1
|
Pathan RK, Biswas M, Yasmin S, Khandaker MU, Salman M, Youssef AAF. Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network. Sci Rep 2023; 13:16975. [PMID: 37813932 PMCID: PMC10562485 DOI: 10.1038/s41598-023-43852-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 09/29/2023] [Indexed: 10/11/2023] Open
Abstract
Sign Language Recognition is a breakthrough for communication among deaf-mute society and has been a critical research topic for years. Although some of the previous studies have successfully recognized sign language, it requires many costly instruments including sensors, devices, and high-end processing power. However, such drawbacks can be easily overcome by employing artificial intelligence-based techniques. Since, in this modern era of advanced mobile technology, using a camera to take video or images is much easier, this study demonstrates a cost-effective technique to detect American Sign Language (ASL) using an image dataset. Here, "Finger Spelling, A" dataset has been used, with 24 letters (except j and z as they contain motion). The main reason for using this dataset is that these images have a complex background with different environments and scene colors. Two layers of image processing have been used: in the first layer, images are processed as a whole for training, and in the second layer, the hand landmarks are extracted. A multi-headed convolutional neural network (CNN) model has been proposed and tested with 30% of the dataset to train these two layers. To avoid the overfitting problem, data augmentation and dynamic learning rate reduction have been used. With the proposed model, 98.981% test accuracy has been achieved. It is expected that this study may help to develop an efficient human-machine communication system for a deaf-mute society.
Collapse
Affiliation(s)
- Refat Khan Pathan
- Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia
| | - Munmun Biswas
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong, 4381, Bangladesh
| | - Suraiya Yasmin
- Department of Computer and Information Science, Graduate School of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, 184-0012, Japan
| | - Mayeen Uddin Khandaker
- Centre for Applied Physics and Radiation Technologies, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia.
- Faculty of Graduate Studies, Daffodil International University, Daffodil Smart City, Birulia, Savar, Dhaka, 1216, Bangladesh.
| | - Mohammad Salman
- College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait
| | - Ahmed A F Youssef
- College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait
| |
Collapse
|
2
|
Carfi A, Mastrogiovanni F. Gesture-Based Human-Machine Interaction: Taxonomy, Problem Definition, and Analysis. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:497-513. [PMID: 34910648 DOI: 10.1109/tcyb.2021.3129119] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The possibility for humans to interact with physical or virtual systems using gestures has been vastly explored by researchers and designers in the last 20 years to provide new and intuitive interaction modalities. Unfortunately, the literature about gestural interaction is not homogeneous, and it is characterized by a lack of shared terminology. This leads to fragmented results and makes it difficult for research activities to build on top of state-of-the-art results and approaches. The analysis in this article aims at creating a common conceptual design framework to enforce development efforts in gesture-based human-machine interaction (HMI). The main contributions of this article can be summarized as follows: 1) we provide a broad definition for the notion of functional gesture in HMI; 2) we design a flexible and expandable gesture taxonomy; and 3) we put forward a detailed problem statement for gesture-based HMI. Finally, to support our main contribution, this article presents and analyzes 83 most pertinent articles classified on the basis of our taxonomy and problem statement.
Collapse
|
3
|
Wu Z, Zhang H, Lin Y, Li G, Wang M, Tang Y. LIAF-Net: Leaky Integrate and Analog Fire Network for Lightweight and Efficient Spatiotemporal Information Processing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6249-6262. [PMID: 33979292 DOI: 10.1109/tnnls.2021.3073016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Spiking neural networks (SNNs) based on the leaky integrate and fire (LIF) model have been applied to energy-efficient temporal and spatiotemporal processing tasks. Due to the bioplausible neuronal dynamics and simplicity, LIF-SNN benefits from event-driven processing, however, usually face the embarrassment of reduced performance. This may because, in LIF-SNN, the neurons transmit information via spikes. To address this issue, in this work, we propose a leaky integrate and analog fire (LIAF) neuron model so that analog values can be transmitted among neurons, and a deep network termed LIAF-Net is built on it for efficient spatiotemporal processing. In the temporal domain, LIAF follows the traditional LIF dynamics to maintain its temporal processing capability. In the spatial domain, LIAF is able to integrate spatial information through convolutional integration or fully connected integration. As a spatiotemporal layer, LIAF can also be used with traditional artificial neural network (ANN) layers jointly. In addition, the built network can be trained with backpropagation through time (BPTT) directly, which avoids the performance loss caused by ANN to SNN conversion. Experiment results indicate that LIAF-Net achieves comparable performance to the gated recurrent unit (GRU) and long short-term memory (LSTM) on bAbI question answering (QA) tasks and achieves state-of-the-art performance on spatiotemporal dynamic vision sensor (DVS) data sets, including MNIST-DVS, CIFAR10-DVS, and DVS128 Gesture, with much less number of synaptic weights and computational overhead compared with traditional networks built by LSTM, GRU, convolutional LSTM (ConvLSTM), or 3-D convolution (Conv3D). Compared with traditional LIF-SNN, LIAF-Net also shows dramatic accuracy gain on all these experiments. In conclusion, LIAF-Net provides a framework combining the advantages of both ANNs and SNNs for lightweight and efficient spatiotemporal information processing.
Collapse
|
4
|
Zhang S, Wang W, Li H, Zhang S. EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking. SENSORS (BASEL, SWITZERLAND) 2022; 22:6090. [PMID: 36015851 PMCID: PMC9414578 DOI: 10.3390/s22166090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 07/06/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]
Abstract
An event camera is a novel bio-inspired sensor that effectively compensates for the shortcomings of current frame cameras, which include high latency, low dynamic range, motion blur, etc. Rather than capturing images at a fixed frame rate, an event camera produces an asynchronous signal by measuring the brightness change of each pixel. Consequently, an appropriate algorithm framework that can handle the unique data types of event-based vision is required. In this paper, we propose a dynamic object tracking framework using an event camera to achieve long-term stable tracking of event objects. One of the key novel features of our approach is to adopt an adaptive strategy that adjusts the spatiotemporal domain of event data. To achieve this, we reconstruct event images from high-speed asynchronous streaming data via online learning. Additionally, we apply the Siamese network to extract features from event data. In contrast to earlier models that only extract hand-crafted features, our method provides powerful feature description and a more flexible reconstruction strategy for event data. We assess our algorithm in three challenging scenarios: 6-DoF (six degrees of freedom), translation, and rotation. Unlike fixed cameras in traditional object tracking tasks, all three tracking scenarios involve the simultaneous violent rotation and shaking of both the camera and objects. Results from extensive experiments suggest that our proposed approach achieves superior accuracy and robustness compared to other state-of-the-art methods. Without reducing time efficiency, our novel method exhibits a 30% increase in accuracy over other recent models. Furthermore, results indicate that event cameras are capable of robust object tracking, which is a task that conventional cameras cannot adequately perform, especially for super-fast motion tracking and challenging lighting situations.
Collapse
|
5
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 151] [Impact Index Per Article: 75.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
6
|
Ge Z, Gao Y, So HKH, Lam EY. Event-based laser speckle correlation for micro motion estimation. OPTICS LETTERS 2021; 46:3885-3888. [PMID: 34388766 DOI: 10.1364/ol.430419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/13/2021] [Indexed: 06/13/2023]
Abstract
Micro motion estimation has important applications in various fields such as microfluidic particle detection and biomedical cell imaging. Conventional methods analyze the motion from intensity images captured using frame-based imaging sensors such as the complementary metal-oxide semiconductor (CMOS) and the charge-coupled device (CCD). Recently, event-based sensors have evolved with the special capability to record asynchronous light changes with high dynamic range, high temporal resolution, low latency, and no motion blur. In this Letter, we explore the potential of using the event sensor to estimate the micro motion based on the laser speckle correlation technique.
Collapse
|
7
|
Tayarani-Najaran MH, Schmuker M. Event-Based Sensing and Signal Processing in the Visual, Auditory, and Olfactory Domain: A Review. Front Neural Circuits 2021; 15:610446. [PMID: 34135736 PMCID: PMC8203204 DOI: 10.3389/fncir.2021.610446] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
The nervous systems converts the physical quantities sensed by its primary receptors into trains of events that are then processed in the brain. The unmatched efficiency in information processing has long inspired engineers to seek brain-like approaches to sensing and signal processing. The key principle pursued in neuromorphic sensing is to shed the traditional approach of periodic sampling in favor of an event-driven scheme that mimicks sampling as it occurs in the nervous system, where events are preferably emitted upon the change of the sensed stimulus. In this paper we highlight the advantages and challenges of event-based sensing and signal processing in the visual, auditory and olfactory domains. We also provide a survey of the literature covering neuromorphic sensing and signal processing in all three modalities. Our aim is to facilitate research in event-based sensing and signal processing by providing a comprehensive overview of the research performed previously as well as highlighting conceptual advantages, current progress and future challenges in the field.
Collapse
Affiliation(s)
| | - Michael Schmuker
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| |
Collapse
|
8
|
Wang G, Zhang C, Chen X, Ji X, Xue JH, Wang H. Bi-Stream Pose-Guided Region Ensemble Network for Fingertip Localization From Stereo Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5153-5165. [PMID: 32070999 DOI: 10.1109/tnnls.2020.2964037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In human-computer interaction, it is important to accurately estimate the hand pose, especially fingertips. However, traditional approaches to fingertip localization mainly rely on depth images and thus suffer considerably from noise and missing values. Instead of depth images, stereo images can also provide 3-D information of hands. There are nevertheless limitations on the dataset size, global viewpoints, hand articulations, and hand shapes in publicly available stereo-based hand pose datasets. To mitigate these limitations and promote further research on hand pose estimation from stereo images, we build a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization. In the THU-Bi-Hand dataset, there are 447k pairs of stereo images of different hand shapes from ten subjects with accurate 3-D location annotations of the wrist and five fingertips. Captured with minimal restriction on the range of hand motion, the dataset covers a large global viewpoint space and hand articulation space. To better present the performance of fingertip localization on THU-Bi-Hand, we propose a novel scheme termed bi-stream pose-guided region ensemble network (Bi-Pose-REN). It extracts more representative feature regions around joints in the feature maps under the guidance of the previously estimated pose. The feature regions are integrated hierarchically according to the topology of hand joints to regress a refined hand pose. Bi-Pose-REN and several existing methods are evaluated on THU-Bi-Hand so that benchmarks are provided for further research. Experimental results show that our Bi-Pose-REN has achieved the best performance on THU-Bi-Hand.
Collapse
|
9
|
Howell J, Hammarton TC, Altmann Y, Jimenez M. High-speed particle detection and tracking in microfluidic devices using event-based sensing. LAB ON A CHIP 2020; 20:3024-3035. [PMID: 32700715 DOI: 10.1039/d0lc00556h] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Visualising fluids and particles within channels is a key element of microfluidic work. Current imaging methods for particle image velocimetry often require expensive high-speed cameras with powerful illuminating sources, thus potentially limiting accessibility. This study explores for the first time the potential of an event-based camera for particle and fluid behaviour characterisation in a microfluidic system. Event-based cameras have the unique capacity to detect light intensity changes asynchronously and to record spatial and temporal information with low latency, low power and high dynamic range. Event-based cameras could consequently be relevant for detecting light intensity changes due to moving particles, chemical reactions or intake of fluorescent dyes by cells to mention a few. As a proof-of-principle, event-based sensing was tested in this work to detect 1 μm and 10 μm diameter particles flowing in a microfluidic channel for average fluid velocities of up to 1.54 m s-1. Importantly, experiments were performed by directly connecting the camera to a standard fluorescence microscope, only relying on the microscope arc lamp for illumination. We present a data processing strategy that allows particle detection and tracking in both bright-field and fluorescence imaging. Detection was achieved up to a fluid velocity of 1.54 m s-1 and tracking up to 0.4 m s-1 suggesting that event-based cameras could be a new paradigm shift in microscopic imaging.
Collapse
Affiliation(s)
- Jessie Howell
- Biomedical Engineering Division, James Watt School of Engineering, University of Glasgow, Glasgow, G12 8LT, UK.
| | | | | | | |
Collapse
|
10
|
Maro JM, Ieng SH, Benosman R. Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities. Front Neurosci 2020; 14:275. [PMID: 32327968 PMCID: PMC7160298 DOI: 10.3389/fnins.2020.00275] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/10/2020] [Indexed: 11/13/2022] Open
Abstract
In this paper, we introduce a framework for dynamic gesture recognition with background suppression operating on the output of a moving event-based camera. The system is developed to operate in real-time using only the computational capabilities of a mobile phone. It introduces a new development around the concept of time-surfaces. It also presents a novel event-based methodology to dynamically remove backgrounds that uses the high temporal resolution properties of event-based cameras. To our knowledge, this is the first Android event-based framework for vision-based recognition of dynamic gestures running on a smartphone without off-board processing. We assess the performances by considering several scenarios in both indoors and outdoors, for static and dynamic conditions, in uncontrolled lighting conditions. We also introduce a new event-based dataset for gesture recognition with static and dynamic backgrounds (made publicly available). The set of gestures has been selected following a clinical trial to allow human-machine interaction for the visually impaired and older adults. We finally report comparisons with prior work that addressed event-based gesture recognition reporting comparable results, without the use of advanced classification techniques nor power greedy hardware.
Collapse
Affiliation(s)
| | - Sio-Hoi Ieng
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
| | - Ryad Benosman
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
- Departments of Ophthalmology/ECE/BioE, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Computer Science, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
11
|
Heidarpur M, Khosravifar P, Ahmadi A, Ahmadi M. CORDIC-Astrocyte: Tripartite Glutamate-IP3-Ca 2+ Interaction Dynamics on FPGA. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2020; 14:36-47. [PMID: 31751284 DOI: 10.1109/tbcas.2019.2953631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Real-time, large-scale simulation of biological systems is challenging due to different types of nonlinear functions describing biochemical reactions in the cells. The promise of the high speed, cost effectiveness, and power efficiency in addition to parallel processing has made application-specific hardware an attractive simulation platform. This paper proposes high-speed and low-cost digital hardware to emulate a biological-plausible astrocyte and glutamate-release mechanism. The nonlinear terms of these models were calculated using a high-precision and cost-effective algorithm. Subsequently, the modified models were simulated to study and validate their functions. We developed several hardware versions by setting different constraints to investigate trade-offs and find the best possible design. FPGA implementation results confirmed the ability of the design to emulate biological cell behaviours in detail with high accuracy. As for performance, the proposed design turned out to be faster and more efficient than previously published works that targeted digital hardware for biological-plausible astrocytes.
Collapse
|
12
|
Chen G, Chen J, Lienen M, Conradt J, Röhrbein F, Knoll AC. FLGR: Fixed Length Gists Representation Learning for RNN-HMM Hybrid-Based Neuromorphic Continuous Gesture Recognition. Front Neurosci 2019; 13:73. [PMID: 30809114 PMCID: PMC6380225 DOI: 10.3389/fnins.2019.00073] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 01/23/2019] [Indexed: 11/18/2022] Open
Abstract
A neuromorphic vision sensors is a novel passive sensing modality and frameless sensors with several advantages over conventional cameras. Frame-based cameras have an average frame-rate of 30 fps, causing motion blur when capturing fast motion, e.g., hand gesture. Rather than wastefully sending entire images at a fixed frame rate, neuromorphic vision sensors only transmit the local pixel-level changes induced by the movement in a scene when they occur. This leads to advantageous characteristics, including low energy consumption, high dynamic range, a sparse event stream and low response latency. In this study, a novel representation learning method was proposed: Fixed Length Gists Representation (FLGR) learning for event-based gesture recognition. Previous methods accumulate events into video frames in a time duration (e.g., 30 ms) to make the accumulated image-level representation. However, the accumulated-frame-based representation waives the friendly event-driven paradigm of neuromorphic vision sensor. New representation are urgently needed to fill the gap in non-accumulated-frame-based representation and exploit the further capabilities of neuromorphic vision. The proposed FLGR is a sequence learned from mixture density autoencoder and preserves the nature of event-based data better. FLGR has a data format of fixed length, and it is easy to feed to sequence classifier. Moreover, an RNN-HMM hybrid was proposed to address the continuous gesture recognition problem. Recurrent neural network (RNN) was applied for FLGR sequence classification while hidden Markov model (HMM) is employed for localizing the candidate gesture and improving the result in a continuous sequence. A neuromorphic continuous hand gestures dataset (Neuro ConGD Dataset) was developed with 17 hand gestures classes for the community of the neuromorphic research. Hopefully, FLGR can inspire the study on the event-based highly efficient, high-speed, and high-dynamic-range sequence classification tasks.
Collapse
Affiliation(s)
- Guang Chen
- College of Automotive Engineering, Tongji University, Shanghai, China.,Chair of Robotics, Artificial Intelligence and Real-time Systems, Technische Universität München, Munich, Germany
| | - Jieneng Chen
- College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Marten Lienen
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Technische Universität München, Munich, Germany
| | - Jörg Conradt
- Department of Computational Science and Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Florian Röhrbein
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Technische Universität München, Munich, Germany
| | - Alois C Knoll
- Chair of Robotics, Artificial Intelligence and Real-time Systems, Technische Universität München, Munich, Germany
| |
Collapse
|
13
|
Afshar S, Hamilton TJ, Tapson J, van Schaik A, Cohen G. Investigation of Event-Based Surfaces for High-Speed Detection, Unsupervised Feature Extraction, and Object Recognition. Front Neurosci 2019; 12:1047. [PMID: 30705618 PMCID: PMC6344467 DOI: 10.3389/fnins.2018.01047] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 12/24/2018] [Indexed: 12/31/2022] Open
Abstract
In this work, we investigate event-based feature extraction through a rigorous framework of testing. We test a hardware efficient variant of Spike Timing Dependent Plasticity (STDP) on a range of spatio-temporal kernels with different surface decaying methods, decay functions, receptive field sizes, feature numbers, and back end classifiers. This detailed investigation can provide helpful insights and rules of thumb for performance vs. complexity trade-offs in more generalized networks, especially in the context of hardware implementation, where design choices can incur significant resource costs. The investigation is performed using a new dataset consisting of model airplanes being dropped free-hand close to the sensor. The target objects exhibit a wide range of relative orientations and velocities. This range of target velocities, analyzed in multiple configurations, allows a rigorous comparison of time-based decaying surfaces (time surfaces) vs. event index-based decaying surface (index surfaces), which are used to perform unsupervised feature extraction, followed by target detection and recognition. We examine each processing stage by comparison to the use of raw events, as well as a range of alternative layer structures, and the use of random features. By comparing results from a linear classifier and an ELM classifier, we evaluate how each element of the system affects accuracy. To generate time and index surfaces, the most commonly used kernels, namely event binning kernels, linearly, and exponentially decaying kernels, are investigated. Index surfaces were found to outperform time surfaces in recognition when invariance to target velocity was made a requirement. In the investigation of network structure, larger networks of neurons with large receptive field sizes were found to perform best. We find that a small number of event-based feature extractors can project the complex spatio-temporal event patterns of the dataset to an almost linearly separable representation in feature space, with best performing linear classifier achieving 98.75% recognition accuracy, using only 25 feature extracting neurons.
Collapse
Affiliation(s)
- Saeed Afshar
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Tara Julia Hamilton
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Jonathan Tapson
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - André van Schaik
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| | - Gregory Cohen
- Biomedical Engineering and Neuroscience Program, The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
14
|
Pfeiffer M, Pfeil T. Deep Learning With Spiking Neurons: Opportunities and Challenges. Front Neurosci 2018; 12:774. [PMID: 30410432 PMCID: PMC6209684 DOI: 10.3389/fnins.2018.00774] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/04/2018] [Indexed: 01/16/2023] Open
Abstract
Spiking neural networks (SNNs) are inspired by information processing in biology, where sparse and asynchronous binary signals are communicated and processed in a massively parallel fashion. SNNs on neuromorphic hardware exhibit favorable properties such as low power consumption, fast inference, and event-driven information processing. This makes them interesting candidates for the efficient implementation of deep neural networks, the method of choice for many machine learning tasks. In this review, we address the opportunities that deep spiking networks offer and investigate in detail the challenges associated with training SNNs in a way that makes them competitive with conventional deep learning, but simultaneously allows for efficient mapping to hardware. A wide range of training methods for SNNs is presented, ranging from the conversion of conventional deep networks into SNNs, constrained training before conversion, spiking variants of backpropagation, and biologically motivated variants of STDP. The goal of our review is to define a categorization of SNN training methods, and summarize their advantages and drawbacks. We further discuss relationships between SNNs and binary networks, which are becoming popular for efficient digital hardware implementation. Neuromorphic hardware platforms have great potential to enable deep spiking networks in real-world applications. We compare the suitability of various neuromorphic systems that have been developed over the past years, and investigate potential use cases. Neuromorphic approaches and conventional machine learning should not be considered simply two solutions to the same classes of problems, instead it is possible to identify and exploit their task-specific advantages. Deep SNNs offer great opportunities to work with new types of event-based sensors, exploit temporal codes and local on-chip learning, and we have so far just scratched the surface of realizing these advantages in practical applications.
Collapse
Affiliation(s)
- Michael Pfeiffer
- Bosch Center for Artificial Intelligence, Robert Bosch GmbH, Renningen, Germany
| | | |
Collapse
|
15
|
Rebecq H, Gallego G, Mueggler E, Scaramuzza D. EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time. Int J Comput Vis 2017. [DOI: 10.1007/s11263-017-1050-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Rivera-Acosta M, Ortega-Cisneros S, Rivera J, Sandoval-Ibarra F. American Sign Language Alphabet Recognition Using a Neuromorphic Sensor and an Artificial Neural Network. SENSORS 2017; 17:s17102176. [PMID: 28937644 PMCID: PMC5677181 DOI: 10.3390/s17102176] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 09/02/2017] [Accepted: 09/13/2017] [Indexed: 11/28/2022]
Abstract
This paper reports the design and analysis of an American Sign Language (ASL) alphabet translation system implemented in hardware using a Field-Programmable Gate Array. The system process consists of three stages, the first being the communication with the neuromorphic camera (also called Dynamic Vision Sensor, DVS) sensor using the Universal Serial Bus protocol. The feature extraction of the events generated by the DVS is the second part of the process, consisting of a presentation of the digital image processing algorithms developed in software, which aim to reduce redundant information and prepare the data for the third stage. The last stage of the system process is the classification of the ASL alphabet, achieved with a single artificial neural network implemented in digital hardware for higher speed. The overall result is the development of a classification system using the ASL signs contour, fully implemented in a reconfigurable device. The experimental results consist of a comparative analysis of the recognition rate among the alphabet signs using the neuromorphic camera in order to prove the proper operation of the digital image processing algorithms. In the experiments performed with 720 samples of 24 signs, a recognition accuracy of 79.58% was obtained.
Collapse
Affiliation(s)
- Miguel Rivera-Acosta
- Advanced Studies and Research Center (CINVESTAV), National Polytechnic Institute (IPN), Zapopan 45019, Mexico.
| | - Susana Ortega-Cisneros
- Advanced Studies and Research Center (CINVESTAV), National Polytechnic Institute (IPN), Zapopan 45019, Mexico.
| | - Jorge Rivera
- CONACYT-Advanced Studies and Research Center (CINVESTAV), National Polytechnic Institute (IPN), Zapopan 45019, Mexico.
| | - Federico Sandoval-Ibarra
- Advanced Studies and Research Center (CINVESTAV), National Polytechnic Institute (IPN), Zapopan 45019, Mexico.
| |
Collapse
|
17
|
Coronary Heart Disease Preoperative Gesture Interactive Diagnostic System Based on Augmented Reality. J Med Syst 2017; 41:126. [PMID: 28718051 DOI: 10.1007/s10916-017-0768-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 06/29/2017] [Indexed: 10/19/2022]
Abstract
Coronary heart disease preoperative diagnosis plays an important role in the treatment of vascular interventional surgery. Actually, most doctors are used to diagnosing the position of the vascular stenosis and then empirically estimating vascular stenosis by selective coronary angiography images instead of using mouse, keyboard and computer during preoperative diagnosis. The invasive diagnostic modality is short of intuitive and natural interaction and the results are not accurate enough. Aiming at above problems, the coronary heart disease preoperative gesture interactive diagnostic system based on Augmented Reality is proposed. The system uses Leap Motion Controller to capture hand gesture video sequences and extract the features which that are the position and orientation vector of the gesture motion trajectory and the change of the hand shape. The training planet is determined by K-means algorithm and then the effect of gesture training is improved by multi-features and multi-observation sequences for gesture training. The reusability of gesture is improved by establishing the state transition model. The algorithm efficiency is improved by gesture prejudgment which is used by threshold discriminating before recognition. The integrity of the trajectory is preserved and the gesture motion space is extended by employing space rotation transformation of gesture manipulation plane. Ultimately, the gesture recognition based on SRT-HMM is realized. The diagnosis and measurement of the vascular stenosis are intuitively and naturally realized by operating and measuring the coronary artery model with augmented reality and gesture interaction techniques. All of the gesture recognition experiments show the distinguish ability and generalization ability of the algorithm and gesture interaction experiments prove the availability and reliability of the system.
Collapse
|
18
|
Peng X, Zhao B, Yan R, Tang H, Yi Z. Bag of Events: An Efficient Probability-Based Feature Extraction Method for AER Image Sensors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:791-803. [PMID: 28113870 DOI: 10.1109/tnnls.2016.2536741] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Address event representation (AER) image sensors represent the visual information as a sequence of events that denotes the luminance changes of the scene. In this paper, we introduce a feature extraction method for AER image sensors based on the probability theory, namely, bag of events (BOE). The proposed approach represents each object as the joint probability distribution of the concurrent events, and each event corresponds to a unique activated pixel of the AER sensor. The advantages of BOE include: 1) it is a statistical learning method and has a good interpretability in mathematics; 2) BOE can significantly reduce the effort to tune parameters for different data sets, because it only has one hyperparameter and is robust to the value of the parameter; 3) BOE is an online learning algorithm, which does not require the training data to be collected in advance; 4) BOE can achieve competitive results in real time for feature extraction (>275 frames/s and >120,000 events/s); and 5) the implementation complexity of BOE only involves some basic operations, e.g., addition and multiplication. This guarantees the hardware friendliness of our method. The experimental results on three popular AER databases (i.e., MNIST-dynamic vision sensor, Poker Card, and Posture) show that our method is remarkably faster than two recently proposed AER categorization systems while preserving a good classification accuracy.
Collapse
|
19
|
Clady X, Maro JM, Barré S, Benosman RB. A Motion-Based Feature for Event-Based Pattern Recognition. Front Neurosci 2017; 10:594. [PMID: 28101001 PMCID: PMC5209354 DOI: 10.3389/fnins.2016.00594] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/13/2016] [Indexed: 11/13/2022] Open
Abstract
This paper introduces an event-based luminance-free feature from the output of asynchronous event-based neuromorphic retinas. The feature consists in mapping the distribution of the optical flow along the contours of the moving objects in the visual scene into a matrix. Asynchronous event-based neuromorphic retinas are composed of autonomous pixels, each of them asynchronously generating "spiking" events that encode relative changes in pixels' illumination at high temporal resolutions. The optical flow is computed at each event, and is integrated locally or globally in a speed and direction coordinate frame based grid, using speed-tuned temporal kernels. The latter ensures that the resulting feature equitably represents the distribution of the normal motion along the current moving edges, whatever their respective dynamics. The usefulness and the generality of the proposed feature are demonstrated in pattern recognition applications: local corner detection and global gesture recognition.
Collapse
Affiliation(s)
- Xavier Clady
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Jean-Matthieu Maro
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Sébastien Barré
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Ryad B Benosman
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| |
Collapse
|
20
|
Lagorce X, Ieng SH, Clady X, Pfeiffer M, Benosman RB. Spatiotemporal features for asynchronous event-based data. Front Neurosci 2015; 9:46. [PMID: 25759637 PMCID: PMC4338664 DOI: 10.3389/fnins.2015.00046] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 02/03/2015] [Indexed: 11/13/2022] Open
Abstract
Bio-inspired asynchronous event-based vision sensors are currently introducing a paradigm shift in visual information processing. These new sensors rely on a stimulus-driven principle of light acquisition similar to biological retinas. They are event-driven and fully asynchronous, thereby reducing redundancy and encoding exact times of input signal changes, leading to a very precise temporal resolution. Approaches for higher-level computer vision often rely on the reliable detection of features in visual frames, but similar definitions of features for the novel dynamic and event-based visual input representation of silicon retinas have so far been lacking. This article addresses the problem of learning and recognizing features for event-based vision sensors, which capture properties of truly spatiotemporal volumes of sparse visual event information. A novel computational architecture for learning and encoding spatiotemporal features is introduced based on a set of predictive recurrent reservoir networks, competing via winner-take-all selection. Features are learned in an unsupervised manner from real-world input recorded with event-based vision sensors. It is shown that the networks in the architecture learn distinct and task-specific dynamic visual features, and can predict their trajectories over time.
Collapse
Affiliation(s)
- Xavier Lagorce
- Equipe de Vision et Calcul Naturel, UMR S968 Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique UMR 7210, Centre Hospitalier National d' Ophtalmologie des Quinze-Vingts, Université Pierre et Marie CurieParis, France
| | - Sio-Hoi Ieng
- Equipe de Vision et Calcul Naturel, UMR S968 Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique UMR 7210, Centre Hospitalier National d' Ophtalmologie des Quinze-Vingts, Université Pierre et Marie CurieParis, France
| | - Xavier Clady
- Equipe de Vision et Calcul Naturel, UMR S968 Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique UMR 7210, Centre Hospitalier National d' Ophtalmologie des Quinze-Vingts, Université Pierre et Marie CurieParis, France
| | - Michael Pfeiffer
- Institute of Neuroinformatics, University of Zürich and Eidgenössische Technische Hochschule (ETH) ZürichZürich, Switzerland
| | - Ryad B. Benosman
- Equipe de Vision et Calcul Naturel, UMR S968 Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique UMR 7210, Centre Hospitalier National d' Ophtalmologie des Quinze-Vingts, Université Pierre et Marie CurieParis, France
| |
Collapse
|