1
|
Tan G, Wan Z, Wang Y, Cao Y, Zha ZJ. Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8279-8291. [PMID: 39288038 DOI: 10.1109/tnnls.2024.3440495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker's lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.
Collapse
|
2
|
Cao C, Fu X, Zhu Y, Sun Z, Zha ZJ. Event-Driven Video Restoration With Spiking-Convolutional Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:866-880. [PMID: 37943649 DOI: 10.1109/tnnls.2023.3329741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
With high temporal resolution, high dynamic range, and low latency, event cameras have made great progress in numerous low-level vision tasks. To help restore low-quality (LQ) video sequences, most existing event-based methods usually employ convolutional neural networks (CNNs) to extract sparse event features without considering the spatial sparse distribution or the temporal relation in neighboring events. It brings about insufficient use of spatial and temporal information from events. To address this problem, we propose a new spiking-convolutional network (SC-Net) architecture to facilitate event-driven video restoration. Specifically, to properly extract the rich temporal information contained in the event data, we utilize a spiking neural network (SNN) to suit the sparse characteristics of events and capture temporal correlation in neighboring regions; to make full use of spatial consistency between events and frames, we adopt CNNs to transform sparse events as an extra brightness prior to being aware of detailed textures in video sequences. In this way, both the temporal correlation in neighboring events and the mutual spatial information between the two types of features are fully explored and exploited to accurately restore detailed textures and sharp edges. The effectiveness of the proposed network is validated in three representative video restoration tasks: deblurring, super-resolution, and deraining. Extensive experiments on synthetic and real-world benchmarks have illuminated that our method performs better than existing competing methods.
Collapse
|
3
|
Adhuran J, Khan N, Martini MG. Lossless Encoding of Time-Aggregated Neuromorphic Vision Sensor Data Based on Point-Cloud Compression. SENSORS (BASEL, SWITZERLAND) 2024; 24:1382. [PMID: 38474918 DOI: 10.3390/s24051382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/03/2024] [Accepted: 02/15/2024] [Indexed: 03/14/2024]
Abstract
Neuromorphic Vision Sensors (NVSs) are emerging sensors that acquire visual information asynchronously when changes occur in the scene. Their advantages versus synchronous capturing (frame-based video) include a low power consumption, a high dynamic range, an extremely high temporal resolution, and lower data rates. Although the acquisition strategy already results in much lower data rates than conventional video, NVS data can be further compressed. For this purpose, we recently proposed Time Aggregation-based Lossless Video Encoding for Neuromorphic Vision Sensor Data (TALVEN), consisting in the time aggregation of NVS events in the form of pixel-based event histograms, arrangement of the data in a specific format, and lossless compression inspired by video encoding. In this paper, we still leverage time aggregation but, rather than performing encoding inspired by frame-based video coding, we encode an appropriate representation of the time-aggregated data via point-cloud compression (similar to another one of our previous works, where time aggregation was not used). The proposed strategy, Time-Aggregated Lossless Encoding of Events based on Point-Cloud Compression (TALEN-PCC), outperforms the originally proposed TALVEN encoding strategy for the content in the considered dataset. The gain in terms of the compression ratio is the highest for low-event rate and low-complexity scenes, whereas the improvement is minimal for high-complexity and high-event rate scenes. According to experiments on outdoor and indoor spike event data, TALEN-PCC achieves higher compression gains for time aggregation intervals of more than 5 ms. However, the compression gains are lower when compared to state-of-the-art approaches for time aggregation intervals of less than 5 ms.
Collapse
Affiliation(s)
- Jayasingam Adhuran
- Faculty of Engineering, Computing, and the Environment, Kingston University London, Penrhyn Rd., Kingston upon Thames KT1 2EE, UK
| | - Nabeel Khan
- Department of Computer Science, University of Chester, Parkgate Road, Chester CH1 4BJ, UK
| | - Maria G Martini
- Faculty of Engineering, Computing, and the Environment, Kingston University London, Penrhyn Rd., Kingston upon Thames KT1 2EE, UK
| |
Collapse
|
4
|
Donati E, Valle G. Neuromorphic hardware for somatosensory neuroprostheses. Nat Commun 2024; 15:556. [PMID: 38228580 PMCID: PMC10791662 DOI: 10.1038/s41467-024-44723-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/03/2024] [Indexed: 01/18/2024] Open
Abstract
In individuals with sensory-motor impairments, missing limb functions can be restored using neuroprosthetic devices that directly interface with the nervous system. However, restoring the natural tactile experience through electrical neural stimulation requires complex encoding strategies. Indeed, they are presently limited in effectively conveying or restoring tactile sensations by bandwidth constraints. Neuromorphic technology, which mimics the natural behavior of neurons and synapses, holds promise for replicating the encoding of natural touch, potentially informing neurostimulation design. In this perspective, we propose that incorporating neuromorphic technologies into neuroprostheses could be an effective approach for developing more natural human-machine interfaces, potentially leading to advancements in device performance, acceptability, and embeddability. We also highlight ongoing challenges and the required actions to facilitate the future integration of these advanced technologies.
Collapse
Affiliation(s)
- Elisa Donati
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland.
| | - Giacomo Valle
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
5
|
Zhang P, Venketeswaran A, Wright RF, Lalam N, Sarcinelli E, Ohodnicki PR. Quasi-Distributed Fiber Sensor-Based Approach for Pipeline Health Monitoring: Generating and Analyzing Physics-Based Simulation Datasets for Classification. SENSORS (BASEL, SWITZERLAND) 2023; 23:5410. [PMID: 37420576 DOI: 10.3390/s23125410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/17/2023] [Accepted: 05/29/2023] [Indexed: 07/09/2023]
Abstract
This study presents a framework for detecting mechanical damage in pipelines, focusing on generating simulated data and sampling to emulate distributed acoustic sensing (DAS) system responses. The workflow transforms simulated ultrasonic guided wave (UGW) responses into DAS or quasi-DAS system responses to create a physically robust dataset for pipeline event classification, including welds, clips, and corrosion defects. This investigation examines the effects of sensing systems and noise on classification performance, emphasizing the importance of selecting the appropriate sensing system for a specific application. The framework shows the robustness of different sensor number deployments to experimentally relevant noise levels, demonstrating its applicability in real-world scenarios where noise is present. Overall, this study contributes to the development of a more reliable and effective method for detecting mechanical damage to pipelines by emphasizing the generation and utilization of simulated DAS system responses for pipeline classification efforts. The results on the effects of sensing systems and noise on classification performance further enhance the robustness and reliability of the framework.
Collapse
Affiliation(s)
- Pengdi Zhang
- Mechanical Engineering and Materials Science, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
| | - Abhishek Venketeswaran
- Mechanical Engineering and Materials Science, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
| | - Ruishu F Wright
- National Energy Technology Laboratory, 626 Cochrans Mill Road, Pittsburgh, PA 15236, USA
| | - Nageswara Lalam
- National Energy Technology Laboratory, 626 Cochrans Mill Road, Pittsburgh, PA 15236, USA
| | - Enrico Sarcinelli
- Mechanical Engineering and Materials Science, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
| | - Paul R Ohodnicki
- Mechanical Engineering and Materials Science, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
- Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
| |
Collapse
|
6
|
Li J, Fu Y, Dong S, Yu Z, Huang T, Tian Y. Asynchronous Spatiotemporal Spike Metric for Event Cameras. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1742-1753. [PMID: 33684047 DOI: 10.1109/tnnls.2021.3061122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras as bioinspired vision sensors have shown great advantages in high dynamic range and high temporal resolution in vision tasks. Asynchronous spikes from event cameras can be depicted using the marked spatiotemporal point processes (MSTPPs). However, how to measure the distance between asynchronous spikes in the MSTPPs still remains an open issue. To address this problem, we propose a general asynchronous spatiotemporal spike metric considering both spatiotemporal structural properties and polarity attributes for event cameras. Technically, the conditional probability density function is first introduced to describe the spatiotemporal distribution and polarity prior in the MSTPPs. Besides, a spatiotemporal Gaussian kernel is defined to capture the spatiotemporal structure, which transforms discrete spikes into the continuous function in a reproducing kernel Hilbert space (RKHS). Finally, the distance between asynchronous spikes can be quantified by the inner product in the RKHS. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves significant improvement in computational efficiency. Especially, it is able to better depict the changes involving spatiotemporal structural properties and polarity attributes.
Collapse
|
7
|
Milde MB, Afshar S, Xu Y, Marcireau A, Joubert D, Ramesh B, Bethi Y, Ralph NO, El Arja S, Dennler N, van Schaik A, Cohen G. Neuromorphic Engineering Needs Closed-Loop Benchmarks. Front Neurosci 2022; 16:813555. [PMID: 35237122 PMCID: PMC8884247 DOI: 10.3389/fnins.2022.813555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/24/2022] [Indexed: 12/02/2022] Open
Abstract
Neuromorphic engineering aims to build (autonomous) systems by mimicking biological systems. It is motivated by the observation that biological organisms—from algae to primates—excel in sensing their environment, reacting promptly to their perils and opportunities. Furthermore, they do so more resiliently than our most advanced machines, at a fraction of the power consumption. It follows that the performance of neuromorphic systems should be evaluated in terms of real-time operation, power consumption, and resiliency to real-world perturbations and noise using task-relevant evaluation metrics. Yet, following in the footsteps of conventional machine learning, most neuromorphic benchmarks rely on recorded datasets that foster sensing accuracy as the primary measure for performance. Sensing accuracy is but an arbitrary proxy for the actual system's goal—taking a good decision in a timely manner. Moreover, static datasets hinder our ability to study and compare closed-loop sensing and control strategies that are central to survival for biological organisms. This article makes the case for a renewed focus on closed-loop benchmarks involving real-world tasks. Such benchmarks will be crucial in developing and progressing neuromorphic Intelligence. The shift towards dynamic real-world benchmarking tasks should usher in richer, more resilient, and robust artificially intelligent systems in the future.
Collapse
|
8
|
Afshar S, Ralph N, Xu Y, Tapson J, van Schaik A, Cohen G. Event-Based Feature Extraction Using Adaptive Selection Thresholds. SENSORS 2020; 20:s20061600. [PMID: 32183052 PMCID: PMC7146588 DOI: 10.3390/s20061600] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/07/2020] [Accepted: 03/08/2020] [Indexed: 11/25/2022]
Abstract
Unsupervised feature extraction algorithms form one of the most important building blocks in machine learning systems. These algorithms are often adapted to the event-based domain to perform online learning in neuromorphic hardware. However, not designed for the purpose, such algorithms typically require significant simplification during implementation to meet hardware constraints, creating trade offs with performance. Furthermore, conventional feature extraction algorithms are not designed to generate useful intermediary signals which are valuable only in the context of neuromorphic hardware limitations. In this work a novel event-based feature extraction method is proposed that focuses on these issues. The algorithm operates via simple adaptive selection thresholds which allow a simpler implementation of network homeostasis than previous works by trading off a small amount of information loss in the form of missed events that fall outside the selection thresholds. The behavior of the selection thresholds and the output of the network as a whole are shown to provide uniquely useful signals indicating network weight convergence without the need to access network weights. A novel heuristic method for network size selection is proposed which makes use of noise events and their feature representations. The use of selection thresholds is shown to produce network activation patterns that predict classification accuracy allowing rapid evaluation and optimization of system parameters without the need to run back-end classifiers. The feature extraction method is tested on both the N-MNIST (Neuromorphic-MNIST) benchmarking dataset and a dataset of airplanes passing through the field of view. Multiple configurations with different classifiers are tested with the results quantifying the resultant performance gains at each processing stage.
Collapse
|