1
|
Tan G, Wan Z, Wang Y, Cao Y, Zha ZJ. Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8279-8291. [PMID: 39288038 DOI: 10.1109/tnnls.2024.3440495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker's lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.
Collapse
|
2
|
Li J, Fu Y, Dong S, Yu Z, Huang T, Tian Y. Asynchronous Spatiotemporal Spike Metric for Event Cameras. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1742-1753. [PMID: 33684047 DOI: 10.1109/tnnls.2021.3061122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras as bioinspired vision sensors have shown great advantages in high dynamic range and high temporal resolution in vision tasks. Asynchronous spikes from event cameras can be depicted using the marked spatiotemporal point processes (MSTPPs). However, how to measure the distance between asynchronous spikes in the MSTPPs still remains an open issue. To address this problem, we propose a general asynchronous spatiotemporal spike metric considering both spatiotemporal structural properties and polarity attributes for event cameras. Technically, the conditional probability density function is first introduced to describe the spatiotemporal distribution and polarity prior in the MSTPPs. Besides, a spatiotemporal Gaussian kernel is defined to capture the spatiotemporal structure, which transforms discrete spikes into the continuous function in a reproducing kernel Hilbert space (RKHS). Finally, the distance between asynchronous spikes can be quantified by the inner product in the RKHS. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves significant improvement in computational efficiency. Especially, it is able to better depict the changes involving spatiotemporal structural properties and polarity attributes.
Collapse
|
3
|
Tan X, Xiang C, Cao J, Xu W, Wen G, Rutkowski L. Synchronization of Neural Networks via Periodic Self-Triggered Impulsive Control and Its Application in Image Encryption. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8246-8257. [PMID: 33531321 DOI: 10.1109/tcyb.2021.3049858] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a periodic self-triggered impulsive (PSTI) control scheme is proposed to achieve synchronization of neural networks (NNs). Two kinds of impulsive gains with constant and random values are considered, and the corresponding synchronization criteria are obtained based on tools from impulsive control, event-driven control theory, and stability analysis. The designed triggering protocol is simpler, easier to implement, and more flexible compared with some previously reported algorithms as the protocol combines the advantages of the periodic sampling and event-driven control. In addition, the chaotic synchronization of NNs via the presented PSTI sampling is further applied to encrypt images. Several examples are also utilized to illustrate the validity of the presented synchronization algorithm of NNs based on PSTI control and its potential applications in image processing.
Collapse
|
4
|
Abstract
A large number of intelligent models for masked face recognition (MFR) has been recently presented and applied in various fields, such as masked face tracking for people safety or secure authentication. Exceptional hazards such as pandemics and frauds have noticeably accelerated the abundance of relevant algorithm creation and sharing, which has introduced new challenges. Therefore, recognizing and authenticating people wearing masks will be a long-established research area, and more efficient methods are needed for real-time MFR. Machine learning has made progress in MFR and has significantly facilitated the intelligent process of detecting and authenticating persons with occluded faces. This survey organizes and reviews the recent works developed for MFR based on deep learning techniques, providing insights and thorough discussion on the development pipeline of MFR systems. State-of-the-art techniques are introduced according to the characteristics of deep network architectures and deep feature extraction strategies. The common benchmarking datasets and evaluation metrics used in the field of MFR are also discussed. Many challenges and promising research directions are highlighted. This comprehensive study considers a wide variety of recent approaches and achievements, aiming to shape a global view of the field of MFR.
Collapse
|
5
|
Steffen L, Elfgen M, Ulbrich S, Roennau A, Dillmann R. A Benchmark Environment for Neuromorphic Stereo Vision. Front Robot AI 2021; 8:647634. [PMID: 34095240 PMCID: PMC8170485 DOI: 10.3389/frobt.2021.647634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
Without neuromorphic hardware, artificial stereo vision suffers from high resource demands and processing times impeding real-time capability. This is mainly caused by high frame rates, a quality feature for conventional cameras, generating large amounts of redundant data. Neuromorphic visual sensors generate less redundant and more relevant data solving the issue of over- and undersampling at the same time. However, they require a rethinking of processing as established techniques in conventional stereo vision do not exploit the potential of their event-based operation principle. Many alternatives have been recently proposed which have yet to be evaluated on a common data basis. We propose a benchmark environment offering the methods and tools to compare different algorithms for depth reconstruction from two event-based sensors. To this end, an experimental setup consisting of two event-based and one depth sensor as well as a framework enabling synchronized, calibrated data recording is presented. Furthermore, we define metrics enabling a meaningful comparison of the examined algorithms, covering aspects such as performance, precision and applicability. To evaluate the benchmark, a stereo matching algorithm was implemented as a testing candidate and multiple experiments with different settings and camera parameters have been carried out. This work is a foundation for a robust and flexible evaluation of the multitude of new techniques for event-based stereo vision, allowing a meaningful comparison.
Collapse
Affiliation(s)
- L. Steffen
- Interactive Diagnosis and Service Systems (IDS), Intelligent Systems and Production Engineering (ISPE), FZI Research Center for Information Technology, Karlsruhe, Germany
| | | | | | | | | |
Collapse
|
6
|
Ge S, Zhang C, Li S, Zeng D, Tao D. Cascaded Correlation Refinement for Robust Deep Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1276-1288. [PMID: 32305944 DOI: 10.1109/tnnls.2020.2984256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent deep trackers have shown superior performance in visual tracking. In this article, we propose a cascaded correlation refinement approach to facilitate the robustness of deep tracking. The core idea is to address accurate target localization and reliable model update in a collaborative way. To this end, our approach cascades multiple stages of correlation refinement to progressively refine target localization. Thus, the localized object could be used to learn an accurate on-the-fly model for improving the reliability of model update. Meanwhile, we introduce an explicit measure to identify the tracking failure and then leverage a simple yet effective look-back scheme to adaptively incorporate the initial model and on-the-fly model to update the tracking model. As a result, the tracking model can be used to localize the target more accurately. Extensive experiments on OTB2013, OTB2015, VOT2016, VOT2018, UAV123, and GOT-10k demonstrate that the proposed tracker achieves the best robustness against the state of the arts.
Collapse
|
7
|
|
8
|
Jiang R, Mou X, Shi S, Zhou Y, Wang Q, Dong M, Chen S. Object tracking on event cameras with offline–online learning. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2020. [DOI: 10.1049/trit.2019.0107] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Rui Jiang
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Xiaozheng Mou
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Shunshun Shi
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Yueyin Zhou
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Qinyi Wang
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Meng Dong
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Shoushun Chen
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
- School of Electrical and Electronic EngineeringNanyang Technological UniversitySingapore639798Singapore
| |
Collapse
|
9
|
Gong M, Feng J, Xie Y. Privacy-enhanced multi-party deep learning. Neural Netw 2019; 121:484-496. [PMID: 31648120 DOI: 10.1016/j.neunet.2019.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 08/01/2019] [Accepted: 10/03/2019] [Indexed: 10/25/2022]
Abstract
In multi-party deep learning, multiple participants jointly train a deep learning model through a central server to achieve common objectives without sharing their private data. Recently, a significant amount of progress has been made toward the privacy issue of this emerging multi-party deep learning paradigm. In this paper, we mainly focus on two problems in multi-party deep learning. The first problem is that most of the existing works are incapable of defending simultaneously against the attacks of honest-but-curious participants and an honest-but-curious server without a manager trusted by all participants. To tackle this problem, we design a privacy-enhanced multi-party deep learning framework, which integrates differential privacy and homomorphic encryption to prevent potential privacy leakage to other participants and a central server without requiring a manager that all participants trust. The other problem is that existing frameworks consume high total privacy budget when applying differential privacy for preserving privacy, which leads to a high risk of privacy leakage. In order to alleviate this problem, we propose three strategies for dynamically allocating privacy budget at each epoch to further enhance privacy guarantees without compromising the model utility. Moreover, it provides participants with an intuitive handle to strike a balance between the privacy level and the training efficiency by choosing different strategies. Both analytical and experimental evaluations demonstrate the promising performance of our proposed framework.
Collapse
Affiliation(s)
- Maoguo Gong
- School of Electronic Engineering, Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an, Shaanxi Province 710071, China.
| | - Jialun Feng
- School of Electronic Engineering, Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an, Shaanxi Province 710071, China
| | - Yu Xie
- School of Electronic Engineering, Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an, Shaanxi Province 710071, China
| |
Collapse
|
10
|
Noda A, Tabata S, Ishikawa M, Yamakawa Y. Synchronized High-Speed Vision Sensor Network for Expansion of Field of View. SENSORS 2018; 18:s18041276. [PMID: 29690512 PMCID: PMC5948748 DOI: 10.3390/s18041276] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/15/2018] [Accepted: 04/19/2018] [Indexed: 11/16/2022]
Abstract
We propose a 500-frames-per-second high-speed vision (HSV) sensor network that acquires frames at a timing that is precisely synchronized across the network. Multiple vision sensor nodes, individually comprising a camera and a PC, are connected via Ethernet for data transmission and for clock synchronization. A network of synchronized HSV sensors provides a significantly expanded field-of-view compared with that of each individual HSV sensor. In the proposed system, the shutter of each camera is controlled based on the clock of the PC locally provided inside the node, and the shutters are globally synchronized using the Precision Time Protocol (PTP) over the network. A theoretical analysis and experiment results indicate that the shutter trigger skew among the nodes is a few tens of microseconds at most, which is significantly smaller than the frame interval of 1000-fps-class high-speed cameras. Experimental results obtained with the proposed system comprising four nodes demonstrated the ability to capture the propagation of a small displacement along a large-scale structure.
Collapse
Affiliation(s)
- Akihito Noda
- Department of Mechatronics, Nanzan University, 18 Yamazato-cho, Showa-ku, Nagoya 466-8673, Japan.
| | - Satoshi Tabata
- Department of Information Physics and Computing, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.
| | - Masatoshi Ishikawa
- Department of Creative Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.
| | - Yuji Yamakawa
- Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan.
| |
Collapse
|