1
|
Lin P, Li C, Chen S, Huangfu J, Yuan W. Intelligent Gesture Recognition Based on Screen Reflectance Multi-Band Spectral Features. SENSORS (BASEL, SWITZERLAND) 2024; 24:5519. [PMID: 39275430 PMCID: PMC11398176 DOI: 10.3390/s24175519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 08/22/2024] [Accepted: 08/24/2024] [Indexed: 09/16/2024]
Abstract
Human-computer interaction (HCI) with screens through gestures is a pivotal method amidst the digitalization trend. In this work, a gesture recognition method is proposed that combines multi-band spectral features with spatial characteristics of screen-reflected light. Based on the method, a red-green-blue (RGB) three-channel spectral gesture recognition system has been developed, composed of a display screen integrated with narrowband spectral receivers as the hardware setup. During system operation, emitted light from the screen is reflected by gestures and received by the narrowband spectral receivers. These receivers at various locations are tasked with capturing multiple narrowband spectra and converting them into light-intensity series. The availability of multi-narrowband spectral data integrates multidimensional features from frequency and spatial domains, enhancing classification capabilities. Based on the RGB three-channel spectral features, this work formulates an RGB multi-channel convolutional neural network long short-term memory (CNN-LSTM) gesture recognition model. It achieves accuracies of 99.93% in darkness and 99.89% in illuminated conditions. This indicates the system's capability for stable operation across different lighting conditions and accurate interaction. The intelligent gesture recognition method can be widely applied for interactive purposes on various screens such as computers and mobile phones, facilitating more convenient and precise HCI.
Collapse
Affiliation(s)
- Peiying Lin
- School of Electrical and Information Engineering, Jiangsu University of Science and Technology, Zhangjiagang 215600, China
| | - Chenrui Li
- Laboratory of Applied Research on Electromagnetics, Zhejiang University, Hangzhou 310027, China
| | - Sijie Chen
- Laboratory of Applied Research on Electromagnetics, Zhejiang University, Hangzhou 310027, China
| | - Jiangtao Huangfu
- Laboratory of Applied Research on Electromagnetics, Zhejiang University, Hangzhou 310027, China
| | - Wei Yuan
- School of Electrical and Information Engineering, Jiangsu University of Science and Technology, Zhangjiagang 215600, China
| |
Collapse
|
2
|
Khetavath S, Sendhilkumar NC, Mukunthan P, Jana S, Gopalakrishnan S, Malliga L, Chand SR, Farhaoui Y. An Intelligent Heuristic Manta-Ray Foraging Optimization and Adaptive Extreme Learning Machine for Hand Gesture Image Recognition. BIG DATA MINING AND ANALYTICS 2023; 6:321-335. [DOI: 10.26599/bdma.2022.9020036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Affiliation(s)
- Seetharam Khetavath
- Chaitanya (Deemed to be University),Department of Electronics and Communication Engineering,Warangal,India,506001
| | - Navalpur Chinnappan Sendhilkumar
- Sri Indu College of Engineering & Technology, Sheriguda,Department of Electronics and Communication Engineering,Hyderabad,India,501510
| | - Pandurangan Mukunthan
- Sri Indu College of Engineering & Technology, Sheriguda,Department of Electronics and Communication Engineering,Hyderabad,India,501510
| | - Selvaganesan Jana
- Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,Department of Electronics and Communication Engineering,Chennai,India,600062
| | - Subburayalu Gopalakrishnan
- Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,Department of Electronics and Communication Engineering,Chennai,India,600062
| | - Lakshmanan Malliga
- Malla Reddy Engineering College for Women (Autonomous),Department of Electronics and Communication Engineering,Telangana,India,500100
| | - Sankuru Ravi Chand
- Nalla Narasimha Reddy Education Society's Group of Institutions-Integrated Campus,Department of Electronics and Communication Engineering,Hyderabad,India,500088
| | - Yousef Farhaoui
- STI Laboratory, the IDMS Team, Faculty of Sciences and Techniques, Moulay Ismail University of Meknès,Errachidia,Morocco,52000
| |
Collapse
|
3
|
John J, Deshpande S. Static hand gesture recognition using multi-dilated DenseNet-based deep learning architecture. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2023.2179965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Jogi John
- P.G. Department of Computer Science & Technology, D.C.P.E, Hanuman Vyayam Prasarak Mandal, Amravati University, Amravati, India
| | - Shrinivas Deshpande
- P.G. Department of Computer Science & Technology, D.C.P.E, Hanuman Vyayam Prasarak Mandal, Amravati University, Amravati, India
| |
Collapse
|
4
|
Huang G, Tran SN, Bai Q, Alty J. Real-time automated detection of older adults' hand gestures in home and clinical settings. Neural Comput Appl 2022; 35:8143-8156. [PMID: 36532882 PMCID: PMC9741488 DOI: 10.1007/s00521-022-08090-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 11/22/2022] [Indexed: 12/14/2022]
Abstract
There is an urgent need, accelerated by the COVID-19 pandemic, for methods that allow clinicians and neuroscientists to remotely evaluate hand movements. This would help detect and monitor degenerative brain disorders that are particularly prevalent in older adults. With the wide accessibility of computer cameras, a vision-based real-time hand gesture detection method would facilitate online assessments in home and clinical settings. However, motion blur is one of the most challenging problems in the fast-moving hands data collection. The objective of this study was to develop a computer vision-based method that accurately detects older adults' hand gestures using video data collected in real-life settings. We invited adults over 50 years old to complete validated hand movement tests (fast finger tapping and hand opening-closing) at home or in clinic. Data were collected without researcher supervision via a website programme using standard laptop and desktop cameras. We processed and labelled images, split the data into training, validation and testing, respectively, and then analysed how well different network structures detected hand gestures. We recruited 1,900 adults (age range 50-90 years) as part of the TAS Test project and developed UTAS7k-a new dataset of 7071 hand gesture images, split 4:1 into clear: motion-blurred images. Our new network, RGRNet, achieved 0.782 mean average precision (mAP) on clear images, outperforming the state-of-the-art network structure (YOLOV5-P6, mAP 0.776), and mAP 0.771 on blurred images. A new robust real-time automated network that detects static gestures from a single camera, RGRNet, and a new database comprising the largest range of individual hands, UTAS7k, both show strong potential for medical and research applications. Supplementary Information The online version contains supplementary material available at 10.1007/s00521-022-08090-8.
Collapse
Affiliation(s)
- Guan Huang
- College of Sciences and Engineering, University of Tasmania, Sandy Bay, TAS 7005 Australia
| | - Son N. Tran
- College of Sciences and Engineering, University of Tasmania, Sandy Bay, TAS 7005 Australia
| | - Quan Bai
- College of Sciences and Engineering, University of Tasmania, Sandy Bay, TAS 7005 Australia
| | - Jane Alty
- Wicking Dementia Research and Education Centre, University of Tasmania, Hobart, TAS 7000 Australia
- School of Medicine, University of Tasmania, Hobart, TAS 7000 Australia
| |
Collapse
|
5
|
Zhang C, Wang Z, An Q, Li S, Hoorfar A, Kou C. Clustering-Driven DGS-Based Micro-Doppler Feature Extraction for Automatic Dynamic Hand Gesture Recognition. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22218535. [PMID: 36366232 PMCID: PMC9657879 DOI: 10.3390/s22218535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/27/2022] [Accepted: 11/02/2022] [Indexed: 05/27/2023]
Abstract
We propose in this work a dynamic group sparsity (DGS) based time-frequency feature extraction method for dynamic hand gesture recognition (HGR) using millimeter-wave radar sensors. Micro-Doppler signatures of hand gestures show both sparse and structured characteristics in time-frequency domain, but previous study only focus on sparsity. We firstly introduce the structured prior when modeling the micro-Doppler signatures in this work to further enhance the features of hand gestures. The time-frequency distributions of dynamic hand gestures are first modeled using a dynamic group sparse model. A DGS-Subspace Pursuit (DGS-SP) algorithm is then utilized to extract the corresponding features. Finally, the support vector machine (SVM) classifier is employed to realize the dynamic HGR based on the extracted group sparse micro-Doppler features. The experiment shows that the proposed method achieved 3.3% recognition accuracy improvement over the sparsity-based method and has a better recognition accuracy than CNN based method in small dataset.
Collapse
Affiliation(s)
- Chengjin Zhang
- Beijing Key Laboratory of Millimeter Wave and Terahertz Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Zehao Wang
- Key Laboratory of Microwave Remote Sensing, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
| | - Qiang An
- Department of Biomedical Engineering, Fourth Military Medical University, Xi’an 710032, China
| | - Shiyong Li
- Beijing Key Laboratory of Millimeter Wave and Terahertz Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ahmad Hoorfar
- Antenna Research Laboratory, Center for Advanced Communications, Villanova University, Villanova, PA 19085, USA
| | - Chenxiao Kou
- Beijing Key Laboratory of Millimeter Wave and Terahertz Technology, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
6
|
Dynamic Hand Gesture Recognition for Smart Lifecare Routines via K-Ary Tree Hashing Classifier. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
In the past few years, home appliances have been influenced by the latest technologies and changes in consumer trends. One of the most desired gadgets of this time is a universal remote control for gestures. Hand gestures are the best way to control home appliances. This paper presents a novel method of recognizing hand gestures for smart home appliances using imaging sensors. The proposed model is divided into six steps. First, preprocessing is done to de-noise the video frames and resize each frame to a specific dimension. Second, the hand is detected using a single shot detector-based convolution neural network (SSD-CNN) model. Third, landmarks are localized on the hand using the skeleton method. Fourth, features are extracted based on point-based trajectories, frame differencing, orientation histograms, and 3D point clouds. Fifth, features are optimized using fuzzy logic, and last, the H-Hash classifier is used for the classification of hand gestures. The system is tested on two benchmark datasets, namely, the IPN hand dataset and Jester dataset. The recognition accuracy on the IPN hand dataset is 88.46% and on Jester datasets is 87.69%. Users can control their smart home appliances, such as television, radio, air conditioner, and vacuum cleaner, using the proposed system.
Collapse
|
7
|
Noreen I, Hamid M, Akram U, Malik S, Saleem M. Hand Pose Recognition Using Parallel Multi Stream CNN. SENSORS (BASEL, SWITZERLAND) 2021; 21:8469. [PMID: 34960562 PMCID: PMC8708730 DOI: 10.3390/s21248469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/13/2021] [Indexed: 11/17/2022]
Abstract
Recently, several computer applications provided operating mode through pointing fingers, waving hands, and with body movement instead of a mouse, keyboard, audio, or touch input such as sign language recognition, robot control, games, appliances control, and smart surveillance. With the increase of hand-pose-based applications, new challenges in this domain have also emerged. Support vector machines and neural networks have been extensively used in this domain using conventional RGB data, which are not very effective for adequate performance. Recently, depth data have become popular due to better understating of posture attributes. In this study, a multiple parallel stream 2D CNN (two-dimensional convolution neural network) model is proposed to recognize the hand postures. The proposed model comprises multiple steps and layers to detect hand poses from image maps obtained from depth data. The hyper parameters of the proposed model are tuned through experimental analysis. Three publicly available benchmark datasets: Kaggle, First Person, and Dexter, are used independently to train and test the proposed approach. The accuracy of the proposed method is 99.99%, 99.48%, and 98% using the Kaggle hand posture dataset, First Person hand posture dataset, and Dexter dataset, respectively. Further, the results obtained for F1 and AUC scores are also near-optimal. Comparative analysis with state-of-the-art shows that the proposed model outperforms the previous methods.
Collapse
Affiliation(s)
- Iram Noreen
- Department of Computer Science, Lahore Campus, Bahria University, Islamabad 54000, Pakistan;
| | - Muhammad Hamid
- Department of Statistics and Computer Science, University of Veterinary and Animal Sciences (UVAS), Lahore 54000, Pakistan;
| | - Uzma Akram
- Department of Computer Science, Lahore Campus, Bahria University, Islamabad 54000, Pakistan;
| | - Saadia Malik
- Department of Information Systems, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Muhammad Saleem
- Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| |
Collapse
|
8
|
Gammulle H, Denman S, Sridharan S, Fookes C. TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7689-7701. [PMID: 34478365 DOI: 10.1109/tip.2021.3108349] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction. Current gesture recognition methods have focused on recognising isolated gestures, and existing continuous gesture recognition methods are limited to two-stage approaches where independent models are required for detection and classification, with the performance of the latter being constrained by detection performance. In contrast, we introduce a single-stage continuous gesture recognition framework, called Temporal Multi-Modal Fusion (TMMF), that can detect and classify multiple gestures in a video via a single model. This approach learns the natural transitions between gestures and non-gestures without the need for a pre-processing segmentation step to detect individual gestures. To achieve this, we introduce a multi-modal fusion mechanism to support the integration of important information that flows from multi-modal inputs, and is scalable to any number of modes. Additionally, we propose Unimodal Feature Mapping (UFM) and Multi-modal Feature Mapping (MFM) models to map uni-modal features and the fused multi-modal features respectively. To further enhance performance, we propose a mid-point based loss function that encourages smooth alignment between the ground truth and the prediction, helping the model to learn natural gesture transitions. We demonstrate the utility of our proposed framework, which can handle variable-length input videos, and outperforms the state-of-the-art on three challenging datasets: EgoGesture, IPN hand and ChaLearn LAP Continuous Gesture Dataset (ConGD). Furthermore, ablation experiments show the importance of different components of the proposed framework.
Collapse
|
9
|
No Interface, No Problem: Gesture Recognition on Physical Objects Using Radar Sensing. SENSORS 2021; 21:s21175771. [PMID: 34502662 PMCID: PMC8433657 DOI: 10.3390/s21175771] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/12/2021] [Accepted: 08/20/2021] [Indexed: 11/30/2022]
Abstract
Physical objects are usually not designed with interaction capabilities to control digital content. Nevertheless, they provide an untapped source for interactions since every object could be used to control our digital lives. We call this the missing interface problem: Instead of embedding computational capacity into objects, we can simply detect users’ gestures on them. However, gesture detection on such unmodified objects has to date been limited in the spatial resolution and detection fidelity. To address this gap, we conducted research on micro-gesture detection on physical objects based on Google Soli’s radar sensor. We introduced two novel deep learning architectures to process range Doppler images, namely a three-dimensional convolutional neural network (Conv3D) and a spectrogram-based ConvNet. The results show that our architectures enable robust on-object gesture detection, achieving an accuracy of approximately 94% for a five-gesture set, surpassing previous state-of-the-art performance results by up to 39%. We also showed that the decibel (dB) Doppler range setting has a significant effect on system performance, as accuracy can vary up to 20% across the dB range. As a result, we provide guidelines on how to best calibrate the radar sensor.
Collapse
|
10
|
Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11094164] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Using gestures can help people with certain disabilities in communicating with other people. This paper proposes a lightweight model based on YOLO (You Only Look Once) v3 and DarkNet-53 convolutional neural networks for gesture recognition without additional preprocessing, image filtering, and enhancement of images. The proposed model achieved high accuracy even in a complex environment, and it successfully detected gestures even in low-resolution picture mode. The proposed model was evaluated on a labeled dataset of hand gestures in both Pascal VOC and YOLO format. We achieved better results by extracting features from the hand and recognized hand gestures of our proposed YOLOv3 based model with accuracy, precision, recall, and an F-1 score of 97.68, 94.88, 98.66, and 96.70%, respectively. Further, we compared our model with Single Shot Detector (SSD) and Visual Geometry Group (VGG16), which achieved an accuracy between 82 and 85%. The trained model can be used for real-time detection, both for static hand images and dynamic gestures recorded on a video.
Collapse
|
11
|
A Two-Stream CNN Model with Adaptive Adjustment of Receptive Field Dedicated to Flame Region Detection. Symmetry (Basel) 2021. [DOI: 10.3390/sym13030397] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Convolutional neural networks (CNN) have yielded state-of-the-art performance in image segmentation. Their application in video surveillance systems can provide very useful information for extinguishing fire in time. The current studies mostly focused on CNN-based flame image classification and have achieved good accuracy. However, the research of CNN-based flame region detection is extremely scarce due to the bulky network structures and high hardware configuration requirements of the state-of-the-art CNN models. Therefore, this paper presents a two-stream convolutional neural network for flame region detection (TSCNNFlame). TSCNNFlame is a lightweight CNN architecture including a spatial stream and temporal stream for detecting flame pixels in video sequences captured by fixed cameras. The static features from the spatial stream and dynamic features from the temporal stream are fused by three convolutional layers to reduce the false positives. We replace the convolutional layer of CNN with the selective kernel (SK)-Shuffle block constructed by integrating the SK convolution into the deep convolutional layer of ShuffleNet V2. The SKnet blocks can adaptively adjust the size of one receptive field with the proportion of one region of interest (ROI) in it. The grouped convolution used in Shufflenet solves the problem in which the multi-branch structure of SKnet causes the network parameters to double with the number of branches. Therefore, the CNN network dedicated to flame region detection balances the efficiency and accuracy by the lightweight architecture, the temporal–spatial features fusion, and the advantages of the SK-Shuffle block. The experimental results, which are evaluated by multiple metrics and are analyzed from many angles, show that this method can achieve significant performance while reducing the running time.
Collapse
|