1
|
Wedasingha N, Samarasinghe P, Senevirathna L, Papandrea M, Puiatti A, Rankin D. Automated anomalous child repetitive head movement identification through transformer networks. Phys Eng Sci Med 2023; 46:1427-1445. [PMID: 37814077 DOI: 10.1007/s13246-023-01309-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/24/2023] [Indexed: 10/11/2023]
Abstract
The increasing prevalence of behavioral disorders in children is of growing concern within the medical community. Recognising the significance of early identification and intervention for atypical behaviors, there is a consensus on their pivotal role in improving outcomes. Due to inadequate facilities and a shortage of medical professionals with specialized expertise, traditional diagnostic methods have been unable to effectively address the rising incidence of behavioral disorders. Hence, there is a need to develop automated approaches for the diagnosis of behavioral disorders in children, to overcome the challenges with traditional methods. The purpose of this study is to develop an automated model capable of analyzing videos to differentiate between typical and atypical repetitive head movements in. To address problems resulting from the limited availability of child datasets, various learning methods are employed to mitigate these issues. In this work, we present a fusion of transformer networks, and Non-deterministic Finite Automata (NFA) techniques, which classify repetitive head movements of a child as typical or atypical based on an analysis of gender, age, and type of repetitive head movement, along with count, duration, and frequency of each repetitive head movement. Experimentation was carried out with different transfer learning methods to enhance the performance of the model. The experimental results on five datasets: NIR face dataset, Bosphorus 3D face dataset, ASD dataset, SSBD dataset, and the Head Movements in the Wild dataset, indicate that our proposed model has outperformed many state-of-the-art frameworks when distinguishing typical and atypical repetitive head movements in children.
Collapse
Affiliation(s)
- Nushara Wedasingha
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka.
| | - Pradeepa Samarasinghe
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka
| | - Lasantha Senevirathna
- Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Rd, Malabe, 10115, Colombo, Sri Lanka
| | - Michela Papandrea
- Information Systems and Networking Institute (ISIN), University of Applied Sciences and Arts of Southern Switzerland, Via Pobiette, Manno, 6928, Switzerland
| | - Alessandro Puiatti
- Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland, Via Pobiette, Manno, 6928, Switzerland
| | - Debbie Rankin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Northland Road, Derry-Londonderry, BT48 7JL, Northern Ireland, UK
| |
Collapse
|
2
|
Liu F, Xu X, Xing X, Guo K, Wang L. Simple-action-guided dictionary learning for complex action recognition. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Xu Y, Xu X, Han G, He S. Holistically Associated Transductive Zero-Shot Learning. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3049274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Yangyang Xu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Xuemiao Xu
- School of Computer Science and Engineering, Ministry of Education Key Laboratory of Big Data and Intelligent Robot and Guangdong, and State Key Laboratory of Subtropical Building Science, Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China
| | - Guoqiang Han
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Shengfeng He
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
4
|
Research on Repetition Counting Method Based on Complex Action Label String. MACHINES 2022. [DOI: 10.3390/machines10060419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Smart factories have real-time demands for the statistics of productivity to meet the needs of quick reaction capabilities. To solve this problem, a counting method based on our decomposition strategy of actions was proposed for complex actions. Our method needs to decompose complex actions into several essential actions and define a label string for each complex action according to the sequence of the essential actions. While counting, we firstly employ an online action recognition algorithm to transform video frames into label numbers, which will be stored in a result queue. Then, the label strings are searched for their results in queue. If the search succeeds, a complex action will be considered to have occurred. Meanwhile, the corresponding counter should be updated to accomplish counting. The comparison test results in a video dataset of workers’ repetitive movements in package printing production lines and illustrate that our method has a lower counting errors, MAE (mean absolute error) less than 5% as well as an OBOA (off-by-one accuracy) more than 90%. Moreover, to enhance the adaptability of the action recognition model to deal with the change of action duration, we propose an adaptive parameter module based on the Kalman filter, which improves counting performances to a certain extent. The conclusions are that our method can achieve high counting performance, and the adaptive parameter module can further improve performances.
Collapse
|
5
|
A Deep Learning and Clustering Extraction Mechanism for Recognizing the Actions of Athletes in Sports. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2663834. [PMID: 35371202 PMCID: PMC8970900 DOI: 10.1155/2022/2663834] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/11/2022] [Accepted: 03/01/2022] [Indexed: 12/24/2022]
Abstract
In sports, the essence of a complete technical action is a complete information structure pattern and the athlete's judgment of the action is actually the identification of the movement information structure pattern. Action recognition refers to the ability of the human brain to distinguish a perceived action from other actions and obtain predictive response information when it identifies and confirms it according to the constantly changing motion information on the field. Action recognition mainly includes two aspects: one is to obtain the required action information based on visual observation and the other is to judge the action based on the obtained action information, but the neuropsychological mechanism of this process is still unknown. In this paper, a new key frame extraction method based on the clustering algorithm and multifeature fusion is proposed for sports videos with complex content, many scenes, and rich actions. First, a variety of features are fused, and then, similarity measurement can be used to describe videos with complex content more completely and comprehensively; second, a clustering algorithm is used to cluster sports video sequences according to scenes, eliminating the need for shots in the case of many scenes. It is difficult and complicated to detect segmentation; third, extracting key frames according to the minimum motion standard can more accurately represent the video content with rich actions. At the same time, the clustering algorithm used in this paper is improved to enhance the offline computing efficiency of the key frame extraction system. Based on the analysis of the advantages and disadvantages of the classical convolutional neural network and recurrent neural network algorithms in deep learning, this paper proposes an improved convolutional network and optimization based on the recognition and analysis of human actions under complex scenes, complex actions, and fast motion compared to post-neural network and hybrid neural network algorithm. Experiments show that the algorithm achieves similar human observation of athletes' training execution and completion. Compared with other algorithms, it has been verified that it has very high learning rate and accuracy for the athlete's action recognition.
Collapse
|
6
|
Dev K, Ashraf Z, Muhuri PK, Kumar S. Deep autoencoder based domain adaptation for transfer learning. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:22379-22405. [PMID: 35310888 PMCID: PMC8923974 DOI: 10.1007/s11042-022-12226-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 12/21/2021] [Accepted: 01/13/2022] [Indexed: 06/14/2023]
Abstract
The concept of transfer learning has received a great deal of concern and interest throughout the last decade. Selecting an ideal representational framework for instances of various domains to minimize the divergence among source and target domains is a fundamental research challenge in representative transfer learning. The domain adaptation approach is designed to learn more robust or higher-level features, required in transfer learning. This paper presents a novel transfer learning framework that employs a marginal probability-based domain adaptation methodology followed by a deep autoencoder. The proposed frame adapts the source and target domain by plummeting distribution deviation between the features of both domains. Further, we adopt the deep neural network process to transfer learning and suggest a supervised learning algorithm based on encoding and decoding layer architecture. Moreover, we have proposed two different variants of the transfer learning techniques for classification, which are termed as (i) Domain adapted transfer learning with deep autoencoder-1 (D-TLDA-1) using the linear regression and (ii) Domain adapted transfer learning with deep autoencoder-2 (D-TLDA-2) using softmax regression. Simulations have been conducted with two popular real-world datasets: ImageNet datasets for image classification problem and 20_Newsgroups datasets for text classification problem. Experimental findings have established and the resulting improvements in accuracy measure of classification shows the supremacy of the proposed D-TLDA framework over prominent state-of-the-art machine learning and transfer learning approaches.
Collapse
Affiliation(s)
- Krishna Dev
- RPATech (Spawn ventures services private. limited), Gurugram, Hariyana 122011 India
| | - Zubair Ashraf
- Department of Computer Science, Aligarh Muslim University, Aligarh, UP-202002 India
| | - Pranab K. Muhuri
- Department of Computer Science, South Asian University, New Delhi, 110021 India
| | - Sandeep Kumar
- Department of Computer Science, South Asian University, New Delhi, 110021 India
| |
Collapse
|
7
|
Gerace F, Saglietti L, Sarao Mannelli S, Saxe A, Zdeborová L. Probing transfer learning with a model of synthetic correlated datasets. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac4f3f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classification setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneficial for generalization.
Collapse
|
8
|
Prabha B, Priya M, Shanker N, Ganesh E. Aberrant behavior prediction and severity analysis for autistic child through deep transfer learning to avoid adverse drug effect. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.103038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Sun B, Wang S, Kong D, Wang L, Yin B. Real-Time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model. IEEE TRANSACTIONS ON CYBERNETICS 2021; PP:4837-4849. [PMID: 34437085 DOI: 10.1109/tcyb.2021.3100507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
3-D action recognition is referred to as the classification of action sequences which consist of 3-D skeleton joints. While many research works are devoted to 3-D action recognition, it mainly suffers from three problems: 1) highly complicated articulation; 2) a great amount of noise; and 3) low implementation efficiency. To tackle all these problems, we propose a real-time 3-D action-recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model. We first define the skeletonlet as a few combinations of joint offsets grouped in terms of the kinematic principle and then represent an action sequence using LAKS, which consists of a denoising phase and a locally aggregating phase. The denoising phase detects the noisy action data and adjusts it by replacing all the features within it with the features of the corresponding previous frame, while the locally aggregating phase sums the difference between an offset feature of the skeletonlet and its cluster center together over all the offset features of the sequence. Finally, the SHA model combines sparse representation with a hashing model, aiming at promoting the recognition accuracy while maintaining high efficiency. Experimental results on MSRAction3D, UTKinectAction3D, and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.
Collapse
|
10
|
Xu Y, Han C, Qin J, Xu X, Han G, He S. Transductive Zero-Shot Action Recognition via Visually Connected Graph Convolutional Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3761-3769. [PMID: 32822308 DOI: 10.1109/tnnls.2020.3015848] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the explosive growth of action categories, zero-shot action recognition aims to extend a well-trained model to novel/unseen classes. To bridge the large knowledge gap between seen and unseen classes, in this brief, we visually associate unseen actions with seen categories in a visually connected graph, and the knowledge is then transferred from the visual features space to semantic space via the grouped attention graph convolutional networks (GAGCNs). In particular, we extract visual features for all the actions, and a visually connected graph is built to attach seen actions to visually similar unseen categories. Moreover, the proposed grouped attention mechanism exploits the hierarchical knowledge in the graph so that the GAGCN enables propagating the visual-semantic connections from seen actions to unseen ones. We extensively evaluate the proposed method on three data sets: HMDB51, UCF101, and NTU RGB + D. Experimental results show that the GAGCN outperforms state-of-the-art methods.
Collapse
|
11
|
Ren Z, Zhang Q, Cheng J, Hao F, Gao X. Segment spatial-temporal representation and cooperative learning of convolution neural networks for multimodal-based action recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
12
|
Ren CX, Feng J, Dai DQ, Yan S. Heterogeneous Domain Adaptation via Covariance Structured Feature Translators. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2166-2177. [PMID: 31880576 DOI: 10.1109/tcyb.2019.2957033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Domain adaptation (DA) and transfer learning with statistical property description is very important in image analysis and data classification. This article studies the domain adaptive feature representation problem for the heterogeneous data, of which both the feature dimensions and the sample distributions across domains are so different that their features cannot be matched directly. To transfer the discriminant information efficiently from the source domain to the target domain, and then enhance the classification performance for the target data, we first introduce two projection matrices specified for different domains to transform the heterogeneous features into a shared space. We then propose a joint kernel regression model to learn the regression variable, which is called feature translator in this article. The novelty focuses on the exploration of optimal experimental design (OED) to deal with the heterogeneous and nonlinear DA by seeking the covariance structured feature translators (CSFTs). An approximate and efficient method is proposed to compute the optimal data projections. Comprehensive experiments are conducted to validate the effectiveness and efficacy of the proposed model. The results show the state-of-the-art performance of our method in heterogeneous DA.
Collapse
|
13
|
Tong M, Bai H, Yue X, Bu H. PTL-LTM model for complex action recognition using local-weighted NMF and deep dual-manifold regularized NMF with sparsity constraint. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04783-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Liu F, Xu X, Zhang T, Guo K, Wang L. Exploring privileged information from simple actions for complex action recognition. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.020] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
15
|
Ren CX, Xu XL, Yan H. Generalized Conditional Domain Adaptation: A Causal Perspective With Low-Rank Translators. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:821-834. [PMID: 30346301 DOI: 10.1109/tcyb.2018.2874219] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Learning domain adaptive features aims to enhance the classification performance of the target domain by exploring the discriminant information from an auxiliary source set. Let X denote the feature and Y as the label. The most typical problem to be addressed is that P XY has a so large variation between different domains that classification in the target domain is difficult. In this paper, we study the generalized conditional domain adaptation (DA) problem, in which both P Y and P X|Y change across domains, in a causal perspective. We propose transforming the class conditional probability matching to the marginal probability matching problem, under a proper assumption. We build an intermediate domain by employing a regression model. In order to enforce the most relevant data to reconstruct the intermediate representations, a low-rank constraint is placed on the regression model for regularization. The low-rank constraint underlines a global algebraic structure between different domains, and stresses the group compactness in representing the samples. The new model is considered under the discriminant subspace framework, which is favorable in simultaneously extracting the classification information from the source domain and adaptation information across domains. The model can be solved by an alternative optimization manner of quadratic programming and the alternative Lagrange multiplier method. To the best of our knowledge, this paper is the first to exploit low-rank representation, from the source domain to the intermediate domain, to learn the domain adaptive features. Comprehensive experimental results validate that the proposed method provides better classification accuracies with DA, compared with well-established baselines.
Collapse
|
16
|
Liu Y, Lu Z, Li J, Yang T, Yao C. Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3168-3182. [PMID: 31831421 DOI: 10.1109/tip.2019.2957930] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Existing deep learning methods for action recognition in videos require a large number of labeled videos for training, which is labor-intensive and time-consuming. For the same action, the knowledge learned from different media types, e.g., videos and images, may be related and complementary. However, due to the domain shifts and heterogeneous feature representations between videos and images, the performance of classifiers trained on images may be dramatically degraded when directly deployed to videos. In this paper, we propose a novel method, named Deep Image-to-Video Adaptation and Fusion Networks (DIVAFN), to enhance action recognition in videos by transferring knowledge from images using video keyframes as a bridge. The DIVAFN is a unified deep learning model, which integrates domain-invariant representations learning and cross-modal feature fusion into a unified optimization framework. Specifically, we design an efficient cross-modal similarities metric to reduce the modality shift among images, keyframes and videos. Then, we adopt an autoencoder architecture, whose hidden layer is constrained to be the semantic representations of the action class names. In this way, when the autoencoder is adopted to project the learned features from different domains to the same space, more compact, informative and discriminative representations can be obtained. Finally, the concatenation of the learned semantic feature representations from these three autoencoders are used to train the classifier for action recognition in videos. Comprehensive experiments on four real-world datasets show that our method outperforms some state-of-the-art domain adaptation and action recognition methods.
Collapse
|
17
|
Arivazhagan S, Shebiah RN, Harini R, Swetha S. Human action recognition from RGB-D data using complete local binary pattern. COGN SYST RES 2019. [DOI: 10.1016/j.cogsys.2019.05.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
18
|
Mohammadi E, Jonathan Wu Q, Saif M, Yang Y. Hierarchical feature representation for unconstrained video analysis. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.06.097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
19
|
Wei P, Ke Y, Goh CK. Feature Analysis of Marginalized Stacked Denoising Autoenconder for Unsupervised Domain Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1321-1334. [PMID: 30281483 DOI: 10.1109/tnnls.2018.2868709] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Marginalized stacked denoising autoencoder (mSDA), has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we investigate the rationale for why mSDA benefits domain adaptation tasks from the perspective of adaptive regularization. Our investigations focus on two types of feature corruption noise: Gaussian noise (mSDA g ) and Bernoulli dropout noise (mSDA bd ). Both theoretical and empirical results demonstrate that mSDA bd successfully boosts the adaptation performance but mSDA g fails to do so. We then propose a new mSDA with data-dependent multinomial dropout noise (mSDA md ) that overcomes the limitations of mSDA bd and further improves the adaptation performance. Our mSDA md is based on a more realistic assumption: different features are correlated and, thus, should be corrupted with different probabilities. Experimental results demonstrate the superiority of mSDA md to mSDA bd on the adaptation performance and the convergence speed. Finally, we propose a deep transferable feature coding (DTFC) framework for unsupervised domain adaptation. The motivation of DTFC is that mSDA fails to consider the distribution discrepancy across different domains in the feature learning process. We introduce a new element to mSDA: domain divergence minimization by maximum mean discrepancy. This element is essential for domain adaptation as it ensures the extracted deep features to have a small distribution discrepancy. The effectiveness of DTFC is verified by extensive experiments on three benchmark data sets for both Bernoulli dropout noise and multinomial dropout noise.
Collapse
|
20
|
Zhang J, Shum HPH, Han J, Shao L. Arbitrary View Action Recognition via Transfer Dictionary Learning on Synthetic Training Data. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4709-4723. [PMID: 29994770 DOI: 10.1109/tip.2018.2836323] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Human action recognition is crucial to many practical applications, ranging from human-computer interaction to video surveillance. Most approaches either recognize the human action from a fixed view or require the knowledge of view angle, which is usually not available in practical applications. In this paper, we propose a novel end-to-end framework to jointly learn a view-invariance transfer dictionary and a view-invariant classifier. The result of the process is a dictionary that can project real-world 2D video into a view-invariant sparse representation, as well as a classifier to recognize actions with an arbitrary view. The main feature of our algorithm is the use of synthetic data to extract view-invariance between 3D and 2D videos during the pre-training phase. This guarantees the availability of training data, and removes the hassle of obtaining real-world videos in specific viewing angles. Additionally, for better describing the actions in 3D videos, we introduce a new feature set called the 3D dense trajectories to effectively encode extracted trajectory information on 3D videos. Experimental results on the IXMAS, N-UCLA, i3DPost and UWA3DII datasets show improvements over existing algorithms.
Collapse
|
21
|
Liu J, Wang G, Duan LY, Abdiyeva K, Kot AC. Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1586-1599. [PMID: 29324413 DOI: 10.1109/tip.2017.2785279] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, long short-term memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. To further improve the attention capability, we also introduce a recurrent attention mechanism, with which the attention performance of our network can be enhanced progressively. Besides, a two-stream framework, which leverages coarse-grained attention and fine-grained attention, is also introduced. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition.
Collapse
|
22
|
|