1
|
Zhao H, Li Z, Chen W, Zheng Z, Xie S. Accelerated Partially Shared Dictionary Learning With Differentiable Scale-Invariant Sparsity for Multi-View Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8825-8839. [PMID: 35254997 DOI: 10.1109/tnnls.2022.3153310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Multiview dictionary learning (DL) is attracting attention in multiview clustering due to the efficient feature learning ability. However, most existing multiview DL algorithms are facing problems in fully utilizing consistent and complementary information simultaneously in the multiview data and learning the most precise representation for multiview clustering because of gaps between views. This article proposes an efficient multiview DL algorithm for multiview clustering, which uses the partially shared DL model with a flexible ratio of shared sparse coefficients to excavate both consistency and complementarity in the multiview data. In particular, a differentiable scale-invariant function is used as the sparsity regularizer, which considers the absolute sparsity of coefficients as the l0 norm regularizer but is continuous and differentiable almost everywhere. The corresponding optimization problem is solved by the proximal splitting method with extrapolation technology; moreover, the proximal operator of the differentiable scale-invariant regularizer can be derived. The synthetic experiment results demonstrate that the proposed algorithm can recover the synthetic dictionary well with reasonable convergence time costs. Multiview clustering experiments include six real-world multiview datasets, and the performances show that the proposed algorithm is not sensitive to the regularizer parameter as the other algorithms. Furthermore, an appropriate coefficient sharing ratio can help to exploit consistent information while keeping complementary information from multiview data and thus enhance performances in multiview clustering. In addition, the convergence performances show that the proposed algorithm can obtain the best performances in multiview clustering among compared algorithms and can converge faster than compared multiview algorithms mostly.
Collapse
|
2
|
Liu F, Xu X, Xing X, Guo K, Wang L. Simple-action-guided dictionary learning for complex action recognition. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Zhao H, Zhong P, Chen H, Li Z, Chen W, Zheng Z. Group non-convex sparsity regularized partially shared dictionary learning for multi-view learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
4
|
Neural Networks for Automatic Posture Recognition in Ambient-Assisted Living. SENSORS 2022; 22:s22072609. [PMID: 35408224 PMCID: PMC9003043 DOI: 10.3390/s22072609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 03/25/2022] [Accepted: 03/27/2022] [Indexed: 12/29/2022]
Abstract
Human Action Recognition (HAR) is a rapidly evolving field impacting numerous domains, among which is Ambient Assisted Living (AAL). In such a context, the aim of HAR is meeting the needs of frail individuals, whether elderly and/or disabled and promoting autonomous, safe and secure living. To this goal, we propose a monitoring system detecting dangerous situations by classifying human postures through Artificial Intelligence (AI) solutions. The developed algorithm works on a set of features computed from the skeleton data provided by four Kinect One systems simultaneously recording the scene from different angles and identifying the posture of the subject in an ecological context within each recorded frame. Here, we compare the recognition abilities of Multi-Layer Perceptron (MLP) and Long-Short Term Memory (LSTM) Sequence networks. Starting from the set of previously selected features we performed a further feature selection based on an SVM algorithm for the optimization of the MLP network and used a genetic algorithm for selecting the features for the LSTM sequence model. We then optimized the architecture and hyperparameters of both models before comparing their performances. The best MLP model (3 hidden layers and a Softmax output layer) achieved 78.4%, while the best LSTM (2 bidirectional LSTM layers, 2 dropout and a fully connected layer) reached 85.7%. The analysis of the performances on individual classes highlights the better suitability of the LSTM approach.
Collapse
|
5
|
Xu C, Wu X, Li Y, Jin Y, Wang M, Liu Y. Cross-modality online distillation for multi-view action recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
T-VLAD: Temporal Vector of Locally Aggregated Descriptor for Multiview Human Action Recognition. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.04.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A, Beghdadi A. A combined multiple action recognition and summarization for surveillance video sequences. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01823-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractHuman action recognition and video summarization represent challenging tasks for several computer vision applications including video surveillance, criminal investigations, and sports applications. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection extracts human bodies’ silhouette, then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the Temporal Difference Map (TDMap) of the frames of each shot, we recognize the action by performing a comparison between the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for training. Also, using the TDMap images we recognize the action using a proposed CNN model. Action summarization is performed for each detected person. The efficiency of the proposed approach is shown through the obtained results for mainly multi-action detection and recognition.
Collapse
|
8
|
Huang Y, Zheng F, Cong R, Huang W, Scott MR, Shao L. MCMT-GAN: Multi-Task Coherent Modality Transferable GAN for 3D Brain Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8187-8198. [PMID: 32746245 DOI: 10.1109/tip.2020.3011557] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The ability to synthesize multi-modality data is highly desirable for many computer-aided medical applications, e.g. clinical diagnosis and neuroscience research, since rich imaging cohorts offer diverse and complementary information unraveling human tissues. However, collecting acquisitions can be limited by adversary factors such as patient discomfort, expensive cost and scanner unavailability. In this paper, we propose a multi-task coherent modality transferable GAN (MCMT-GAN) to address this issue for brain MRI synthesis in an unsupervised manner. Through combining the bidirectional adversarial loss, cycle-consistency loss, domain adapted loss and manifold regularization in a volumetric space, MCMT-GAN is robust for multi-modality brain image synthesis with visually high fidelity. In addition, we complement discriminators collaboratively working with segmentors which ensure the usefulness of our results to segmentation task. Experiments evaluated on various cross-modality synthesis show that our method produces visually impressive results with substitutability for clinical post-processing and also exceeds the state-of-the-art methods.
Collapse
|
9
|
|
10
|
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, Abbasi AA. Human action recognition using fusion of multiview and deep features: an application to video surveillance. MULTIMEDIA TOOLS AND APPLICATIONS 2020; 83:14885-14911. [DOI: 10.1007/s11042-020-08806-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 01/27/2020] [Accepted: 02/28/2020] [Indexed: 08/25/2024]
|
11
|
Zheng X, Chen X, Lu X. A Joint Relationship Aware Neural Network for Single-Image 3D Human Pose Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4747-4758. [PMID: 32070954 DOI: 10.1109/tip.2020.2972104] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper studies the task of 3D human pose estimation from a single RGB image, which is challenging without depth information. Recently many deep learning methods are proposed and achieve great improvements due to their strong representation learning. However, most existing methods ignore the relationship between joint features. In this paper, a joint relationship aware neural network is proposed to take both global and local joint relationship into consideration. First, a whole feature block representing all human body joints is extracted by a convolutional neural network. A Dual Attention Module (DAM) is applied on the whole feature block to generate attention weights. By exploiting the attention module, the global relationship between the whole joints is encoded. Second, the weighted whole feature block is divided into some individual joint features. To capture salient joint feature, the individual joint features are refined by individual DAMs. Finally, a joint angle prediction constraint is proposed to consider local joint relationship. Quantitative and qualitative experiments on 3D human pose estimation benchmarks demonstrate the effectiveness of the proposed method.
Collapse
|
12
|
|
13
|
Wang L, Huynh DQ, Koniusz P. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:15-28. [PMID: 31283506 DOI: 10.1109/tip.2019.2925285] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare 10 recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that the skeleton-based features are more robust for cross-view recognition than the depth-based features, and that the deep learning features are suitable for large datasets.
Collapse
|
14
|
Tong M, Li M, Bai H, Ma L, Zhao M. DKD–DAD: a novel framework with discriminative kinematic descriptor and deep attention-pooled descriptor for action recognition. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04030-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Novel Cross-View Human Action Model Recognition Based on the Powerful View-Invariant Features Technique. FUTURE INTERNET 2018. [DOI: 10.3390/fi10090089] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
One of the most important research topics nowadays is human action recognition, which is of significant interest to the computer vision and machine learning communities. Some of the factors that hamper it include changes in postures and shapes and the memory space and time required to gather, store, label, and process the pictures. During our research, we noted a considerable complexity to recognize human actions from different viewpoints, and this can be explained by the position and orientation of the viewer related to the position of the subject. We attempted to address this issue in this paper by learning different special view-invariant facets that are robust to view variations. Moreover, we focused on providing a solution to this challenge by exploring view-specific as well as view-shared facets utilizing a novel deep model called the sample-affinity matrix (SAM). These models can accurately determine the similarities among samples of videos in diverse angles of the camera and enable us to precisely fine-tune transfer between various views and learn more detailed shared facets found in cross-view action identification. Additionally, we proposed a novel view-invariant facets algorithm that enabled us to better comprehend the internal processes of our project. Using a series of experiments applied on INRIA Xmas Motion Acquisition Sequences (IXMAS) and the Northwestern–UCLA Multi-view Action 3D (NUMA) datasets, we were able to show that our technique performs much better than state-of-the-art techniques.
Collapse
|