1
|
Yasin H, Ghani S, Krüger B. An Effective and Efficient Approach for 3D Recovery of Human Motion Capture Data. SENSORS (BASEL, SWITZERLAND) 2023; 23:3664. [PMID: 37050724 PMCID: PMC10098987 DOI: 10.3390/s23073664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/10/2023] [Accepted: 03/14/2023] [Indexed: 06/19/2023]
Abstract
In this work, we propose a novel data-driven approach to recover missing or corrupted motion capture data, either in the form of 3D skeleton joints or 3D marker trajectories. We construct a knowledge-base that contains prior existing knowledge, which helps us to make it possible to infer missing or corrupted information of the motion capture data. We then build a kd-tree in parallel fashion on the GPU for fast search and retrieval of this already available knowledge in the form of nearest neighbors from the knowledge-base efficiently. We exploit the concept of histograms to organize the data and use an off-the-shelf radix sort algorithm to sort the keys within a single processor of GPU. We query the motion missing joints or markers, and as a result, we fetch a fixed number of nearest neighbors for the given input query motion. We employ an objective function with multiple error terms that substantially recover 3D joints or marker trajectories in parallel on the GPU. We perform comprehensive experiments to evaluate our approach quantitatively and qualitatively on publicly available motion capture datasets, namely CMU and HDM05. From the results, it is observed that the recovery of boxing, jumptwist, run, martial arts, salsa, and acrobatic motion sequences works best, while the recovery of motion sequences of kicking and jumping results in slightly larger errors. However, on average, our approach executes outstanding results. Generally, our approach outperforms all the competing state-of-the-art methods in the most test cases with different action sequences and executes reliable results with minimal errors and without any user interaction.
Collapse
Affiliation(s)
- Hashim Yasin
- Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan;
| | - Saba Ghani
- Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan;
| | - Björn Krüger
- Faculty of Information, Media and Electrical Engineering, Institute of Media and Imaging Technology, TH Köln—University of Applied Sciences, 50679 Köln, Germany;
| |
Collapse
|
2
|
Hu W, Zhu X, Wang T, Yi Y, Yu G. Discrete subspace structure constrained human motion capture data recovery. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
3
|
Xia G, Xue P, Sun H, Sun Y, Zhang D, Liu Q. Local Self-Expression Subspace Learning Network for Motion Capture Data. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4869-4883. [PMID: 35839181 DOI: 10.1109/tip.2022.3189822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep subspace learning is an important branch of self-supervised learning and has been a hot research topic in recent years, but current methods do not fully consider the individualities of temporal data and related tasks. In this paper, by transforming the individualities of motion capture data and segmentation task as the supervision, we propose the local self-expression subspace learning network. Specifically, considering the temporality of motion data, we use the temporal convolution module to extract temporal features. To implement the local validity of self-expression in temporal tasks, we design the local self-expression layer which only maintains the representation relations with temporally adjacent motion frames. To simulate the interpolatability of motion data in the feature space, we impose a group sparseness constraint on the local self-expression layer to impel the representations only using selected keyframes. Besides, based on the subspace assumption, we propose the subspace projection loss, which is induced from distances of each frame projected to the fitted subspaces, to penalize the potential clustering errors. The superior performances of the proposed model on the segmentation task of synthetic data and three tasks of real motion capture data demonstrate the feature learning ability of our model.
Collapse
|
4
|
Xue Y, Chen J, Gu X, Ma H, Ma H. Boosting Monocular 3D Human Pose Estimation With Part Aware Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4278-4291. [PMID: 35709111 DOI: 10.1109/tip.2022.3182269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Monocular 3D human pose estimation is challenging due to depth ambiguity. Convolution-based and Graph-Convolution-based methods have been developed to extract 3D information from temporal cues in motion videos. Typically, in the lifting-based methods, most recent works adopt the transformer to model the temporal relationship of 2D keypoint sequences. These previous works usually consider all the joints of a skeleton as a whole and then calculate the temporal attention based on the overall characteristics of the skeleton. Nevertheless, the human skeleton exhibits obvious part-wise inconsistency of motion patterns. It is therefore more appropriate to consider each part's temporal behaviors separately. To deal with such part-wise motion inconsistency, we propose the Part Aware Temporal Attention module to extract the temporal dependency of each part separately. Moreover, the conventional attention mechanism in 3D pose estimation usually calculates attention within a short time interval. This indicates that only the correlation within the temporal context is considered. Whereas, we find that the part-wise structure of the human skeleton is repeating across different periods, actions, and even subjects. Therefore, the part-wise correlation at a distance can be utilized to further boost 3D pose estimation. We thus propose the Part Aware Dictionary Attention module to calculate the attention for the part-wise features of input in a dictionary, which contains multiple 3D skeletons sampled from the training set. Extensive experimental results show that our proposed part aware attention mechanism helps a transformer-based model to achieve state-of-the-art 3D pose estimation performance on two widely used public datasets. The codes and the trained models are released at https://github.com/thuxyz19/3D-HPE-PAA.
Collapse
|
5
|
Xia G, Xue P, Zhang D, Liu Q. Likelihood-constrained coupled space learning for motion synthesis. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-020-00186-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractMany regions of human movement capturing are commonly used. Still, it includes a complicated capturing method, and the obtained information contains missing information invariably due to the human's body or clothing structure. Recovery of motion that aims to recover from degraded observation and the underlying complete sequence of motion is still a difficult task, because the nonlinear structure and the filming property is integrated into the movements. Machine learning model based two-dimensional matrix computation (MM-TDMC) approach demonstrates promising performance in short-term motion recovery problems. However, the theoretical guarantee for the recovery of nonlinear movement information lacks in the two-dimensional matrix computation model developed for linear information. To overcome this drawback, this study proposes MM-TDMC for human motion and dance recovery. The advantages of the machine learning-based Two-dimensional matrix computation model for human motion and dance recovery shows extensive experimental results and comparisons with auto-conditioned recurrent neural network, multimodal corpus, low-rank matrix completion, and kinect sensors methods.
Collapse
|
7
|
Xia G, Chen B, Sun H, Liu Q. Nonconvex Low-Rank Kernel Sparse Subspace Learning for Keyframe Extraction and Motion Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1612-1626. [PMID: 32340963 DOI: 10.1109/tnnls.2020.2985817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
By exploiting the kernel trick, the sparse subspace model is extended to the nonlinear version with one or a combination of predefined kernels, but the high-dimensional space induced by predefined kernels is not guaranteed to be able to capture the features of the nonlinear data in theory. In this article, we propose a nonconvex low-rank learning framework in an unsupervised way to learn a kernel to replace the predefined kernel in the sparse subspace model. The learned kernel by a nonconvex relaxation of rank can better exploiting the low-rank property of nonlinear data to induce a high-dimensional Hilbert space that more closely approaches the true feature space. Furthermore, we give a global closed-form optimal solution of the nonconvex rank minimization and prove it. Considering the low-rank and sparseness characteristics of motion capture data in its feature space, we use them to verify the better representation of nonlinear data with the learned kernel via two tasks: keyframe extraction and motion segmentation. The performances on both tasks demonstrate the advantage of our model over the sparse subspace model with predefined kernels and some other related state-of-art methods.
Collapse
|
8
|
Qin A, Xian L, Yang Y, Zhang T, Tang YY. Low-Rank Matrix Recovery from Noise via an MDL Framework-Based Atomic Norm. SENSORS (BASEL, SWITZERLAND) 2020; 20:E6111. [PMID: 33121059 PMCID: PMC7663647 DOI: 10.3390/s20216111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 10/16/2020] [Accepted: 10/22/2020] [Indexed: 06/11/2023]
Abstract
The recovery of the underlying low-rank structure of clean data corrupted with sparse noise/outliers is attracting increasing interest. However, in many low-level vision problems, the exact target rank of the underlying structure and the particular locations and values of the sparse outliers are not known. Thus, the conventional methods cannot separate the low-rank and sparse components completely, especially in the case of gross outliers or deficient observations. Therefore, in this study, we employ the minimum description length (MDL) principle and atomic norm for low-rank matrix recovery to overcome these limitations. First, we employ the atomic norm to find all the candidate atoms of low-rank and sparse terms, and then we minimize the description length of the model in order to select the appropriate atoms of low-rank and the sparse matrices, respectively. Our experimental analyses show that the proposed approach can obtain a higher success rate than the state-of-the-art methods, even when the number of observations is limited or the corruption ratio is high. Experimental results utilizing synthetic data and real sensing applications (high dynamic range imaging, background modeling, removing noise and shadows) demonstrate the effectiveness, robustness and efficiency of the proposed method.
Collapse
Affiliation(s)
- Anyong Qin
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Lina Xian
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (L.X.); (Y.Y.)
| | - Yongliang Yang
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (L.X.); (Y.Y.)
| | - Taiping Zhang
- College of Computer Science, Chongqing University, Chongqing 400030, China;
| | - Yuan Yan Tang
- Faculty of Science and Technology, University of Macau, Macau 999078, China;
| |
Collapse
|
9
|
Low-Rank and Sparse Recovery of Human Gait Data. SENSORS 2020; 20:s20164525. [PMID: 32823505 PMCID: PMC7472490 DOI: 10.3390/s20164525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 08/07/2020] [Accepted: 08/07/2020] [Indexed: 11/17/2022]
Abstract
Due to occlusion or detached markers, information can often be lost while capturing human motion with optical tracking systems. Based on three natural properties of human gait movement, this study presents two different approaches to recover corrupted motion data. These properties are used to define a reconstruction model combining low-rank matrix completion of the measured data with a group-sparsity prior on the marker trajectories mapped in the frequency domain. Unlike most existing approaches, the proposed methodology is fully unsupervised and does not need training data or kinematic information of the user. We evaluated our methods on four different gait datasets with various gap lengths and compared their performance with a state-of-the-art approach using principal component analysis (PCA). Our results showed recovering missing data more precisely, with a reduction of at least 2 mm in mean reconstruction error compared to the literature method. When a small number of marker trajectories is available, our findings showed a reduction of more than 14 mm for the mean reconstruction error compared to the literature approach.
Collapse
|
10
|
|
11
|
Ye H, Li H, Cao F, Zhang L. A Hybrid Truncated Norm Regularization Method for Matrix Completion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5171-5186. [PMID: 31170070 DOI: 10.1109/tip.2019.2918733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Matrix completion has been widely used in image processing, in which the popular approach is to formulate this issue as a general low-rank matrix approximation problem. This paper proposes a novel regularization method referred to as truncated Frobenius norm (TFN), and presents a hybrid truncated norm (HTN) model combining the truncated nuclear norm and truncated Frobenius norm for solving matrix completion problems. To address this model, a simple and effective two-step iteration algorithm is designed. Further, an adaptive way to change the penalty parameter is introduced to reduce the computational cost. Also, the convergence of the proposed method is discussed and proved mathematically. The proposed approach could not only effectively improve the recovery performance but also greatly promote the stability of the model. Meanwhile, the use of this new method could eliminate large variations that exist when estimating complex models, and achieve competitive successes in matrix completion. Experimental results on the synthetic data, real-world images as well as recommendation systems, particularly the use of the statistical analysis strategy, verify the effectiveness and superiority of the proposed method, i.e. the proposed method is more stable and effective than other state-of-the-art approaches.
Collapse
|