1
|
Tan B, Xiao Y, Li S, Tong X, Yan T, Cao Z, Tianyi Zhou J. Language-Guided 3-D Action Feature Learning Without Ground-Truth Sample Class Label. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9356-9369. [PMID: 38865228 DOI: 10.1109/tnnls.2024.3409613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
This work pays the first research effort to leverage point cloud sequence-based Self-supervised 3-D Action Feature Learning (S3AFL), under text's cross-modality weak supervision. We intend to fill the huge performance gap between point cloud sequence and 3-D skeleton-based manners. The key intuition derives from the observation that skeleton-based manners actually hold the human pose's high-level knowledge that leads to attention on the body's joint-aware local parts. Inspired by this, we propose to introduce the text's weak supervision of high-level semantics into a point cloud sequence-based paradigm. With RGB-point cloud pair sequence acquired via RGB-D camera, text sequence is first generated from RGB component using pretrained image captioning model, as auxiliary weak supervision. Then, S3AFL runs in a cross and intra-modality contrastive learning (CL) way. To resist text's missing and redundant semantics, feature learning is conducted in a multistage way with semantic refinement. Essentially, text is only required for training. To facilitate the feature's representation power on fine-grained actions, a multirank max-pooling (MR-MP) way is also proposed for the point set network to better maintain discriminative clues. Experiments verify that the text's weak supervision can facilitate performance by 10.8%, 10.4%, and 8.0% on NTU RGB+D 60, 120, and N-UCLA at most. The performance gap between point cloud sequence and skeleton-based manners has been remarkably narrowed down. The idea of transferring text's weak supervision to S3AFL can also be applied to a skeleton manner, with strong generality. The source code is available at https://github.com/tangent-T/W3AMT.
Collapse
|
2
|
Tan B, Xiao Y, Wang Y, Li S, Yang J, Cao Z, Zhou JT, Yuan J. Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18186-18199. [PMID: 37729565 DOI: 10.1109/tnnls.2023.3312673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL.
Collapse
|
3
|
Zhao F, Zhao W, Lu H. Interactive Feature Embedding for Infrared and Visible Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12810-12822. [PMID: 37040245 DOI: 10.1109/tnnls.2023.3264911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
General deep learning-based methods for infrared and visible image fusion rely on the unsupervised mechanism for vital information retention by utilizing elaborately designed loss functions. However, the unsupervised mechanism depends on a well-designed loss function, which cannot guarantee that all vital information of source images is sufficiently extracted. In this work, we propose a novel interactive feature embedding in a self-supervised learning framework for infrared and visible image fusion, attempting to overcome the issue of vital information degradation. With the help of a self-supervised learning framework, hierarchical representations of source images can be efficiently extracted. In particular, interactive feature embedding models are tactfully designed to build a bridge between self-supervised learning and infrared and visible image fusion learning, achieving vital information retention. Qualitative and quantitative evaluations exhibit that the proposed method performs favorably against state-of-the-art methods.
Collapse
|
4
|
Ma N, Wu Z, Feng Y, Wang C, Gao Y. Multi-View Time-Series Hypergraph Neural Network for Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3301-3313. [PMID: 38700958 DOI: 10.1109/tip.2024.3391913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Recently, action recognition has attracted considerable attention in the field of computer vision. In dynamic circumstances and complicated backgrounds, there are some problems, such as object occlusion, insufficient light, and weak correlation of human body joints, resulting in skeleton-based human action recognition accuracy being very low. To address this issue, we propose a Multi-View Time-Series Hypergraph Neural Network (MV-TSHGNN) method. The framework is composed of two main parts: the construction of a multi-view time-series hypergraph structure and the learning process of multi-view time-series hypergraph convolutions. Specifically, given the multi-view video sequence frames, we first extract the joint features of actions from different views. Then, limb components and adjacent joints spatial hypergraphs based on the joints of different views at the same time are constructed respectively, temporal hypergraphs are constructed joints of the same view at continuous times, which are established high-order semantic relationships and cooperatively generate complementary action features. After that, we design a multi-view time-series hypergraph neural network to efficiently learn the features of spatial and temporal hypergraphs, and effectively improve the accuracy of skeleton-based action recognition. To evaluate the effectiveness and efficiency of MV-TSHGNN, we conduct experiments on NTU RGB+D, NTU RGB+D 120 and imitating traffic police gestures datasets. The experimental results indicate that our proposed method model achieves the new state-of-the-art performance.
Collapse
|
5
|
Ang Z. Application of IoT technology based on neural networks in basketball training motion capture and injury prevention. Prev Med 2023; 175:107660. [PMID: 37573953 DOI: 10.1016/j.ypmed.2023.107660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 08/08/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]
Abstract
Basketball players need to frequently engage in various physical movements during the game, which puts a certain burden on their bodies and can easily lead to various sports injuries. Therefore, it is crucial to prevent sports injuries in basketball teaching. This paper also studies basketball motion track capture. Basketball motion capture preserves the motion posture information of the target person in three-dimensional space. Because the motion capture system based on machine vision often encounters problems such as occlusion or self occlusion in the application scene, human motion capture is still a challenging problem in the current research field. This article designs a multi perspective human motion trajectory capture algorithm framework, which uses a two-dimensional human motion pose estimation algorithm based on deep learning to estimate the position distribution of human joint points on the two-dimensional image from each perspective. By combining the knowledge of camera poses from multiple perspectives, the three-dimensional spatial distribution of joint points is transformed, and the final evaluation result of the target human 3D pose is obtained. This article applies the research results of neural networks and IoT devices to basketball motion capture methods, further developing basketball motion capture systems.
Collapse
Affiliation(s)
- Zhao Ang
- Hui Shang Vocational College, Hefei 230022, China.
| |
Collapse
|
6
|
Kim HH, Kim JY, Jang BK, Lee JH, Kim JH, Lee DH, Yang HM, Choi YJ, Sung MJ, Kang TJ, Kim E, Oh YS, Lim J, Hong SB, Ahn K, Park CL, Kwon SM, Park YR. Multiview child motor development dataset for AI-driven assessment of child development. Gigascience 2022; 12:giad039. [PMID: 37243520 PMCID: PMC10220505 DOI: 10.1093/gigascience/giad039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 03/15/2023] [Accepted: 05/21/2023] [Indexed: 05/29/2023] Open
Abstract
BACKGROUND Children's motor development is a crucial tool for assessing developmental levels, identifying developmental disorders early, and taking appropriate action. Although the Korean Developmental Screening Test for Infants and Children (K-DST) can accurately assess childhood development, its dependence on parental surveys rather than reliable, professional observation limits it. This study constructed a dataset based on a skeleton of recordings of K-DST behaviors in children aged between 20 and 71 months, with and without developmental disorders. The dataset was validated using a child behavior artificial intelligence (AI) learning model to highlight its possibilities. RESULTS The 339 participating children were divided into 3 groups by age. We collected videos of 4 behaviors by age group from 3 different angles and extracted skeletons from them. The raw data were used to annotate labels for each image, denoting whether each child performed the behavior properly. Behaviors were selected from the K-DST's gross motor section. The number of images collected differed by age group. The original dataset underwent additional processing to improve its quality. Finally, we confirmed that our dataset can be used in the AI model with 93.94%, 87.50%, and 96.31% test accuracy for the 3 age groups in an action recognition model. Additionally, the models trained with data including multiple views showed the best performance. CONCLUSION Ours is the first publicly available dataset that constitutes skeleton-based action recognition in young children according to the standardized criteria (K-DST). This dataset will enable the development of various models for developmental tests and screenings.
Collapse
Affiliation(s)
- Hye Hyeon Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Jin Yong Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Bong Kyung Jang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Joo Hyun Lee
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Jong Hyun Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Dong Hoon Lee
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Hee Min Yang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Young Jo Choi
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Myung Jun Sung
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Tae Jun Kang
- MISO Info Tech Co. Ltd., Seoul 06222, Republic of Korea
| | - Eunah Kim
- Maumdri Co. Ltd., Muan-gun, Jeollanam-do 58563, Republic of Korea
| | - Yang Seong Oh
- Maumdri Co. Ltd., Muan-gun, Jeollanam-do 58563, Republic of Korea
| | - Jaehyun Lim
- Lumanlab, Inc., Seoul 05836, Republic of Korea
| | - Soon-Beom Hong
- Division of Child and Adolescent Psychiatry, Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
- Institute of Human Behavioral Medicine, Seoul National University Medical Research Center, Seoul 03080, Republic of Korea
| | - Kiok Ahn
- GazziLabs, Inc., Anyang-si, Gyeonggi-do 14085, Republic of Korea
| | - Chan Lim Park
- Smart Safety Laboratory Co. Ltd., Seongnam-si, Gyeonggi-do 13494, Republic of Korea
| | - Soon Myeong Kwon
- Smart Safety Laboratory Co. Ltd., Seongnam-si, Gyeonggi-do 13494, Republic of Korea
| | - Yu Rang Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| |
Collapse
|
7
|
Momin MS, Sufian A, Barman D, Dutta P, Dong M, Leo M. In-Home Older Adults' Activity Pattern Monitoring Using Depth Sensors: A Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:9067. [PMID: 36501769 PMCID: PMC9735577 DOI: 10.3390/s22239067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 11/10/2022] [Accepted: 11/15/2022] [Indexed: 06/17/2023]
Abstract
The global population is aging due to many factors, including longer life expectancy through better healthcare, changing diet, physical activity, etc. We are also witnessing various frequent epidemics as well as pandemics. The existing healthcare system has failed to deliver the care and support needed to our older adults (seniors) during these frequent outbreaks. Sophisticated sensor-based in-home care systems may offer an effective solution to this global crisis. The monitoring system is the key component of any in-home care system. The evidence indicates that they are more useful when implemented in a non-intrusive manner through different visual and audio sensors. Artificial Intelligence (AI) and Computer Vision (CV) techniques may be ideal for this purpose. Since the RGB imagery-based CV technique may compromise privacy, people often hesitate to utilize in-home care systems which use this technology. Depth, thermal, and audio-based CV techniques could be meaningful substitutes here. Due to the need to monitor larger areas, this review article presents a systematic discussion on the state-of-the-art using depth sensors as primary data-capturing techniques. We mainly focused on fall detection and other health-related physical patterns. As gait parameters may help to detect these activities, we also considered depth sensor-based gait parameters separately. The article provides discussions on the topic in relation to the terminology, reviews, a survey of popular datasets, and future scopes.
Collapse
Affiliation(s)
- Md Sarfaraz Momin
- Department of Computer Science, Kaliachak College, University of Gour Banga, Malda 732101, India
- Department of Computer & System Sciences, Visva-Bharati University, Bolpur 731235, India
| | - Abu Sufian
- Department of Computer Science, University of Gour Banga, Malda 732101, India
| | - Debaditya Barman
- Department of Computer & System Sciences, Visva-Bharati University, Bolpur 731235, India
| | - Paramartha Dutta
- Department of Computer & System Sciences, Visva-Bharati University, Bolpur 731235, India
| | - Mianxiong Dong
- Department of Science and Informatics, Muroran Institute of Technology, Muroran 050-8585, Hokkaido, Japan
| | - Marco Leo
- National Research Council of Italy, Institute of Applied Sciences and Intelligent Systems, 73100 Lecce, Italy
| |
Collapse
|