2
|
Koutrintzes D, Spyrou E, Mathe E, Mylonas P. A Multimodal Fusion Approach for Human Activity Recognition. Int J Neural Syst 2023; 33:2350002. [PMID: 36573880 DOI: 10.1142/s0129065723500028] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. It consists of recognizing human motion and/or behavior within a given image or a video sequence, using as input raw sensor measurements. In this paper, a multimodal approach addressing the task of video-based HAR is proposed. It is based on 3D visual data that are collected using an RGB + depth camera, resulting to both raw video and 3D skeletal sequences. These data are transformed into six different 2D image representations; four of them are in the spectral domain, another is a pseudo-colored image. The aforementioned representations are based on skeletal data. The last representation is a "dynamic" image which is actually an artificially created image that summarizes RGB data of the whole video sequence, in a visually comprehensible way. In order to classify a given activity video, first, all the aforementioned 2D images are extracted and then six trained convolutional neural networks are used so as to extract visual features. The latter are fused so as to form a single feature vector and are fed into a support vector machine for classification into human activities. For evaluation purposes, a challenging motion activity recognition dataset is used, while single-view, cross-view and cross-subject experiments are performed. Moreover, the proposed approach is compared to three other state-of-the-art methods, demonstrating superior performance in most experiments.
Collapse
Affiliation(s)
- Dimitrios Koutrintzes
- Institute of Informatics and Telecommunications, National Center for Scientific Research - "Demokritos", Athens, Greece
| | - Evaggelos Spyrou
- Department of Informatics and Telecommunication, University of Thessaly, Lamia, Greece
| | - Eirini Mathe
- Department of Informatics, Ionian University, Corfu, Greece
| | - Phivos Mylonas
- Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
3
|
Kim HH, Kim JY, Jang BK, Lee JH, Kim JH, Lee DH, Yang HM, Choi YJ, Sung MJ, Kang TJ, Kim E, Oh YS, Lim J, Hong SB, Ahn K, Park CL, Kwon SM, Park YR. Multiview child motor development dataset for AI-driven assessment of child development. Gigascience 2022; 12:giad039. [PMID: 37243520 PMCID: PMC10220505 DOI: 10.1093/gigascience/giad039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 03/15/2023] [Accepted: 05/21/2023] [Indexed: 05/29/2023] Open
Abstract
BACKGROUND Children's motor development is a crucial tool for assessing developmental levels, identifying developmental disorders early, and taking appropriate action. Although the Korean Developmental Screening Test for Infants and Children (K-DST) can accurately assess childhood development, its dependence on parental surveys rather than reliable, professional observation limits it. This study constructed a dataset based on a skeleton of recordings of K-DST behaviors in children aged between 20 and 71 months, with and without developmental disorders. The dataset was validated using a child behavior artificial intelligence (AI) learning model to highlight its possibilities. RESULTS The 339 participating children were divided into 3 groups by age. We collected videos of 4 behaviors by age group from 3 different angles and extracted skeletons from them. The raw data were used to annotate labels for each image, denoting whether each child performed the behavior properly. Behaviors were selected from the K-DST's gross motor section. The number of images collected differed by age group. The original dataset underwent additional processing to improve its quality. Finally, we confirmed that our dataset can be used in the AI model with 93.94%, 87.50%, and 96.31% test accuracy for the 3 age groups in an action recognition model. Additionally, the models trained with data including multiple views showed the best performance. CONCLUSION Ours is the first publicly available dataset that constitutes skeleton-based action recognition in young children according to the standardized criteria (K-DST). This dataset will enable the development of various models for developmental tests and screenings.
Collapse
Affiliation(s)
- Hye Hyeon Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Jin Yong Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Bong Kyung Jang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Joo Hyun Lee
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Jong Hyun Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Dong Hoon Lee
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Hee Min Yang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Young Jo Choi
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Myung Jun Sung
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Tae Jun Kang
- MISO Info Tech Co. Ltd., Seoul 06222, Republic of Korea
| | - Eunah Kim
- Maumdri Co. Ltd., Muan-gun, Jeollanam-do 58563, Republic of Korea
| | - Yang Seong Oh
- Maumdri Co. Ltd., Muan-gun, Jeollanam-do 58563, Republic of Korea
| | - Jaehyun Lim
- Lumanlab, Inc., Seoul 05836, Republic of Korea
| | - Soon-Beom Hong
- Division of Child and Adolescent Psychiatry, Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
- Institute of Human Behavioral Medicine, Seoul National University Medical Research Center, Seoul 03080, Republic of Korea
| | - Kiok Ahn
- GazziLabs, Inc., Anyang-si, Gyeonggi-do 14085, Republic of Korea
| | - Chan Lim Park
- Smart Safety Laboratory Co. Ltd., Seongnam-si, Gyeonggi-do 13494, Republic of Korea
| | - Soon Myeong Kwon
- Smart Safety Laboratory Co. Ltd., Seongnam-si, Gyeonggi-do 13494, Republic of Korea
| | - Yu Rang Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| |
Collapse
|
4
|
Multiple Sensor Synchronization with theRealSense RGB-D Camera. SENSORS 2021; 21:s21186276. [PMID: 34577483 PMCID: PMC8472203 DOI: 10.3390/s21186276] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 09/11/2021] [Accepted: 09/17/2021] [Indexed: 11/17/2022]
Abstract
When reconstructing a 3D object, it is difficult to obtain accurate 3D geometric information using a single camera. In order to capture detailed geometric information of a 3D object, it is inevitable to increase the number of cameras to capture the object. However, cameras need to be synchronized in order to simultaneously capture frames. If cameras are incorrectly synchronized, many artifacts are produced in the reconstructed 3D object. The RealSense RGB-D camera, which is commonly used for obtaining geometric information of a 3D object, provides synchronization modes to mitigate synchronization errors. However, the synchronization modes provided by theRealSense cameras can only sync depth cameras and have limitations in the number of cameras that can be synchronized using a single host due to the hardware issue of stable data transmission. Therefore, in this paper, we propose a novel synchronization method that synchronizes an arbitrary number of RealSense cameras by adjusting the number of hosts to support stable data transmission. Our method establishes a master–slave architecture in order to synchronize the system clocks of the hosts. While synchronizing the system clocks, delays that resulted from the process of synchronization were estimated so that the difference between the system clocks could be minimized. Through synchronization of the system clocks, cameras connected to the different hosts can be synchronized based on the timestamp of the data received by the hosts. Thus, our method synchronizes theRealSense cameras to simultaneously capture accurate 3D information of an object at a constant frame rate without dropping it.
Collapse
|