1
|
Shao H, Luo L, Qian J, Yan M, Gao S, Yang J. Video-Based Multiphysiological Disentanglement and Remote Robust Estimation for Respiration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8360-8371. [PMID: 39012736 DOI: 10.1109/tnnls.2024.3424772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Remote noncontact respiratory rate estimation by facial visual information has great research significance, providing valuable priors for health monitoring, clinical diagnosis, and anti-fraud. However, existing studies suffer from disturbances in epidermal specular reflections induced by head movements and facial expressions. Furthermore, diffuse reflections of light in the skin-colored subcutaneous tissue caused by multiple time-varying physiological signals independent of breathing are entangled with the intention of the respiratory process, leading to confusion in current research. To address these issues, this article proposes a novel network for natural light video-based remote respiration estimation. Specifically, our model consists of a two-stage architecture that progressively implements vital measurements. The first stage adopts an encoder-decoder structure to recharacterize the facial motion frame differences of the input video based on the gradient binary state of the respiratory signal during inspiration and expiration. Then, the obtained generative mapping, which is disentangled from various time-varying interferences and is only linearly related to the respiratory state, is combined with the facial appearance in the second stage. To further improve the robustness of our algorithm, we design a targeted long-term temporal attention module and embed it between the two stages to enhance the network's ability to model the breathing cycle that occupies ultra many frames and to mine hidden timing change clues. We train and validate the proposed network on a series of publicly available respiration estimation datasets, and the experimental results demonstrate its competitiveness against the state-of-the-art breathing and physiological prediction frameworks.
Collapse
|
2
|
Liu M, Tang J, Chen Y, Li H, Qi J, Li S, Wang K, Gan J, Wang Y, Chen H. Spiking-PhysFormer: Camera-based remote photoplethysmography with parallel spike-driven transformer. Neural Netw 2025; 185:107128. [PMID: 39817982 DOI: 10.1016/j.neunet.2025.107128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 11/12/2024] [Accepted: 01/03/2025] [Indexed: 01/18/2025]
Abstract
Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model's performance. Experiments conducted on four datasets-PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 10.1% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN-based models.
Collapse
Affiliation(s)
| | | | - Yongli Chen
- Beijing Smartchip Microelectronics Technology Co., Ltd, Beijing, China
| | | | | | - Siwei Li
- Tsinghua University, Beijing, China
| | | | - Jie Gan
- Beijing Smartchip Microelectronics Technology Co., Ltd, Beijing, China
| | - Yuntao Wang
- Tsinghua University, Beijing, China; National Key Laboratory of Human Factors Engineering, Beijing, China.
| | - Hong Chen
- Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Jeong YH, Choi YS. Diffusion-Phys: noise-robust heart rate estimation from facial videos via diffusion models. Biomed Eng Lett 2025; 15:575-585. [PMID: 40271392 PMCID: PMC12011667 DOI: 10.1007/s13534-025-00472-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 03/11/2025] [Accepted: 03/21/2025] [Indexed: 04/25/2025] Open
Abstract
Remote photoplethysmography (rPPG) offers significant potential for health monitoring and emotional analysis through non-contact physiological measurement from facial videos. However, noise remains a crucial challenge, limiting the generalizability of current rPPG methods. This paper introduces Diffusion-Phys, a novel framework using diffusion models for robust heart rate (HR) estimation from facial videos. Diffusion-Phys employs Multi-scale Spatial-Temporal Maps (MSTmaps) to preprocess input data and introduces Gaussian noise to simulate real-world conditions. The model is trained using a denoising network for accurate HR estimation. Experimental evaluations on the VIPL-HR, UBFC-rPPG and PURE datasets demonstrate that Diffusion-Phys achieves comparable or superior performance to state-of-the-art methods, with lower computational complexity. These results highlight the effectiveness of explicitly addressing noise through diffusion modeling, improving the reliability and generalization of non-contact physiological measurement systems.
Collapse
Affiliation(s)
- Yong-Hoon Jeong
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826 Republic of Korea
| | - Young-Seok Choi
- Department of Electronics and Communications Engineering, Kwangwoon University, Seoul, 01897 Republic of Korea
| |
Collapse
|
4
|
Wang J, Shan C, Liu Z, Zhou S, Shu M. Physiological Information Preserving Video Compression for rPPG. IEEE J Biomed Health Inform 2025; 29:3563-3575. [PMID: 40030966 DOI: 10.1109/jbhi.2025.3526837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Remote photoplethysmography (rPPG) has recently attracted much attention due to its non-contact measurement convenience and great potential in health care and computer vision applications. Early rPPG studies were mostly developed on self-collected uncompressed video data, which limited their application in scenarios that require long-distance real-time video transmission, and also hindered the generation of large-scale publicly available benchmark datasets. In recent years, with the popularization of high-definition video and the rise of telemedicine, the pressure of storage and real-time video transmission under limited bandwidth have made the compression of rPPG video inevitable. However, video compression can adversely affect rPPG measurements. This is due to the fact that conventional video compression algorithms are not specifically proposed to preserve physiological signals. Based on this, we propose a video compression scheme specifically designed for rPPG application. The proposed approach consists of three main strategies: 1) facial ROI-based computational resource reallocation; 2) rPPG signal preserving bit resource reallocation; and 3) temporal domain up- and down-sampling coding. UBFC-rPPG, ECG-Fitness, and a self-collected dataset are used to evaluate the performance of the proposed method. The results demonstrate that the proposed method can preserve almost all physiological information after compressing the original video to 1/60 of its original size. The proposed method is expected to promote the development of telemedicine and deep learning techniques relying on large-scale datasets in the field of rPPG measurement.
Collapse
|
5
|
Bhutani S, Elgendi M, Menon C. Preserving privacy and video quality through remote physiological signal removal. COMMUNICATIONS ENGINEERING 2025; 4:66. [PMID: 40195503 PMCID: PMC11977227 DOI: 10.1038/s44172-025-00363-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/04/2025] [Indexed: 04/09/2025]
Abstract
The revolutionary remote photoplethysmography (rPPG) technique has enabled intelligent devices to estimate physiological parameters with remarkable accuracy. However, the continuous and surreptitious recording of individuals by these devices and the collecting of sensitive health data without users' knowledge or consent raise serious privacy concerns. Here we explore frugal methods for modifying facial videos to conceal physiological signals while maintaining image quality. Eleven lightweight modification methods, including blurring operations, additive noises, and time-averaging techniques, were evaluated using five different rPPG techniques across four activities: rest, talking, head rotation, and gym. These rPPG methods require minimal computational resources, enabling real-time implementation on low-compute devices. Our results indicate that the time-averaging sliding frame method achieved the greatest balance between preserving the information within the frame and inducing a heart rate error, with an average error of 22 beats per minute (bpm). Further, the facial region of interest was found to be the most effective and to offer the best trade-off between bpm errors and information loss.
Collapse
Affiliation(s)
- Saksham Bhutani
- Biomedical and Mobile Health Technology Research Lab, ETH Zürich, Zürich, Switzerland
| | - Mohamed Elgendi
- Department of Biomedical Engineering and Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE.
- Healthcare Engineering Innovation Group (HEIG), Khalifa University of Science and Technology, Abu Dhabi, UAE.
| | - Carlo Menon
- Biomedical and Mobile Health Technology Research Lab, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
6
|
Zhao X, Tanaka R, Mandour AS, Shimada K, Hamabe L. Remote Vital Sensing in Clinical Veterinary Medicine: A Comprehensive Review of Recent Advances, Accomplishments, Challenges, and Future Perspectives. Animals (Basel) 2025; 15:1033. [PMID: 40218426 PMCID: PMC11988085 DOI: 10.3390/ani15071033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2025] [Revised: 03/23/2025] [Accepted: 03/26/2025] [Indexed: 04/14/2025] Open
Abstract
Remote vital sensing in veterinary medicine is a relatively new area of practice, which involves the acquisition of data without invasion of the body cavities of live animals. This paper aims to review several technologies in remote vital sensing: infrared thermography, remote photoplethysmography (rPPG), radar, wearable sensors, and computer vision and machine learning. In each of these technologies, we outline its concepts, uses, strengths, and limitations in multiple animal species, and its potential to reshape health surveillance, welfare evaluation, and clinical medicine in animals. The review also provides information about the problems associated with applying these technologies, including species differences, external conditions, and the question of the reliability and classification of these technologies. Additional topics discussed in this review include future developments such as the use of artificial intelligence, combining different sensing methods, and creating monitoring solutions tailored to specific animal species. This contribution gives a clear understanding of the status and future possibilities of remote vital sensing in veterinary applications and stresses the importance of that technology for the development of the veterinary field in terms of animal health and science.
Collapse
Affiliation(s)
- Xinyue Zhao
- Department of Veterinary Science, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan; (X.Z.); (A.S.M.); (L.H.)
| | - Ryou Tanaka
- Department of Veterinary Science, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan; (X.Z.); (A.S.M.); (L.H.)
| | - Ahmed S. Mandour
- Department of Veterinary Science, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan; (X.Z.); (A.S.M.); (L.H.)
- Department of Animal Medicine (Internal Medicine), Faculty of Veterinary Medicine, Suez Canal University, Ismailia 41522, Egypt
| | - Kazumi Shimada
- Department of Veterinary Science, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan; (X.Z.); (A.S.M.); (L.H.)
| | - Lina Hamabe
- Department of Veterinary Science, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan; (X.Z.); (A.S.M.); (L.H.)
| |
Collapse
|
7
|
Rao B, Fang R, Zhao C, Bai J. Measurement of heart rate from long-distance videos via projection of rotated orthogonal bases in POS. Med Eng Phys 2025; 138:104326. [PMID: 40180538 DOI: 10.1016/j.medengphy.2025.104326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 01/19/2025] [Accepted: 03/06/2025] [Indexed: 04/05/2025]
Abstract
Remote photoplethysmography (rPPG) has long been an active research topic. Existing rPPG approaches achieve high accuracy of heart rate extraction, as long as the user is relatively close to the camera (typically, less than 1 meter distance). This article investigates the performance of existing rPPG approaches under the long-distance recording conditions and proposes a novel Projection of Rotated Orthogonal Bases in POS (ProPOS) algorithm for heart rate extraction. A set of orthogonal projection bases is generated around the original plain of POS algorithm. The raw measurement traces are projected on these bases and the final output signal is obtained by a designed SNR selection criterion. The long-distance rPPG (LD-rPPG) dataset is established for long-distance rPPG research by varying the recording distance from 3 m-30 m. Extensive experiments are performed in comparison with existing approaches. Experiments show that videos recorded by HikVision DS-V108 and Logitech C920 cameras contain a certain amount of physiological signal whereas the videos recorded by HikVision DS-U102D and Mercury cameras contain little physiological signal. Using zoom lenses is beneficial to improve the rPPG measurement accuracy under long-distance conditions.
Collapse
Affiliation(s)
- Bing Rao
- School of Information and Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Ruige Fang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Changchen Zhao
- School of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Jie Bai
- School of Information and Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China.
| |
Collapse
|
8
|
Yu J, He Y, Li B, Chen H, Zheng H, Liu J, Zhou L, Niu Y, Wu H, Xu Z. Non-Contact Heart Rate Measurement Using Ghost Imaging System. JOURNAL OF BIOPHOTONICS 2025; 18:e202400517. [PMID: 39930850 DOI: 10.1002/jbio.202400517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/05/2025] [Accepted: 01/07/2025] [Indexed: 04/08/2025]
Abstract
Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional methods fail to capture image information effectively, that may lead to difficulty or inability in measuring heart rate. To address these limitations, this study proposes non-contact heart rate detection based on ghost imaging architecture. The mean absolute error between experimental measurements and reference true values is 4.24 bpm. Additionally, the bucket signals obtained by the ghost imaging system can be directly processed using digital signal processing techniques, thereby enhancing personal privacy protection.
Collapse
Affiliation(s)
- Jianming Yu
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Yuchen He
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Bin Li
- Bioinspired Engineering and Biomechanics Center, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Hui Chen
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Huaibin Zheng
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Jianbin Liu
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Lifan Zhou
- Shaanxi Electronic Information Research Institute, Xi'an, People's Republic of China
| | - Yi Niu
- Shaanxi Electronic Information Research Institute, Xi'an, People's Republic of China
| | - Haodong Wu
- The Northwest Machine Co. Ltd, Xi'an, People's Republic of China
| | - Zhuo Xu
- Electronic Materials Research Laboratory, Key Laboratory of the Ministry of Education and International Center for Dielectric Research, Xi'an Jiaotong University, Xi'an, People's Republic of China
| |
Collapse
|
9
|
Wang Q, Cheng H, Wang W. Video-PSG: An Intelligent Contactless Monitoring System for Sleep Staging. IEEE Trans Biomed Eng 2025; 72:965-977. [PMID: 39405136 DOI: 10.1109/tbme.2024.3480813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Polysomnography (PSG) is the gold standard for sleep staging in clinics, but its skin-contact nature makes it uncomfortable and inconvenient to use for long-term sleep monitoring. As a complementary part of PSG, the video cameras are not utilized to their full potential, only for manual check of simple sleep events, thereby ignoring the potential for physiological and semantic measurement. This leads to a pivotal research question: Can camera be used for sleep staging, and to what extent? We developed a camera-based contactless sleep staging system in the Institute of Respiratory Diseases and created a clinical video dataset of 20 adults. The camera-based feature set, derived from both physiological signals (pulse and breath) and motions all measured from a video, was evaluated for 4-class sleep staging (Wake-REM-Light-Deep). Three optimization strategies were proposed to enhance the sleep staging accuracy: using motion metrics to prune measurement outliers, creating a more personalized model based on the baseline calibration of waking-stage physiological signals, and deriving a specialized feature for REM detection. It achieved the best accuracy of 73.1% (kappa = 0.62, F1-score = 0.74) in the benchmark of five sleep-staging classifiers. Notably, the system exhibited high accuracy in predicting the overall sleep structure and subtle changes between different sleep stages. The study demonstrates that camera-based contactless sleep staging is a new value stream for sleep medicine, which also provides clinical and technical insights for future optimization and implementation.
Collapse
|
10
|
Lee S, Do Song Y, Lee EC. Ultra-short-term stress measurement using RGB camera-based remote photoplethysmography with reduced effects of Individual differences in heart rate. Med Biol Eng Comput 2025; 63:497-510. [PMID: 39392540 DOI: 10.1007/s11517-024-03213-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 09/27/2024] [Indexed: 10/12/2024]
Abstract
Stress is linked to health problems, increasing the need for immediate monitoring. Traditional methods like electrocardiograms or contact photoplethysmography require device attachment, causing discomfort, and ultra-short-term stress measurement research remains inadequate. This paper proposes a method for ultra-short-term stress monitoring using remote photoplethysmography (rPPG). Previous predictions of ultra-short-term stress have typically used pulse rate variability (PRV) features derived from time-segmented heart rate data. However, PRV varies at the same stress levels depending on heart rates, necessitating a new method to account for these differences. This study addressed this by segmenting rPPG data based on normal-to-normal intervals (NNIs), converted from peak-to-peak intervals, to predict ultra-short-term stress indices. We used NNI counts corresponding to average durations of 10, 20, and 30 s (13, 26, and 39 NNIs) to extract PRV features, predicting the Baevsky stress index through regressors. The Extra Trees Regressor achieved R2 scores of 0.6699 for 13 NNIs, 0.8751 for 26 NNIs, and 0.9358 for 39 NNIs, surpassing the time-segmented approach, which yielded 0.4162, 0.6528, and 0.7943 for 10, 20, and 30-s intervals, respectively. These findings demonstrate that using NNI counts for ultra-short-term stress prediction improves accuracy by accounting for individual bio-signal variations.
Collapse
Affiliation(s)
- Seungkeon Lee
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-Gil 20, Jongno-Gu, Seoul, 03016, Republic of Korea
| | - Young Do Song
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-Gil 20, Jongno-Gu, Seoul, 03016, Republic of Korea
| | - Eui Chul Lee
- Departmen of Human-Centered Artificial Intelligence, Sangmyung University Hongjimun, 2-Gil 20, Jongno-Gu, Seoul, 03016, Republic of Korea.
| |
Collapse
|
11
|
Zheng X, Yan W, Liu B, Wu YI, Tu H. Estimation of heart rate and respiratory rate by fore-background spatiotemporal modeling of videos. BIOMEDICAL OPTICS EXPRESS 2025; 16:760-777. [PMID: 39958852 PMCID: PMC11828443 DOI: 10.1364/boe.546968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/08/2025] [Accepted: 01/19/2025] [Indexed: 02/18/2025]
Abstract
Heart rate (HR) and respiratory rate (RR) are two critical physiological parameters that can be estimated from video recordings. However, the accuracy of remote estimation of HR and RR is affected by fluctuations in ambient illumination. To address this adverse effect, we propose a fore-background spatiotemporal (FBST) method for estimating HR and RR from videos captured by consumer-grade cameras. Initially, we identify the foreground regions of interest (ROIs) on the face and chest, as well as the background ROIs in non-body areas of the videos. Subsequently, we construct the foreground and background spatiotemporal maps based on the dichromatic reflectance model. We then introduce a lightweight network equipped with adaptive spatiotemporal layers to process the spatiotemporal maps and automatically generate a feature map of the non-illumination perturbation pulses. This feature map serves as input to a ResNet-18 network to estimate the physiological rhythm. Finally, we extract pulse signals and estimate HR and RR concurrently. Experiments conducted on three public and one private dataset demonstrate the superiority of the proposed FBST method in terms of accuracy and computational efficiency. These findings provide novel insights into non-intrusive human physiological measurements using common devices.
Collapse
Affiliation(s)
- Xiujuan Zheng
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
| | - Wenqin Yan
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
| | - Boxiang Liu
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- Key Laboratory of Information and Automation Technology of Sichuan Province, Sichuan University, Chengdu 610065, China
| | - Yue Ivan Wu
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
| | - Haiyan Tu
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- Key Laboratory of Information and Automation Technology of Sichuan Province, Sichuan University, Chengdu 610065, China
| |
Collapse
|
12
|
Cramer I, van Esch R, Verstappen C, Kloeze C, van Bussel B, Stuijk S, Bergmans J, van 't Veer M, Zinger S, Montenij L, Bouwman RA, Dekker L. Accuracy of remote, video-based supraventricular tachycardia detection in patients undergoing elective electrical cardioversion: a prospective cohort. J Clin Monit Comput 2025:10.1007/s10877-025-01263-5. [PMID: 39881085 DOI: 10.1007/s10877-025-01263-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 01/14/2025] [Indexed: 01/31/2025]
Abstract
Unobtrusive pulse rate monitoring by continuous video recording, based on remote photoplethysmography (rPPG), might enable early detection of perioperative arrhythmias in general ward patients. However, the accuracy of an rPPG-based machine learning model to monitor the pulse rate during sinus rhythm and arrhythmias is unknown. We conducted a prospective, observational diagnostic study in a cohort with a high prevalence of arrhythmias (patients undergoing elective electrical cardioversion). Pulse rate was assessed with rPPG via a visible light camera and ECG as reference, before and after cardioversion. A cardiologist categorized ECGs into normal sinus rhythm or arrhythmias requiring further investigation. A supervised machine learning model (support vector machine with Gaussian kernel) was trained using rPPG signal features from 60-s intervals and validated via leave-one-subject-out. Pulse rate measurement performance was evaluated with Bland-Altman analysis. Of 72 patients screened, 51 patients were included in the analyses, including 444 60-s intervals with normal sinus rhythm and 1130 60-s intervals of clinically relevant arrhythmias. The model showed robust discrimination (AUC 0.95 [0.93-0.96]) and good calibration. For pulse rate measurement, the bias and limits of agreement for sinus rhythm were 1.21 [- 8.60 to 11.02], while for arrhythmia, they were - 7.45 [- 35.75 to 20.86]. The machine learning model accurately identified sinus rhythm and arrhythmias using rPPG in real-world conditions. Heart rate underestimation during arrhythmias highlights the need for optimization.
Collapse
Affiliation(s)
- Iris Cramer
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands.
- Department of Anesthesiology, Intensive Care and Pain Medicine, Catharina Hospital, Eindhoven, the Netherlands.
| | - Rik van Esch
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
- Department of Anesthesiology, Intensive Care and Pain Medicine, Catharina Hospital, Eindhoven, the Netherlands
| | - Cindy Verstappen
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
- Department of Cardiology, Catharina Hospital Eindhoven, Eindhoven, the Netherlands
| | - Carla Kloeze
- Department of Medical Physics, Catharina Hospital, Eindhoven, the Netherlands
| | - Bas van Bussel
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, Maastricht, the Netherlands
- Care and Public Health Research Institute CAPHRI, Maastricht University, Maastricht, the Netherlands
- Cardiovascular Research Institute Maastricht CARIM, Maastricht University, Maastricht, the Netherlands
| | - Sander Stuijk
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
| | - Jan Bergmans
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
| | - Marcel van 't Veer
- Department of Cardiology, Catharina Hospital Eindhoven, Eindhoven, the Netherlands
| | - Svitlana Zinger
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
| | - Leon Montenij
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
- Department of Anesthesiology, Intensive Care and Pain Medicine, Catharina Hospital, Eindhoven, the Netherlands
| | - R Arthur Bouwman
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
- Department of Anesthesiology, Intensive Care and Pain Medicine, Catharina Hospital, Eindhoven, the Netherlands
| | - Lukas Dekker
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AZ, Eindhoven, the Netherlands
- Department of Cardiology, Catharina Hospital Eindhoven, Eindhoven, the Netherlands
| |
Collapse
|
13
|
Chen CC, Lin SX, Jeong H. Low-Complexity Timing Correction Methods for Heart Rate Estimation Using Remote Photoplethysmography. SENSORS (BASEL, SWITZERLAND) 2025; 25:588. [PMID: 39860958 PMCID: PMC11768942 DOI: 10.3390/s25020588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 01/10/2025] [Accepted: 01/17/2025] [Indexed: 01/27/2025]
Abstract
With the rise of modern healthcare monitoring, heart rate (HR) estimation using remote photoplethysmography (rPPG) has gained attention for its non-contact, continuous tracking capabilities. However, most HR estimation methods rely on stable, fixed sampling intervals, while practical image capture often involves irregular frame rates and missing data, leading to inaccuracies in HR measurements. This study addresses these issues by introducing low-complexity timing correction methods, including linear, cubic, and filter interpolation, to improve HR estimation from rPPG signals under conditions of irregular sampling and data loss. Through a comparative analysis, this study offers insights into efficient timing correction techniques for enhancing HR estimation from rPPG, particularly suitable for edge-computing applications where low computational complexity is essential. Cubic interpolation can provide robust performance in reconstructing signals but requires higher computational resources, while linear and filter interpolation offer more efficient solutions. The proposed low-complexity timing correction methods improve the reliability of rPPG-based HR estimation, making it a more robust solution for real-world healthcare applications.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Electrical Engineering Department, National Chiayi University, Chiayi 600355, Taiwan
| | - Song-Xian Lin
- Electrical Engineering Department, National Chiayi University, Chiayi 600355, Taiwan
| | - Hyundoo Jeong
- Department of Biomedical and Robotics Engineering, Incheon National University, Incheon 22012, Republic of Korea
| |
Collapse
|
14
|
Duan C, Liang X, Dai F. Optimization of Video Heart Rate Detection Based on Improved SSA Algorithm. SENSORS (BASEL, SWITZERLAND) 2025; 25:501. [PMID: 39860871 PMCID: PMC11769212 DOI: 10.3390/s25020501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/15/2025] [Accepted: 01/15/2025] [Indexed: 01/27/2025]
Abstract
A solution to address the issues of environmental light interference in Remote Photoplethysmography (rPPG) methods is proposed in this paper. First, signals from the face's region of interest (ROI) and background noise signals are simultaneously collected, and the two signals are processed by a differential to obtain a more accurate rPPG signal. This method effectively suppresses background noise and enhances signal quality. Secondly, the singular spectrum analysis algorithm (SSA) is enhanced to further improve the accuracy of heart rate detection. The algorithm's parameters are adaptively optimized by integrating the spectral and periodic characteristics of the heart rate signal. Experimental results demonstrate that the method proposed in this paper effectively mitigates the effects of lighting changes on heart rate detection, thereby enhancing detection accuracy. Overall, the experiments indicate that the proposed method significantly improves the effectiveness and accuracy of heart rate detection, achieving a high level of consistency with existing contact-based detection methods.
Collapse
Affiliation(s)
- Chengcheng Duan
- School of Defence Science and Technology, Xi’an Technological University, Xi’an 710021, China;
| | - Xiangyang Liang
- School of Defence Science and Technology, Xi’an Technological University, Xi’an 710021, China;
- School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
| | - Fei Dai
- School of Sciences, Xi’an Technological University, Xi’an 710021, China;
| |
Collapse
|
15
|
Zhu Q, Wong CW, Lazri ZM, Chen M, Fu CH, Wu M. A Comparative Study of Principled rPPG-Based Pulse Rate Tracking Algorithms for Fitness Activities. IEEE Trans Biomed Eng 2025; 72:152-165. [PMID: 39137071 DOI: 10.1109/tbme.2024.3442785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Performance improvements obtained by recent principled approaches for pulse rate (PR) estimation from face videos have typically been achieved by adding or modifying certain modules within a reconfigurable system. Yet, evaluations of such remote photoplethysmography (rPPG) are usually performed only at the system level. To better understand each module's contribution and facilitate future research in explainable learning and artificial intelligence for physiological monitoring, this paper conducts a comparative study of video-based, principled PR tracking algorithms, with a focus on challenging fitness scenarios. A review of the progress achieved over the last decade and a half in this field is utilized to construct the major processing modules of a reconfigurable remote pulse rate sensing system. Experiments are conducted on two challenging datasets-an internal collection of 25 videos of two Asian males exercising on stationary-bike, elliptical, and treadmill machines and 34 videos from a public ECG fitness database of 14 men and 3 women exercising on elliptical and stationary-bike machines. The signal-to-noise ratio (SNR), Pearson's correlation coefficient, error count ratio, error rate, and root mean squared error are used for performance evaluation. The top-performing configuration produces respective values of 0.8 dB, 0.86, 9%, 1.7%, and 3.3 beats per minute (bpm) for the internal dataset and 1.3 dB, 0.77, 28.6%, 6.0%, and 8.1 bpm for the ECG Fitness dataset, achieving significant improvements over alternative configurations. Our results suggest a synergistic effect between pulse color mapping and adaptive motion filtering, as well as the importance of a robust frequency tracking algorithm for PR estimation in low SNR settings.
Collapse
|
16
|
Huang L, Ye F, Shu H, Huang Y, Wang S, Wu Q, Lu H, Wang W. Exploiting Dual-Wavelength Depolarization of Skin-Tissues for Camera-Based Perfusion Monitoring. IEEE Trans Biomed Eng 2025; 72:358-369. [PMID: 39226200 DOI: 10.1109/tbme.2024.3453402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Perfusion index (PI), the ratio between variable pulsatile (AC) and non-pulsatile (DC) components in a photoplethysmographic (PPG) signal, is an indirect and non-invasive measure of peripheral perfusion. PI has been widely used in assessing sympathetic block success, and monitoring hemodynamics in anesthesia and intensive care. Based on the principle of dual-wavelength depolarization (DWD) of skin tissues, we propose to investigate its opportunity in quantifying the skin perfusion contactlessly. The proposed method exploits the characteristic changes in chromaticity caused by skin depolarization and chromophore absorption. The experimental results of DWD, obtained with the post occlusive reactive hyperemia test and the local cooling and heating test, were compared to the PI values obtained from the patient monitor and photoplethysmography imaging (PPGI). The comparison demonstrated the feasibility of using DWD for PI measurement. Clinical trials conducted in the anesthesia recovery room and operating theatre further showed that DWD is potentially a new metric for camera-based non-contact skin perfusion monitoring during clinical operations, such as the guidance in anesthetic surgery.
Collapse
|
17
|
Khaleel Sallam Ma'aitah M, Helwan A. 3D DenseNet with temporal transition layer for heart rate estimation from real-life RGB videos. Technol Health Care 2025; 33:419-430. [PMID: 39058471 DOI: 10.3233/thc-241104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/28/2024]
Abstract
BACKGROUND Deep learning has demonstrated superior performance over traditional methods for the estimation of heart rates in controlled contexts. However, in less controlled scenarios this performance seems to vary based on the training dataset and the architecture of the deep learning models. OBJECTIVES In this paper, we develop a deep learning-based model leveraging the power of 3D convolutional neural networks (3DCNN) to extract temporal and spatial features that lead to an accurate heart rates estimation from RGB no pre-defined region of interest (ROI) videos. METHODS We propose a 3D DenseNet with a 3D temporal transition layer for the estimation of heart rates from a large-scale dataset of videos that appear more hospital-like and real-life than other existing facial video-based datasets. RESULTS Experimentally, our model was trained and tested on this less controlled dataset and showed heart rate estimation performance with root mean square error (RMSE) of 8.68 BPM and mean absolute error (MAE) of 3.34 BPM. CONCLUSION Moreover, we show that such a model can also achieve better results than the state-of-the-art models when tested on the VIPL-HR public dataset.
Collapse
|
18
|
Tseng CW, Wu BF, Sun Y. A Real-Time Contact-Free Atrial Fibrillation Detection System for Mobile Devices. IEEE J Biomed Health Inform 2025; 29:17-29. [PMID: 38954564 DOI: 10.1109/jbhi.2024.3422155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
As the global population ages, the death and prevalence of atrial fibrillation (AF) continue to rise, posing significant concerns due to its strong association with stroke-related disabilities. Detecting AF early before a stroke occurs has become paramount. However, existing methods face challenges in achieving quick, easy, and affordable detection in complex environments characterized by motion interference and varying light conditions. To address these challenges, we propose a system that is employable for edge computing devices like smartphones, tablets, or laptops. Meanwhile, to ensure that the dataset reflects real-world scenarios, we collect 7,216 30-second segments from 452 subjects, categorized into Atrial Fibrillation (AF), Normal Sinus Rhythm (NSR), and Other Arrhythmias (Others), with a subject ratio of 105:116:231. Our lightweight non-contact facial rPPG atrial fibrillation detection system utilizes a Convolution Neural Network (CNN) with a large receptive field and a bidirectional spatial mapping augmented attention module (BiSME-ATT) coupled with a bidirectional feature pyramid network layer (BiFPN), optimized for deployment on mobile devices by reducing model parameters and floating-point operations per second (FLOPs). Our approach significantly improves AF detection accuracy, sensitivity, specificity, positive predictive value, and negative predictive value to 94.39%, 91.57%, 95.44%, 88.06%, and 96.93%, respectively, in AF vs. Non-AF scenarios. Furthermore, the results demonstrate notable enhancements in AF detection across various motion and light intensity levels.
Collapse
|
19
|
Huo C, Yin P, Fu B. MultiPhys: Heterogeneous Fusion of Mamba and Transformer for Video-Based Multi-Task Physiological Measurement. SENSORS (BASEL, SWITZERLAND) 2024; 25:100. [PMID: 39796891 PMCID: PMC11722562 DOI: 10.3390/s25010100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 11/29/2024] [Accepted: 12/23/2024] [Indexed: 01/13/2025]
Abstract
Due to its non-contact characteristics, remote photoplethysmography (rPPG) has attracted widespread attention in recent years, and has been widely applied for remote physiological measurements. However, most of the existing rPPG models are unable to estimate multiple physiological signals simultaneously, and the performance of the limited available multi-task models is also restricted due to their single-model architectures. To address the above problems, this study proposes MultiPhys, adopting a heterogeneous network fusion approach for its development. Specifically, a Convolutional Neural Network (CNN) is used to quickly extract local features in the early stage, a transformer captures global context and long-distance dependencies, and Mamba is used to compensate for the transformer's deficiencies, reducing the computational complexity and improving the accuracy of the model. Additionally, a gate is utilized for feature selection, which classifies the features of different physiological indicators. Finally, physiological indicators are estimated after passing features to each task-related head. Experiments on three datasets show that MultiPhys has superior performance in handling multiple tasks. The results of cross-dataset and hyper-parameter sensitivity tests also verify its generalization ability and robustness, respectively. MultiPhys can be considered as an effective solution for remote physiological estimation, thus promoting the development of this field.
Collapse
Affiliation(s)
| | | | - Bo Fu
- School of Mechanical Engineering, Sichuan University, Chengdu 610065, China; (C.H.); (P.Y.)
| |
Collapse
|
20
|
Yan W, Zhuang J, Chen Y, Zhang Y, Zheng X. MFF-Net: A Lightweight Multi-Frequency Network for Measuring Heart Rhythm from Facial Videos. SENSORS (BASEL, SWITZERLAND) 2024; 24:7937. [PMID: 39771677 PMCID: PMC11679567 DOI: 10.3390/s24247937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 12/07/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025]
Abstract
Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios. Hence, we propose a lightweight multi-frequency network named MFF-Net to measure heart rhythm via facial videos in a short time. Firstly, we propose a multi-frequency mode signal fusion (MFF) mechanism, which can separate the characteristics of different modes of the original rPPG signals and send them to a processor with independent parameters, helping the network recover blood volume pulse (BVP) signals accurately under a complex noise environment. In addition, in order to help the network extract the characteristics of different modal signals effectively, we designed a temporal multiscale convolution module (TMSC-module) and spectrum self-attention module (SSA-module). The TMSC-module can expand the receptive field of the signal-refining network, obtain more abundant multiscale information, and transmit it to the signal reconstruction network. The SSA-module can help a signal reconstruction network locate the obvious inferior parts in the reconstruction process so as to make better decisions when merging multi-dimensional signals. Finally, in order to solve the over-fitting phenomenon that easily occurs in the network, we propose an over-fitting sampling training scheme to further improve the fitting ability of the network. Comprehensive experiments were conducted on three benchmark datasets, and we estimated HR and HRV based on the BVP signals derived by MFF-Net. Compared with state-of-the-art methods, our approach achieves better performance both on HR and HRV estimation with lower computational burden. We can conclude that the proposed MFF-Net has the opportunity to be applied in many real-world scenarios.
Collapse
Affiliation(s)
- Wenqin Yan
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| | - Jialiang Zhuang
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
| | - Yuheng Chen
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| | - Yun Zhang
- School of Information Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Xiujuan Zheng
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| |
Collapse
|
21
|
Hu R, Gao Y, Peng G, Yang H, Zhang J. A novel approach for contactless heart rate monitoring from pet facial videos. Front Vet Sci 2024; 11:1495109. [PMID: 39687850 PMCID: PMC11647959 DOI: 10.3389/fvets.2024.1495109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 11/14/2024] [Indexed: 12/18/2024] Open
Abstract
Introduction Monitoring the heart rate (HR) of pets is challenging when contact with a conscious pet is inconvenient, difficult, injurious, distressing, or dangerous for veterinarians or pet owners. However, few established, simple, and non-invasive techniques for HR measurement in pets exist. Methods To address this gap, we propose a novel, contactless approach for HR monitoring in pet dogs and cats, utilizing facial videos and imaging photoplethysmography (iPPG). This method involves recording a video of the pet's face and extracting the iPPG signal from the video data, offering a simple, non-invasive, and stress-free alternative to conventional HR monitoring techniques. We validated the accuracy of the proposed method by comparing it to electrocardiogram (ECG) recordings in a controlled laboratory setting. Results Experimental results indicated that the average absolute errors between the reference ECG monitor and iPPG estimates were 2.94 beats per minute (BPM) for dogs and 3.33 BPM for cats under natural light, and 2.94 BPM for dogs and 2.33 BPM for cats under artificial light. These findings confirm the reliability and accuracy of our iPPG-based method for HR measurement in pets. Discussion This approach can be applied to resting animals for real-time monitoring of their health and welfare status, which is of significant interest to both veterinarians and families seeking to improve care for their pets.
Collapse
Affiliation(s)
- Renjie Hu
- College of Big Data, Yunnan Agricultural University, Kunming, China
| | - Yu Gao
- College of Big Data, Yunnan Agricultural University, Kunming, China
| | - Guoying Peng
- College of Big Data, Yunnan Agricultural University, Kunming, China
| | - Hongyu Yang
- College of Mechanical and Electrical Engineering, Yunnan Agricultural University, Kunming, China
| | - Jiajin Zhang
- College of Big Data, Yunnan Agricultural University, Kunming, China
| |
Collapse
|
22
|
Wang J, Wei X, Lu H, Chen Y, He D. ConDiff-rPPG: Robust Remote Physiological Measurement to Heterogeneous Occlusions. IEEE J Biomed Health Inform 2024; 28:7090-7102. [PMID: 39052463 DOI: 10.1109/jbhi.2024.3433461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Remote photoplethysmography (rPPG) is a contactless technique that facilitates the measurement of physiological signals and cardiac activities through facial video recordings. This approach holds tremendous potential for various applications. However, existing rPPG methods often did not account for different types of occlusions that commonly occur in real-world scenarios, such as temporary movement or actions of humans in videos or dust on camera. The failure to address these occlusions can compromise the accuracy of rPPG algorithms. To address this issue, we proposed a novel Condiff-rPPG to improve the robustness of rPPG measurement facing various occlusions. First, we compressed the damaged face video into a spatio-temporal representation with several types of masks. Second, the diffusion model was designed to recover the missing information with observed values as a condition. Moreover, a novel low-rank decomposition regularization was proposed to eliminate background noise and maximize informative features. ConDiff-rPPG ensured consistency in optimization goals during the training process. Through extensive experiments, including intra- and cross-dataset evaluations, as well as ablation tests, we demonstrated the robustness and generalization ability of our proposed model.
Collapse
|
23
|
Tong Y, Huang Z, Qiu F, Wang T, Wang Y, Qin F, Yin M. An Accurate Non-Contact Photoplethysmography via Active Cancellation of Reflective Interference. IEEE J Biomed Health Inform 2024; 28:7116-7125. [PMID: 39146172 DOI: 10.1109/jbhi.2024.3443988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Imaging Photoplethysmography (IPPG) is an emerging and efficient optical method for non-contact measurement of pulse waves using an image sensor. While the contactless way brings convenience, the inevitable distance between the sensor and the subject results in massive specular reflection interference on the skin surface, which leads to a low Signal to Interference plus Noise Ratio (SINR) of IPPG. To ease this challenge, this work proposes a novel modulation illumination approach to measure the accurate arterial pulse wave via surface reflection interference isolation from IPPG. Based on the proposed skin reflection model, a specific modulation illumination is designed to separate the surface reflections and obtain the subcutaneous diffuse reflections containing the pulse wave information. Compared with the results under ambient illumination and constant supplemental illumination, the SINR of the proposed method is improved by 4.56 and 3.74 dB, respectively.
Collapse
|
24
|
Buyung RA, Bustamam A, Ramazhan MRS. Integrating Remote Photoplethysmography and Machine Learning on Multimodal Dataset for Noninvasive Heart Rate Monitoring. SENSORS (BASEL, SWITZERLAND) 2024; 24:7537. [PMID: 39686079 DOI: 10.3390/s24237537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 11/07/2024] [Accepted: 11/08/2024] [Indexed: 12/18/2024]
Abstract
Non-contact heart monitoring is crucial in advancing telemedicine, fitness tracking, and mass screening. Remote photoplethysmography (rPPG) is a non-contact technique to obtain information about heart pulse by analyzing the changes in the light intensity reflected or absorbed by the skin during the blood circulation cycle. However, this technique is sensitive to environmental lightning and different skin pigmentation, resulting in unreliable results. This research presents a multimodal approach to non-contact heart rate estimation by combining facial video and physical attributes, including age, gender, weight, height, and body mass index (BMI). For this purpose, we collected local datasets from 60 individuals containing a 1 min facial video and physical attributes such as age, gender, weight, and height, and we derived the BMI variable from the weight and height. We compare the performance of two machine learning models, support vector regression (SVR) and random forest regression on the multimodal dataset. The experimental results demonstrate that incorporating a multimodal approach enhances model performance, with the random forest model achieving superior results, yielding a mean absolute error (MAE) of 3.057 bpm, a root mean squared error (RMSE) of 10.532 bpm, and a mean absolute percentage error (MAPE) of 4.2% that outperforms the state-of-the-art rPPG methods. These findings highlight the potential for interpretable, non-contact, real-time heart rate measurement systems to contribute effectively to applications in telemedicine and mass screening.
Collapse
Affiliation(s)
- Rinaldi Anwar Buyung
- Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Indonesia, Depok 16424, Indonesia
| | - Alhadi Bustamam
- Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Indonesia, Depok 16424, Indonesia
- Data Science Center (DSC), Faculty of Mathematics and Natural Science, Universitas Indonesia, Depok 16424, Indonesia
| | - Muhammad Remzy Syah Ramazhan
- Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Indonesia, Depok 16424, Indonesia
- Data Science Center (DSC), Faculty of Mathematics and Natural Science, Universitas Indonesia, Depok 16424, Indonesia
| |
Collapse
|
25
|
Wang Z, Liao C, Pan L, Lu H, Shan C, Wang W. Living-Skin Detection Based on Spatio-Temporal Analysis of Structured Light Pattern. IEEE J Biomed Health Inform 2024; 28:6738-6750. [PMID: 39163185 DOI: 10.1109/jbhi.2024.3446193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Living-skin detection is an important step for imaging photoplethysmography and biometric anti-spoofing. In this paper, we propose a new approach that exploits spatio-temporal characteristics of structured light patterns projected on the skin surface for living-skin detection. We observed that due to the interactions between laser photons and tissues inside a multi-layer skin structure, the frequency-domain sharpness feature of laser spots on skin and non-skin surfaces exhibits clear difference. Additionally, the subtle physiological motion of living-skin causes laser interference, leading to brightness fluctuations of laser spots projected on the skin surface. Based on these two observations, we designed a new living-skin detection algorithm to distinguish skin from non-skin using spatio-temporal features of structured laser spots. Experiments in the dark chamber and Neonatal Intensive Care Unit (NICU) demonstrated that the proposed setup and method performed well, achieving a precision of 85.32%, recall of 83.87%, and F1-score of 83.03% averaged over these two scenes. Compared to the approach that only leverages the property of multilayer skin structure, the hybrid approach obtains an averaged improvement of 8.18% in precision, 3.93% in recall, and 8.64% in F1-score. These results validate the efficacy of using frequency domain sharpness and brightness fluctuations to augment the features of living-skin tissues irradiated by structured light, providing a solid basis for structured light based physiological imaging.
Collapse
|
26
|
Zhang L, Ren J, Zhao S, Wu P. MDAR: A Multiscale Features-Based Network for Remotely Measuring Human Heart Rate Utilizing Dual-Branch Architecture and Alternating Frame Shifts in Facial Videos. SENSORS (BASEL, SWITZERLAND) 2024; 24:6791. [PMID: 39517688 PMCID: PMC11548444 DOI: 10.3390/s24216791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/17/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Remote photoplethysmography (rPPG) refers to a non-contact technique that measures heart rate through analyzing the subtle signal changes of facial blood flow captured by video sensors. It is widely used in contactless medical monitoring, remote health management, and activity monitoring, providing a more convenient and non-invasive way to monitor heart health. However, factors such as ambient light variations, facial movements, and differences in light absorption and reflection pose challenges to deep learning-based methods. To solve these difficulties, we put forward a measurement network of heart rate based on multiscale features. In this study, we designed and implemented a dual-branch signal processing framework that combines static and dynamic features, proposing a novel and efficient method for feature fusion, enhancing the robustness and reliability of the signal. Furthermore, we proposed an alternate time-shift module to enhance the model's temporal depth. To integrate the features extracted at different scales, we utilized a multiscale feature fusion method, enabling the model to accurately capture subtle changes in blood flow. We conducted cross-validation on three public datasets: UBFC-rPPG, PURE, and MMPD. The results demonstrate that MDAR not only ensures fast inference speed but also significantly improves performance. The two main indicators, MAE and MAPE, achieved improvements of at least 30.6% and 30.2%, respectively, surpassing state-of-the-art methods. These conclusions highlight the potential advantages of MDAR for practical applications.
Collapse
Affiliation(s)
- Linhua Zhang
- Department of Computer Engineering, Taiyuan Institute of Technology, Taiyuan 030008, China;
- School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China;
| | - Jinchang Ren
- School of Computing, Engineering and Technology, Robert Gordon University, Aberdeen AB10 7QB, UK;
| | - Shuang Zhao
- School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China;
| | - Peng Wu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
27
|
Zou B, Zhao Y, Hu X, He C, Yang T. Remote physiological signal recovery with efficient spatio-temporal modeling. Front Physiol 2024; 15:1428351. [PMID: 39469440 PMCID: PMC11513465 DOI: 10.3389/fphys.2024.1428351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 09/30/2024] [Indexed: 10/30/2024] Open
Abstract
Contactless physiological signal measurement has great applications in various fields, such as affective computing and health monitoring. Physiological measurements based on remote photoplethysmography (rPPG) are realized by capturing the weak periodic color changes. The changes are caused by the variation in the light absorption of skin surface during systole and diastole stages of a functioning heart. This measurement mode has advantages of contactless measurement, simple operation, low cost, etc. In recent years, several deep learning-based rPPG measurement methods have been proposed. However, the features learned by deep learning models are vulnerable to motion and illumination artefacts, and are unable to fully exploit the intrinsic temporal characteristics of the rPPG. This paper presents an efficient spatiotemporal modeling-based rPPG recovery method for physiological signal measurements. First, two modules are utilized in the rPPG task: 1) 3D central difference convolution for temporal context modeling with enhanced representation and generalization capacity, and 2) Huber loss for robust intensity-level rPPG recovery. Second, a dual branch structure for both motion and appearance modeling and a soft attention mask are adapted to take full advantage of the central difference convolution. Third, a multi-task setting for joint cardiac and respiratory signals measurements is introduced to benefit from the internal relevance between two physiological signals. Last, extensive experiments performed on three public databases show that the proposed method outperforms prior state-of-the-art methods with the Pearson's correlation coefficient higher than 0.96 on all three datasets. The generalization ability of the proposed method is also evaluated by cross-database and video compression experiments. The effectiveness and necessity of each module are confirmed by ablation studies.
Collapse
Affiliation(s)
- Bochao Zou
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
- Shunde Graduate School of University of Science and Technology Beijing, Beijing, Guangdong, China
| | - Yu Zhao
- Key Laboratory of Complex System Control Theory and Application, Tianjin University of Technology, Tianjin, China
| | - Xiaocheng Hu
- China Academy of Electronics and Information Technology, Beijing, China
| | - Changyu He
- China Academy of Electronics and Information Technology, Beijing, China
| | - Tianwa Yang
- China University of Political Science and Law, Beijing, China
| |
Collapse
|
28
|
Wu YC, Lin CH, Chiu LW, Wu BF, Chung ML, Tang SC, Sun Y. Contact-Free Atrial Fibrillation Screening With Attention Network. IEEE J Biomed Health Inform 2024; 28:5124-5135. [PMID: 38412073 DOI: 10.1109/jbhi.2024.3368049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Atrial Fibrillation (AF) screening from face videos has become popular with the trend of telemedicine and telehealth in recent years. In this study, the largest facial image database for camera-based AF detection is proposed. There are 657 participants from two clinical sites and each of them is recorded for about 10 minutes of video data, which can be further processed as over 10 000 segments around 30 seconds, where the duration setting is referred to the guideline of AF diagnosis. It is also worth noting that, 2 979 segments are segment-wise labeled, that is, every rhythm is independently labeled with AF or not. Besides, all labels are confirmed by the cardiologist manually. Various environments, talking, facial expressions, and head movements are involved in data collection, which meets the situations in practical usage. Specific to camera-based AF screening, a novel CNN-based architecture equipped with an attention mechanism is proposed. It is capable of fusing heartbeat consistency, heart rate variability derived from remote photoplethysmography, and motion features simultaneously to reliable outputs. With the proposed model, the performance of intra-database evaluation comes up to 96.62% of sensitivity, 90.61% of specificity, and 0.96 of AUC. Furthermore, to check the capability of adaptation of the proposed method thoroughly, the cross-database evaluation is also conducted, and the performance also reaches about 90% on average with the AUCs being over 0.94 in both clinical sites.
Collapse
|
29
|
Nguyen N, Nguyen L, Li H, Bordallo López M, Álvarez Casado C. Evaluation of video-based rPPG in challenging environments: Artifact mitigation and network resilience. Comput Biol Med 2024; 179:108873. [PMID: 39053334 DOI: 10.1016/j.compbiomed.2024.108873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 07/05/2024] [Accepted: 07/08/2024] [Indexed: 07/27/2024]
Abstract
Video-based remote photoplethysmography (rPPG) has emerged as a promising technology for non-contact vital sign monitoring, especially under controlled conditions. However, the accurate measurement of vital signs in real-world scenarios faces several challenges, including artifacts induced by videocodecs, low-light noise, degradation, low dynamic range, occlusions, and hardware and network constraints. In this article, a systematic and comprehensive investigation of these issues is conducted, measuring their detrimental effects on the quality of rPPG measurements. Additionally, practical strategies are proposed for mitigating these challenges to improve the dependability and resilience of video-based rPPG systems. Methods for effective biosignal recovery in the presence of network limitations are detailed, along with denoising and inpainting techniques aimed at preserving video frame integrity. Compared to previous studies, this paper addresses a broader range of variables and demonstrates improved accuracy across various rPPG methods, emphasizing generalizability for practical applications in diverse scenarios with varying data quality. Extensive evaluations and direct comparisons demonstrate the effectiveness of these approaches in enhancing rPPG measurements under challenging environments, contributing to the development of more reliable and effective remote vital sign monitoring technologies.
Collapse
Affiliation(s)
- Nhi Nguyen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland.
| | - Le Nguyen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland.
| | - Honghan Li
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland; Division of Bioengineering, Graduate School of Engineering Science, Osaka University, Osaka, Japan.
| | - Miguel Bordallo López
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland; VTT Technical Research Center of Finland Ltd., Oulu, Finland.
| | | |
Collapse
|
30
|
Xu J, Song C, Yue Z, Ding S. Facial Video-Based Non-Contact Stress Recognition Utilizing Multi-Task Learning With Peak Attention. IEEE J Biomed Health Inform 2024; 28:5335-5346. [PMID: 38861440 DOI: 10.1109/jbhi.2024.3412103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
Abstract
Negative emotional states, such as anxiety and depression, pose significant challenges in contemporary society, often stemming from the stress encountered in daily activities. Stress (state or level) recognition is a crucial prerequisite for effective stress management and intervention. Presently, wearable devices have been employed to capture physiological signals and analyze stress states. However, their constant skin contact can lead to discomfort and disturbance during prolonged monitoring. In this paper, a peak attention-based multitasking framework is presented for non-contact stress recognition. The framework extracts rPPG signals from RGB facial videos, utilizing them as inputs for a novel multi-task attentional convolutional neural network for stress recognition (MTASR). It incorporates peak detection and HR estimation as auxiliary tasks to facilitate stress recognition. By leveraging multi-task learning, MTASR can utilize information related to stress physiological responses, thereby enhancing feature extraction efficiency. For stress recognition, two binary classification tasks are applied: stress state recognition and stress level recognition. The model is validated on the UBFC-Phys public dataset and demonstrates an accuracy of 94.33% for stress state recognition and 83.83% for stress level recognition. The proposed method outperforms the dataset's baseline methods and other competing approaches.
Collapse
|
31
|
Li K, Sun J. Understanding the physiological transmission mechanisms of photoplethysmography signals: a comprehensive review. Physiol Meas 2024; 45:08TR02. [PMID: 39106894 DOI: 10.1088/1361-6579/ad6be4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 08/06/2024] [Indexed: 08/09/2024]
Abstract
Objective. The widespread adoption of Photoplethysmography (PPG) as a non-invasive method for detecting blood volume variations and deriving vital physiological parameters reflecting health status has surged, primarily due to its accessibility, cost-effectiveness, and non-intrusive nature. This has led to extensive research around this technique in both daily life and clinical applications. Interestingly, despite the existence of contradictory explanations of the underlying mechanism of PPG signals across various applications, a systematic investigation into this crucial matter has not been conducted thus far. This gap in understanding hinders the full exploitation of PPG technology and undermines its accuracy and reliability in numerous applications.Approach. Building upon a comprehensive review of the fundamental principles and technological advancements in PPG, this paper initially attributes the origin of PPG signals to a combination of physical and physiological transmission processes. Furthermore, three distinct models outlining the concerned physiological transmission processes are synthesized, with each model undergoing critical examination based on theoretical underpinnings, empirical evidence, and constraints.Significance. The ultimate objective is to form a fundamental framework for a better understanding of physiological transmission processes in PPG signal generation and to facilitate the development of more reliable technologies for detecting physiological signals.
Collapse
Affiliation(s)
- Kai Li
- School of Medical Imaging, Jiading District Central Hospital Affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, People's Republic of China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China
| | - Jiuai Sun
- School of Medical Imaging, Jiading District Central Hospital Affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, People's Republic of China
| |
Collapse
|
32
|
Zhu Y, Hong H, Wang W. Privacy-Protected Contactless Sleep Parameters Measurement Using a Defocused Camera. IEEE J Biomed Health Inform 2024; 28:4660-4673. [PMID: 38696292 DOI: 10.1109/jbhi.2024.3396397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2024]
Abstract
Sleep monitoring plays a vital role in various scenarios such as hospitals and living-assisted homes, contributing to the prevention of sleep accidents as well as the assessment of sleep health. Contactless camera-based sleep monitoring is promising due to its user-friendly nature and rich visual semantics. However, the privacy concern of video cameras limits their applications in sleep monitoring. In this paper, we explored the opportunity of using a defocused camera that does not allow identification of the monitored subject when measuring sleep-related parameters, as face detection and recognition are impossible on optically blurred images. We proposed a novel privacy-protected sleep parameters measurement framework, including a physiological measurement branch and a semantic analysis branch based on ResNet-18. Four important sleep parameters are measured: heart rate (HR), respiration rate (RR), sleep posture, and movement. The results of HR, RR, and movement have strong correlations with the reference (HR: R = 0.9076; RR: R = 0.9734; Movement: R = 0.9946). The overall mean absolute errors (MAE) for HR and RR are 5.2 bpm and 1.5 bpm respectively. The measurement of HR and RR achieve reliable estimation coverage of 72.1% and 93.6%, respectively. The sleep posture detection achieves an overall accuracy of 94.5%. Experimental results show that the defocused camera is promising for sleep monitoring as it fundamentally eliminates the privacy issue while still allowing the measurement of multiple parameters that are essential for sleep health informatics.
Collapse
|
33
|
Cao M, Cheng X, Liu X, Jiang Y, Yu H, Shi J. ST-Phys: Unsupervised Spatio-Temporal Contrastive Remote Physiological Measurement. IEEE J Biomed Health Inform 2024; 28:4613-4624. [PMID: 38743531 DOI: 10.1109/jbhi.2024.3400869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Remote photoplethysmography (rPPG) is a non-contact method that employs facial videos for measuring physiological parameters. Existing rPPG methods have achieved remarkable performance. However, the success mainly profits from supervised learning over massive labeled data. On the other hand, existing unsupervised rPPG methods fail to fully utilize spatio-temporal features and encounter challenges in low-light or noise environments. To address these problems, we propose an unsupervised contrast learning approach, ST-Phys. We incorporate a low-light enhancement module, a temporal dilated module, and a spatial enhanced module to better deal with long-term dependencies under the random low-light conditions. In addition, we design a circular margin loss, wherein rPPG signals originating from identical videos are attracted, while those from distinct videos are repelled. Our method is assessed on six openly accessible datasets, including RGB and NIR videos. Extensive experiments reveal the superior performance of our proposed ST-Phys over state-of-the-art unsupervised rPPG methods. Moreover, it offers advantages in parameter reduction and noise robustness.
Collapse
|
34
|
Chen S, Wong KL, Chin JW, Chan TT, So RHY. DiffPhys: Enhancing Signal-to-Noise Ratio in Remote Photoplethysmography Signal Using a Diffusion Model Approach. Bioengineering (Basel) 2024; 11:743. [PMID: 39199701 PMCID: PMC11351469 DOI: 10.3390/bioengineering11080743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 06/30/2024] [Accepted: 07/08/2024] [Indexed: 09/01/2024] Open
Abstract
Remote photoplethysmography (rPPG) is an emerging non-contact method for monitoring cardiovascular health based on facial videos. The quality of the captured videos largely determines the efficacy of rPPG in this application. Traditional rPPG techniques, while effective for heart rate (HR) estimation, often produce signals with an inadequate signal-to-noise ratio (SNR) for reliable vital sign measurement due to artifacts like head motion and measurement noise. Another pivotal factor is the overlooking of the inherent properties of signals generated by rPPG (rPPG-signals). To address these limitations, we introduce DiffPhys, a novel deep generative model particularly designed to enhance the SNR of rPPG-signals. DiffPhys leverages the conditional diffusion model to learn the distribution of rPPG-signals and uses a refined reverse process to generate rPPG-signals with a higher SNR. Experimental results demonstrate that DiffPhys elevates the SNR of rPPG-signals across within-database and cross-database scenarios, facilitating the extraction of cardiovascular metrics such as HR and HRV with greater precision. This enhancement allows for more accurate monitoring of health conditions in non-clinical settings.
Collapse
Affiliation(s)
- Shutao Chen
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (S.C.); (K.-L.W.); (J.-W.C.); (T.-T.C.)
| | - Kwan-Long Wong
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (S.C.); (K.-L.W.); (J.-W.C.); (T.-T.C.)
| | - Jing-Wei Chin
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (S.C.); (K.-L.W.); (J.-W.C.); (T.-T.C.)
| | - Tsz-Tai Chan
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (S.C.); (K.-L.W.); (J.-W.C.); (T.-T.C.)
| | - Richard H. Y. So
- Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
35
|
Chen W, Yi Z, Lim LJR, Lim RQR, Zhang A, Qian Z, Huang J, He J, Liu B. Deep learning and remote photoplethysmography powered advancements in contactless physiological measurement. Front Bioeng Biotechnol 2024; 12:1420100. [PMID: 39104628 PMCID: PMC11298756 DOI: 10.3389/fbioe.2024.1420100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 06/27/2024] [Indexed: 08/07/2024] Open
Abstract
In recent decades, there has been ongoing development in the application of computer vision (CV) in the medical field. As conventional contact-based physiological measurement techniques often restrict a patient's mobility in the clinical environment, the ability to achieve continuous, comfortable and convenient monitoring is thus a topic of interest to researchers. One type of CV application is remote imaging photoplethysmography (rPPG), which can predict vital signs using a video or image. While contactless physiological measurement techniques have an excellent application prospect, the lack of uniformity or standardization of contactless vital monitoring methods limits their application in remote healthcare/telehealth settings. Several methods have been developed to improve this limitation and solve the heterogeneity of video signals caused by movement, lighting, and equipment. The fundamental algorithms include traditional algorithms with optimization and developing deep learning (DL) algorithms. This article aims to provide an in-depth review of current Artificial Intelligence (AI) methods using CV and DL in contactless physiological measurement and a comprehensive summary of the latest development of contactless measurement techniques for skin perfusion, respiratory rate, blood oxygen saturation, heart rate, heart rate variability, and blood pressure.
Collapse
Affiliation(s)
- Wei Chen
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Zhe Yi
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Lincoln Jian Rong Lim
- Department of Medical Imaging, Western Health, Footscray Hospital, Footscray, VIC, Australia
- Department of Surgery, The University of Melbourne, Melbourne, VIC, Australia
| | - Rebecca Qian Ru Lim
- Department of Hand & Reconstructive Microsurgery, Singapore General Hospital, Singapore, Singapore
| | - Aijie Zhang
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Zhen Qian
- Institute of Intelligent Diagnostics, Beijing United-Imaging Research Institute of Intelligent Imaging, Beijing, China
| | - Jiaxing Huang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jia He
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Bo Liu
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
- Beijing Research Institute of Traumatology and Orthopaedics, Beijing, China
| |
Collapse
|
36
|
Momeni M, Wuthe S, Molmer MB, Lobner Svendsen E, Brabrand M, Biesenbach P, Teichmann D. Facial Remote Photoplethysmography for Continuous Heart Rate Monitoring during Prolonged Cold Liquid Bolus Administration. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039011 DOI: 10.1109/embc53108.2024.10781709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
This study investigates non-contact heart rate (HR) monitoring through camera-based remote PPG during intravenous fluid bolus (FB) therapy. The experiment, at Odense University Hospital, Denmark, involved 4 volunteers and over 350 minutes of filming. We implemented a MATLAB-based HR extraction tool chain. The proposed method includes a two-stage process for dynamically determining regions of interest (ROIs), incorporating deep learning for facial landmarks detection and a subsequential consideration of subjects' facial dimensions. HR estimation uses chrominance-based (CHROM) and plane-orthogonal-to-skin (POS) PPG signal extraction methods, chosen for robustness against motion artifacts. Deviating from usual advice for other methods, omitting preprocessing minimizes signal processing still yielding a low error rate. The system achieved a mean error of fewer than 2 beats per minute (bpm), underscoring iPPG's alignment with ground truth. The results exemplify the feasibility of remote PPG monitoring in critical care and emergency settings.
Collapse
|
37
|
Zhang T, Bolic M, Davoodabadi Farahani MH, Zadorsky T, Sabbagh R. Non-contact Heart Rate and Respiratory Rate Estimation from Videos of the Neck. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039511 DOI: 10.1109/embc53108.2024.10781989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Heart Rate (HR) and Respiratory Rate (RR) estimation constitutes a crucial part of non-contact assessment of cardiovascular disease, which has been a leading cause of death worldwide. This paper proposes a novel HR and RR estimation algorithm based on RGB videos recorded by a smartphone camera. Instead of requiring facial videos, this novel algorithm demonstrates the ability to estimate HR and RR using only a video of the human neck. This novel algorithm captures cardiac as well as respiratory activity via detecting skin displacement by only analyzing the Laplacian pyramid of each frame of the video. Its performance was evaluated by applying it to the videos of neck of 80 participants and comparing it to existing methods, demonstrating the superior performance of the proposed algorithm.
Collapse
|
38
|
Anil AA, Karthik S, Sivaprakasam M, Joseph J. PhysioSens1D-NET: A 1D Convolution Network for Extracting Heart Rate from Facial Videos. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039469 DOI: 10.1109/embc53108.2024.10782272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Non-contact heart rate (HR) monitoring from video streams is the most established approach to unobtrusive vitals monitoring. A multitude of classical signal processing algorithms and cutting-edge deep learning models have been developed for non-contact HR extraction. Classical signal processing algorithms excel in real-time application, even on low-end CPUs, while deep learning models offer higher accuracy at the cost of computational complexity. In this study, we introduce PhysioSens1DNET- a novel 1D convolutional neural network, that deliver both computational efficiency and accurate HR measures. In contrast to classical rPPG algorithms like ICA, POS, CHROM, PBV, LGI, and GREEN, the PhysioSens1D-NET demonstrates significant improvements, achieving reductions in Mean Absolute Error (MAE) by 91.4%, 72.5%, 70.7%, 93.1%, 76.7%, and 95.1%, respectively. When compared to state-of-the-art deep learning models, including DeepPhys, EfficientNet, PhysNet, and TS-CAN, our 1D-NET exhibits comparable performance. A performance analysis on low specification CPU's, indicated that PhysioSens1DNET outperforms deep learning models, showcasing a considerable speed advantage-being 180 times faster than the bestperforming DL model. Furthermore, our 1D-NET aligns closely with classical algorithms with a computational time of only 2.3 ms.
Collapse
|
39
|
Anil AA, Karthik S, Sivaprakasam M, Joseph J. Enhancing Non-Contact Heart Rate Monitoring: An Intelligent Multi-ROI Approach with Face Masking and CNN-Based Feature Adaptation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40040198 DOI: 10.1109/embc53108.2024.10781978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Heart rate (HR) estimation from facial video streams has emerged in the recent years as a promising method of unobtrusive vitals monitoring. Conventional non-contact HR monitoring algorithms like POS, CHROM, ICA are often applied to a single region of interest (ROI), typically the forehead. However, this approach has a lot of disadvantages, such as not utilizing other facial regions, poor tolerance to movement of the subject or face. To address this, we propose a MultiROI approach with face Masking and CNN-based facial feature adaptation. We introduce an novel face masking technique method using facial landmarks alone, effectively eliminating non-skin pixels like background, hair, eyes, lips, and eyebrows. Additionally, a CNN model was designed to classify individuals based on facial features, dynamically adjusting ROI positions and ROI numbers accordingly. The proposed comprehensive approach significantly reduced the Mean Absolute Error (MAE) in HR measurement by 58.2%, 47.1%, and 33.2% for POS, CHROM, and ICA algorithms respectively, when compared to the traditional single ROI approach. The multi-ROI approach can thus improve measurement reliability and robustness to motion.
Collapse
|
40
|
Jaimme Poppen CD, Kumar NJ, Karthik S, Margana BS, Sivaprakasam M, Joseph J. Fusion of ballistocardiography and imaging for improved non-contact heart rate monitoring. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039206 DOI: 10.1109/embc53108.2024.10781858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Camera based imaging is the most widely used technique for non-contact heart rate monitoring (HRM). However, it's robustness to motion artifacts and the absence of reliability when the patient's face is not in the camera's field of view still persists. This is often addressed with computationally heavy AI algorithms or hardware intensive multi-camera systems. Here, we investigate the improvement in accuracy and reliability of noncontact HRM by augmenting vision with an unobtrusive near field sensing modality such as ballistocardiography. The system seamlessly transitions between the two modalities based on a real time signal quality index (SQI). The SQI parameter was able to discard segments of higher errors and select the data stream with lower error. The proposed system was validated on data collected from 20 subjects, with induced motion and facial occlusions to create artifacts. The accuracy was validated against a contact-based ground truth reference. The findings indicate a notable enhancement in the temporal coverage upto 100 percent. The proposed method had a competent error ranging between 3 to 17 bpm, with a median error of 8 bpm.
Collapse
|
41
|
Li J, Vatanparvar K, Gwak M, Zhu L, Kuang J, Gao A. Enhance Heart Rate Measurement from Remote PPG with Head Motion Awareness from Image. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039974 DOI: 10.1109/embc53108.2024.10782369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Measurement of cardiac pulse rate through image-based remote photoplethysmography (rPPG) is drawing attention to applications of continuous health monitoring. Meanwhile, extracting clean rPPG signals and reliable heart rate (HR) remotely is challenging especially in real-life scenarios where users can move freely. In this paper, we leverage head motion information in the video to increase tolerance of vital estimation against the motion. A motion artifact classification model relying on rPPG and real-time head motion signals is developed to identify motion artifacts and reject outliers. We handcrafted 106 features and selected 20 features from both time and frequency domains. The model and methodology are validated comprehensively in a dataset of 30 subjects with 25 motion tasks in three motion intensity levels: low-motion, medium-motion, and high-motion. The motion-aware pipeline achieves a mean absolute error of 4.03 bpm for high-motion intensity tasks, improved by 31% by removing artifacts with specificity over 75%. In addition, the pipeline is tested with various light intensities to show that the motion detection is robust in darker conditions.
Collapse
|
42
|
Saikevičius L, Raudonis V, Dervinis G, Baranauskas V. Non-Contact Vision-Based Techniques of Vital Sign Monitoring: Systematic Review. SENSORS (BASEL, SWITZERLAND) 2024; 24:3963. [PMID: 38931747 PMCID: PMC11207835 DOI: 10.3390/s24123963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024]
Abstract
The development of non-contact techniques for monitoring human vital signs has significant potential to improve patient care in diverse settings. By facilitating easier and more convenient monitoring, these techniques can prevent serious health issues and improve patient outcomes, especially for those unable or unwilling to travel to traditional healthcare environments. This systematic review examines recent advancements in non-contact vital sign monitoring techniques, evaluating publicly available datasets and signal preprocessing methods. Additionally, we identified potential future research directions in this rapidly evolving field.
Collapse
Affiliation(s)
| | - Vidas Raudonis
- Automation Department, Faculty of Electrical and Electronics Engineering, Kaunas University of Technology, 44249 Kaunas, Lithuania; (L.S.); (G.D.); (V.B.)
| | | | | |
Collapse
|
43
|
Castellano Ontiveros R, Elgendi M, Menon C. A machine learning-based approach for constructing remote photoplethysmogram signals from video cameras. COMMUNICATIONS MEDICINE 2024; 4:109. [PMID: 38849495 PMCID: PMC11161609 DOI: 10.1038/s43856-024-00519-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 05/03/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Advancements in health monitoring technologies are increasingly relying on capturing heart signals from video, a method known as remote photoplethysmography (rPPG). This study aims to enhance the accuracy of rPPG signals using a novel computer technique. METHODS We developed a machine-learning model to improve the clarity and accuracy of rPPG signals by comparing them with traditional photoplethysmogram (PPG) signals from sensors. The model was evaluated across various datasets and under different conditions, such as rest and movement. Evaluation metrics, including dynamic time warping (to assess timing alignment between rPPG and PPG) and correlation coefficients (to measure the linear association between rPPG and PPG), provided a robust framework for validating the effectiveness of our model in capturing and replicating physiological signals from videos accurately. RESULTS Our method showed significant improvements in the accuracy of heart signals captured from video, as evidenced by dynamic time warping and correlation coefficients. The model performed exceptionally well, demonstrating its effectiveness in achieving accuracy comparable to direct-contact heart signal measurements. CONCLUSIONS This study introduces a novel and effective machine-learning approach for improving the detection of heart signals from video. The results demonstrate the flexibility of our method across various scenarios and its potential to enhance the accuracy of health monitoring applications, making it a promising tool for remote healthcare.
Collapse
Affiliation(s)
- Rodrigo Castellano Ontiveros
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
- School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Mohamed Elgendi
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland.
| | - Carlo Menon
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
44
|
Khanam FTZ, Perera AG, Al-Naji A, Mcintyre TD, Chahl J. Integrating RGB-thermal image sensors for non-contact automatic respiration rate monitoring. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2024; 41:1140-1151. [PMID: 38856428 DOI: 10.1364/josaa.520757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/23/2024] [Indexed: 06/11/2024]
Abstract
Respiration rate (RR) holds significance as a human health indicator. Presently, the conventional RR monitoring system requires direct physical contact, which may cause discomfort and pain. Therefore, this paper proposes a non-contact RR monitoring system integrating RGB and thermal imaging through RGB-thermal image alignment. The proposed method employs an advanced image processing algorithm for automatic region of interest (ROI) selection. The experimental results demonstrated a close correlation and a lower error rate between measured thermal, measured RGB, and reference data. In summary, the proposed non-contact system emerges as a promising alternative to conventional contact-based approaches without the associated discomfort and pain.
Collapse
|
45
|
Wang W, Shu H, Lu H, Xu M, Ji X. Multispectral Depolarization Based Living-Skin Detection: A New Measurement Principle. IEEE Trans Biomed Eng 2024; 71:1937-1949. [PMID: 38241110 DOI: 10.1109/tbme.2024.3356410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
Camera-based photoplethysmographic imaging enabled the segmentation of living-skin tissues in a video, but it has inherent limitations to be used in real-life applications such as video health monitoring and face anti-spoofing. Inspired by the use of polarization for improving vital signs monitoring (i.e. specular reflection removal), we observed that skin tissues have an attractive property of wavelength-dependent depolarization due to its multi-layer structure containing different absorbing chromophores, i.e. polarized light photons with longer wavelengths (R) have deeper skin penetrability and thus experience thorougher depolarization than those with shorter wavelengths (G and B). Thus we proposed a novel dual-polarization setup and an elegant algorithm (named "MSD") that exploits the nature of multispectral depolarization of skin tissues to detect living-skin pixels, which only requires two images sampled at the parallel and cross polarizations to estimate the characteristic chromaticity changes (R/G) caused by tissue depolarization. Our proposal was verified in both the laboratory and hospital settings (ICU and NICU) focused on anti-spoofing and patient skin segmentation. The clinical experiments in ICU also indicate the potential of MSD for skin perfusion analysis, which may lead to a new diagnostic imaging approach in the future.
Collapse
|
46
|
Xiang G, Yao S, Peng Y, Deng H, Wu X, Wang K, Li Y, Wu F. An effective cross-scenario remote heart rate estimation network based on global-local information and video transformer. Phys Eng Sci Med 2024; 47:729-739. [PMID: 38504066 DOI: 10.1007/s13246-024-01401-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/06/2024] [Indexed: 03/21/2024]
Abstract
Remote photoplethysmography (rPPG) technology is a non-contact physiological signal measurement method, characterized by non-invasiveness and ease of use. It has broad application potential in medical health, human factors engineering, and other fields. However, current rPPG technology is highly susceptible to variations in lighting conditions, head pose changes, and partial occlusions, posing significant challenges for its widespread application. In order to improve the accuracy of remote heart rate estimation and enhance model generalization, we propose PulseFormer, a dual-path network based on transformer. By integrating local and global information and utilizing fast and slow paths, PulseFormer effectively captures the temporal variations of key regions and spatial variations of the global area, facilitating the extraction of rPPG feature information while mitigating the impact of background noise variations. Heart rate estimation results on the popular rPPG dataset show that PulseFormer achieves state-of-the-art performance on public datasets. Additionally, we establish a dataset containing facial expressions and synchronized physiological signals in driving scenarios and test the pre-trained model from the public dataset on this collected dataset. The results indicate that PulseFormer exhibits strong generalization capabilities across different data distributions in cross-scenario settings. Therefore, this model is applicable for heart rate estimation of individuals in various scenarios.
Collapse
Affiliation(s)
- Guoliang Xiang
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Song Yao
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Yong Peng
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China.
| | - Hanwen Deng
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Xianhui Wu
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Kui Wang
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Yingli Li
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Fan Wu
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| |
Collapse
|
47
|
Zhu F, Niu Q, Li X, Zhao Q, Su H, Shuai J. FM-FCN: A Neural Network with Filtering Modules for Accurate Vital Signs Extraction. RESEARCH (WASHINGTON, D.C.) 2024; 7:0361. [PMID: 38737196 PMCID: PMC11082448 DOI: 10.34133/research.0361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/01/2024] [Indexed: 05/14/2024]
Abstract
Neural networks excel at capturing local spatial patterns through convolutional modules, but they may struggle to identify and effectively utilize the morphological and amplitude periodic nature of physiological signals. In this work, we propose a novel network named filtering module fully convolutional network (FM-FCN), which fuses traditional filtering techniques with neural networks to amplify physiological signals and suppress noise. First, instead of using a fully connected layer, we use an FCN to preserve the time-dimensional correlation information of physiological signals, enabling multiple cycles of signals in the network and providing a basis for signal processing. Second, we introduce the FM as a network module that adapts to eliminate unwanted interference, leveraging the structure of the filter. This approach builds a bridge between deep learning and signal processing methodologies. Finally, we evaluate the performance of FM-FCN using remote photoplethysmography. Experimental results demonstrate that FM-FCN outperforms the second-ranked method in terms of both blood volume pulse (BVP) signal and heart rate (HR) accuracy. It substantially improves the quality of BVP waveform reconstruction, with a decrease of 20.23% in mean absolute error (MAE) and an increase of 79.95% in signal-to-noise ratio (SNR). Regarding HR estimation accuracy, FM-FCN achieves a decrease of 35.85% in MAE, 29.65% in error standard deviation, and 32.88% decrease in 95% limits of agreement width, meeting clinical standards for HR accuracy requirements. The results highlight its potential in improving the accuracy and reliability of vital sign measurement through high-quality BVP signal extraction. The codes and datasets are available online at https://github.com/zhaoqi106/FM-FCN.
Collapse
Affiliation(s)
- Fangfang Zhu
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network,
Xiamen University, Xiamen 361005, China
| | - Qichao Niu
- Vitalsilicon Technology Co. Ltd., Jiaxing, Zhejiang 314006, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Qi Zhao
- School of Computer Science and Software Engineering,
University of Science and Technology Liaoning, Anshan 114051, China
| | - Honghong Su
- Yangtze Delta Region Institute of Tsinghua University, Zhejiang, Jiaxing 314006, China
| | - Jianwei Shuai
- Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou 325001, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China
| |
Collapse
|
48
|
Liu X, Yang X, Li X. HRUNet: Assessing Uncertainty in Heart Rates Measured From Facial Videos. IEEE J Biomed Health Inform 2024; 28:2955-2966. [PMID: 38345952 DOI: 10.1109/jbhi.2024.3363006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Video-based Photoplethysmography (VPPG) offers the capability to measure heart rate (HR) from facial videos. However, the reliability of the HR values extracted through this method remains uncertain, especially when videos are affected by various disturbances. Confronted by this challenge, we introduce an innovative framework for VPPG-based HR measurements, with a focus on capturing diverse sources of uncertainty in the predicted HR values. In this context, a neural network named HRUNet is structured for HR extraction from input facial videos. Departing from the conventional training approach of learning specific weight (and bias) values, we leverage the Bayesian posterior estimation to derive weight distributions within HRUNet. These distributions allow for sampling to encode uncertainty stemming from HRUNet's limited performance. On this basis, we redefine HRUNet's output as a distribution of potential HR values, as opposed to the traditional emphasis on the single most probable HR value. The underlying goal is to discover the uncertainty arising from inherent noise in the input video. HRUNet is evaluated across 1,098 videos from seven datasets, spanning three scenarios: undisturbed, motion-disturbed, and light-disturbed. The ensuing test outcomes demonstrate that uncertainty in the HR measurements increases significantly in the scenarios marked by disturbances, compared to that in the undisturbed scenario. Moreover, HRUNet outperforms state-of-the-art methods in HR accuracy when excluding HR values with 0.4 uncertainty. This underscores that uncertainty emerges as an informative indicator of potentially erroneous HR measurements. With enhanced reliability affirmed, the VPPG technique holds the promise for applications in safety-critical domains.
Collapse
|
49
|
Slapničar G, Wang W, Luštrek M. Generalized channel separation algorithms for accurate camera-based multi-wavelength PTT and BP estimation. BIOMEDICAL OPTICS EXPRESS 2024; 15:3128-3146. [PMID: 38855660 PMCID: PMC11161386 DOI: 10.1364/boe.518562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 06/11/2024]
Abstract
Single-site multi-wavelength (MW) pulse transit time (PTT) measurement was recently proposed using contact sensors with sequential illumination. It leverages different penetration depths of light to measure the traversal of a cardiac pulse between skin layers. This enabled continuous single-site MW blood pressure (BP) monitoring, but faces challenges like subtle skin compression, which importantly influences the PPG morphology and subsequent PTT. We extended this idea to contact-free camera-based sensing and identified the major challenge of color channel overlap, which causes the signals obtained from a consumer RGB camera to be a mixture of responses in different wavelengths, thus not allowing for meaningful PTT measurement. To address this, we propose novel camera-independent data-driven channel separation algorithms based on constrained genetic algorithms. We systematically validated the algorithms on camera recordings of palms and corresponding ground-truth BP measurements of 13 subjects in two different scenarios, rest and activity. We compared the proposed algorithms against established blind source separation methods and against previous camera-specific physics-based method, showing good performance in both PTT reconstruction and BP estimation using a Random Forest regressor. The best-performing algorithm achieved mean absolute errors (MAEs) of 3.48 and 2.61 mmHg for systolic and diastolic BP in a leave-one-subject-out experiment with personalization, solidifying the proposed algorithms as enablers of novel contact-free MW PTT and BP estimation.
Collapse
Affiliation(s)
- Gašper Slapničar
- Department of Intelligent Systems, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia
| | - Wenjin Wang
- Biomedical Engineering Department, Southern University of Science and Technology, 1088 Xueyuan Blvd, Nanshan, Shenzhen, Guangdong, China
| | - Mitja Luštrek
- Department of Intelligent Systems, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
50
|
Talala S, Shvimmer S, Simhon R, Gilead M, Yitzhaky Y. Emotion Classification Based on Pulsatile Images Extracted from Short Facial Videos via Deep Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:2620. [PMID: 38676235 PMCID: PMC11053953 DOI: 10.3390/s24082620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024]
Abstract
Most human emotion recognition methods largely depend on classifying stereotypical facial expressions that represent emotions. However, such facial expressions do not necessarily correspond to actual emotional states and may correspond to communicative intentions. In other cases, emotions are hidden, cannot be expressed, or may have lower arousal manifested by less pronounced facial expressions, as may occur during passive video viewing. This study improves an emotion classification approach developed in a previous study, which classifies emotions remotely without relying on stereotypical facial expressions or contact-based methods, using short facial video data. In this approach, we desire to remotely sense transdermal cardiovascular spatiotemporal facial patterns associated with different emotional states and analyze this data via machine learning. In this paper, we propose several improvements, which include a better remote heart rate estimation via a preliminary skin segmentation, improvement of the heartbeat peaks and troughs detection process, and obtaining a better emotion classification accuracy by employing an appropriate deep learning classifier using an RGB camera input only with data. We used the dataset obtained in the previous study, which contains facial videos of 110 participants who passively viewed 150 short videos that elicited the following five emotion types: amusement, disgust, fear, sexual arousal, and no emotion, while three cameras with different wavelength sensitivities (visible spectrum, near-infrared, and longwave infrared) recorded them simultaneously. From the short facial videos, we extracted unique high-resolution spatiotemporal, physiologically affected features and examined them as input features with different deep-learning approaches. An EfficientNet-B0 model type was able to classify participants' emotional states with an overall average accuracy of 47.36% using a single input spatiotemporal feature map obtained from a regular RGB camera.
Collapse
Affiliation(s)
- Shlomi Talala
- Department of Electro-Optics and Photonics Engineering, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel; (S.T.)
| | - Shaul Shvimmer
- Department of Electro-Optics and Photonics Engineering, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel; (S.T.)
| | - Rotem Simhon
- School of Psychology, Tel Aviv University, Tel Aviv 39040, Israel
| | - Michael Gilead
- School of Psychology, Tel Aviv University, Tel Aviv 39040, Israel
| | - Yitzhak Yitzhaky
- Department of Electro-Optics and Photonics Engineering, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel; (S.T.)
| |
Collapse
|