1
|
Xiang G, Yao S, Peng Y, Deng H, Wu X, Wang K, Li Y, Wu F. An effective cross-scenario remote heart rate estimation network based on global-local information and video transformer. Phys Eng Sci Med 2024; 47:729-739. [PMID: 38504066 DOI: 10.1007/s13246-024-01401-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/06/2024] [Indexed: 03/21/2024]
Abstract
Remote photoplethysmography (rPPG) technology is a non-contact physiological signal measurement method, characterized by non-invasiveness and ease of use. It has broad application potential in medical health, human factors engineering, and other fields. However, current rPPG technology is highly susceptible to variations in lighting conditions, head pose changes, and partial occlusions, posing significant challenges for its widespread application. In order to improve the accuracy of remote heart rate estimation and enhance model generalization, we propose PulseFormer, a dual-path network based on transformer. By integrating local and global information and utilizing fast and slow paths, PulseFormer effectively captures the temporal variations of key regions and spatial variations of the global area, facilitating the extraction of rPPG feature information while mitigating the impact of background noise variations. Heart rate estimation results on the popular rPPG dataset show that PulseFormer achieves state-of-the-art performance on public datasets. Additionally, we establish a dataset containing facial expressions and synchronized physiological signals in driving scenarios and test the pre-trained model from the public dataset on this collected dataset. The results indicate that PulseFormer exhibits strong generalization capabilities across different data distributions in cross-scenario settings. Therefore, this model is applicable for heart rate estimation of individuals in various scenarios.
Collapse
Affiliation(s)
- Guoliang Xiang
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Song Yao
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Yong Peng
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China.
| | - Hanwen Deng
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Xianhui Wu
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Kui Wang
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Yingli Li
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| | - Fan Wu
- Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha, 410075, China
| |
Collapse
|
2
|
Zhu F, Niu Q, Li X, Zhao Q, Su H, Shuai J. FM-FCN: A Neural Network with Filtering Modules for Accurate Vital Signs Extraction. RESEARCH (WASHINGTON, D.C.) 2024; 7:0361. [PMID: 38737196 PMCID: PMC11082448 DOI: 10.34133/research.0361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/01/2024] [Indexed: 05/14/2024]
Abstract
Neural networks excel at capturing local spatial patterns through convolutional modules, but they may struggle to identify and effectively utilize the morphological and amplitude periodic nature of physiological signals. In this work, we propose a novel network named filtering module fully convolutional network (FM-FCN), which fuses traditional filtering techniques with neural networks to amplify physiological signals and suppress noise. First, instead of using a fully connected layer, we use an FCN to preserve the time-dimensional correlation information of physiological signals, enabling multiple cycles of signals in the network and providing a basis for signal processing. Second, we introduce the FM as a network module that adapts to eliminate unwanted interference, leveraging the structure of the filter. This approach builds a bridge between deep learning and signal processing methodologies. Finally, we evaluate the performance of FM-FCN using remote photoplethysmography. Experimental results demonstrate that FM-FCN outperforms the second-ranked method in terms of both blood volume pulse (BVP) signal and heart rate (HR) accuracy. It substantially improves the quality of BVP waveform reconstruction, with a decrease of 20.23% in mean absolute error (MAE) and an increase of 79.95% in signal-to-noise ratio (SNR). Regarding HR estimation accuracy, FM-FCN achieves a decrease of 35.85% in MAE, 29.65% in error standard deviation, and 32.88% decrease in 95% limits of agreement width, meeting clinical standards for HR accuracy requirements. The results highlight its potential in improving the accuracy and reliability of vital sign measurement through high-quality BVP signal extraction. The codes and datasets are available online at https://github.com/zhaoqi106/FM-FCN.
Collapse
Affiliation(s)
- Fangfang Zhu
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network,
Xiamen University, Xiamen 361005, China
| | - Qichao Niu
- Vitalsilicon Technology Co. Ltd., Jiaxing, Zhejiang 314006, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Qi Zhao
- School of Computer Science and Software Engineering,
University of Science and Technology Liaoning, Anshan 114051, China
| | - Honghong Su
- Yangtze Delta Region Institute of Tsinghua University, Zhejiang, Jiaxing 314006, China
| | - Jianwei Shuai
- Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou 325001, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China
| |
Collapse
|
3
|
Fontes L, Machado P, Vinkemeier D, Yahaya S, Bird JJ, Ihianle IK. Enhancing Stress Detection: A Comprehensive Approach through rPPG Analysis and Deep Learning Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:1096. [PMID: 38400254 PMCID: PMC10892284 DOI: 10.3390/s24041096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 01/29/2024] [Accepted: 02/02/2024] [Indexed: 02/25/2024]
Abstract
Stress has emerged as a major concern in modern society, significantly impacting human health and well-being. Statistical evidence underscores the extensive social influence of stress, especially in terms of work-related stress and associated healthcare costs. This paper addresses the critical need for accurate stress detection, emphasising its far-reaching effects on health and social dynamics. Focusing on remote stress monitoring, it proposes an efficient deep learning approach for stress detection from facial videos. In contrast to the research on wearable devices, this paper proposes novel Hybrid Deep Learning (DL) networks for stress detection based on remote photoplethysmography (rPPG), employing (Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), 1D Convolutional Neural Network (1D-CNN)) models with hyperparameter optimisation and augmentation techniques to enhance performance. The proposed approach yields a substantial improvement in accuracy and efficiency in stress detection, achieving up to 95.83% accuracy with the UBFC-Phys dataset while maintaining excellent computational efficiency. The experimental results demonstrate the effectiveness of the proposed Hybrid DL models for rPPG-based-stress detection.
Collapse
Affiliation(s)
| | | | | | | | | | - Isibor Kennedy Ihianle
- Department of Computer Science, Nottingham Trent University, Nottingham NG1 4FQ, UK; (L.F.); (P.M.); (D.V.); (S.Y.); (J.J.B.)
| |
Collapse
|
4
|
Lee S, Lee M, Sim JY. DSE-NN: Deeply Supervised Efficient Neural Network for Real-Time Remote Photoplethysmography. Bioengineering (Basel) 2023; 10:1428. [PMID: 38136019 PMCID: PMC10740871 DOI: 10.3390/bioengineering10121428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Non-contact remote photoplethysmography can be used in a variety of medical and healthcare fields by measuring vital signs continuously and unobtrusively. Recently, end-to-end deep learning methods have been proposed to replace the existing handcrafted features. However, since the existing deep learning methods are known as black box models, the problem of interpretability has been raised, and the same problem exists in the remote photoplethysmography (rPPG) network. In this study, we propose a method to visualize temporal and spectral representations for hidden layers, deeply supervise the spectral representation of intermediate layers through the depth of networks and optimize it for a lightweight model. The optimized network improves performance and enables fast training and inference times. The proposed spectral deep supervision helps to achieve not only high performance but also fast convergence speed through the regularization of the intermediate layers. The effect of the proposed methods was confirmed through a thorough ablation study on public datasets. As a result, similar or outperforming results were obtained in comparison to state-of-the-art models. In particular, our model achieved an RMSE of 1 bpm on the PURE dataset, demonstrating its high accuracy. Moreover, it excelled on the V4V dataset with an impressive RMSE of 6.65 bpm, outperforming other methods. We observe that our model began converging from the very first epoch, a significant improvement over other models in terms of learning efficiency. Our approach is expected to be generally applicable to models that learn spectral domain information as well as to the applications of regression that require the representations of periodicity.
Collapse
Affiliation(s)
| | | | - Joo Yong Sim
- Department of Mechanical Systems Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea; (S.L.); (M.L.)
| |
Collapse
|
5
|
Casado CA, Lopez MB. Face2PPG: An Unsupervised Pipeline for Blood Volume Pulse Extraction From Faces. IEEE J Biomed Health Inform 2023; 27:5530-5541. [PMID: 37610907 DOI: 10.1109/jbhi.2023.3307942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Photoplethysmography (PPG) signals have become a key technology in many fields, such as medicine, well-being, or sports. Our work proposes a set of pipelines to extract remote PPG signals (rPPG) from the face robustly, reliably, and configurably. We identify and evaluate the possible choices in the critical steps of unsupervised rPPG methodologies. We assess a state-of-the-art processing pipeline in six different datasets, incorporating important corrections in the methodology that ensure reproducible and fair comparisons. In addition, we extend the pipeline by proposing three novel ideas; 1) a new method to stabilize the detected face based on a rigid mesh normalization; 2) a new method to dynamically select the different regions in the face that provide the best raw signals, and 3) a new RGB to rPPG transformation method, called Orthogonal Matrix Image Transformation (OMIT) based on QR decomposition, that increases robustness against compression artifacts. We show that all three changes introduce noticeable improvements in retrieving rPPG signals from faces, obtaining state-of-the-art results compared with unsupervised, non-learning-based methodologies and, in some databases, very close to supervised, learning-based methods. We perform a comparative study to quantify the contribution of each proposed idea. In addition, we depict a series of observations that could help in future implementations.
Collapse
|
6
|
Lin B, Tao J, Xu J, He L, Liu N, Zhang X. Estimation of vital signs from facial videos via video magnification and deep learning. iScience 2023; 26:107845. [PMID: 37790274 PMCID: PMC10542939 DOI: 10.1016/j.isci.2023.107845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/27/2023] [Accepted: 09/05/2023] [Indexed: 10/05/2023] Open
Abstract
The continuous monitoring of vital signs is one of the hottest topics in healthcare. Recent technological advances in sensors, signal processing, and image processing spawned the development of no-contact techniques such as remote photoplethysmography (rPPG). To solve the common problems of rPPG including weak extracted signals, body movements, and generalization with limited data resources, we proposed a dual-path estimation method based on video magnification and deep learning. First, image processes are applied to detect, track, and magnificate facial ROIs automatically. Then, the steady part of the wave of each processed ROI is used for the extraction of features including heart rate, PTT, and features of pulse wave waveform. The blood pressures are estimated from the features via a small CNN. Results comply with the current standard and promise potential clinical applications in the future.
Collapse
Affiliation(s)
- Bin Lin
- Key Laboratory of Opto-Electronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Jing Tao
- Key Laboratory of Opto-Electronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Jingjing Xu
- Key Laboratory of Opto-Electronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Liang He
- Key Laboratory of Opto-Electronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Nenrong Liu
- Fujian Provincial Key Laboratory of Quantum Manipulation and New Energy Materials, Fujian Provincial Collaborative Innovation Center for Advanced High-Field Superconducting Materials and Engineering, College of Physics and Energy, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Xianzeng Zhang
- Key Laboratory of Opto-Electronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou, Fujian 350117, China
| |
Collapse
|
7
|
Zhang Q, Lin X, Zhang Y, Liu Q, Cai F. Non-contact high precision pulse-rate monitoring system for moving subjects in different motion states. Med Biol Eng Comput 2023; 61:2769-2783. [PMID: 37474842 DOI: 10.1007/s11517-023-02884-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 07/03/2023] [Indexed: 07/22/2023]
Abstract
Remote photoplethysmography (rPPG) enables contact-free monitoring of the pulse rate by using a color camera. The fundamental limitation is that motion artifacts and changes in ambient light conditions greatly affect the accuracy of pulse-rate monitoring. We propose use of a high-speed camera and a motion suppression algorithm with high computational efficiency. This system incorporates a number of major improvements including reproduction of pulse wave details, high-precision pulse-rate monitoring of moving subjects, and excellent scene scalability. A series of quantization methods were used to evaluate the effect of different frame rates and different algorithms in pulse-rate monitoring of moving subjects. The experimental results show that use of 180-fps video and a Plane-Orthogonal-to-Skin (POS) algorithm can produce high-precision pulse-rate monitoring results with mean absolute error can be less than 5 bpm and the relative accuracy reaching 94.5%. Thus, it has significant potential to improve personal health care and intelligent health monitoring.
Collapse
Affiliation(s)
- Qing Zhang
- School of Biomedical Engineering, Hainan University, Haikou, 570228, Hainan, China
| | - Xingsen Lin
- School of Biomedical Engineering, Hainan University, Haikou, 570228, Hainan, China
| | - Yuxin Zhang
- School of Biomedical Engineering, Hainan University, Haikou, 570228, Hainan, China
| | - Qian Liu
- School of Biomedical Engineering, Hainan University, Haikou, 570228, Hainan, China
| | - Fuhong Cai
- School of Biomedical Engineering, Hainan University, Haikou, 570228, Hainan, China.
| |
Collapse
|
8
|
Hino Y, Ashida K, Ogawa-Ochiai K, Tsumura N. Noise-Robust Pulse Wave Estimation from Near-Infrared Face Video Images Using the Wiener Estimation Method. J Imaging 2023; 9:202. [PMID: 37888309 PMCID: PMC10607892 DOI: 10.3390/jimaging9100202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/28/2023] Open
Abstract
In this paper, we propose a noise-robust pulse wave estimation method from near-infrared face video images. Pulse wave estimation in a near-infrared environment is expected to be applied to non-contact monitoring in dark areas. The conventional method cannot consider noise when performing estimation. As a result, the accuracy of pulse wave estimation in noisy environments is not very high. This may adversely affect the accuracy of heart rate data and other data obtained from pulse wave signals. Therefore, the objective of this study is to perform pulse wave estimation robust to noise. The Wiener estimation method, which is a simple linear computation that can consider noise, was used in this study. Experimental results showed that the combination of the proposed method and signal processing (detrending and bandpass filtering) increased the SNR (signal to noise ratio) by more than 2.5 dB compared to the conventional method and signal processing. The correlation coefficient between the pulse wave signal measured using a pulse wave meter and the estimated pulse wave signal was 0.30 larger on average for the proposed method. Furthermore, the AER (absolute error rate) between the heart rate measured with the pulse wave meter was 0.82% on average for the proposed method, which was lower than the value of the conventional method (12.53% on average). These results show that the proposed method is more robust to noise than the conventional method for pulse wave estimation.
Collapse
Affiliation(s)
- Yuta Hino
- Graduate School of Science and Engineering, Chiba University, Chiba 263-8522, Japan (N.T.)
| | - Koichi Ashida
- Graduate School of Science and Engineering, Chiba University, Chiba 263-8522, Japan (N.T.)
| | - Keiko Ogawa-Ochiai
- Kampo Clinical Center, Hiroshima University Hospital, Hiroshima 734-8511, Japan;
| | - Norimichi Tsumura
- Graduate School of Science and Engineering, Chiba University, Chiba 263-8522, Japan (N.T.)
- Kampo Clinical Center, Hiroshima University Hospital, Hiroshima 734-8511, Japan;
| |
Collapse
|
9
|
Hu M, Wu X, Wang X, Xing Y, An N, Shi P. Contactless blood oxygen estimation from face videos: A multi-model fusion method based on deep learning. Biomed Signal Process Control 2023; 81:104487. [PMID: 36530216 PMCID: PMC9735266 DOI: 10.1016/j.bspc.2022.104487] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/13/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022]
Abstract
Blood Oxygen ( SpO 2 ), a key indicator of respiratory function, has received increasing attention during the COVID-19 pandemic. Clinical results show that patients with COVID-19 likely have distinct lower SpO 2 before the onset of significant symptoms. Aiming at the shortcomings of current methods for monitoring SpO 2 by face videos, this paper proposes a novel multi-model fusion method based on deep learning for SpO 2 estimation. The method includes the feature extraction network named Residuals and Coordinate Attention (RCA) and the multi-model fusion SpO 2 estimation module. The RCA network uses the residual block cascade and coordinate attention mechanism to focus on the correlation between feature channels and the location information of feature space. The multi-model fusion module includes the Color Channel Model (CCM) and the Network-Based Model(NBM). To fully use the color feature information in face videos, an image generator is constructed in the CCM to calculate SpO 2 by reconstructing the red and blue channel signals. Besides, to reduce the disturbance of other physiological signals, a novel two-part loss function is designed in the NBM. Given the complementarity of the features and models that CCM and NBM focus on, a Multi-Model Fusion Model(MMFM) is constructed. The experimental results on the PURE and VIPL-HR datasets show that three models meet the clinical requirement(the mean absolute error ⩽ 2%) and demonstrate that the multi-model fusion can fully exploit the SpO 2 features of face videos and improve the SpO 2 estimation performance. Our research achievements will facilitate applications in remote medicine and home health.
Collapse
Affiliation(s)
- Min Hu
- Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China
| | - Xia Wu
- Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China
| | - Xiaohua Wang
- Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China
| | - Yan Xing
- School of Mathematics, Hefei University of Technology, Hefei, Anhui 230601, China
| | - Ning An
- Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China
- National Smart Eldercare International S&T Cooperation Base, Hefei University of Technology, Hefei, Anhui 230601, China
| | - Piao Shi
- Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education,Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, Anhui 230601, China
| |
Collapse
|
10
|
Jaiswal KB, Meenpal T. Heart rate estimation network from facial videos using spatiotemporal feature image. Comput Biol Med 2022; 151:106307. [PMID: 36403356 PMCID: PMC9671618 DOI: 10.1016/j.compbiomed.2022.106307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 11/05/2022] [Accepted: 11/06/2022] [Indexed: 11/10/2022]
Abstract
Remote health monitoring has become quite inevitable after SARS-CoV-2 pandemic and continues to be accepted as a measure of healthcare in future too. However, contact-less measurement of vital sign, like Heart Rate(HR) is quite difficult to measure because, the amplitude of physiological signal is very weak and can be easily degraded due to noise. The various sources of noise are head movements, variation in illumination or acquisition devices. In this paper, a video-based noise-less cardiopulmonary measurement is proposed. 3D videos are converted to 2D Spatio-Temporal Images (STI), which suppresses noise while preserving temporal information of Remote Photoplethysmography(rPPG) signal. The proposed model projects a new motion representation to CNN derived using wavelets, which enables estimation of HR under heterogeneous lighting condition and continuous motion. STI is formed by the concatenation of feature vectors obtained after wavelet decomposition of subsequent frames. STI is provided as input to CNN for mapping the corresponding HR values. The proposed approach utilizes the ability of CNN to visualize patterns. Proposed approach yields better results in terms of estimation of HR on four benchmark dataset such as MAHNOB-HCI, MMSE-HR, UBFC-rPPG and VIPL-HR.
Collapse
|
11
|
Kiddle A, Barham H, Wegerif S, Petronzio C. Dynamic region of interest selection in remote photoplethysmography: proof of principle (Preprint). JMIR Form Res 2022; 7:e44575. [PMID: 36995742 PMCID: PMC10131655 DOI: 10.2196/44575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Remote photoplethysmography (rPPG) can record vital signs (VSs) by detecting subtle changes in the light reflected from the skin. Lifelight (Xim Ltd) is a novel software being developed as a medical device for the contactless measurement of VSs using rPPG via integral cameras on smart devices. Research to date has focused on extracting the pulsatile VS from the raw signal, which can be influenced by factors such as ambient light, skin thickness, facial movements, and skin tone. OBJECTIVE This preliminary proof-of-concept study outlines a dynamic approach to rPPG signal processing wherein green channel signals from the most relevant areas of the face (the midface, comprising the cheeks, nose, and top of the lip) are optimized for each subject using tiling and aggregation (T&A) algorithms. METHODS High-resolution 60-second videos were recorded during the VISION-MD study. The midface was divided into 62 tiles of 20×20 pixels, and the signals from multiple tiles were evaluated using bespoke algorithms through weighting according to signal-to-noise ratio in the frequency domain (SNR-F) score or segmentation. Midface signals before and after T&A were categorized by a trained observer blinded to the data processing as 0 (high quality, suitable for algorithm training), 1 (suitable for algorithm testing), or 2 (inadequate quality). On secondary analysis, observer categories were compared for signals predicted to improve categories following T&A based on the SNR-F score. Observer ratings and SNR-F scores were also compared before and after T&A for Fitzpatrick skin tones 5 and 6, wherein rPPG is hampered by light absorption by melanin. RESULTS The analysis used 4310 videos recorded from 1315 participants. Category 2 and 1 signals had lower mean SNR-F scores than category 0 signals. T&A improved the mean SNR-F score using all algorithms. Depending on the algorithm, 18% (763/4212) to 31% (1306/4212) of signals improved by at least one category, with up to 10% (438/4212) improving into category 0, and 67% (2834/4212) to 79% (3337/4212) remaining in the same category. Importantly, 9% (396/4212) to 21% (875/4212) improved from category 2 (not usable) into category 1. All algorithms showed improvements. No more than 3% (137/4212) of signals were assigned to a lower-quality category following T&A. On secondary analysis, 62% of signals (32/52) were recategorized, as predicted from the SNR-F score. T&A improved SNR-F scores in darker skin tones; 41% of signals (151/369) improved from category 2 to 1 and 12% (44/369) from category 1 to 0. CONCLUSIONS The T&A approach to dynamic region of interest selection improved signal quality, including in dark skin tones. The method was verified by comparison with a trained observer's rating. T&A could overcome factors that compromise whole-face rPPG. This method's performance in estimating VS is currently being assessed. TRIAL REGISTRATION ClinicalTrials.gov NCT04763746; https://clinicaltrials.gov/ct2/show/NCT04763746.
Collapse
|
12
|
Li B, Jiang W, Peng J, Li X. Deep learning-based remote-photoplethysmography measurement from short-time facial video. Physiol Meas 2022; 43. [PMID: 36215976 DOI: 10.1088/1361-6579/ac98f1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/10/2022] [Indexed: 02/07/2023]
Abstract
Objective. Efficient non-contact heart rate (HR) measurement from facial video has received much attention in health monitoring. Past methods relied on prior knowledge and an unproven hypothesis to extract remote photoplethysmography (rPPG) signals, e.g. manually designed regions of interest (ROIs) and the skin reflection model.Approach. This paper presents a short-time end to end HR estimation framework based on facial features and temporal relationships of video frames. In the proposed method, a deep 3D multi-scale network with cross-layer residual structure is designed to construct an autoencoder and extract robust rPPG features. Then, a spatial-temporal fusion mechanism is proposed to help the network focus on features related to rPPG signals. Both shallow and fused 3D spatial-temporal features are distilled to suppress redundant information in the complex environment. Finally, a data augmentation strategy is presented to solve the problem of uneven distribution of HR in existing datasets.Main results. The experimental results on four face-rPPG datasets show that our method overperforms the state-of-the-art methods and requires fewer video frames. Compared with the previous best results, the proposed method improves the root mean square error (RMSE) by 5.9%, 3.4% and 21.4% on the OBF dataset (intra-test), COHFACE dataset (intra-test) and UBFC dataset (cross-test), respectively.Significance. Our method achieves good results on diverse datasets (i.e. highly compressed video, low-resolution and illumination variation), demonstrating that our method can extract stable rPPG signals in short time.
Collapse
Affiliation(s)
- Bin Li
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Wei Jiang
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Jinye Peng
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Xiaobai Li
- Center for Machine Vision and Signal Analysis, University of Oulu, Oulu
| |
Collapse
|
13
|
Man PK, Cheung KL, Sangsiri N, Shek WJ, Wong KL, Chin JW, Chan TT, So RHY. Blood Pressure Measurement: From Cuff-Based to Contactless Monitoring. Healthcare (Basel) 2022; 10:healthcare10102113. [PMID: 36292560 PMCID: PMC9601911 DOI: 10.3390/healthcare10102113] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 09/26/2022] [Accepted: 10/02/2022] [Indexed: 11/04/2022] Open
Abstract
Blood pressure (BP) determines whether a person has hypertension and offers implications as to whether he or she could be affected by cardiovascular disease. Cuff-based sphygmomanometers have traditionally provided both accuracy and reliability, but they require bulky equipment and relevant skills to obtain precise measurements. BP measurement from photoplethysmography (PPG) signals has become a promising alternative for convenient and unobtrusive BP monitoring. Moreover, the recent developments in remote photoplethysmography (rPPG) algorithms have enabled new innovations for contactless BP measurement. This paper illustrates the evolution of BP measurement techniques from the biophysical theory, through the development of contact-based BP measurement from PPG signals, and to the modern innovations of contactless BP measurement from rPPG signals. We consolidate knowledge from a diverse background of academic research to highlight the importance of multi-feature analysis for improving measurement accuracy. We conclude with the ongoing challenges, opportunities, and possible future directions in this emerging field of research.
Collapse
Affiliation(s)
- Ping-Kwan Man
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Correspondence:
| | - Kit-Leong Cheung
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Nawapon Sangsiri
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Wilfred Jin Shek
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Biomedical Sciences, King’s College London, London WC2R 2LS, UK
| | - Kwan-Long Wong
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Jing-Wei Chin
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Tsz-Tai Chan
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Richard Hau-Yue So
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
14
|
Jaiswal KB, Meenpal T. rPPG-FuseNet: Non-contact heart rate estimation from facial video via RGB/MSR signal fusion. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.104002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
15
|
PERSIST: Improving micro-expression spotting using better feature encodings and multi-scale Gaussian TCN. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03553-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
|
16
|
Heart Rate Measurement Based on 3D Central Difference Convolution with Attention Mechanism. SENSORS 2022; 22:s22020688. [PMID: 35062649 PMCID: PMC8781886 DOI: 10.3390/s22020688] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/12/2022] [Accepted: 01/14/2022] [Indexed: 12/04/2022]
Abstract
Remote photoplethysmography (rPPG) is a video-based non-contact heart rate measurement technology. It is a fact that most existing rPPG methods fail to deal with the spatiotemporal features of the video, which is significant for the extraction of the rPPG signal. In this paper, we propose a 3D central difference convolutional network (CDCA-rPPGNet) to measure heart rate, with an attention mechanism to combine spatial and temporal features. First, we crop and stitch the region of interest together through facial landmarks. Next, the high-quality regions of interest are fed to CDCA-rPPGNet based on a central difference convolution, which can enhance the spatiotemporal representation and capture rich relevant time contexts by collecting time difference information. In addition, we integrate the attention module into the neural network, aiming to strengthen the ability of the neural network to extract video channels and spatial features, so as to obtain more accurate rPPG signals. In summary, the three main contributions of this paper are as follows: (1) the proposed network base on central difference convolution could better capture the subtle color changes to recover the rPPG signals; (2) the proposed ROI extraction method provides high-quality input to the network; (3) the attention module is used to strengthen the ability of the network to extract features. Extensive experiments are conducted on two public datasets—the PURE dataset and the UBFC-rPPG dataset. In terms of the experiment results, our proposed method achieves 0.46 MAE (bpm), 0.90 RMSE (bpm) and 0.99 R value of Pearson’s correlation coefficient on the PURE dataset, and 0.60 MAE (bpm), 1.38 RMSE (bpm) and 0.99 R value of Pearson’s correlation coefficient on the UBFC dataset, which proves the effectiveness of our proposed approach.
Collapse
|