1
|
Shao H, Luo L, Qian J, Yan M, Gao S, Yang J. Video-Based Multiphysiological Disentanglement and Remote Robust Estimation for Respiration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8360-8371. [PMID: 39012736 DOI: 10.1109/tnnls.2024.3424772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Remote noncontact respiratory rate estimation by facial visual information has great research significance, providing valuable priors for health monitoring, clinical diagnosis, and anti-fraud. However, existing studies suffer from disturbances in epidermal specular reflections induced by head movements and facial expressions. Furthermore, diffuse reflections of light in the skin-colored subcutaneous tissue caused by multiple time-varying physiological signals independent of breathing are entangled with the intention of the respiratory process, leading to confusion in current research. To address these issues, this article proposes a novel network for natural light video-based remote respiration estimation. Specifically, our model consists of a two-stage architecture that progressively implements vital measurements. The first stage adopts an encoder-decoder structure to recharacterize the facial motion frame differences of the input video based on the gradient binary state of the respiratory signal during inspiration and expiration. Then, the obtained generative mapping, which is disentangled from various time-varying interferences and is only linearly related to the respiratory state, is combined with the facial appearance in the second stage. To further improve the robustness of our algorithm, we design a targeted long-term temporal attention module and embed it between the two stages to enhance the network's ability to model the breathing cycle that occupies ultra many frames and to mine hidden timing change clues. We train and validate the proposed network on a series of publicly available respiration estimation datasets, and the experimental results demonstrate its competitiveness against the state-of-the-art breathing and physiological prediction frameworks.
Collapse
|
2
|
Liu M, Tang J, Chen Y, Li H, Qi J, Li S, Wang K, Gan J, Wang Y, Chen H. Spiking-PhysFormer: Camera-based remote photoplethysmography with parallel spike-driven transformer. Neural Netw 2025; 185:107128. [PMID: 39817982 DOI: 10.1016/j.neunet.2025.107128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 11/12/2024] [Accepted: 01/03/2025] [Indexed: 01/18/2025]
Abstract
Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model's performance. Experiments conducted on four datasets-PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 10.1% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN-based models.
Collapse
Affiliation(s)
| | | | - Yongli Chen
- Beijing Smartchip Microelectronics Technology Co., Ltd, Beijing, China
| | | | | | - Siwei Li
- Tsinghua University, Beijing, China
| | | | - Jie Gan
- Beijing Smartchip Microelectronics Technology Co., Ltd, Beijing, China
| | - Yuntao Wang
- Tsinghua University, Beijing, China; National Key Laboratory of Human Factors Engineering, Beijing, China.
| | - Hong Chen
- Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Wang J, Shan C, Liu Z, Zhou S, Shu M. Physiological Information Preserving Video Compression for rPPG. IEEE J Biomed Health Inform 2025; 29:3563-3575. [PMID: 40030966 DOI: 10.1109/jbhi.2025.3526837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Remote photoplethysmography (rPPG) has recently attracted much attention due to its non-contact measurement convenience and great potential in health care and computer vision applications. Early rPPG studies were mostly developed on self-collected uncompressed video data, which limited their application in scenarios that require long-distance real-time video transmission, and also hindered the generation of large-scale publicly available benchmark datasets. In recent years, with the popularization of high-definition video and the rise of telemedicine, the pressure of storage and real-time video transmission under limited bandwidth have made the compression of rPPG video inevitable. However, video compression can adversely affect rPPG measurements. This is due to the fact that conventional video compression algorithms are not specifically proposed to preserve physiological signals. Based on this, we propose a video compression scheme specifically designed for rPPG application. The proposed approach consists of three main strategies: 1) facial ROI-based computational resource reallocation; 2) rPPG signal preserving bit resource reallocation; and 3) temporal domain up- and down-sampling coding. UBFC-rPPG, ECG-Fitness, and a self-collected dataset are used to evaluate the performance of the proposed method. The results demonstrate that the proposed method can preserve almost all physiological information after compressing the original video to 1/60 of its original size. The proposed method is expected to promote the development of telemedicine and deep learning techniques relying on large-scale datasets in the field of rPPG measurement.
Collapse
|
4
|
Yan W, Zhuang J, Chen Y, Zhang Y, Zheng X. MFF-Net: A Lightweight Multi-Frequency Network for Measuring Heart Rhythm from Facial Videos. SENSORS (BASEL, SWITZERLAND) 2024; 24:7937. [PMID: 39771677 PMCID: PMC11679567 DOI: 10.3390/s24247937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 12/07/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025]
Abstract
Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios. Hence, we propose a lightweight multi-frequency network named MFF-Net to measure heart rhythm via facial videos in a short time. Firstly, we propose a multi-frequency mode signal fusion (MFF) mechanism, which can separate the characteristics of different modes of the original rPPG signals and send them to a processor with independent parameters, helping the network recover blood volume pulse (BVP) signals accurately under a complex noise environment. In addition, in order to help the network extract the characteristics of different modal signals effectively, we designed a temporal multiscale convolution module (TMSC-module) and spectrum self-attention module (SSA-module). The TMSC-module can expand the receptive field of the signal-refining network, obtain more abundant multiscale information, and transmit it to the signal reconstruction network. The SSA-module can help a signal reconstruction network locate the obvious inferior parts in the reconstruction process so as to make better decisions when merging multi-dimensional signals. Finally, in order to solve the over-fitting phenomenon that easily occurs in the network, we propose an over-fitting sampling training scheme to further improve the fitting ability of the network. Comprehensive experiments were conducted on three benchmark datasets, and we estimated HR and HRV based on the BVP signals derived by MFF-Net. Compared with state-of-the-art methods, our approach achieves better performance both on HR and HRV estimation with lower computational burden. We can conclude that the proposed MFF-Net has the opportunity to be applied in many real-world scenarios.
Collapse
Affiliation(s)
- Wenqin Yan
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| | - Jialiang Zhuang
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
| | - Yuheng Chen
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| | - Yun Zhang
- School of Information Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Xiujuan Zheng
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China; (W.Y.); (J.Z.); (Y.C.)
- Key Laboratory of Information and Automation Technology of Sichuan Province, Chengdu 610065, China
| |
Collapse
|
5
|
Wang J, Wei X, Lu H, Chen Y, He D. ConDiff-rPPG: Robust Remote Physiological Measurement to Heterogeneous Occlusions. IEEE J Biomed Health Inform 2024; 28:7090-7102. [PMID: 39052463 DOI: 10.1109/jbhi.2024.3433461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Remote photoplethysmography (rPPG) is a contactless technique that facilitates the measurement of physiological signals and cardiac activities through facial video recordings. This approach holds tremendous potential for various applications. However, existing rPPG methods often did not account for different types of occlusions that commonly occur in real-world scenarios, such as temporary movement or actions of humans in videos or dust on camera. The failure to address these occlusions can compromise the accuracy of rPPG algorithms. To address this issue, we proposed a novel Condiff-rPPG to improve the robustness of rPPG measurement facing various occlusions. First, we compressed the damaged face video into a spatio-temporal representation with several types of masks. Second, the diffusion model was designed to recover the missing information with observed values as a condition. Moreover, a novel low-rank decomposition regularization was proposed to eliminate background noise and maximize informative features. ConDiff-rPPG ensured consistency in optimization goals during the training process. Through extensive experiments, including intra- and cross-dataset evaluations, as well as ablation tests, we demonstrated the robustness and generalization ability of our proposed model.
Collapse
|
6
|
Zhang L, Ren J, Zhao S, Wu P. MDAR: A Multiscale Features-Based Network for Remotely Measuring Human Heart Rate Utilizing Dual-Branch Architecture and Alternating Frame Shifts in Facial Videos. SENSORS (BASEL, SWITZERLAND) 2024; 24:6791. [PMID: 39517688 PMCID: PMC11548444 DOI: 10.3390/s24216791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/17/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Remote photoplethysmography (rPPG) refers to a non-contact technique that measures heart rate through analyzing the subtle signal changes of facial blood flow captured by video sensors. It is widely used in contactless medical monitoring, remote health management, and activity monitoring, providing a more convenient and non-invasive way to monitor heart health. However, factors such as ambient light variations, facial movements, and differences in light absorption and reflection pose challenges to deep learning-based methods. To solve these difficulties, we put forward a measurement network of heart rate based on multiscale features. In this study, we designed and implemented a dual-branch signal processing framework that combines static and dynamic features, proposing a novel and efficient method for feature fusion, enhancing the robustness and reliability of the signal. Furthermore, we proposed an alternate time-shift module to enhance the model's temporal depth. To integrate the features extracted at different scales, we utilized a multiscale feature fusion method, enabling the model to accurately capture subtle changes in blood flow. We conducted cross-validation on three public datasets: UBFC-rPPG, PURE, and MMPD. The results demonstrate that MDAR not only ensures fast inference speed but also significantly improves performance. The two main indicators, MAE and MAPE, achieved improvements of at least 30.6% and 30.2%, respectively, surpassing state-of-the-art methods. These conclusions highlight the potential advantages of MDAR for practical applications.
Collapse
Affiliation(s)
- Linhua Zhang
- Department of Computer Engineering, Taiyuan Institute of Technology, Taiyuan 030008, China;
- School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China;
| | - Jinchang Ren
- School of Computing, Engineering and Technology, Robert Gordon University, Aberdeen AB10 7QB, UK;
| | - Shuang Zhao
- School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China;
| | - Peng Wu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
7
|
Li J, Peng J. End-to-End Multimodal Emotion Recognition Based on Facial Expressions and Remote Photoplethysmography Signals. IEEE J Biomed Health Inform 2024; 28:6054-6063. [PMID: 39024092 DOI: 10.1109/jbhi.2024.3430310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Emotion is a complex physiological phenomenon, and a single modality may be insufficient for accurately determining human emotional states. This paper proposes an end-to-end multimodal emotion recognition method based on facial expressions and non-contact physiological signals. Facial expression features and remote photoplethysmography (rPPG) signals are extracted from facial video data, and a transformer-based cross-modal attention mechanism (TCMA) is used to learn the correlation between the two modalities. The results show that the accuracy of emotion recognition can be slightly improved by combining facial expressions with accurate rPPG signals. The performance is further improved with the use of TCMA, for which the binary classification accuracy of valence and arousal is 91.11% and 90.00%, respectively. Additionally, when experiments are conducted using the whole dataset, an increased accuracy of 7.31% and 4.23% for the binary classification of valence and arousal, and an improved accuracy of 5.36% for the four classifications of valence-arousal are achieved when TCMA is used in modal fusion, compared to using only facial expression modality, which fully demonstrates the effectiveness and robustness of TCMA. This method makes it possible to realize multimodal emotion recognition of facial expressions and contactless physiological signals in reality.
Collapse
|
8
|
Nguyen N, Nguyen L, Li H, Bordallo López M, Álvarez Casado C. Evaluation of video-based rPPG in challenging environments: Artifact mitigation and network resilience. Comput Biol Med 2024; 179:108873. [PMID: 39053334 DOI: 10.1016/j.compbiomed.2024.108873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 07/05/2024] [Accepted: 07/08/2024] [Indexed: 07/27/2024]
Abstract
Video-based remote photoplethysmography (rPPG) has emerged as a promising technology for non-contact vital sign monitoring, especially under controlled conditions. However, the accurate measurement of vital signs in real-world scenarios faces several challenges, including artifacts induced by videocodecs, low-light noise, degradation, low dynamic range, occlusions, and hardware and network constraints. In this article, a systematic and comprehensive investigation of these issues is conducted, measuring their detrimental effects on the quality of rPPG measurements. Additionally, practical strategies are proposed for mitigating these challenges to improve the dependability and resilience of video-based rPPG systems. Methods for effective biosignal recovery in the presence of network limitations are detailed, along with denoising and inpainting techniques aimed at preserving video frame integrity. Compared to previous studies, this paper addresses a broader range of variables and demonstrates improved accuracy across various rPPG methods, emphasizing generalizability for practical applications in diverse scenarios with varying data quality. Extensive evaluations and direct comparisons demonstrate the effectiveness of these approaches in enhancing rPPG measurements under challenging environments, contributing to the development of more reliable and effective remote vital sign monitoring technologies.
Collapse
Affiliation(s)
- Nhi Nguyen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland.
| | - Le Nguyen
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland.
| | - Honghan Li
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland; Division of Bioengineering, Graduate School of Engineering Science, Osaka University, Osaka, Japan.
| | - Miguel Bordallo López
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland; VTT Technical Research Center of Finland Ltd., Oulu, Finland.
| | | |
Collapse
|
9
|
Cao M, Cheng X, Liu X, Jiang Y, Yu H, Shi J. ST-Phys: Unsupervised Spatio-Temporal Contrastive Remote Physiological Measurement. IEEE J Biomed Health Inform 2024; 28:4613-4624. [PMID: 38743531 DOI: 10.1109/jbhi.2024.3400869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Remote photoplethysmography (rPPG) is a non-contact method that employs facial videos for measuring physiological parameters. Existing rPPG methods have achieved remarkable performance. However, the success mainly profits from supervised learning over massive labeled data. On the other hand, existing unsupervised rPPG methods fail to fully utilize spatio-temporal features and encounter challenges in low-light or noise environments. To address these problems, we propose an unsupervised contrast learning approach, ST-Phys. We incorporate a low-light enhancement module, a temporal dilated module, and a spatial enhanced module to better deal with long-term dependencies under the random low-light conditions. In addition, we design a circular margin loss, wherein rPPG signals originating from identical videos are attracted, while those from distinct videos are repelled. Our method is assessed on six openly accessible datasets, including RGB and NIR videos. Extensive experiments reveal the superior performance of our proposed ST-Phys over state-of-the-art unsupervised rPPG methods. Moreover, it offers advantages in parameter reduction and noise robustness.
Collapse
|
10
|
Chen W, Yi Z, Lim LJR, Lim RQR, Zhang A, Qian Z, Huang J, He J, Liu B. Deep learning and remote photoplethysmography powered advancements in contactless physiological measurement. Front Bioeng Biotechnol 2024; 12:1420100. [PMID: 39104628 PMCID: PMC11298756 DOI: 10.3389/fbioe.2024.1420100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 06/27/2024] [Indexed: 08/07/2024] Open
Abstract
In recent decades, there has been ongoing development in the application of computer vision (CV) in the medical field. As conventional contact-based physiological measurement techniques often restrict a patient's mobility in the clinical environment, the ability to achieve continuous, comfortable and convenient monitoring is thus a topic of interest to researchers. One type of CV application is remote imaging photoplethysmography (rPPG), which can predict vital signs using a video or image. While contactless physiological measurement techniques have an excellent application prospect, the lack of uniformity or standardization of contactless vital monitoring methods limits their application in remote healthcare/telehealth settings. Several methods have been developed to improve this limitation and solve the heterogeneity of video signals caused by movement, lighting, and equipment. The fundamental algorithms include traditional algorithms with optimization and developing deep learning (DL) algorithms. This article aims to provide an in-depth review of current Artificial Intelligence (AI) methods using CV and DL in contactless physiological measurement and a comprehensive summary of the latest development of contactless measurement techniques for skin perfusion, respiratory rate, blood oxygen saturation, heart rate, heart rate variability, and blood pressure.
Collapse
Affiliation(s)
- Wei Chen
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Zhe Yi
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Lincoln Jian Rong Lim
- Department of Medical Imaging, Western Health, Footscray Hospital, Footscray, VIC, Australia
- Department of Surgery, The University of Melbourne, Melbourne, VIC, Australia
| | - Rebecca Qian Ru Lim
- Department of Hand & Reconstructive Microsurgery, Singapore General Hospital, Singapore, Singapore
| | - Aijie Zhang
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Zhen Qian
- Institute of Intelligent Diagnostics, Beijing United-Imaging Research Institute of Intelligent Imaging, Beijing, China
| | - Jiaxing Huang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jia He
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Bo Liu
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
- Beijing Research Institute of Traumatology and Orthopaedics, Beijing, China
| |
Collapse
|
11
|
Cheng CH, Yuen Z, Chen S, Wong KL, Chin JW, Chan TT, So RHY. Contactless Blood Oxygen Saturation Estimation from Facial Videos Using Deep Learning. Bioengineering (Basel) 2024; 11:251. [PMID: 38534525 DOI: 10.3390/bioengineering11030251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 02/26/2024] [Accepted: 03/02/2024] [Indexed: 03/28/2024] Open
Abstract
Blood oxygen saturation (SpO2) is an essential physiological parameter for evaluating a person's health. While conventional SpO2 measurement devices like pulse oximeters require skin contact, advanced computer vision technology can enable remote SpO2 monitoring through a regular camera without skin contact. In this paper, we propose novel deep learning models to measure SpO2 remotely from facial videos and evaluate them using a public benchmark database, VIPL-HR. We utilize a spatial-temporal representation to encode SpO2 information recorded by conventional RGB cameras and directly pass it into selected convolutional neural networks to predict SpO2. The best deep learning model achieves 1.274% in mean absolute error and 1.71% in root mean squared error, which exceed the international standard of 4% for an approved pulse oximeter. Our results significantly outperform the conventional analytical Ratio of Ratios model for contactless SpO2 measurement. Results of sensitivity analyses of the influence of spatial-temporal representation color spaces, subject scenarios, acquisition devices, and SpO2 ranges on the model performance are reported with explainability analyses to provide more insights for this emerging research field.
Collapse
Affiliation(s)
- Chun-Hong Cheng
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK
| | - Zhikun Yuen
- Department of Computer Science, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Shutao Chen
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
| | - Kwan-Long Wong
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
| | - Jing-Wei Chin
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
| | - Tsz-Tai Chan
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
| | - Richard H Y So
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China
- Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
12
|
Yang Z, Wang H, Liu B, Lu F. cbPPGGAN: A Generic Enhancement Framework for Unpaired Pulse Waveforms in Camera-Based Photoplethysmography. IEEE J Biomed Health Inform 2024; 28:598-608. [PMID: 37695961 DOI: 10.1109/jbhi.2023.3314282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
Camera-based photoplethysmography (cbP PG) is a non-contact technique that measures cardiac-related blood volume alterations in skin surface vessels through the analysis of facial videos. While traditional approaches can estimate heart rate (HR) under different illuminations, their accuracy can be affected by motion artifacts, leading to poor waveform fidelity and hindering further analysis of heart rate variability (HRV); deep learning-based approaches reconstruct high-quality pulse waveform, yet their performance significantly degrades under illumination variations. In this work, we aim to leverage the strength of these two methods and propose a framework that possesses favorable generalization capabilities while maintaining waveform fidelity. For this purpose, we propose the cbPPGGAN, an enhancement framework for cbPPG that enables the flexible incorporation of both unpaired and paired data sources in the training process. Based on the waveforms extracted by traditional approaches, the cbPPGGAN reconstructs high-quality waveforms that enable accurate HR estimation and HRV analysis. In addition, to address the lack of paired training data problems in real-world applications, we propose a cycle consistency loss that guarantees the time-frequency consistency before/after mapping. The method enhances the waveform quality of traditional POS approaches in different illumination tests (BH-rPPG) and cross-datasets (UBFC-rPPG) with mean absolute error (MAE) values of 1.34 bpm and 1.65 bpm, and average beat-to-beat (AVBB) values of 27.46 ms and 45.28 ms, respectively. Experimental results demonstrate that the cbPPGGAN enhances cbPPG signal quality and outperforms the state-of-the-art approaches in HR estimation and HRV analysis. The proposed framework opens a new pathway toward accurate HR estimation in an unconstrained environment.
Collapse
|
13
|
Zhao C, Zhou M, Zhao Z, Huang B, Rao B. Learning Spatio-Temporal Pulse Representation With Global-Local Interaction and Supervision for Remote Prediction of Heart Rate. IEEE J Biomed Health Inform 2024; 28:609-620. [PMID: 37028087 DOI: 10.1109/jbhi.2023.3252091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Recent studies have demonstrated the benefit of extracting and fusing pulse signals from multi-scale region-of-interests (ROIs). However, these methods suffer from heavy computational load. This paper aims to effectively utilize multi-scale rPPG features with a more compact architecture. Inspired by recent research works exploring two-path architecture that leverages global and local information with bidirectional bridge in between. This paper designs a novel architecture Global-Local Interaction and Supervision Network (GLISNet), which uses a local path to learn representations in the original scale and a global path to learn representations in the other scale capturing multi-scale information. A light-weight rPPG signal generation block is attached to the output of each path that maps the pulse representation to the pulse output. A hybrid loss function is utilized enabling the local and global representations to learn directly from the training data. Extensive experiments are conducted on two publicly available datasets, and results demonstrate that GLISNet achieves superior performance in terms of signal-to-noise ratio (SNR), mean absolute error (MAE), and root mean squared error (RMSE). In terms of SNR, GLISNet has an increase of 4.41% compared with the second best algorithm PhysNet on PURE dataset. The MAE has a decrease of 13.16% compared with the second best algorithm DeeprPPG on UBFC-rPPG dataset. The RMSE has a decrease of 26.29% compared with the second best algorithm PhysNet on UBFC-rPPG dataset. Experiments on MIHR dataset demonstrates the robustness of GLISNet under low-light environment.
Collapse
|
14
|
Casado CA, Lopez MB. Face2PPG: An Unsupervised Pipeline for Blood Volume Pulse Extraction From Faces. IEEE J Biomed Health Inform 2023; 27:5530-5541. [PMID: 37610907 DOI: 10.1109/jbhi.2023.3307942] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Photoplethysmography (PPG) signals have become a key technology in many fields, such as medicine, well-being, or sports. Our work proposes a set of pipelines to extract remote PPG signals (rPPG) from the face robustly, reliably, and configurably. We identify and evaluate the possible choices in the critical steps of unsupervised rPPG methodologies. We assess a state-of-the-art processing pipeline in six different datasets, incorporating important corrections in the methodology that ensure reproducible and fair comparisons. In addition, we extend the pipeline by proposing three novel ideas; 1) a new method to stabilize the detected face based on a rigid mesh normalization; 2) a new method to dynamically select the different regions in the face that provide the best raw signals, and 3) a new RGB to rPPG transformation method, called Orthogonal Matrix Image Transformation (OMIT) based on QR decomposition, that increases robustness against compression artifacts. We show that all three changes introduce noticeable improvements in retrieving rPPG signals from faces, obtaining state-of-the-art results compared with unsupervised, non-learning-based methodologies and, in some databases, very close to supervised, learning-based methods. We perform a comparative study to quantify the contribution of each proposed idea. In addition, we depict a series of observations that could help in future implementations.
Collapse
|
15
|
Liu X, Sun Z, Li X, Song R, Yang X. VidBP: Detecting Blood Pressure from Facial Videos with Personalized Calibration. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083294 DOI: 10.1109/embc40787.2023.10340996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Recent studies have found that blood volume pulse (BVP) in facial videos contains features highly correlated to blood pressure (BP). However, the mapping from BVP features to BP varies from person to person. To address this issue, VidBP has been proposed as a BP detector that can be calibrated based on an individual's data. VidBP is pre-trained on a large dataset to extract BP-related features from BVP. Then, BVP samples and BP labels of an individual are fed into the pre-trained VidBP to create a personal dictionary of BP-related features. When estimating the individual's BP, the current BP-related feature is compared to the features saved in the dictionary, and the BP labels of the similar features are considered as the BP estimate. The performance of VidBP was evaluated on 640 samples of 16 subjects, and it demonstrated significantly lower errors in BP estimation compared to state-of-the-art methods. The personalized calibration of VidBP is a significant advantage, enabling it to better capture the unique mapping from BVP features to BP for each individual.Clinical relevance This study reports a feasible method to estimate BP from facial videos, providing a convenient and cost-effective way for home BP monitoring.
Collapse
|
16
|
Yu Z, Qin Y, Li X, Zhao C, Lei Z, Zhao G. Deep Learning for Face Anti-Spoofing: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5609-5631. [PMID: 36260579 DOI: 10.1109/tpami.2022.3215850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs). As more and more realistic PAs with novel types spring up, early-stage FAS methods based on handcrafted features become unreliable due to their limited representation capacity. With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area. However, existing reviews in this field mainly focus on the handcrafted features, which are outdated and uninspiring for the progress of FAS community. In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS. It covers several novel and insightful components: 1) besides supervision with binary label (e.g., '0' for bonafide versus '1' for PAs), we also investigate recent methods with pixel-wise supervision (e.g., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. We conclude this survey by emphasizing current open issues and highlighting potential prospects.
Collapse
|
17
|
Yu Z, Shen Y, Shi J, Zhao H, Cui Y, Zhang J, Torr P, Zhao G. PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer. Int J Comput Vis 2023. [DOI: 10.1007/s11263-023-01758-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
AbstractRemote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.
Collapse
|
18
|
Qin K, Huang W, Zhang T, Tang S. Machine learning and deep learning for blood pressure prediction: a methodological review from multiple perspectives. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
19
|
Zhang X, Yang C, Yin R, Meng L. An End-to-End Heart Rate Estimation Scheme Using Divided Space-Time Attention. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11097-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
20
|
Li B, Jiang W, Peng J, Li X. Deep learning-based remote-photoplethysmography measurement from short-time facial video. Physiol Meas 2022; 43. [PMID: 36215976 DOI: 10.1088/1361-6579/ac98f1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/10/2022] [Indexed: 02/07/2023]
Abstract
Objective. Efficient non-contact heart rate (HR) measurement from facial video has received much attention in health monitoring. Past methods relied on prior knowledge and an unproven hypothesis to extract remote photoplethysmography (rPPG) signals, e.g. manually designed regions of interest (ROIs) and the skin reflection model.Approach. This paper presents a short-time end to end HR estimation framework based on facial features and temporal relationships of video frames. In the proposed method, a deep 3D multi-scale network with cross-layer residual structure is designed to construct an autoencoder and extract robust rPPG features. Then, a spatial-temporal fusion mechanism is proposed to help the network focus on features related to rPPG signals. Both shallow and fused 3D spatial-temporal features are distilled to suppress redundant information in the complex environment. Finally, a data augmentation strategy is presented to solve the problem of uneven distribution of HR in existing datasets.Main results. The experimental results on four face-rPPG datasets show that our method overperforms the state-of-the-art methods and requires fewer video frames. Compared with the previous best results, the proposed method improves the root mean square error (RMSE) by 5.9%, 3.4% and 21.4% on the OBF dataset (intra-test), COHFACE dataset (intra-test) and UBFC dataset (cross-test), respectively.Significance. Our method achieves good results on diverse datasets (i.e. highly compressed video, low-resolution and illumination variation), demonstrating that our method can extract stable rPPG signals in short time.
Collapse
Affiliation(s)
- Bin Li
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Wei Jiang
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Jinye Peng
- School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
| | - Xiaobai Li
- Center for Machine Vision and Signal Analysis, University of Oulu, Oulu
| |
Collapse
|
21
|
Kim DY, Cho SY, Lee K, Sohn CB. A Study of Projection-Based Attentive Spatial-Temporal Map for Remote Photoplethysmography Measurement. Bioengineering (Basel) 2022; 9:638. [PMID: 36354549 PMCID: PMC9687348 DOI: 10.3390/bioengineering9110638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 10/21/2022] [Accepted: 10/28/2022] [Indexed: 08/27/2023] Open
Abstract
The photoplethysmography (PPG) signal contains various information that is related to CVD (cardiovascular disease). The remote PPG (rPPG) is a method that can measure a PPG signal using a face image taken with a camera, without a PPG device. Deep learning-based rPPG methods can be classified into three main categories. First, there is a 3D CNN approach that uses a facial image video as input, which focuses on the spatio-temporal changes in the facial video. The second approach is a method that uses a spatio-temporal map (STMap), and the video image is pre-processed using the point where it is easier to analyze changes in blood flow in time order. The last approach uses a preprocessing model with a dichromatic reflection model. This study proposed the concept of an axis projection network (APNET) that complements the drawbacks, in which the 3D CNN method requires significant memory; the STMap method requires a preprocessing method; and the dyschromatic reflection model (DRM) method does not learn long-term temporal characteristics. We also showed that the proposed APNET effectively reduced the network memory size, and that the low-frequency signal was observed in the inferred PPG signal, suggesting that it can provide meaningful results to the study when developing the rPPG algorithm.
Collapse
Affiliation(s)
- Dae-Yeol Kim
- AI Research Team, Tvstorm, Seoul 13875, Korea
- Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Korea
| | - Soo-Young Cho
- Department of Information Contents, Kwangwoon University, Seoul 01897, Korea
| | | | - Chae-Bong Sohn
- Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Korea
| |
Collapse
|
22
|
Intelligent Remote Photoplethysmography-Based Methods for Heart Rate Estimation from Face Videos: A Survey. INFORMATICS 2022. [DOI: 10.3390/informatics9030057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Over the last few years, a rich amount of research has been conducted on remote vital sign monitoring of the human body. Remote photoplethysmography (rPPG) is a camera-based, unobtrusive technology that allows continuous monitoring of changes in vital signs and thereby helps to diagnose and treat diseases earlier in an effective manner. Recent advances in computer vision and its extensive applications have led to rPPG being in high demand. This paper specifically presents a survey on different remote photoplethysmography methods and investigates all facets of heart rate analysis. We explore the investigation of the challenges of the video-based rPPG method and extend it to the recent advancements in the literature. We discuss the gap within the literature and suggestions for future directions.
Collapse
|
23
|
Hu M, Qian F, Wang X, He L, Guo D, Ren F. Robust Heart Rate Estimation With Spatial–Temporal Attention Network From Facial Videos. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3062370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Min Hu
- School of Computer and Information, Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, China
| | - Fei Qian
- School of Computer and Information, Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, China
| | - Xiaohua Wang
- School of Computer and Information, Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, China
| | - Lei He
- School of Mathematics, Hefei University of Technology, Hefei, China
| | - Dong Guo
- School of Computer and Information, Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, China
| | - Fuji Ren
- Graduate School of Advanced Technology and Science, University of Tokushima, Tokushima, Japan
| |
Collapse
|
24
|
Zheng K, Ci K, Li H, Shao L, Sun G, Liu J, Cui J. Heart rate prediction from facial video with masks using eye location and corrected by convolutional neural networks. Biomed Signal Process Control 2022; 75:103609. [PMID: 35287368 PMCID: PMC8906658 DOI: 10.1016/j.bspc.2022.103609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 02/16/2022] [Accepted: 02/27/2022] [Indexed: 12/24/2022]
Abstract
Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications. The emergence of novel coronavirus pneumonia COVID-19 has attracted worldwide attentions. Contact photoplethysmography (cPPG) methods need to contact the detection equipment with the patient, which may accelerate the spread of the epidemic. In the future, the non-contact heart rate detection will be an urgent need. However, existing heart rate measuring methods from facial videos are vulnerable to the less-constrained scenarios (e.g., with head movement and wearing a mask). In this paper, we proposed a method of heart rate detection based on eye location of region of interest (ROI) to solve the problem of missing information when wearing masks. Besides, a model to filter outliers based on residual network was conceived first by us and the better heart rate measurement accuracy was generated. To validate our method, we also created a mask dataset. The results demonstrated that after using our method for correcting the heart rate (HR) value measured with the traditional method, the accuracy reaches 4.65 bpm, which is 0.42 bpm higher than that without correction.
Collapse
Affiliation(s)
- Kun Zheng
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Kangyi Ci
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Hui Li
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Lei Shao
- Department of Investigation, Sichuan Police College, No.186, Longtouguan Road, Jiangyang District, Luzhou, Sichuan 646000, China
| | - Guangmin Sun
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Junhua Liu
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Jinling Cui
- Faculty of Information Technology, Beijing University of Technology, No.100, Pingleyuan, Chaoyang District, Beijing 100124, China
| |
Collapse
|
25
|
PulseNet: A multitask learning network for remote heart rate estimation. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.108048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
McDuff D, Hernandez J, Liu X, Wood E, Baltrusaitis T. Using High-Fidelity Avatars to Advance Camera-based Cardiac Pulse Measurement. IEEE Trans Biomed Eng 2022; 69:2646-2656. [PMID: 35171764 DOI: 10.1109/tbme.2022.3152070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Non-contact physiological measurement has the potential to provide low-cost, non-invasive health monitoring. However, machine vision approaches are often limited by the availability and diversity of annotated video datasets resulting in poor generalization to complex real-life conditions. To address these challenges, this work proposes the use of synthetic avatars that display facial blood flow changes and allow for systematic generation of samples under a wide variety of conditions. Our results show that training on both simulated and real video data can lead to performance gains under challenging conditions. We show strong performance on three large benchmark datasets and improved robustness to skin type and motion. These results highlight the promise of synthetic data for training camera-based pulse measurement; however, further research and validation is needed to establish whether synthetic data alone could be sufficient for training models.
Collapse
|
27
|
Liu X, Yang X, Wang D, Wong A, Ma L, Li L. VidAF: A Motion-Robust Model for Screening Atrial Fibrillation from Facial Videos. IEEE J Biomed Health Inform 2021; 26:1672-1683. [PMID: 34735349 DOI: 10.1109/jbhi.2021.3124967] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Atrial fibrillation (AF) is the most common arrhythmia, but an estimated 30% of patients with AF are unaware of their conditions. The purpose of this work is to design a model for AF screening from facial videos, with a focus on addressing typical motion disturbances in our real life, such as head movements and expression changes. This model detects a pulse signal from the skin color changes in a facial video by a convolution neural network, incorporating a phase-driven attention mechanism to suppress motion signals in the space domain. It then encodes the pulse signal into discriminative features for AF classification by a coding neural network, using a de-noise coding strategy to improve the robustness of the features to motion signals in the time domain. The proposed model was tested on a dataset containing 1200 samples of 100 AF patients and 100 non-AF subjects. Experimental results demonstrated that VidAF had significant robustness to facial motions, predicting clean pulse signals with the mean absolute error of inter-pulse intervals less than 100 milliseconds. Besides, the model achieved promising performance in AF identification, showing an accuracy of more than 90% in multiple challenging scenarios. VidAF provides a more convenient and cost-effective approach for opportunistic AF screening in the community.
Collapse
|
28
|
Cheng CH, Wong KL, Chin JW, Chan TT, So RHY. Deep Learning Methods for Remote Heart Rate Measurement: A Review and Future Research Agenda. SENSORS (BASEL, SWITZERLAND) 2021; 21:6296. [PMID: 34577503 PMCID: PMC8473186 DOI: 10.3390/s21186296] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/13/2021] [Accepted: 09/16/2021] [Indexed: 01/05/2023]
Abstract
Heart rate (HR) is one of the essential vital signs used to indicate the physiological health of the human body. While traditional HR monitors usually require contact with skin, remote photoplethysmography (rPPG) enables contactless HR monitoring by capturing subtle light changes of skin through a video camera. Given the vast potential of this technology in the future of digital healthcare, remote monitoring of physiological signals has gained significant traction in the research community. In recent years, the success of deep learning (DL) methods for image and video analysis has inspired researchers to apply such techniques to various parts of the remote physiological signal extraction pipeline. In this paper, we discuss several recent advances of DL-based methods specifically for remote HR measurement, categorizing them based on model architecture and application. We further detail relevant real-world applications of remote physiological monitoring and summarize various common resources used to accelerate related research progress. Lastly, we analyze the implications of research findings and discuss research gaps to guide future explorations.
Collapse
Affiliation(s)
- Chun-Hong Cheng
- Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China;
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (J.-W.C.); (T.-T.C.); (R.H.Y.S.)
| | - Kwan-Long Wong
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (J.-W.C.); (T.-T.C.); (R.H.Y.S.)
- Department of Bioengineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Jing-Wei Chin
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (J.-W.C.); (T.-T.C.); (R.H.Y.S.)
- Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Tsz-Tai Chan
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (J.-W.C.); (T.-T.C.); (R.H.Y.S.)
- Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Richard H. Y. So
- PanopticAI, Hong Kong Science and Technology Parks, New Territories, Hong Kong, China; (J.-W.C.); (T.-T.C.); (R.H.Y.S.)
- Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
29
|
Lin JD, Lin HH, Dy J, Chen JC, Tanveer M, Razzak I, Hua KL. Lightweight Face Anti-Spoofing Network for Telehealth Applications. IEEE J Biomed Health Inform 2021; 26:1987-1996. [PMID: 34432642 DOI: 10.1109/jbhi.2021.3107735] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Online healthcare applications have grown more popular over the years. For instance,telehealth is an online healthcare application that allows patients and doctors to schedule consultations,prescribe medication,share medical documents,and monitor health conditions conveniently. Apart from this,telehealth can also be used to store a patients personal and medical information. Given the amount of sensitive data it stores,security measures are necessary. With its rise in usage due to COVID-19,its usefulness may be undermined if security issues are not addressed. A simple way of making these applications more secure is through user authentication. One of the most common and often used authentications is face recognition. It is convenient and easy to use. However,face recognition systems are not foolproof. They are prone to malicious attacks like printed photos,paper cutouts,re-played videos,and 3D masks. In order to counter this,multiple face anti-spoofing methods have been proposed. The goal of face anti-spoofing is to differentiate real users (live) from attackers (spoof). Although effective in terms of performance,existing methods use a significant amount of parameters,making them resource-heavy and unsuitable for handheld devices. Apart from this,they fail to generalize well to new environments like changes in lighting or background. This paper proposes a lightweight face anti-spoofing framework that does not compromise on performance. A lightweight model is critical for applications like telehealth that run on handheld devices. Our proposed method achieves good performance with the help of an ArcFace Classifier (AC). The AC encourages differentiation between spoof and live samples by making clear boundaries between them. With clear boundaries,classification becomes more accurate. We further demonstrate our models capabilities by comparing the number of parameters,FLOPS,and performance with other state-of-the-art methods.
Collapse
|
30
|
Kurihara K, Sugimura D, Hamamoto T. Non-Contact Heart Rate Estimation via Adaptive RGB/NIR Signal Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6528-6543. [PMID: 34260354 DOI: 10.1109/tip.2021.3094739] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We propose a non-contact heart rate (HR) estimation method that is robust to various situations, such as bright, low-light, and varying illumination scenes. We utilize a camera that records red, green, and blue (RGB) and near-infrared (NIR) information to capture the subtle skin color changes induced by the cardiac pulse of a person. The key novelty of our method is the adaptive fusion of RGB and NIR signals for HR estimation based on the analysis of background illumination variations. RGB signals are suitable indicators for HR estimation in bright scenes. Conversely, NIR signals are more reliable than RGB signals in scenes with more complex illumination, as they can be captured independently of the changes in background illumination. By measuring the correlations between the lights reflected from the background and facial regions, we adaptively utilize RGB and NIR observations for HR estimation. The experiments demonstrate the effectiveness of the proposed method.
Collapse
|
31
|
Song R, Chen H, Cheng J, Li C, Liu Y, Chen X. PulseGAN: Learning to Generate Realistic Pulse Waveforms in Remote Photoplethysmography. IEEE J Biomed Health Inform 2021; 25:1373-1384. [PMID: 33434140 DOI: 10.1109/jbhi.2021.3051176] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Remote photoplethysmography (rPPG) is a non-contact technique for measuring cardiac signals from facial videos. High-quality rPPG pulse signals are urgently demanded in many fields, such as health monitoring and emotion recognition. However, most of the existing rPPG methods can only be used to get average heart rate (HR) values due to the limitation of inaccurate pulse signals. In this paper, a new framework based on generative adversarial network, called PulseGAN, is introduced to generate realistic rPPG pulse signals through denoising the chrominance (CHROM) signals. Considering that the cardiac signal is quasi-periodic and has apparent time-frequency characteristics, the error losses defined in time and spectrum domains are both employed with the adversarial loss to enforce the model generating accurate pulse waveforms as its reference. The proposed framework is tested on three public databases. The results show that the PulseGAN framework can effectively improve the waveform quality, thereby enhancing the accuracy of HR, the interbeat interval (IBI) and the related heart rate variability (HRV) features. The proposed method significantly improves the quality of waveforms compared to the input CHROM signals, with the mean absolute error of AVNN (the average of all normal-to-normal intervals) reduced by 41.19%, 40.45%, 41.63%, and the mean absolute error of SDNN (the standard deviation of all NN intervals) reduced by 37.53%, 44.29%, 58.41%, in the cross-database test on the UBFC-RPPG, PURE, and MAHNOB-HCI databases, respectively. This framework can be easily integrated with other existing rPPG methods to further improve the quality of waveforms, thereby obtaining more reliable IBI features and extending the application scope of rPPG techniques.
Collapse
|