1
|
Christakos P, Petrellis N, Mousouliotis P, Keramidas G, Antonopoulos CP, Voros N. A High Performance and Robust FPGA Implementation of a Driver State Monitoring Application. SENSORS (BASEL, SWITZERLAND) 2023; 23:6344. [PMID: 37514638 PMCID: PMC10383104 DOI: 10.3390/s23146344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/03/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023]
Abstract
A high-performance Driver State Monitoring (DSM) application for the detection of driver drowsiness is presented in this paper. The popular Ensemble of Regression Trees (ERTs) machine learning method has been employed for the alignment of 68 facial landmarks. Open-source implementation of ERTs for facial shape alignment has been ported to different platforms and adapted for the acceleration of the frame processing speed using reconfigurable hardware. Reducing the frame processing latency saves time that can be used to apply frame-to-frame facial shape coherency rules. False face detection and false shape estimations can be ignored for higher robustness and accuracy in the operation of the DSM application without sacrificing the frame processing rate that can reach 65 frames per second. The sensitivity and precision in yawning recognition can reach 93% and 97%, respectively. The implementation of the employed DSM algorithm in reconfigurable hardware is challenging since the kernel arguments require large data transfers and the degree of data reuse in the computational kernel is low. Hence, unconventional hardware acceleration techniques have been employed that can also be useful for the acceleration of several other machine learning applications that require large data transfers to their kernels with low reusability.
Collapse
Affiliation(s)
- P Christakos
- Electrical and Computer Engineering, University of Peloponnese, 263 34 Patras, Greece
| | - N Petrellis
- Electrical and Computer Engineering, University of Peloponnese, 263 34 Patras, Greece
| | - P Mousouliotis
- Electrical and Computer Engineering, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece
| | - G Keramidas
- Computer Science, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece
| | - C P Antonopoulos
- Electrical and Computer Engineering, University of Peloponnese, 263 34 Patras, Greece
| | - N Voros
- Electrical and Computer Engineering, University of Peloponnese, 263 34 Patras, Greece
| |
Collapse
|
2
|
Aldridge CM, McDonald MM, Wruble M, Zhuang Y, Uribe O, McMurry TL, Lin I, Pitchford H, Schneider BJ, Dalrymple WA, Carrera JF, Chapman S, Worrall BB, Rohde GK, Southerland AM. Human vs. Machine Learning Based Detection of Facial Weakness Using Video Analysis. Front Neurol 2022; 13:878282. [PMID: 35847210 PMCID: PMC9284117 DOI: 10.3389/fneur.2022.878282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/31/2022] [Indexed: 11/15/2022] Open
Abstract
Background Current EMS stroke screening tools facilitate early detection and triage, but the tools' accuracy and reliability are limited and highly variable. An automated stroke screening tool could improve stroke outcomes by facilitating more accurate prehospital diagnosis and delivery. We hypothesize that a machine learning algorithm using video analysis can detect common signs of stroke. As a proof-of-concept study, we trained a computer algorithm to detect presence and laterality of facial weakness in publically available videos with comparable accuracy, sensitivity, and specificity to paramedics. Methods and Results We curated videos of people with unilateral facial weakness (n = 93) and with a normal smile (n = 96) from publicly available web-based sources. Three board certified vascular neurologists categorized the videos according to the presence or absence of weakness and laterality. Three paramedics independently analyzed each video with a mean accuracy, sensitivity and specificity of 92.6% [95% CI 90.1–94.7%], 87.8% [95% CI 83.9–91.7%] and 99.3% [95% CI 98.2–100%]. Using a 5-fold cross validation scheme, we trained a computer vision algorithm to analyze the same videos producing an accuracy, sensitivity and specificity of 88.9% [95% CI 83.5–93%], 90.3% [95% CI 82.4–95.5%] and 87.5 [95% CI 79.2–93.4%]. Conclusions These preliminary results suggest that a machine learning algorithm using computer vision analysis can detect unilateral facial weakness in pre-recorded videos with an accuracy and sensitivity comparable to trained paramedics. Further research is warranted to pursue the concept of augmented facial weakness detection and external validation of this algorithm in independent data sets and prospective patient encounters.
Collapse
Affiliation(s)
- Chad M. Aldridge
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
- *Correspondence: Chad M. Aldridge
| | - Mark M. McDonald
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Mattia Wruble
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Yan Zhuang
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, United States
| | - Omar Uribe
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Timothy L. McMurry
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| | - Iris Lin
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Haydon Pitchford
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Brett J. Schneider
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - William A. Dalrymple
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Joseph F. Carrera
- Department of Neurology, University of Michigan, Ann Arbor, MI, United States
| | - Sherita Chapman
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
| | - Bradford B. Worrall
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| | - Gustavo K. Rohde
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, United States
| | - Andrew M. Southerland
- Department of Neurology, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
3
|
Wang Y, Huang L, Yee AL. Full-convolution Siamese network algorithm under deep learning used in tracking of facial video image in newborns. THE JOURNAL OF SUPERCOMPUTING 2022; 78:14343-14361. [PMID: 35382385 PMCID: PMC8972989 DOI: 10.1007/s11227-022-04439-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/10/2022] [Indexed: 06/14/2023]
Abstract
This study was carried out with the aim of exploring the full-convolution Siamese network (SiamFC) in the application of neonatal facial video image tracking, achieving accurate recognition of neonatal pain and helping doctors evaluate neonatal emotions in an automatic manner. The current technology shows low accuracy on facial image recognition of newborns, so the SiamFC algorithm under the deep learning was optimized in this study. Besides, a newborn facial video image tracking model (FVIT model) was constructed based on the SiamFC algorithm in combination with the attention mechanism with face tracking algorithm, and the facial features of newborns were tracked and recognized. In addition, a newborn face database was constructed based on the adult face database to evaluate performance of the FVIT model. It was found that the accuracy of the improved algorithm is 0.889, higher by 0.036 in contrast to other models; the area under the curve (AUC) of success rate reaches 0.748, higher by 0.075 compared with other algorithms. What's more, the improved algorithm shows good performance in tracking the facial occlusion, facial expression changes, and scale conversion of newborns. Therefore, the improved algorithm shows higher accuracy and success rate and has good effect in capturing and tracking the facial images of newborns, thereby providing an experimental basis for facial recognition and pain assessment of newborns in the later stage.
Collapse
Affiliation(s)
- Yun Wang
- Department of Computer Engineering, Shanxi Polytechnic College, Taiyuan, 030006 China
| | - Lu Huang
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029 China
| | - Austin Lin Yee
- Department of Oral Biology, Division of Orthodontics, Harvard School of Dental Medicine, Harvard University, Boston, 02115 USA
| |
Collapse
|
4
|
Sun Y, Ren Z, Zheng W. Research on Face Recognition Algorithm Based on Image Processing. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9224203. [PMID: 35341202 PMCID: PMC8956407 DOI: 10.1155/2022/9224203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/16/2022] [Accepted: 02/23/2022] [Indexed: 12/04/2022]
Abstract
While network technology is convenient for our daily life, the problems that are exposed are also endless. The most important thing for everyone is information security. In order to improve the security level of network information and identify and detect faces, the method used in this paper has improved compared with the traditional AdaBoost method and skin color method. AdaBoost detection is performed on the image, which reduces the probability of false detection. The experiment compares the experimental results of the AdaBoost method, the skin color method and the skin color + AdaBoost method. All operations in the KPCA and KFDA algorithms are performed by the inner product kernel function defined in the original space, and no specific non-linear mapping function is involved.The full name of KPCA is kernel principal component analysis. The full name of KFDA is kernel Fisher discriminant analysis. Combining the zero-space method kernel discriminant analysis method improves the ability of discriminant analysis to extract non-linear features. Through the secondary extraction of PCA features, a better recognition result than the PCA method is obtained. This paper also proposes a zero-space based Fisher discriminant analysis method. Experiments show that the zero-space-based method makes full use of the useful discriminant information in the zero space of the intraclass dispersion matrix, which improves the accuracy of face recognition to some extent.If you choose the polynomial kernel function, when d = 0.8, KPCA has a higher recognition ability. When d = 2, the recognition rate of KFDA and zero space-based KFDA is the largest. For polynomial functions, in general, d = 2.
Collapse
Affiliation(s)
- Yan Sun
- College of Information and Communication Engineering University, Harbin 150001, Heilongjiang, China
| | - Zhenyun Ren
- College of Information and Communication Engineering University, Harbin 150001, Heilongjiang, China
| | - Wenxi Zheng
- College of Information and Communication Engineering University, Harbin 150001, Heilongjiang, China
| |
Collapse
|
5
|
Dong X, Yang Y, Wei SE, Weng X, Sheikh Y, Yu SI. Supervision by Registration and Triangulation for Landmark Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3681-3694. [PMID: 32248096 DOI: 10.1109/tpami.2020.2983935] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present supervision by registration and triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (I) The detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (II) The detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data during detector training. End-to-end training is made possible by differentiable registration and 3D triangulation modules. Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video.
Collapse
|
6
|
|
7
|
Montaño-Serrano VM, Jacinto-Villegas JM, Vilchis-González AH, Portillo-Rodríguez O. Artificial Vision Algorithms for Socially Assistive Robot Applications: A Review of the Literature. SENSORS (BASEL, SWITZERLAND) 2021; 21:5728. [PMID: 34502617 PMCID: PMC8433764 DOI: 10.3390/s21175728] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/22/2021] [Accepted: 08/23/2021] [Indexed: 11/16/2022]
Abstract
Today, computer vision algorithms are very important for different fields and applications, such as closed-circuit television security, health status monitoring, and recognizing a specific person or object and robotics. Regarding this topic, the present paper deals with a recent review of the literature on computer vision algorithms (recognition and tracking of faces, bodies, and objects) oriented towards socially assistive robot applications. The performance, frames per second (FPS) processing speed, and hardware implemented to run the algorithms are highlighted by comparing the available solutions. Moreover, this paper provides general information for researchers interested in knowing which vision algorithms are available, enabling them to select the one that is most suitable to include in their robotic system applications.
Collapse
Affiliation(s)
- Victor Manuel Montaño-Serrano
- Facultad de Ingeniería, Universidad Autónoma del Estado de México, Toluca 50130, Mexico; (V.M.M.-S.); (A.H.V.-G.); (O.P.-R.)
| | - Juan Manuel Jacinto-Villegas
- Facultad de Ingeniería, Universidad Autónoma del Estado de México, Toluca 50130, Mexico; (V.M.M.-S.); (A.H.V.-G.); (O.P.-R.)
- Cátedras CONACYT, Ciudad de México 03940, Mexico
| | | | - Otniel Portillo-Rodríguez
- Facultad de Ingeniería, Universidad Autónoma del Estado de México, Toluca 50130, Mexico; (V.M.M.-S.); (A.H.V.-G.); (O.P.-R.)
| |
Collapse
|
8
|
Zhuang Y, McDonald MM, Aldridge CM, Hassan MA, Uribe O, Arteaga D, Southerland AM, Rohde GK. Video-Based Facial Weakness Analysis. IEEE Trans Biomed Eng 2021; 68:2698-2705. [PMID: 33406036 DOI: 10.1109/tbme.2021.3049739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Facial weakness is a common sign of neurological diseases such as Bell's palsy and stroke. However, recognizing facial weakness still remains as a challenge, because it requires experience and neurological training. METHODS We propose a framework for facial weakness detection, which models the temporal dynamics of both shape and appearance-based features of each target frame through a bi-directional long short-term memory network (Bi-LSTM). The system is evaluated on a "in-the-wild"video dataset that is verified by three board-certified neurologists. In addition, three emergency medical services (EMS) personnel and three upper level residents rated the dataset. We compare the evaluation of the proposed algorithm with other comparison methods as well as the human raters. RESULTS Experimental evaluation demonstrates that: (1) the proposed algorithm achieves the accuracy, sensitivity, and specificity of 94.3%, 91.4%, and 95.7%, which outperforms other comparison methods and achieves the equal performance to paramedics; (2) the framework can provide visualizable and interpretable results that increases model transparency and interpretability; (3) a prototype is implemented as a proof-of-concept showcase to show the feasibility of an inexpensive solution for facial weakness detection. CONCLUSION The experiment results suggest that the proposed framework can identify facial weakness effectively. SIGNIFICANCE We provide a proof-of-concept study, showing that such technology could be used by non-neurologists to more readily identify facial weakness in the field, leading to increasing coverage and earlier treatment.
Collapse
|
9
|
Savran A, Bartolozzi C. Face Pose Alignment with Event Cameras. SENSORS (BASEL, SWITZERLAND) 2020; 20:E7079. [PMID: 33321842 PMCID: PMC7764104 DOI: 10.3390/s20247079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/30/2020] [Accepted: 11/05/2020] [Indexed: 06/12/2023]
Abstract
Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency, high dynamic range, and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment, which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications, especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose, we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation, we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average, whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications.
Collapse
Affiliation(s)
- Arman Savran
- Department of Computer Engineering, Yasar University, 35100 Izmir, Turkey
| | - Chiara Bartolozzi
- Event-Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163 Genova, Italy;
| |
Collapse
|
10
|
Liu C, Feng L, Guo S, Wang H, Liu S, Qiao H. An incrementally cascaded broad learning framework to facial landmark tracking. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X. Siamese Local and Global Networks for Robust Face Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9152-9164. [PMID: 32941139 DOI: 10.1109/tip.2020.3023621] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking. In this paper, we propose a face tracking method based on Siamese CNNs, which takes advantages of powerful representations of hierarchical CNN features learned from massive face images. The proposed method captures discriminative face information at both local and global levels. At the local level, representations for attribute patches (i.e:, eyes, nose and mouth) are learned to distinguish a face from another one, which are robust to pose changes and occlusions. At the global level, representations for each whole face are learned, which take into account the spatial relationships among local patches and facial characters, such as skin color and nevus. In addition, we build a new largescale challenging face tracking dataset to evaluate face tracking methods and to facilitate the research forward in this field. Extensive experiments on the collected dataset demonstrate the effectiveness of our method in comparison to several state-of-theart visual tracking methods.
Collapse
|
12
|
Learning from discrete Gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
13
|
Burt AL, Crewther DP. The 4D Space-Time Dimensions of Facial Perception. Front Psychol 2020; 11:1842. [PMID: 32849084 PMCID: PMC7399249 DOI: 10.3389/fpsyg.2020.01842] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 07/06/2020] [Indexed: 12/19/2022] Open
Abstract
Facial information is a powerful channel for human-to-human communication. Characteristically, faces can be defined as biological objects that are four-dimensional (4D) patterns, whereby they have concurrently a spatial structure and surface as well as temporal dynamics. The spatial characteristics of facial objects contain a volume and surface in three dimensions (3D), namely breadth, height and importantly, depth. The temporal properties of facial objects are defined by how a 3D facial structure and surface evolves dynamically over time; where time is referred to as the fourth dimension (4D). Our entire perception of another’s face, whether it be social, affective or cognitive perceptions, is therefore built on a combination of 3D and 4D visual cues. Counterintuitively, over the past few decades of experimental research in psychology, facial stimuli have largely been captured, reproduced and presented to participants with two dimensions (2D), while remaining largely static. The following review aims to advance and update facial researchers, on the recent revolution in computer-generated, realistic 4D facial models produced from real-life human subjects. We delve in-depth to summarize recent studies which have utilized facial stimuli that possess 3D structural and surface cues (geometry, surface and depth) and 4D temporal cues (3D structure + dynamic viewpoint and movement). In sum, we have found that higher-order perceptions such as identity, gender, ethnicity, emotion and personality, are critically influenced by 4D characteristics. In future, it is recommended that facial stimuli incorporate the 4D space-time perspective with the proposed time-resolved methods.
Collapse
Affiliation(s)
- Adelaide L Burt
- Centre for Human Psychopharmacology, Swinburne University of Technology, Melbourne, VIC, Australia
| | - David P Crewther
- Centre for Human Psychopharmacology, Swinburne University of Technology, Melbourne, VIC, Australia
| |
Collapse
|
14
|
Khurshid A, Scharcanski J. An Adaptive Face Tracker with Application in Yawning Detection. SENSORS 2020; 20:s20051494. [PMID: 32182814 PMCID: PMC7085723 DOI: 10.3390/s20051494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 02/23/2020] [Accepted: 02/24/2020] [Indexed: 11/16/2022]
Abstract
In this work, we propose an adaptive face tracking scheme that compensates for possible face tracking errors during its operation. The proposed scheme is equipped with a tracking divergence estimate, which allows to detect early and minimize the face tracking errors, so the tracked face is not missed indefinitely. When the estimated face tracking error increases, a resyncing mechanism based on Constrained Local Models (CLM) is activated to reduce the tracking errors by re-estimating the tracked facial features' locations (e.g., facial landmarks). To improve the Constrained Local Model (CLM) feature search mechanism, a Weighted-CLM (W-CLM) is proposed and used in resyncing. The performance of the proposed face tracking method is evaluated in the challenging context of driver monitoring using yawning detection and talking video datasets. Furthermore, an improvement in a yawning detection scheme is proposed. Experiments suggest that our proposed face tracking scheme can obtain a better performance than comparable state-of-the-art face tracking methods and can be successfully applied in yawning detection.
Collapse
Affiliation(s)
- Aasim Khurshid
- Sidia Instituto de Ciencia e tecnologia, Amazonas, Manaus 69055-035, Brazil
- Instituto de Informatica, UFRGS, Porto Alegre 9500, Brazil;
- Correspondence:
| | | |
Collapse
|
15
|
Zhuang Y, McDonald M, Uribe O, Yin X, Parikh D, Southerland AM, Rohde GK. Facial Weakness Analysis and Quantification of Static Images. IEEE J Biomed Health Inform 2020; 24:2260-2267. [PMID: 31944968 DOI: 10.1109/jbhi.2020.2964520] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Facial weakness is a symptom commonly associated to lack of facial muscle control due to neurological injury. Several diseases are associated with facial weakness such as stroke and Bell's palsy. The use of digital imaging through mobile phones, tablets, personal computers and other devices could provide timely opportunity for detection, which if accurate enough can improve treatment by enabling faster patient triage and recovery progress monitoring. Most of the existing facial weakness detection approaches from static images are based on facial landmarks from which geometric features can be calculated. Landmark-based methods, however, can suffer from inaccuracies in face landmarks localization. In this study, We also experimentally evaluate the performance of several feature extraction methods for measuring facial weakness, including the landmark-based features, as well as intensity-based features on a neurologist-certified dataset that comprises 186 images of normal, 125 images of left facial weakness, and 126 images of right facial weakness. We demonstrate that, for the application of facial weakness detection from single (static) images, approaches that incorporate the Histogram of Oriented Gradients (HoG) features tend to be more accurate.
Collapse
|
16
|
A Review of Facial Landmark Extraction in 2D Images and Videos Using Deep Learning. BIG DATA AND COGNITIVE COMPUTING 2019. [DOI: 10.3390/bdcc3010014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The task of facial landmark extraction is fundamental in several applications which involve facial analysis, such as facial expression analysis, identity and face recognition, facial animation, and 3D face reconstruction. Taking into account the most recent advances resulting from deep-learning techniques, the performance of methods for facial landmark extraction have been substantially improved, even on in-the-wild datasets. Thus, this article presents an updated survey on facial landmark extraction on 2D images and video, focusing on methods that make use of deep-learning techniques. An analysis of many approaches comparing the performances is provided. In summary, an analysis of common datasets, challenges, and future research directions are provided.
Collapse
|
17
|
Deng J, Trigeorgis G, Zhou Y, Zafeiriou S. Joint Multi-view Face Alignment in the Wild. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3636-3648. [PMID: 30762549 DOI: 10.1109/tip.2019.2899267] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: i) the detection and deformable fitting steps are performed independently, while the detector might not provide best-suited initialization for the fitting step, ii) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (e.g., one for profile and one for frontal faces). In this work, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localization tasks. Existing joint face detection and landmark localization methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. Significant improvement over state-of-the-art methods on deformable face tracking is witnessed on 300VW benchmark. We also demonstrate state-ofthe- art results for face detection on FDDB and MALF datasets.
Collapse
|
18
|
Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01158-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
19
|
|
20
|
|
21
|
Booth J, Roussos A, Ververas E, Antonakos E, Ploumpis S, Panagakis Y, Zafeiriou S. 3D Reconstruction of "In-the-Wild" Faces in Images and Videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2638-2652. [PMID: 29993707 DOI: 10.1109/tpami.2018.2832138] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and are among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ("in-the-wild"). In this paper, we propose the first "in-the-wild" 3DMM by combining a statistical model of facial identity and expression shape with an "in-the-wild" texture model. We show that such an approach allows for the development of a greatly simplified fitting procedure for images and videos, as there is no need to optimise with regards to the illumination parameters. We have collected three new benchmarks that combine "in-the-wild" images and video with ground truth 3D facial geometry, the first of their kind, and report extensive quantitative evaluations using them that demonstrate our method is state-of-the-art.
Collapse
|
22
|
Chrysos GG, Zafeiriou S. PD 2T: Person-Specific Detection, Deformable Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2555-2568. [PMID: 29990150 DOI: 10.1109/tpami.2017.2769654] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Face detection/alignment methods have reached a satisfactory state in static images captured under arbitrary conditions. Such methods typically perform (joint) fitting for each frame and are used in commercial applications; however in the majority of the real-world scenarios the dynamic scenes are of interest. We argue that generic fitting per frame is suboptimal (it discards the informative correlation of sequential frames) and propose to learn person-specific statistics from the video to improve the generic results. To that end, we introduce a meticulously studied pipeline, which we name PD2T, that performs person-specific detection and landmark localisation. We carry out extensive experimentation with a diverse set of i) generic fitting results, ii) different objects (human faces, animal faces) that illustrate the powerful properties of our proposed pipeline and experimentally verify that PD2T outperforms all the compared methods.
Collapse
|
23
|
Chrysos GG, Antonakos E, Zafeiriou S. IPST: Incremental Pictorial Structures for Model-free Tracking of Deformable Objects. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3529-3540. [PMID: 29993804 DOI: 10.1109/tip.2018.2816121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Model-free tracking is a well-studied task in computer vision. Typically, a rectangular bounding box containing a single object is provided in the first (few) frame(s) and then the method tracks the object in the rest frames. However, for deformable objects (e.g. faces, bodies) the single bounding box scenario is sub-optimal; a part-based approach would be more effective. The current state-of-the-art part-based approach is incrementally trained discriminative Deformable Part Models (DPM). Nevertheless, training discriminative DPMs with one or a few examples poses a huge challenge. We argue that a generative model is a better fit for the task. We utilise the powerful pictorial structures, which we augment with incremental updates to account for object adaptations. Our proposed incremental pictorial structures, which we call IPST, are experimentally validated in different scenarios. In a thorough experimentation we demonstrate that IPST outperforms the existing model-free methods in facial landmark tracking, body tracking, animal tracking (newly introduced to verify the strength in ad hoc cases).
Collapse
|
24
|
|