1
|
Tian H, Gong W, Li W, Qian Y. PASTFNet: a paralleled attention spatio-temporal fusion network for micro-expression recognition. Med Biol Eng Comput 2024; 62:1911-1924. [PMID: 38413518 DOI: 10.1007/s11517-024-03041-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/06/2024] [Indexed: 02/29/2024]
Abstract
Micro-expressions (MEs) play such an important role in predicting a person's genuine emotions, as to make micro-expression recognition such an important resea rch focus in recent years. Most recent researchers have made efforts to recognize MEs with spatial and temporal information of video clips. However, because of their short duration and subtle intensity, capturing spatio-temporal features of micro-expressions remains challenging. To effectively promote the recognition performance, this paper presents a novel paralleled dual-branch attention-based spatio-temporal fusion network (PASTFNet). We jointly extract short- and long-range spatial relationships in spatial branch. Inspired by the composite architecture of the convolutional neural network (CNN) and long short-term memory (LSTM) for temporal modeling, we propose a novel attention-based multi-scale feature fusion network (AMFNet) to encode features of sequential frames, which can learn more expressive facial-detailed features for it implements the integrated use of attention and multi-scale feature fusion, then design an aggregation block to aggregate and acquire temporal features. At last, the features learned by the above two branches are fused to accomplish expression recognition with outstanding effect. Experiments on two MER datasets (CASMEII and SAMM) show that the PASTFNet model achieves promising ME recognition performance compared with other methods.
Collapse
Affiliation(s)
- Haichen Tian
- School of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Weijun Gong
- School of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Wei Li
- School of Software, Xinjiang University, Urumqi, China
| | - Yurong Qian
- School of Information Science and Engineering, Xinjiang University, Urumqi, China.
- School of Software, Xinjiang University, Urumqi, China.
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Urumqi, China.
| |
Collapse
|
2
|
Tonguç G. Effect of distance education courses held in different environments on emotions of the instructor. PLoS One 2024; 19:e0295935. [PMID: 38277358 PMCID: PMC10817226 DOI: 10.1371/journal.pone.0295935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/03/2023] [Indexed: 01/28/2024] Open
Abstract
In this study, the emotional states of the instructors who teach by distance education and the effect of the environment in which they give the lesson on their emotions were examined. Computer-aided "Facial Action Coding System" method was used to measure emotion values from facial images. Through the software developed by the researchers using the Microsoft Face Recognition API, 43292 facial images taken from five trainers during their training were analysed and seven basic emotions representing facial expressions were obtained numerically. As a result of the analysis, it was found that the emotions of the instructors that can be described as negative in the lessons held in the e-studio environment generally increased at the beginning of the lesson, decreased in the following minutes and increased again at the end of the lesson; On the other hand, it was determined that positive emotions decreased at the beginning of the lesson and increased later. In the home environment, while the emotions that can be described as negative at the beginning decreased, positive emotions increased. A significant difference was determined between home and e-studio environment in all emotions except anger. One of the emotions with a difference in value between the two environments is happiness, and it has been determined that happiness has higher values in the home environment. It has been determined that other emotions are experienced more in the e-studio environment. It is thought that the results of the study will contribute to the mental states of the instructors who teach through distance education and to the efficiency of distance education.
Collapse
Affiliation(s)
- Güray Tonguç
- Applied Sciences Faculty, Information Management Systems Department, Akdeniz University, Antalya, Turkey
| |
Collapse
|
3
|
Li Y, Huang J, Lu S, Zhang Z, Lu G. Cross-Domain Facial Expression Recognition via Contrastive Warm up and Complexity-Aware Self-Training. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5438-5450. [PMID: 37773906 DOI: 10.1109/tip.2023.3318955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Unsupervised cross-domain Facial Expression Recognition (FER) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain. Existing methods strive to reduce the discrepancy between source and target domain, but cannot effectively explore the abundant semantic information of the target domain due to the absence of target labels. To this end, we propose a novel framework via Contrastive Warm up and Complexity-aware Self-Training (namely CWCST), which facilitates source knowledge transfer and target semantic learning jointly. Specifically, we formulate a contrastive warm up strategy via features, momentum features, and learnable category centers to concurrently learn discriminative representations and narrow the domain gap, which benefits domain adaptation by generating more accurate target pseudo labels. Moreover, to deal with the inevitable noise in pseudo labels, we develop complexity-aware self-training with a label selection module based on prediction entropy, which iteratively generates pseudo labels and adaptively chooses the reliable ones for training, ultimately yielding effective target semantics exploration. Furthermore, by jointly using the two mentioned components, our framework enables to effectively utilize the source knowledge and target semantic information by source-target co- training. In addition, our framework can be easily incorporated into other baselines with consistent performance improvements. Extensive experimental results on seven databases show the superior performance of the proposed method against various baselines.
Collapse
|
4
|
Chen X, Zheng X, Sun K, Liu W, Zhang Y. Self-supervised vision transformer-based few-shot learning for facial expression recognition. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
5
|
Joudeh IO, Cretu AM, Bouchard S, Guimond S. Prediction of Continuous Emotional Measures through Physiological and Visual Data. SENSORS (BASEL, SWITZERLAND) 2023; 23:5613. [PMID: 37420778 DOI: 10.3390/s23125613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 06/07/2023] [Accepted: 06/12/2023] [Indexed: 07/09/2023]
Abstract
The affective state of a person can be measured using arousal and valence values. In this article, we contribute to the prediction of arousal and valence values from various data sources. Our goal is to later use such predictive models to adaptively adjust virtual reality (VR) environments and help facilitate cognitive remediation exercises for users with mental health disorders, such as schizophrenia, while avoiding discouragement. Building on our previous work on physiological, electrodermal activity (EDA) and electrocardiogram (ECG) recordings, we propose improving preprocessing and adding novel feature selection and decision fusion processes. We use video recordings as an additional data source for predicting affective states. We implement an innovative solution based on a combination of machine learning models alongside a series of preprocessing steps. We test our approach on RECOLA, a publicly available dataset. The best results are obtained with a concordance correlation coefficient (CCC) of 0.996 for arousal and 0.998 for valence using physiological data. Related work in the literature reported lower CCCs on the same data modality; thus, our approach outperforms the state-of-the-art approaches for RECOLA. Our study underscores the potential of using advanced machine learning techniques with diverse data sources to enhance the personalization of VR environments.
Collapse
Affiliation(s)
- Itaf Omar Joudeh
- Department of Computer Science and Engineering, University of Quebec in Outaouais, Gatineau, QC J8Y 3G5, Canada
| | - Ana-Maria Cretu
- Department of Computer Science and Engineering, University of Quebec in Outaouais, Gatineau, QC J8Y 3G5, Canada
| | - Stéphane Bouchard
- Department of Psychoeducation and Psychology, University of Quebec in Outaouais, Gatineau, QC J8X 3X7, Canada
| | - Synthia Guimond
- Department of Psychoeducation and Psychology, University of Quebec in Outaouais, Gatineau, QC J8X 3X7, Canada
- Department of Psychiatry, The Royal's Institute of Mental Health Research, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
6
|
Zhu X, Sun J, Liu G, Shen C, Dai Z, Zhao L. Hybrid Domain Consistency Constraints-Based Deep Neural Network for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115201. [PMID: 37299930 DOI: 10.3390/s23115201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/23/2023] [Accepted: 05/28/2023] [Indexed: 06/12/2023]
Abstract
Facial expression recognition (FER) has received increasing attention. However, multiple factors (e.g., uneven illumination, facial deflection, occlusion, and subjectivity of annotations in image datasets) probably reduce the performance of traditional FER methods. Thus, we propose a novel Hybrid Domain Consistency Network (HDCNet) based on a feature constraint method that combines both spatial domain consistency and channel domain consistency. Specifically, first, the proposed HDCNet mines the potential attention consistency feature expression (different from manual features, e.g., HOG and SIFT) as effective supervision information by comparing the original sample image with the augmented facial expression image. Second, HDCNet extracts facial expression-related features in the spatial and channel domains, and then it constrains the consistent expression of features through the mixed domain consistency loss function. In addition, the loss function based on the attention-consistency constraints does not require additional labels. Third, the network weights are learned to optimize the classification network through the loss function of the mixed domain consistency constraints. Finally, experiments conducted on the public RAF-DB and AffectNet benchmark datasets verify that the proposed HDCNet improved classification accuracy by 0.3-3.84% compared to the existing methods.
Collapse
Affiliation(s)
- Xiaoliang Zhu
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Junyi Sun
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Gendong Liu
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Chen Shen
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Zhicheng Dai
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Liang Zhao
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
7
|
Davodabadi A, Daneshian B, Saati S, Razavyan S. Mathematical model and artificial intelligence for diagnosis of Alzheimer's disease. EUROPEAN PHYSICAL JOURNAL PLUS 2023; 138:474. [PMID: 37274456 PMCID: PMC10226030 DOI: 10.1140/epjp/s13360-023-04128-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/22/2023] [Indexed: 06/06/2023]
Abstract
Degeneration of the neurological system linked to cognitive deficits, daily living exercise clutters, and behavioral disturbing impacts may define Alzheimer's disease. Alzheimer's disease research conducted later in life focuses on describing ways for early detection of dementia, a kind of mental disorder. To tailor our care to each patient, we utilized visual cues to determine how they were feeling. We did this by outlining two approaches to diagnosing a person's mental health. Support vector machine is the first technique. Image characteristics are extracted using a fractal model for classification in this method. With this technique, the histogram of a picture is modeled after a Gaussian distribution. Classification was performed with several support vector machines kernels, and the outcomes were compared. Step two proposes using a deep convolutional neural network architecture to identify Alzheimer's-related mental disorders. According to the findings, the support vector machines approach accurately recognized over 93% of the photos tested. The deep convolutional neural network approach was one hundred percent accurate during model training, whereas the support vector machines approach achieved just 93 percent accuracy. In contrast to support vector machines accuracy of 89.3%, the deep convolutional neural network model test findings were accurate 98.8% of the time. Based on the findings reported here, the proposed deep convolutional neural network architecture may be used for diagnostic purposes involving the patient's mental state.
Collapse
Affiliation(s)
- Afsaneh Davodabadi
- Department of Mathematics, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Behrooz Daneshian
- Department of Mathematics, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Saber Saati
- Department of Mathematics, North Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Shabnam Razavyan
- Department of Mathematics, South Tehran Branch, Islamic Azad University, Tehran, Iran
| |
Collapse
|
8
|
Borgalli RA, Surve S. Review on learning framework for facial expression recognition. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2023.2172526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Affiliation(s)
- Rohan Appasaheb Borgalli
- Department of Electronics Engineering, Fr. Conceicao Rodrigues College of Engineering, Bandra, University of Mumbai, Mumbai, Maharashtra, India
| | - Sunil Surve
- Department of Computer Engineering, Fr. Conceicao Rodrigues College of Engineering, Bandra, University of Mumbai, Mumbai, Maharashtra, India
| |
Collapse
|
9
|
Li J, Dong Z, Lu S, Wang SJ, Yan WJ, Ma Y, Liu Y, Huang C, Fu X. CAS(ME) 3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2782-2800. [PMID: 35560102 DOI: 10.1109/tpami.2022.3174895] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Micro-expression (ME) is a significant non-verbal communication clue that reveals one person's genuine emotional state. The development of micro-expression analysis (MEA) has just gained attention in the last decade. However, the small sample size problem constrains the use of deep learning on MEA. Besides, ME samples distribute in six different databases, leading to database bias. Moreover, the ME database development is complicated. In this article, we introduce a large-scale spontaneous ME database: CAS(ME) 3. The contribution of this article is summarized as follows: (1) CAS(ME) 3 offers around 80 hours of videos with over 8,000,000 frames, including manually labeled 1,109 MEs and 3,490 macro-expressions. Such a large sample size allows effective MEA method validation while avoiding database bias. (2) Inspired by psychological experiments, CAS(ME) 3 provides the depth information as an additional modality unprecedentedly, contributing to multi-modal MEA. (3) For the first time, CAS(ME) 3 elicits ME with high ecological validity using the mock crime paradigm, along with physiological and voice signals, contributing to practical MEA. (4) Besides, CAS(ME) 3 provides 1,508 unlabeled videos with more than 4,000,000 frames, i.e., a data platform for unsupervised MEA methods. (5) Finally, we demonstrate the effectiveness of depth information by the proposed depth flow algorithm and RGB-D information.
Collapse
|
10
|
SoftClusterMix: learning soft boundaries for empirical risk minimization. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
11
|
Xie W, Wang C, Lin Z, Luo X, Chen W, Xu M, Liang L, Liu X, Wang Y, Luo H, Cheng M. Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model. Comput Med Imaging Graph 2022; 102:102128. [PMID: 36272311 DOI: 10.1016/j.compmedimag.2022.102128] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 09/20/2022] [Accepted: 09/28/2022] [Indexed: 11/07/2022]
Abstract
BACKGROUND In recent years, more and more people suffer from depression and anxiety. These symptoms are hard to be spotted and can be very dangerous. Currently, the Self-Reported Anxiety Scale (SAS) and Self-Reported Depression Scale (SDS) are commonly used for initial screening for depression and anxiety disorders. However, the information contained in these two scales is limited, while the symptoms of subjects are various and complex, which results in the inconsistency between the questionnaire evaluation results and the clinician's diagnosis results. To fully mine the scale data, we propose a method to extract the features from the facial expression and movements, which are generated from the video recorded simultaneously when subjects fill in the scale. Then we collect the facial expression, movements and scale information to establish a multimodal framework for improving the accuracy and robustness of the diagnosis of depression and anxiety. METHODS We collect the scale results of the subjects and the videos when filling in the scales. Given the two scales, SAS and SDS, we construct a model with two branches, where each branch processes the multimodal data of SAS and SDS, respectively. In the branch, we first build a convolutional neural network (CNN) to extracts the facial expression features in each frame of images. Secondly, we establish a long short-term memory (LSTM) network to further embedding the facial expression feature and build the connections between various frames, so that the movement feature in the video can be generated. Thirdly, we transform the scale scores into one-hot format, and feed them into the corresponding branch of the network to further mining the information of the multimodal data. Finally, we fuse the embeddings of these two branches to generate inference results of depression and anxiety. RESULTS AND CONCLUSIONS Based on the score results of SAS and SDS, our multimodal model further mines the video information, and can reach the accuracy of 0.946 in diagnosing depression and anxiety. This study demonstrates the feasibility of using our CNN-LSTM-based multimodal model for initial screening and diagnosis of depression and anxiety disorders with high diagnostic performance.
Collapse
Affiliation(s)
- Wanqing Xie
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Anhui Medical University, Hefei, China; Department of Psychology, School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei, China; Suzhou Fanhan Information Technology Company, Ltd, Suzhou, China
| | - Chen Wang
- College of the Mathematical Sciences, Harbin Engineering University, Harbin, China
| | - Zhixiong Lin
- Department of Psychiatry, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Xudong Luo
- Department of Psychiatry, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Wenqian Chen
- College of the Mathematical Sciences, Harbin Engineering University, Harbin, China
| | - Manzhu Xu
- Department of Biological Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Lizhong Liang
- Department of Psychiatry, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Xiaofeng Liu
- Suzhou Fanhan Information Technology Company, Ltd, Suzhou, China; Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA
| | - Yanzhong Wang
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK
| | - Hui Luo
- Marine Biomedical Research Institute of Guangdong Medical University, Zhanjiang 510240, China.
| | - Mingmei Cheng
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Anhui Medical University, Hefei, China; Department of Psychology, School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei, China.
| |
Collapse
|
12
|
Ben X, Ren Y, Zhang J, Wang SJ, Kpalma K, Meng W, Liu YJ. Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5826-5846. [PMID: 33739920 DOI: 10.1109/tpami.2021.3067464] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Unlike the conventional facial expressions, micro-expressions are involuntary and transient facial expressions capable of revealing the genuine emotions that people attempt to hide. Therefore, they can provide important information in a broad range of applications such as lie detection, criminal detection, etc. Since micro-expressions are transient and of low intensity, however, their detection and recognition is difficult and relies heavily on expert experiences. Due to its intrinsic particularity and complexity, video-based micro-expression analysis is attractive but challenging, and has recently become an active area of research. Although there have been numerous developments in this area, thus far there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences between macro- and micro-expressions, then use these differences to guide our research survey of video-based micro-expression analysis in a cascaded structure, encompassing the neuropsychological basis, datasets, features, spotting algorithms, recognition algorithms, applications and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are addressed and discussed. Furthermore, after considering the limitations of existing micro-expression datasets, we present and release a new dataset - called micro-and-macro expression warehouse (MMEW) - containing more video samples and more labeled emotion types. We then perform a unified comparison of representative methods on CAS(ME) 2 for spotting, and on MMEW and SAMM for recognition, respectively. Finally, some potential future research directions are explored and outlined.
Collapse
|
13
|
Facial Emotion Expressions in Human–Robot Interaction: A Survey. Int J Soc Robot 2022. [DOI: 10.1007/s12369-022-00867-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractFacial expressions are an ideal means of communicating one’s emotions or intentions to others. This overview will focus on human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered. There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets, but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human–robot interaction has been discussed leading to several possible directions for future research.
Collapse
|
14
|
A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6060047] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multimodal human–computer interaction (HCI) systems pledge a more human–human-like interaction between machines and humans. Their prowess in emanating an unambiguous information exchange between the two makes these systems more reliable, efficient, less error prone, and capable of solving complex tasks. Emotion recognition is a realm of HCI that follows multimodality to achieve accurate and natural results. The prodigious use of affective identification in e-learning, marketing, security, health sciences, etc., has increased demand for high-precision emotion recognition systems. Machine learning (ML) is getting its feet wet to ameliorate the process by tweaking the architectures or wielding high-quality databases (DB). This paper presents a survey of such DBs that are being used to develop multimodal emotion recognition (MER) systems. The survey illustrates the DBs that contain multi-channel data, such as facial expressions, speech, physiological signals, body movements, gestures, and lexical features. Few unimodal DBs are also discussed that work in conjunction with other DBs for affect recognition. Further, VIRI, a new DB of visible and infrared (IR) images of subjects expressing five emotions in an uncontrolled, real-world environment, is presented. A rationale for the superiority of the presented corpus over the existing ones is instituted.
Collapse
|
15
|
Micro-Expression Recognition Based on Optical Flow and PCANet+. SENSORS 2022; 22:s22114296. [PMID: 35684917 PMCID: PMC9185295 DOI: 10.3390/s22114296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 05/10/2022] [Accepted: 05/31/2022] [Indexed: 11/27/2022]
Abstract
Micro-expressions are rapid and subtle facial movements. Different from ordinary facial expressions in our daily life, micro-expressions are very difficult to detect and recognize. In recent years, due to a wide range of potential applications in many domains, micro-expression recognition has aroused extensive attention from computer vision. Because available micro-expression datasets are very small, deep neural network models with a huge number of parameters are prone to over-fitting. In this article, we propose an OF-PCANet+ method for micro-expression recognition, in which we design a spatiotemporal feature learning strategy based on shallow PCANet+ model, and we incorporate optical flow sequence stacking with the PCANet+ network to learn discriminative spatiotemporal features. We conduct comprehensive experiments on publicly available SMIC and CASME2 datasets. The results show that our lightweight model obviously outperforms popular hand-crafted methods and also achieves comparable performances with deep learning based methods, such as 3D-FCNN and ELRCN.
Collapse
|
16
|
Pons G, Masip D. Multitask, Multilabel, and Multidomain Learning With Convolutional Networks for Emotion Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4764-4771. [PMID: 33306479 DOI: 10.1109/tcyb.2020.3036935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in deep learning have assumed a significant breakthrough in this topic, strong changes in pose, orientation, and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly and the current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this article, we propose applying a multitask learning loss function to share a common feature representation with other related tasks. Particularly, we show that emotion recognition benefits from jointly learning a model with a detector of facial action units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multitask approaches. We validate the proposal using three datasets acquired in noncontrolled environments, and an application to predict compound facial emotion expressions.
Collapse
|
17
|
Zhao S, Tang H, Liu S, Zhang Y, Wang H, Xu T, Chen E, Guan C. ME-PLAN: A deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw 2022; 153:427-443. [DOI: 10.1016/j.neunet.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/09/2022] [Accepted: 06/20/2022] [Indexed: 10/17/2022]
|
18
|
Zhao K, Chen T, Chen L, Fu X, Meng H, Yap MH, Yuan J, Davison AK. Editorial: Facial Expression Recognition and Computing: An Interdisciplinary Perspective. Front Psychol 2022; 13:940630. [PMID: 35712216 PMCID: PMC9194941 DOI: 10.3389/fpsyg.2022.940630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 05/17/2022] [Indexed: 11/23/2022] Open
Affiliation(s)
- Ke Zhao
- State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tong Chen
- School of Electronic and Information Engineering, Southwest University, Chongqing, China
- *Correspondence: Tong Chen
| | - Liming Chen
- Université de Lyon, CNRS, École Centrale de Lyon LIRISUMR5205, Lyon, France
| | - Xiaolan Fu
- State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Xiaolan Fu
| | - Hongying Meng
- Department of Electronic and Electrical Engineering, Brunel University London, London, United Kingdom
| | - Moi Hoon Yap
- Department of Computing and Mathematics, Faculty of Science and Engineering, Manchester Metropolitan University, Manchester, United Kingdom
| | - Jiajin Yuan
- Institute of Brain and Psychological Sciences, Sichuan Normal University, Chengdu, China
| | - Adrian K. Davison
- Division of Musculoskeletal and Dermatological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
19
|
Zheng Q, Wang Y, Hu Z, Zhang X, Wu Z, Pan G. Jointly Optimizing Expressional and Residual Models for 3D Facial Expression Removal. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3533312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
This paper proposes a facial expression removal method to recover a 3D neutral face from a single 3D expressional or non-neutral face. We treat a 3D non-neutral face as the sum of its neutral one and the residual. This can be satisfied if the correspondence between 3D vertices of expressional faces and those of neutral faces is established. We propose a non-rigid deformation method to establish the correspondence between 3D faces. Then, according to algebra inequality, the minimization of a neutral face model can be replaced by the minimization of its upper bound,
i.e.
, the errors of an expressional face model and a residual model. Thus, we co-optimize the representation errors of the latter two models and build the relationship between the representation coefficients of the two models. Given an expressional face as the input, its corresponding neutral face can be inferred by the associative representation parameters in these two models. In the testing stage, we use an iterative joint fitting scheme to obtain a more accurate recovery. Extensive experiments are conducted to evaluate our method. The results show that our method obtains considerably better performance than existing methods in terms of average RMS errors and recognition rates, and also better visual effects.
Collapse
|
20
|
Liu W, Zheng WL, Li Z, Wu SY, Gan L, Lu BL. Identifying similarities and differences in emotion recognition with EEG and eye movements among Chinese, German, and French people. J Neural Eng 2022; 19. [PMID: 35272271 DOI: 10.1088/1741-2552/ac5c8d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 03/10/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Cultures have essential influences on emotions. However, most studies on cultural influences on emotions are in the areas of psychology and neuroscience, while the existing affective models are mostly built with data from the same culture. In this paper, we identify the similarities and differences among Chinese, German, and French individuals in emotion recognition with electroencephalogram (EEG) and eye movements from an affective computing perspective. APPROACH Three experimental settings were designed: intraculture subject dependent, intraculture subject independent, and cross-culture subject independent. EEG and eye movements are acquired simultaneously from Chinese, German, and French subjects while watching positive, neutral, and negative movie clips. The affective models for Chinese, German, and French subjects are constructed by using machine learning algorithms. A systematic analysis is performed from four aspects: affective model performance, neural patterns, complementary information from different modalities, and cross-cultural emotion recognition. MAIN RESULTS From emotion recognition accuracies, we find that EEG and eye movements can adapt to Chinese, German, and French cultural diversities and that a cultural in-group advantage phenomenon does exist in emotion recognition with EEG. From the topomaps of EEG, we find that the gamma and beta bands exhibit decreasing activities for Chinese, while for German and French, theta and alpha bands exhibit increasing activities. From confusion matrices and attentional weights, we find that EEG and eye movements have complementary characteristics. From a cross-cultural emotion recognition perspective, we observe that German and French people share more similarities in topographical patterns and attentional weight distributions than Chinese people while the data from Chinese are a good fit for test data but not suitable for training data for the other two cultures. SIGNIFICANCE Our experimental results provide concrete evidence of the in-group advantage phenomenon, cultural influences on emotion recognition, and different neural patterns among Chinese, German, and French individuals.
Collapse
Affiliation(s)
- Wei Liu
- Computer Science and Engineering, Shanghai Jiao Tong University, No 800, Dongchuan Road, Minhang District, Shanghai ,China, Shanghai, Shanghai, Shanghai, 200240, CHINA
| | - Wei-Long Zheng
- Massachusetts General Hospital, 77 Massachusetts Avenue, Room 46-2005 Cambridge, MA, USA, Boston, Massachusetts, 02114-2696, UNITED STATES
| | - Ziyi Li
- Shanghai Jiao Tong University, No 800, Dongchuan Road Minhang District, Shanghai ,China, Shanghai, 200240, CHINA
| | - Si-Yuan Wu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, No 800, Dongchuan Road Minhang District, Shanghai, 200240, CHINA
| | - Lu Gan
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, No 800, Dongchuan Road Minhang District, Shanghai ,China, Shanghai, 200240, CHINA
| | - Bao-Liang Lu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, P R China, Shanghai, 200240, CHINA
| |
Collapse
|
21
|
Churamani N, Barros P, Gunes H, Wermter S. Affect-Driven Learning of Robot Behaviour for Collaborative Human-Robot Interactions. Front Robot AI 2022; 9:717193. [PMID: 35265672 PMCID: PMC8898942 DOI: 10.3389/frobt.2022.717193] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 01/17/2022] [Indexed: 11/29/2022] Open
Abstract
Collaborative interactions require social robots to share the users’ perspective on the interactions and adapt to the dynamics of their affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for affect-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech of the users, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed behavioural traits like patience and emotional actuation that modulate the robot’s affective appraisal, and (iii) a Reinforcement Learning model that uses the robot’s appraisal to learn interaction behaviour. We investigate the effect of modelling different affective core dispositions on the affective appraisal and use this affective appraisal as the motivation to generate robot behaviours. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of the robot’s affective core on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an impatient robot with low emotional actuation is rated higher on its generosity and altruistic behaviour.
Collapse
Affiliation(s)
- Nikhil Churamani
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nikhil Churamani,
| | - Pablo Barros
- Cognitive Architecture for Collaborative Technologies (CONTACT) Unit, Istituto Italiano di Tecnologia, Genova, Italy
| | - Hatice Gunes
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Stefan Wermter
- Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany
| |
Collapse
|
22
|
Chen Z, Ansari R, Wilkie DJ. Learning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 2022; 13:135-146. [PMID: 35242282 PMCID: PMC8890070 DOI: 10.1109/taffc.2019.2949314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Patient pain can be detected highly reliably from facial expressions using a set of facial muscle-based action units (AUs) defined by the Facial Action Coding System (FACS). A key characteristic of facial expression of pain is the simultaneous occurrence of pain-related AU combinations, whose automated detection would be highly beneficial for efficient and practical pain monitoring. Existing general Automated Facial Expression Recognition (AFER) systems prove inadequate when applied specifically for detecting pain as they either focus on detecting individual pain-related AUs but not on combinations or they seek to bypass AU detection by training a binary pain classifier directly on pain intensity data but are limited by lack of enough labeled data for satisfactory training. In this paper, we propose a new approach that mimics the strategy of human coders of decoupling pain detection into two consecutive tasks: one performed at the individual video-frame level and the other at video-sequence level. Using state-of-the-art AFER tools to detect single AUs at the frame level, we propose two novel data structures to encode AU combinations from single AU scores. Two weakly supervised learning frameworks namely multiple instance learning (MIL) and multiple clustered instance learning (MCIL) are employed corresponding to each data structure to learn pain from video sequences. Experimental results show an 87% pain recognition accuracy with 0.94 AUC (Area Under Curve) on the UNBC-McMaster Shoulder Pain Expression dataset. Tests on long videos in a lung cancer patient video dataset demonstrates the potential value of the proposed system for pain monitoring in clinical settings.
Collapse
Affiliation(s)
- Zhanli Chen
- Department of Electrical and Computer Engineering, University of Illinois at Chicago
| | - Rashid Ansari
- Department of Electrical and Computer Engineering, University of Illinois at Chicago
| | - Diana J Wilkie
- Department of Biobehavioral Nursing, University of Florida
| |
Collapse
|
23
|
Abstract
AbstractHuman emotion recognition is an active research area in artificial intelligence and has made substantial progress over the past few years. Many recent works mainly focus on facial regions to infer human affection, while the surrounding context information is not effectively utilized. In this paper, we proposed a new deep network to effectively recognize human emotions using a novel global-local attention mechanism. Our network is designed to extract features from both facial and context regions independently, then learn them together using the attention module. In this way, both the facial and contextual information is used to infer human emotions, therefore enhancing the discrimination of the classifier. The intensive experiments show that our method surpasses the current state-of-the-art methods on recent emotion datasets by a fair margin. Qualitatively, our global-local attention module can extract more meaningful attention maps than previous methods. The source code and trained model of our network are available at https://github.com/minhnhatvt/glamor-net.
Collapse
|
24
|
Abstract
Artificial intelligence is developing rapidly in the direction of intellectualization and humanization. Recent studies have shown the vulnerability of many deep learning models to adversarial examples, but there are fewer studies on adversarial examples attacking facial expression recognition systems. Human–computer interaction requires facial expression recognition, so the security demands of artificial intelligence humanization should be considered. Inspired by facial expression recognition, we want to explore the characteristics of facial expression recognition adversarial examples. In this paper, we are the first to study facial expression adversarial examples (FEAEs) and propose an adversarial attack method on facial expression recognition systems, a novel measurement method on the adversarial hardness of FEAEs, and two evaluation metrics on FEAE transferability. The experimental results illustrate that our approach is superior to other gradient-based attack methods. Finding FEAEs can attack not only facial expression recognition systems but also face recognition systems. The transferability and adversarial hardness of FEAEs can be measured effectively and accurately.
Collapse
Affiliation(s)
- Yudao Sun
- School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, P. R. China
| | - Chunhua Wu
- School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, P. R. China
| | | | | |
Collapse
|
25
|
Han Z, Huang H. GAN Based Three-Stage-Training Algorithm for Multi-view Facial Expression Recognition. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10591-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
A novel approach for facial expression recognition based on Gabor filters and genetic algorithm. EVOLVING SYSTEMS 2021. [DOI: 10.1007/s12530-021-09393-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
27
|
Xie W, Liang L, Lu Y, Wang C, Shen J, Luo H, Liu X. Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation. IEEE J Biomed Health Inform 2021; 26:865-875. [PMID: 34170837 DOI: 10.1109/jbhi.2021.3092628] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.
Collapse
|
28
|
Deep transfer learning in human–robot interaction for cognitive and physical rehabilitation purposes. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-021-00988-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
29
|
Niinuma K, Onal Ertugrul I, Cohn JF, Jeni LA. Systematic Evaluation of Design Choices for Deep Facial Action Coding Across Pose. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.636094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The performance of automated facial expression coding is improving steadily. Advances in deep learning techniques have been key to this success. While the advantage of modern deep learning techniques is clear, the contribution of critical design choices remains largely unknown, especially for facial action unit occurrence and intensity across pose. Using the The Facial Expression Recognition and Analysis 2017 (FERA 2017) database, which provides a common protocol to evaluate robustness to pose variation, we systematically evaluated design choices in pre-training, feature alignment, model size selection, and optimizer details. Informed by the findings, we developed an architecture that exceeds state-of-the-art on FERA 2017. The architecture achieved a 3.5% increase in F1 score for occurrence detection and a 5.8% increase in Intraclass Correlation (ICC) for intensity estimation. To evaluate the generalizability of the architecture to unseen poses and new dataset domains, we performed experiments across pose in FERA 2017 and across domains in Denver Intensity of Spontaneous Facial Action (DISFA) and the UNBC Pain Archive.
Collapse
|
30
|
Fu Y, Ruan Q, Luo Z, An G, Jin Y. Orthogonal tucker decomposition using factor priors for 2D+3D facial expression recognition. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Yunfang Fu
- Institute of Information Science Beijing Jiaotong University Beijing China
- School of Computer Science & Engineering Shijiazhuang University Shijiazhuang China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Qiuqi Ruan
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Ziyan Luo
- Department of Mathematics Beijing Jiaotong University Beijing China
| | - Gaoyun An
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| | - Yi Jin
- Institute of Information Science Beijing Jiaotong University Beijing China
- Beijing Key Laboratory of Information Science and Network Technology Beijing China
| |
Collapse
|
31
|
Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models. SENSORS 2021; 21:s21072344. [PMID: 33801739 PMCID: PMC8036494 DOI: 10.3390/s21072344] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 03/24/2021] [Accepted: 03/25/2021] [Indexed: 11/18/2022]
Abstract
Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with “Conv2D+LSTM+3DCNN+Classify” architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.
Collapse
|
32
|
Sepas-Moghaddam A, Etemad A, Pereira F, Correia PL. CapsField: Light Field-Based Face and Expression Recognition in the Wild Using Capsule Routing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2627-2642. [PMID: 33523811 DOI: 10.1109/tip.2021.3054476] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been used for face and expression recognition in the wild. In this context, this paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network and an additional capsule network that utilizes dynamic routing to learn hierarchical relations between capsules. CapsField extracts the spatial features from facial images and learns the angular part-whole relations for a selected set of 2D sub-aperture images rendered from each LF image. To analyze the performance of the proposed solution in the wild, the first in the wild LF face dataset, along with a new complementary constrained face dataset captured from the same subjects recorded earlier have been captured and are made available. A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests. An extensive performance assessment study using the new datasets has been conducted for the proposed and relevant prior solutions, showing that the CapsField proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art.
Collapse
|
33
|
Ben Tamou A, Benzinou A, Nasreddine K. Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02155-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Yang GZ, Bellingham J, Dupont PE, Fischer P, Floridi L, Full R, Jacobstein N, Kumar V, McNutt M, Merrifield R, Nelson BJ, Scassellati B, Taddeo M, Taylor R, Veloso M, Wang ZL, Wood R. The grand challenges of Science Robotics. Sci Robot 2021; 3:3/14/eaar7650. [PMID: 33141701 DOI: 10.1126/scirobotics.aar7650] [Citation(s) in RCA: 359] [Impact Index Per Article: 119.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 01/12/2018] [Indexed: 12/17/2022]
Abstract
One of the ambitions of Science Robotics is to deeply root robotics research in science while developing novel robotic platforms that will enable new scientific discoveries. Of our 10 grand challenges, the first 7 represent underpinning technologies that have a wider impact on all application areas of robotics. For the next two challenges, we have included social robotics and medical robotics as application-specific areas of development to highlight the substantial societal and health impacts that they will bring. Finally, the last challenge is related to responsible innovation and how ethics and security should be carefully considered as we develop the technology further.
Collapse
Affiliation(s)
- Guang-Zhong Yang
- Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK.
| | - Jim Bellingham
- Center for Marine Robotics, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA
| | - Pierre E Dupont
- Department of Cardiovascular Surgery, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Peer Fischer
- Institute of Physical Chemistry, University of Stuttgart, Stuttgart, Germany.,Micro, Nano, and Molecular Systems Laboratory, Max Planck Institute for Intelligent Systems, Stuttgart, Germany
| | - Luciano Floridi
- Centre for Practical Ethics, Faculty of Philosophy, University of Oxford, Oxford, UK.,Digital Ethics Lab, Oxford Internet Institute, University of Oxford, Oxford, UK.,Department of Computer Science, University of Oxford, Oxford, UK.,Data Ethics Group, Alan Turing Institute, London, UK.,Department of Economics, American University, Washington, DC 20016, USA
| | - Robert Full
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Neil Jacobstein
- Singularity University, NASA Research Park, Moffett Field, CA 94035, USA.,MediaX, Stanford University, Stanford, CA 94305, USA
| | - Vijay Kumar
- Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcia McNutt
- National Academy of Sciences, Washington, DC 20418, USA
| | - Robert Merrifield
- Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK
| | - Bradley J Nelson
- Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH Zürich, Zurich, Switzerland
| | - Brian Scassellati
- Department of Computer Science, Yale University, New Haven, CT 06520, USA.,Department Mechanical Engineering and Materials Science, Yale University, New Haven, CT 06520, USA
| | - Mariarosaria Taddeo
- Digital Ethics Lab, Oxford Internet Institute, University of Oxford, Oxford, UK.,Department of Computer Science, University of Oxford, Oxford, UK.,Data Ethics Group, Alan Turing Institute, London, UK
| | - Russell Taylor
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Manuela Veloso
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Zhong Lin Wang
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Robert Wood
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
35
|
Spezialetti M, Placidi G, Rossi S. Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives. Front Robot AI 2020; 7:532279. [PMID: 33501307 PMCID: PMC7806093 DOI: 10.3389/frobt.2020.532279] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 09/18/2020] [Indexed: 12/11/2022] Open
Abstract
A fascinating challenge in the field of human-robot interaction is the possibility to endow robots with emotional intelligence in order to make the interaction more intuitive, genuine, and natural. To achieve this, a critical point is the capability of the robot to infer and interpret human emotions. Emotion recognition has been widely explored in the broader fields of human-machine interaction and affective computing. Here, we report recent advances in emotion recognition, with particular regard to the human-robot interaction context. Our aim is to review the state of the art of currently adopted emotional models, interaction modalities, and classification strategies and offer our point of view on future developments and critical issues. We focus on facial expressions, body poses and kinematics, voice, brain activity, and peripheral physiological responses, also providing a list of available datasets containing data from these modalities.
Collapse
Affiliation(s)
- Matteo Spezialetti
- PRISCA (Intelligent Robotics and Advanced Cognitive System Projects) Laboratory, Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Naples, Italy
- Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila, L'Aquila, Italy
| | - Giuseppe Placidi
- AVI (Acquisition, Analysis, Visualization & Imaging Laboratory) Laboratory, Department of Life, Health and Environmental Sciences (MESVA), University of L'Aquila, L'Aquila, Italy
| | - Silvia Rossi
- PRISCA (Intelligent Robotics and Advanced Cognitive System Projects) Laboratory, Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Naples, Italy
| |
Collapse
|
36
|
Masson A, Cazenave G, Trombini J, Batt M. The current challenges of automatic recognition of facial expressions: A systematic review. AI COMMUN 2020. [DOI: 10.3233/aic-200631] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, due to its great economic and social potential, the recognition of facial expressions linked to emotions has become one of the most flourishing applications in the field of artificial intelligence, and has been the subject of many developments. However, despite significant progress, this field is still subject to many theoretical debates and technical challenges. It therefore seems important to make a general inventory of the different lines of research and to present a synthesis of recent results in this field. To this end, we have carried out a systematic review of the literature according to the guidelines of the PRISMA method. A search of 13 documentary databases identified a total of 220 references over the period 2014–2019. After a global presentation of the current systems and their performance, we grouped and analyzed the selected articles in the light of the main problems encountered in the field of automated facial expression recognition. The conclusion of this review highlights the strengths, limitations and main directions for future research in this field.
Collapse
Affiliation(s)
- Audrey Masson
- Interpsy – GRC, University of Lorraine, France. E-mails: ,
- Two-I, France. E-mails: ,
| | | | | | - Martine Batt
- Interpsy – GRC, University of Lorraine, France. E-mails: ,
| |
Collapse
|
37
|
Wang X, Fairhurst MC, Canuto AM. Improving multi-view facial expression recognition through two novel texture-based feature representations. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-194798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Although several automatic computer systems have been proposed to address facial expression recognition problems, the majority of them still fail to cope with some requirements of many practical application scenarios. In this paper, one of the most influential and common issues raised in practical application scenarios when applying automatic facial expression recognition system, head pose variation, is comprehensively explored and investigated. In order to do this, two novel texture feature representations are proposed for implementing multi-view facial expression recognition systems in practical environments. These representations combine the block-based techniques with Local Ternary Pattern-based features, providing a more informative and efficient feature representation of the facial images. In addition, an in-house multi-view facial expression database has been designed and collected to allow us to conduct a detailed research study of the effect of out-of-plane pose angles on the performance of a multi-view facial expression recognition system. Along with the proposed in-house dataset, the proposed system is tested on two well-known facial expression databases, CK+ and BU-3DFE datasets. The obtained results shows that the proposed system outperforms current state-of-the-art 2D facial expression systems in the presence of pose variations.
Collapse
Affiliation(s)
- Xuejian Wang
- School of Engineering and Digital Arts, Jennison Building, University of Kent, UK
| | - Michael C. Fairhurst
- School of Engineering and Digital Arts, Jennison Building, University of Kent, UK
| | - Anne M.P. Canuto
- Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal, RN, Brazil
| |
Collapse
|
38
|
Boman M, Downs J, Karali A, Pawlby S. Toward Learning Machines at a Mother and Baby Unit. Front Psychol 2020; 11:567310. [PMID: 33281668 PMCID: PMC7691596 DOI: 10.3389/fpsyg.2020.567310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 10/12/2020] [Indexed: 11/13/2022] Open
Abstract
Agnostic analyses of unique video material from a Mother and Baby Unit were carried out to investigate the usefulness of such analyses to the unit. The goal was to improve outcomes: the health of mothers and their babies. The method was to implement a learning machine that becomes more useful over time and over task. A feasible set-up is here described, with the purpose of producing intelligible and useful results to healthcare professionals at the unit by means of a vision processing pipeline, grouped together with multi-modal capabilities of handling annotations and audio. Algorithmic bias turned out to be an obstacle that could only partly be handled by modern pipelines for automated feature analysis. The professional use of complex quantitative scoring for various mental health-related assessments further complicated the automation of laborious tasks. Activities during the MBU stay had previously been shown to decrease psychiatric symptoms across diagnostic groups. The implementation and first set of experiments on a learning machine for the unit produced the first steps toward explaining why this is so, in turn enabling decision support to staff about what to do more and what to do less of.
Collapse
Affiliation(s)
- Magnus Boman
- Department of Software and Computer Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Johnny Downs
- Child & Adolescent Psychiatry, Psychological Medicine and Integrated Care, Clinical Academic Group, The National Institute for Health Research Maudsley Biomedical Research Centre, King's College London, London, United Kingdom
| | - Abubakrelsedik Karali
- Department of Software and Computer Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- NVIDIA Corporation, London, United Kingdom
| | - Susan Pawlby
- Channi Kumar Mother and Baby Unit, Bethlem Royal Hospital, South London and Maudsley National Health Service Trust, London, United Kingdom
| |
Collapse
|
39
|
Klimek R. Sensor-Enabled Context-Aware and Pro-Active Queue Management Systems in Intelligent Environments. SENSORS 2020; 20:s20205837. [PMID: 33076402 PMCID: PMC7602596 DOI: 10.3390/s20205837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 10/04/2020] [Accepted: 10/08/2020] [Indexed: 11/24/2022]
Abstract
Queue systems are practically used in various institutions and commercial enterprises constituting a challenge for the intelligent environments in smart cities. The management of the flow of customers guarantees the elimination or reduction of the queues as well as the economic benefits which follow the clients’ satisfaction of a better quality of service. An intelligent queue management system has been proposed which is designed as the pro-active and context-aware ecosystem based on multiple low-level sensors and devices constituting the IoT (Internet of Things) network. The designed context-driven system is characterised by user friendliness, as well as the client behaviour recognition and understanding which generate actions that support clients, establishing wealthy environments. A prototype version of the system has been proposed which has been validated by formal analysis and simulation. This prototype can be used as a necessary experience and as a reference point when building a target system and meeting requirements typical for context-aware and pro-active systems based on IoT networks which process massive data streams.
Collapse
Affiliation(s)
- Radosław Klimek
- Department of Applied Computer Science, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
| |
Collapse
|
40
|
|
41
|
Lee MK, Kim DH, Song BC. Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5184. [PMID: 32932939 PMCID: PMC7571042 DOI: 10.3390/s20185184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 09/07/2020] [Accepted: 09/09/2020] [Indexed: 11/20/2022]
Abstract
Facial expression recognition (FER) technology has made considerable progress with the rapid development of deep learning. However, conventional FER techniques are mainly designed and trained for videos that are artificially acquired in a limited environment, so they may not operate robustly on videos acquired in a wild environment suffering from varying illuminations and head poses. In order to solve this problem and improve the ultimate performance of FER, this paper proposes a new architecture that extends a state-of-the-art FER scheme and a multi-modal neural network that can effectively fuse image and landmark information. To this end, we propose three methods. To maximize the performance of the recurrent neural network (RNN) in the previous scheme, we first propose a frame substitution module that replaces the latent features of less important frames with those of important frames based on inter-frame correlation. Second, we propose a method for extracting facial landmark features based on the correlation between frames. Third, we propose a new multi-modal fusion method that effectively fuses video and facial landmark information at the feature level. By applying attention based on the characteristics of each modality to the features of the modality, novel fusion is achieved. Experimental results show that the proposed method provides remarkable performance, with 51.4% accuracy for the wild AFEW dataset, 98.5% accuracy for the CK+ dataset and 81.9% accuracy for the MMI dataset, outperforming the state-of-the-art networks.
Collapse
Affiliation(s)
| | | | - Byung Cheol Song
- Department of Electronic Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon 22212, Korea; (M.K.L.); (D.H.K.)
| |
Collapse
|
42
|
|
43
|
Wang S, Zheng Z, Yin S, Yang J, Ji Q. A Novel Dynamic Model Capturing Spatial and Temporal Patterns for Facial Expression Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2082-2095. [PMID: 30998459 DOI: 10.1109/tpami.2019.2911937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Facial expression analysis could be greatly improved by incorporating spatial and temporal patterns present in facial behavior, but the patterns have not yet been utilized to their full advantage. We remedy this via a novel dynamic model-an interval temporal restricted Boltzmann machine (IT-RBM) - that is able to capture both universal spatial patterns and complicated temporal patterns in facial behavior for facial expression analysis. We regard a facial expression as a multifarious activity composed of sequential or overlapping primitive facial events. Allen's interval algebra is implemented to portray these complicated temporal patterns via a two-layer Bayesian network. The nodes in the upper-most layer are representative of the primitive facial events, and the nodes in the lower layer depict the temporal relationships between those events. Our model also captures inherent universal spatial patterns via a multi-value restricted Boltzmann machine in which the visible nodes are facial events, and the connections between hidden and visible nodes model intrinsic spatial patterns. Efficient learning and inference algorithms are proposed. Experiments on posed and spontaneous expression distinction and expression recognition demonstrate that our proposed IT-RBM achieves superior performance compared to state-of-the art research due to its ability to incorporate these facial behavior patterns.
Collapse
|
44
|
Feature Selection on 2D and 3D Geometric Features to Improve Facial Expression Recognition. SENSORS 2020; 20:s20174847. [PMID: 32867182 PMCID: PMC7506644 DOI: 10.3390/s20174847] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 08/04/2020] [Accepted: 08/25/2020] [Indexed: 11/16/2022]
Abstract
An essential aspect in the interaction between people and computers is the recognition of facial expressions. A key issue in this process is to select relevant features to classify facial expressions accurately. This study examines the selection of optimal geometric features to classify six basic facial expressions: happiness, sadness, surprise, fear, anger, and disgust. Inspired by the Facial Action Coding System (FACS) and the Moving Picture Experts Group 4th standard (MPEG-4), an initial set of 89 features was proposed. These features are normalized distances and angles in 2D and 3D computed from 22 facial landmarks. To select a minimum set of features with the maximum classification accuracy, two selection methods and four classifiers were tested. The first selection method, principal component analysis (PCA), obtained 39 features. The second selection method, a genetic algorithm (GA), obtained 47 features. The experiments ran on the Bosphorus and UIVBFED data sets with 86.62% and 93.92% median accuracy, respectively. Our main finding is that the reduced feature set obtained by the GA is the smallest in comparison with other methods of comparable accuracy. This has implications in reducing the time of recognition.
Collapse
|
45
|
|
46
|
Perveen N, Roy D, Mohan CK. Facial Expression Recognition in Videos using Dynamic Kernels. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8316-8325. [PMID: 32746249 DOI: 10.1109/tip.2020.3011846] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recognition of facial expressions across various actors, contexts, and recording conditions in real-world videos involves identifying local facial movements. Hence, it is important to discover the formation of expressions from local representations captured from different parts of the face. So in this paper, we propose a dynamic kernel-based representation for facial expressions that assimilates facial movements captured using local spatio-temporal representations in a large universal Gaussian mixture model (uGMM). These dynamic kernels are used to preserve local similarities while handling global context changes for the same expression by utilizing the statistics of uGMM. We demonstrate the efficacy of dynamic kernel representation using three different dynamic kernels, namely, explicit mapping based, probability-based, and matching-based, on three standard facial expression datasets, namely, MMI, AFEW, and BP4D. Our evaluations show that probability-based kernels are the most discriminative among the dynamic kernels. However, in terms of computational complexity, intermediate matching kernels are more efficient as compared to the other two representations.
Collapse
|
47
|
Athanasiadis C, Hortal E, Asteriadis S. Audio–visual domain adaptation using conditional semi-supervised Generative Adversarial Networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.106] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
48
|
Zhou F, Kong S, Fowlkes CC, Chen T, Lei B. Fine-grained facial expression analysis using dimensional emotion model. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.067] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
49
|
Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification. Symmetry (Basel) 2020. [DOI: 10.3390/sym12040679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Classification in multi-modal data is one of the challenges in the machine learning field. The multi-modal data need special treatment as its features are distributed in several areas. This study proposes multi-codebook fuzzy neural networks by using intelligent clustering and dynamic incremental learning for multi-modal data classification. In this study, we utilized intelligent K-means clustering based on anomalous patterns and intelligent K-means clustering based on histogram information. In this study, clustering is used to generate codebook candidates before the training process, while incremental learning is utilized when the condition to generate a new codebook is sufficient. The condition to generate a new codebook in incremental learning is based on the similarity of the winner class and other classes. The proposed method was evaluated in synthetic and benchmark datasets. The experiment results showed that the proposed multi-codebook fuzzy neural networks that use dynamic incremental learning have significant improvements compared to the original fuzzy neural networks. The improvements were 15.65%, 5.31% and 11.42% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively, for incremental version 1. The incremental learning version 2 improved by 21.08% 4.63%, and 14.35% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively. The multi-codebook fuzzy neural networks that use intelligent clustering also had significant improvements compared to the original fuzzy neural networks, achieving 23.90%, 2.10%, and 15.02% improvements on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively.
Collapse
|
50
|
Ertugrul IO, Cohn JF, Jeni LA, Zhang Z, Yin L, Ji Q. Crossing Domains for AU Coding: Perspectives, Approaches, and Measures. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE 2020; 2:158-171. [PMID: 32377637 PMCID: PMC7202467 DOI: 10.1109/tbiom.2020.2977225] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Facial action unit (AU) detectors have performed well when trained and tested within the same domain. How well do AU detectors transfer to domains in which they have not been trained? We review literature on cross-domain transfer and conduct experiments to address limitations of prior research. We evaluate generalizability in four publicly available databases. EB+ (an expanded version of BP4D+), Sayette GFT, DISFA and UNBC Shoulder Pain (SP). The databases differ in observational scenarios, context, participant diversity, range of head pose, video resolution, and AU base rates. In most cases performance decreased with change in domain, often to below the threshold needed for behavioral research. However, exceptions were noted. Deep and shallow approaches generally performed similarly and average results were slightly better for deep model compared to shallow one. Occlusion sensitivity maps revealed that local specificity was greater for AU detection within than cross domains. The findings suggest that more varied domains and deep learning approaches may be better suited for generalizability and suggest the need for more attention to characteristics that vary between domains. Until further improvement is realized, caution is warranted when applying AU classifiers from one domain to another.
Collapse
Affiliation(s)
| | - Jeffrey F Cohn
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - László A Jeni
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zheng Zhang
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Lijun Yin
- Department of Computer Science, State University of New York at Binghamton, USA
| | - Qiang Ji
- Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|