1
|
Fiorini L, D'Onofrio G, Sorrentino A, Cornacchia Loizzo FG, Russo S, Ciccone F, Giuliani F, Sancarlo D, Cavallo F. The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study. JMIR Hum Factors 2024; 11:e45494. [PMID: 38277201 PMCID: PMC10858416 DOI: 10.2196/45494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 04/24/2023] [Accepted: 11/29/2023] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Social robots are becoming increasingly important as companions in our daily lives. Consequently, humans expect to interact with them using the same mental models applied to human-human interactions, including the use of cospeech gestures. Research efforts have been devoted to understanding users' needs and developing robot's behavioral models that can perceive the user state and properly plan a reaction. Despite the efforts made, some challenges regarding the effect of robot embodiment and behavior in the perception of emotions remain open. OBJECTIVE The aim of this study is dual. First, it aims to assess the role of the robot's cospeech gestures and embodiment in the user's perceived emotions in terms of valence (stimulus pleasantness), arousal (intensity of evoked emotion), and dominance (degree of control exerted by the stimulus). Second, it aims to evaluate the robot's accuracy in identifying positive, negative, and neutral emotions displayed by interacting humans using 3 supervised machine learning algorithms: support vector machine, random forest, and K-nearest neighbor. METHODS Pepper robot was used to elicit the 3 emotions in humans using a set of 60 images retrieved from a standardized database. In particular, 2 experimental conditions for emotion elicitation were performed with Pepper robot: with a static behavior or with a robot that expresses coherent (COH) cospeech behavior. Furthermore, to evaluate the role of the robot embodiment, the third elicitation was performed by asking the participant to interact with a PC, where a graphical interface showed the same images. Each participant was requested to undergo only 1 of the 3 experimental conditions. RESULTS A total of 60 participants were recruited for this study, 20 for each experimental condition for a total of 3600 interactions. The results showed significant differences (P<.05) in valence, arousal, and dominance when stimulated with the Pepper robot behaving COH with respect to the PC condition, thus underlying the importance of the robot's nonverbal communication and embodiment. A higher valence score was obtained for the elicitation of the robot (COH and robot with static behavior) with respect to the PC. For emotion recognition, the K-nearest neighbor classifiers achieved the best accuracy results. In particular, the COH modality achieved the highest level of accuracy (0.97) when compared with the static behavior and PC elicitations (0.88 and 0.94, respectively). CONCLUSIONS The results suggest that the use of multimodal communication channels, such as cospeech and visual channels, as in the COH modality, may improve the recognition accuracy of the user's emotional state and can reinforce the perceived emotion. Future studies should investigate the effect of age, culture, and cognitive profile on the emotion perception and recognition going beyond the limitation of this work.
Collapse
Affiliation(s)
- Laura Fiorini
- Department of Industrial Engineering, University of Florence, Firenze, Italy
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pontedera (Pisa), Italy
| | - Grazia D'Onofrio
- Clinical Psychology Service, Health Department, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | | | | | - Sergio Russo
- Innovation & Research Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Filomena Ciccone
- Clinical Psychology Service, Health Department, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Francesco Giuliani
- Innovation & Research Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Daniele Sancarlo
- Complex Unit of Geriatrics, Department of Medical Sciences, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Filippo Cavallo
- Department of Industrial Engineering, University of Florence, Firenze, Italy
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pontedera (Pisa), Italy
| |
Collapse
|
2
|
Hossain S, Umer S, Rout RK, Tanveer M. Fine-grained image analysis for facial expression recognition using deep convolutional neural networks with bilinear pooling. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.109997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
3
|
Salhi I, Qbadou M, Gouraguine S, Mansouri K, Lytridis C, Kaburlasos V. Towards Robot-Assisted Therapy for Children With Autism—The Ontological Knowledge Models and Reinforcement Learning-Based Algorithms. Front Robot AI 2022; 9:713964. [PMID: 35462779 PMCID: PMC9020227 DOI: 10.3389/frobt.2022.713964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 03/17/2022] [Indexed: 11/16/2022] Open
Abstract
Robots are more and more present in our lives, particularly in the health sector. In therapeutic centers, some therapists are beginning to explore various tools like video games, Internet exchanges, and robot-assisted therapy. These tools will be at the disposal of these professionals as additional resources that can support them to assist their patients intuitively and remotely. The humanoid robot can capture young children’s attention and then attract the attention of researchers. It can be considered as a play partner and can directly interact with children or without a third party’s presence. It can equally perform repetitive tasks that humans cannot achieve in the same way. Moreover, humanoid robots can assist a therapist by allowing him to teleoperated and interact from a distance. In this context, our research focuses on robot-assisted therapy and introduces a humanoid social robot in a pediatric hospital care unit. That will be performed by analyzing many aspects of the child’s behavior, such as verbal interactions, gestures and facial expressions, etc. Consequently, the robot can reproduce consistent experiences and actions for children with communication capacity restrictions. This work is done by applying a novel approach based on deep learning and reinforcement learning algorithms supported by an ontological knowledge base that contains relevant information and knowledge about patients, screening tests, and therapies. In this study, we realized a humanoid robot that will assist a therapist by equipping the robot NAO: 1) to detect whether a child is autistic or not using a convolutional neural network, 2) to recommend a set of therapies based on a selection algorithm using a correspondence matrix between screening test and therapies, and 2) to assist and monitor autistic children by executing tasks that require those therapies.
Collapse
Affiliation(s)
- Intissar Salhi
- SSDIA, ENSET, Department of Mathematics & Computer Science, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Mohammed Qbadou
- SSDIA, ENSET, Department of Mathematics & Computer Science, Hassan II University of Casablanca, Mohammedia, Morocco
- *Correspondence: Mohammed Qbadou,
| | - Soukaina Gouraguine
- SSDIA, ENSET, Department of Mathematics & Computer Science, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Khalifa Mansouri
- SSDIA, ENSET, Department of Mathematics & Computer Science, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Chris Lytridis
- HUman-MAchines INteraction (HUMAIN) Lab, Department of Computer Science, International Hellenic University (IHU), Kavala, Greece
| | - Vassilis Kaburlasos
- HUman-MAchines INteraction (HUMAIN) Lab, Department of Computer Science, International Hellenic University (IHU), Kavala, Greece
| |
Collapse
|
4
|
A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010327] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.
Collapse
|
5
|
Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F. Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. SENSORS 2021; 21:s21227665. [PMID: 34833739 PMCID: PMC8618559 DOI: 10.3390/s21227665] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 11/12/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022]
Abstract
Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.
Collapse
Affiliation(s)
- Cristina Luna-Jiménez
- Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain; (R.K.); (J.M.M.); (F.F.-M.)
- Correspondence:
| | - David Griol
- Department of Software Engineering, CITIC-UGR, University of Granada, Periodista Daniel Saucedo Aranda S/N, 18071 Granada, Spain; (D.G.); (Z.C.)
| | - Zoraida Callejas
- Department of Software Engineering, CITIC-UGR, University of Granada, Periodista Daniel Saucedo Aranda S/N, 18071 Granada, Spain; (D.G.); (Z.C.)
| | - Ricardo Kleinlein
- Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain; (R.K.); (J.M.M.); (F.F.-M.)
| | - Juan M. Montero
- Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain; (R.K.); (J.M.M.); (F.F.-M.)
| | - Fernando Fernández-Martínez
- Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain; (R.K.); (J.M.M.); (F.F.-M.)
| |
Collapse
|
6
|
Abstract
Spatial Transformer Networks are considered a powerful algorithm to learn the main areas of an image, but still, they could be more efficient by receiving images with embedded expert knowledge. This paper aims to improve the performance of conventional Spatial Transformers when applied to Facial Expression Recognition. Based on the Spatial Transformers’ capacity of spatial manipulation within networks, we propose different extensions to these models where effective attentional regions are captured employing facial landmarks or facial visual saliency maps. This specific attentional information is then hardcoded to guide the Spatial Transformers to learn the spatial transformations that best fit the proposed regions for better recognition results. For this study, we use two datasets: AffectNet and FER-2013. For AffectNet, we achieve a 0.35% point absolute improvement relative to the traditional Spatial Transformer, whereas for FER-2013, our solution gets an increase of 1.49% when models are fine-tuned with the Affectnet pre-trained weights.
Collapse
|
7
|
Cano S, González CS, Gil-Iranzo RM, Albiol-Pérez S. Affective Communication for Socially Assistive Robots (SARs) for Children with Autism Spectrum Disorder: A Systematic Review. SENSORS 2021; 21:s21155166. [PMID: 34372402 PMCID: PMC8347754 DOI: 10.3390/s21155166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/18/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022]
Abstract
Research on affective communication for socially assistive robots has been conducted to enable physical robots to perceive, express, and respond emotionally. However, the use of affective computing in social robots has been limited, especially when social robots are designed for children, and especially those with autism spectrum disorder (ASD). Social robots are based on cognitive-affective models, which allow them to communicate with people following social behaviors and rules. However, interactions between a child and a robot may change or be different compared to those with an adult or when the child has an emotional deficit. In this study, we systematically reviewed studies related to computational models of emotions for children with ASD. We used the Scopus, WoS, Springer, and IEEE-Xplore databases to answer different research questions related to the definition, interaction, and design of computational models supported by theoretical psychology approaches from 1997 to 2021. Our review found 46 articles; not all the studies considered children or those with ASD.
Collapse
Affiliation(s)
- Sandra Cano
- School of Computer Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2340000, Chile
- Correspondence:
| | - Carina S. González
- Department of Computer Engineering and Systems, University of La Laguna, 38204 La Laguna, Spain;
| | - Rosa María Gil-Iranzo
- Department of Computer Engineering and Industrial, University of Lleida, 25001 Lleida, Spain;
| | - Sergio Albiol-Pérez
- Aragón Health Research Institute (IIS Aragón), Universidad de Zaragoza, Cdad. Escolar, 4, 44003 Teruel, Spain;
| |
Collapse
|
8
|
Honig S, Oron-Gilad T. Expect the Unexpected: Leveraging the Human-Robot Ecosystem to Handle Unexpected Robot Failures. Front Robot AI 2021; 8:656385. [PMID: 34381819 PMCID: PMC8352555 DOI: 10.3389/frobt.2021.656385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/21/2021] [Indexed: 11/23/2022] Open
Abstract
Unexpected robot failures are inevitable. We propose to leverage socio-technical relations within the human-robot ecosystem to support adaptable strategies for handling unexpected failures. The Theory of Graceful Extensibility is used to understand how characteristics of the ecosystem can influence its ability to respond to unexpected events. By expanding our perspective from Human-Robot Interaction to the Human-Robot Ecosystem, adaptable failure-handling strategies are identified, alongside technical, social and organizational arrangements that are needed to support them. We argue that robotics and HRI communities should pursue more holistic approaches to failure-handling, recognizing the need to embrace the unexpected and consider socio-technical relations within the human robot ecosystem when designing failure-handling strategies.
Collapse
Affiliation(s)
- Shanee Honig
- Department of Industrial Engineering and Management, Mobile Robotics Laboratory and HRI Laboratory, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Tal Oron-Gilad
- Department of Industrial Engineering and Management, Mobile Robotics Laboratory and HRI Laboratory, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| |
Collapse
|