1
|
Naseem MT, Lee CS, Shahzad T, Khan MA, Abu-Mahfouz AM, Ouahada K. Facial expression recognition using visible and IR by early fusion of deep learning with attention mechanism. PeerJ Comput Sci 2025; 11:e2676. [PMID: 40134864 PMCID: PMC11935750 DOI: 10.7717/peerj-cs.2676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 01/09/2025] [Indexed: 03/27/2025]
Abstract
Facial expression recognition (FER) has garnered significant attention due to advances in artificial intelligence, particularly in applications like driver monitoring, healthcare, and human-computer interaction, which benefit from deep learning techniques. The motivation of this research is to address the challenges of accurately recognizing emotions despite variations in expressions across emotions and similarities between different expressions. In this work, we propose an early fusion approach that combines features from visible and infrared modalities using publicly accessible VIRI and NVIE databases. Initially, we developed single-modality models for visible and infrared datasets by incorporating an attention mechanism into the ResNet-18 architecture. We then extended this to a multi-modal early fusion approach using the same modified ResNet-18 with attention, achieving superior accuracy through the combination of convolutional neural network (CNN) and transfer learning (TL). Our multi-modal approach attained 84.44% accuracy on the VIRI database and 85.20% on the natural visible and infrared facial expression (NVIE) database, outperforming previous methods. These results demonstrate that our single-modal and multi-modal approaches achieve state-of-the-art performance in FER.
Collapse
Affiliation(s)
- Muhammad Tahir Naseem
- Department of Electronic Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| | - Chan-Su Lee
- Department of Electronic Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| | - Tariq Shahzad
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
| | - Muhammad Adnan Khan
- School of Computing, Skyline University College, Sharjah, United Arab Emirates
- Riphah School of Computing and Innovation, Riphah International University, Lahore, Pakistan
- Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam-si, Republic of Korea
| | - Adnan M. Abu-Mahfouz
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
- NextGen Enterprises and Institutions, Council for Scientific and Industrial Research, Pretoria, South Africa
| | - Khmaies Ouahada
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
2
|
Hu M, Wei Y, Li M, Yao H, Deng W, Tong M, Liu Q. Bimodal Learning Engagement Recognition from Videos in the Classroom. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22165932. [PMID: 36015693 PMCID: PMC9415674 DOI: 10.3390/s22165932] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/27/2022] [Accepted: 07/27/2022] [Indexed: 06/12/2023]
Abstract
Engagement plays an essential role in the learning process. Recognition of learning engagement in the classroom helps us understand the student's learning state and optimize the teaching and study processes. Traditional recognition methods such as self-report and teacher observation are time-consuming and obtrusive to satisfy the needs of large-scale classrooms. With the development of big data analysis and artificial intelligence, applying intelligent methods such as deep learning to recognize learning engagement has become the research hotspot in education. In this paper, based on non-invasive classroom videos, first, a multi-cues classroom learning engagement database was constructed. Then, we introduced the power IoU loss function to You Only Look Once version 5 (YOLOv5) to detect the students and obtained a precision of 95.4%. Finally, we designed a bimodal learning engagement recognition method based on ResNet50 and CoAtNet. Our proposed bimodal learning engagement method obtained an accuracy of 93.94% using the KNN classifier. The experimental results confirmed that the proposed method outperforms most state-of-the-art techniques.
Collapse
Affiliation(s)
- Meijia Hu
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
- Huanggang High School of Hubei Province, Huanggang 438000, China
| | - Yantao Wei
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Mengsiying Li
- School of Management, Wuhan College, Wuhan 430212, China
| | - Huang Yao
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Wei Deng
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Mingwen Tong
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| | - Qingtang Liu
- Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China
| |
Collapse
|
3
|
Zhang T, Mao Y, Shen F, Zhao J. Label distribution learning through exploring nonnegative components. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
The Application of Adaptive Tolerance and Serialized Facial Feature Extraction to Automatic Attendance Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11142278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The aim of this study was to develop a real-time automatic attendance system (AAS) based on Internet of Things (IoT) technology and facial recognition. A Raspberry Pi camera built into a Raspberry Pi 3B is used to transfer facial images to a cloud server. Face detection and recognition libraries are implemented on this cloud server, which thus can handle all the processes involved with the automatic recording of student attendance. In addition, this study proposes the application of data serialization processing and adaptive tolerance vis-à-vis Euclidean distance. The facial features encountered are processed using data serialization before they are saved in the SQLite database; such serialized data can easily be written and then read back from the database. When examining the differences between the facial features already stored in the SQLite databases and any new facial features, the proposed adaptive tolerance system can improve the performance of the facial recognition method applying Euclidean distance. The results of this study show that the proposed AAS can recognize multiple faces and so record attendance automatically. The AAS proposed in this study can assist in the detection of students who attempt to skip classes without the knowledge of their teachers. The problem of students being unintentionally marked present, though absent, and the problem of proxies is also resolved.
Collapse
|
5
|
Exploring the Effects of Caputo Fractional Derivative in Spiking Neural Network Training. ELECTRONICS 2022. [DOI: 10.3390/electronics11142114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Fractional calculus is an emerging topic in artificial neural network training, especially when using gradient-based methods. This paper brings the idea of fractional derivatives to spiking neural network training using Caputo derivative-based gradient calculation. We focus on conducting an extensive investigation of performance improvements via a case study of small-scale networks using derivative orders in the unit interval. With particle swarm optimization we provide an example of handling the derivative order as an optimizable hyperparameter to find viable values for it. Using multiple benchmark datasets we empirically show that there is no single generally optimal derivative order, rather this value is data-dependent. However, statistics show that a range of derivative orders can be determined where the Caputo derivative outperforms first-order gradient descent with high confidence. Improvements in convergence speed and training time are also examined and explained by the reformulation of the Caputo derivative-based training as an adaptive weight normalization technique.
Collapse
|
6
|
A cascaded spatiotemporal attention network for dynamic facial expression recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Learning Robust Shape-Indexed Features for Facial Landmark Detection. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
In facial landmark detection, extracting shape-indexed features is widely applied in existing methods to impose shape constraint over landmarks. Commonly, these methods crop shape-indexed patches surrounding landmarks of a given initial shape. All landmarks are then detected jointly based on these patches, with shape constraint naturally embedded in the regressor. However, there are still two remaining challenges that cause the degradation of these methods. First, the initial shape may seriously deviate from the ground truth when presented with a large pose, resulting in considerable noise in the shape-indexed features. Second, extracting local patch features is vulnerable to occlusions due to missing facial context information under severe occlusion. To address the issues above, this paper proposes a facial landmark detection algorithm named Sparse-To-Dense Network (STDN). First, STDN employs a lightweight network to detect sparse facial landmarks and forms a reinitialized shape, which can efficiently improve the quality of cropped patches when presented with large poses. Then, a group-relational module is used to exploit the inherent geometric relations of the face, which further enhances the shape constraint against occlusion. Our method achieves 4.64% mean error with 1.97% failure rate on COFW68 dataset, 3.48% mean error with 0.43% failure rate on 300 W dataset and 7.12% mean error with 11.61% failure rate on Masked 300 W dataset. The results demonstrate that STDN achieves outstanding performance in comparison to state-of-the-art methods, especially on occlusion datasets.
Collapse
|
8
|
Zhou L, Wang Y, Lei B, Yang W. Regional Self-Attention Convolutional Neural Network for Facial Expression Recognition. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s0218001422560134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
9
|
Maithri M, Raghavendra U, Gudigar A, Samanth J, Murugappan M, Chakole Y, Acharya UR. Automated emotion recognition: Current trends and future perspectives. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 215:106646. [PMID: 35093645 DOI: 10.1016/j.cmpb.2022.106646] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 12/25/2021] [Accepted: 01/16/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Human emotions greatly affect the actions of a person. The automated emotion recognition has applications in multiple domains such as health care, e-learning, surveillance, etc. The development of computer-aided diagnosis (CAD) tools has led to the automated recognition of human emotions. OBJECTIVE This review paper provides an insight into various methods employed using electroencephalogram (EEG), facial, and speech signals coupled with multi-modal emotion recognition techniques. In this work, we have reviewed most of the state-of-the-art papers published on this topic. METHOD This study was carried out by considering the various emotion recognition (ER) models proposed between 2016 and 2021. The papers were analysed based on methods employed, classifier used and performance obtained. RESULTS There is a significant rise in the application of deep learning techniques for ER. They have been widely applied for EEG, speech, facial expression, and multimodal features to develop an accurate ER model. CONCLUSION Our study reveals that most of the proposed machine and deep learning-based systems have yielded good performances for automated ER in a controlled environment. However, there is a need to obtain high performance for ER even in an uncontrolled environment.
Collapse
Affiliation(s)
- M Maithri
- Department of Mechatronics, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - U Raghavendra
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - Anjan Gudigar
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India.
| | - Jyothi Samanth
- Department of Cardiovascular Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - Murugappan Murugappan
- Department of Electronics and Communication Engineering, Kuwait College of Science and Technology, 13133, Kuwait
| | - Yashas Chakole
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - U Rajendra Acharya
- School of Engineering, Ngee Ann Polytechnic, Clementi 599489, Singapore; Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan; Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore
| |
Collapse
|
10
|
Kumari N, Bhatia R. Efficient facial emotion recognition model using deep convolutional neural network and modified joint trilateral filter. Soft comput 2022. [DOI: 10.1007/s00500-022-06804-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Xu L, Diao Z, Wei Y. Non-linear target trajectory prediction for robust visual tracking. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02829-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
12
|
Wang Y, Lu T, Zhang Y, Fang W, Wu Y, Wang Z. Cross-task feature alignment for seeing pedestrians in the dark. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
13
|
Liu T, Wang J, Yang B, Wang X. NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.090] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|