1
|
Chen D, Wen G, Li H, Yang P, Chen C, Wang B. CDGT: Constructing diverse graph transformers for emotion recognition from facial videos. Neural Netw 2024; 179:106573. [PMID: 39096753 DOI: 10.1016/j.neunet.2024.106573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/20/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Recognizing expressions from dynamic facial videos can find more natural affect states of humans, and it becomes a more challenging task in real-world scenes due to pose variations of face, partial occlusions and subtle dynamic changes of emotion sequences. Existing transformer-based methods often focus on self-attention to model the global relations among spatial features or temporal features, which cannot well focus on important expression-related locality structures from both spatial and temporal features for the in-the-wild expression videos. To this end, we incorporate diverse graph structures into transformers and propose a CDGT method to construct diverse graph transformers for efficient emotion recognition from in-the-wild videos. Specifically, our method contains a spatial dual-graphs transformer and a temporal hyperbolic-graph transformer. The former deploys a dual-graph constrained attention to capture latent emotion-related graph geometry structures among local spatial tokens for efficient feature representation, especially for the video frames with pose variations and partial occlusions. The latter adopts a hyperbolic-graph constrained self-attention that explores important temporal graph structure information under hyperbolic space to model more subtle changes of dynamic emotion. Extensive experimental results on in-the-wild video-based facial expression databases show that our proposed CDGT outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Dongliang Chen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China.
| | - Guihua Wen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China.
| | - Huihui Li
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China.
| | - Pei Yang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China.
| | - Chuyun Chen
- The Affiliated Traditional Chinese Medicine Hospital of Guangzhou Medical University, Guangzhou 510140, China.
| | - Bao Wang
- The Affiliated Traditional Chinese Medicine Hospital of Guangzhou Medical University, Guangzhou 510140, China.
| |
Collapse
|
2
|
Munsif M, Sajjad M, Ullah M, Tarekegn AN, Cheikh FA, Tsakanikas P, Muhammad K. Optimized efficient attention-based network for facial expressions analysis in neurological health care. Comput Biol Med 2024; 179:108822. [PMID: 38986286 DOI: 10.1016/j.compbiomed.2024.108822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/25/2024] [Accepted: 06/25/2024] [Indexed: 07/12/2024]
Abstract
Facial Expression Analysis (FEA) plays a vital role in diagnosing and treating early-stage neurological disorders (NDs) like Alzheimer's and Parkinson's. Manual FEA is hindered by expertise, time, and training requirements, while automatic methods confront difficulties with real patient data unavailability, high computations, and irrelevant feature extraction. To address these challenges, this paper proposes a novel approach: an efficient, lightweight convolutional block attention module (CBAM) based deep learning network (DLN) to aid doctors in diagnosing ND patients. The method comprises two stages: data collection of real ND patients, and pre-processing, involving face detection and an attention-enhanced DLN for feature extraction and refinement. Extensive experiments with validation on real patient data showcase compelling performance, achieving an accuracy of up to 73.2%. Despite its efficacy, the proposed model is lightweight, occupying only 3MB, making it suitable for deployment on resource-constrained mobile healthcare devices. Moreover, the method exhibits significant advancements over existing FEA approaches, holding tremendous promise in effectively diagnosing and treating ND patients. By accurately recognizing emotions and extracting relevant features, this approach empowers medical professionals in early ND detection and management, overcoming the challenges of manual analysis and heavy models. In conclusion, this research presents a significant leap in FEA, promising to enhance ND diagnosis and care.The code and data used in this work are available at: https://github.com/munsif200/Neurological-Health-Care.
Collapse
Affiliation(s)
| | - Muhammad Sajjad
- Digital Image Processing Lab, Department of Computer Science, Islamia College, Peshawar, 25000, Pakistan; Department of Computer Science, Norwegian University for Science and Technology, 2815, Gjøvik, Norway
| | - Mohib Ullah
- Intelligent Systems and Analytics Research Group (ISA), Department of Computer Science, Norwegian University for Science and Technology, 2815, Gjøvik, Norway
| | - Adane Nega Tarekegn
- Department of Computer Science, Norwegian University for Science and Technology, 2815, Gjøvik, Norway
| | - Faouzi Alaya Cheikh
- Department of Computer Science, Norwegian University for Science and Technology, 2815, Gjøvik, Norway
| | - Panagiotis Tsakanikas
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece
| | - Khan Muhammad
- Visual Analytics for Knowledge Laboratory (VIS2KNOW Lab), Department of Applied Artificial Intelligence, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul 03063, Republic of Korea.
| |
Collapse
|
3
|
Tareke TW, Leclerc S, Vuillemin C, Buffier P, Crevisy E, Nguyen A, Monnier Meteau MP, Legris P, Angiolini S, Lalande A. Automatic Classification of Nodules from 2D Ultrasound Images Using Deep Learning Networks. J Imaging 2024; 10:203. [PMID: 39194992 DOI: 10.3390/jimaging10080203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/14/2024] [Accepted: 08/19/2024] [Indexed: 08/29/2024] Open
Abstract
OBJECTIVE In clinical practice, thyroid nodules are typically visually evaluated by expert physicians using 2D ultrasound images. Based on their assessment, a fine needle aspiration (FNA) may be recommended. However, visually classifying thyroid nodules from ultrasound images may lead to unnecessary fine needle aspirations for patients. The aim of this study is to develop an automatic thyroid ultrasound image classification system to prevent unnecessary FNAs. METHODS An automatic computer-aided artificial intelligence system is proposed for classifying thyroid nodules using a fine-tuned deep learning model based on the DenseNet architecture, which incorporates an attention module. The dataset comprises 591 thyroid nodule images categorized based on the Bethesda score. Thyroid nodules are classified as either requiring FNA or not. The challenges encountered in this task include managing variability in image quality, addressing the presence of artifacts in ultrasound image datasets, tackling class imbalance, and ensuring model interpretability. We employed techniques such as data augmentation, class weighting, and gradient-weighted class activation maps (Grad-CAM) to enhance model performance and provide insights into decision making. RESULTS Our approach achieved excellent results with an average accuracy of 0.94, F1-score of 0.93, and sensitivity of 0.96. The use of Grad-CAM gives insights on the decision making and then reinforce the reliability of the binary classification for the end-user perspective. CONCLUSIONS We propose a deep learning architecture that effectively classifies thyroid nodules as requiring FNA or not from ultrasound images. Despite challenges related to image variability, class imbalance, and interpretability, our method demonstrated a high classification accuracy with minimal false negatives, showing its potential to reduce unnecessary FNAs in clinical settings.
Collapse
Affiliation(s)
- Tewele W Tareke
- ICMUB Laboratory, UMR CNRS 6302, University of Burgundy, 7 Bld Jeanne d'Arc, 21000 Dijon, France
| | - Sarah Leclerc
- ICMUB Laboratory, UMR CNRS 6302, University of Burgundy, 7 Bld Jeanne d'Arc, 21000 Dijon, France
| | | | - Perrine Buffier
- Department of Endocrinology-Diabetology, University Hospital, 21000 Dijon, France
| | - Elodie Crevisy
- Department of Endocrinology-Diabetology, University Hospital, 21000 Dijon, France
| | - Amandine Nguyen
- Department of Endocrinology-Diabetology, University Hospital, 21000 Dijon, France
| | | | - Pauline Legris
- Department of Endocrinology-Diabetology, University Hospital, 21000 Dijon, France
| | - Serge Angiolini
- Medical Imaging Department, Hospital of Bastia, 20600 Bastia, France
| | - Alain Lalande
- ICMUB Laboratory, UMR CNRS 6302, University of Burgundy, 7 Bld Jeanne d'Arc, 21000 Dijon, France
- Department of Medical Imaging, University Hospital of Dijon, 21000 Dijon, France
| |
Collapse
|
4
|
Joseph CW, Kathrine GJW, Vimal S, Sumathi S, Pelusi D, Valencia XPB, Verdú E. Improved optimizer with deep learning model for emotion detection and classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:6631-6657. [PMID: 39176412 DOI: 10.3934/mbe.2024290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Facial emotion recognition (FER) is largely utilized to analyze human emotion in order to address the needs of many real-time applications such as computer-human interfaces, emotion detection, forensics, biometrics, and human-robot collaboration. Nonetheless, existing methods are mostly unable to offer correct predictions with a minimum error rate. In this paper, an innovative facial emotion recognition framework, termed extended walrus-based deep learning with Botox feature selection network (EWDL-BFSN), was designed to accurately detect facial emotions. The main goals of the EWDL-BFSN are to identify facial emotions automatically and effectively by choosing the optimal features and adjusting the hyperparameters of the classifier. The gradient wavelet anisotropic filter (GWAF) can be used for image pre-processing in the EWDL-BFSN model. Additionally, SqueezeNet is used to extract significant features. The improved Botox optimization algorithm (IBoA) is then used to choose the best features. Lastly, FER and classification are accomplished through the use of an enhanced optimization-based kernel residual 50 (EK-ResNet50) network. Meanwhile, a nature-inspired metaheuristic, walrus optimization algorithm (WOA) is utilized to pick the hyperparameters of EK-ResNet50 network model. The EWDL-BFSN model was trained and tested with publicly available CK+ and FER-2013 datasets. The Python platform was applied for implementation, and various performance metrics such as accuracy, sensitivity, specificity, and F1-score were analyzed with state-of-the-art methods. The proposed EWDL-BFSN model acquired an overall accuracy of 99.37 and 99.25% for both CK+ and FER-2013 datasets and proved its superiority in predicting facial emotions over state-of-the-art methods.
Collapse
Affiliation(s)
- C Willson Joseph
- Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India
- Department of Computer Science and Engineering, Adi Shankara Institute of Engineering and Technology, Kerala, India
| | - G Jaspher Willsie Kathrine
- Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India
| | - Shanmuganathan Vimal
- Department of Artificial Intelligence and Data Science, Sri Eshwar College of Engineering, Coimbatore, India
| | - S Sumathi
- Department of CSE (Artificial Intelligence and Machine Learning), Sri Eshwar College of Engineering, Coimbatore, India
| | - Danilo Pelusi
- Communication Sciences, University of Teramo, Coste Sant'agostino Campus, Teramo, Italy
| | | | - Elena Verdú
- Universidad Internacional de La Rioja, Logroño, La Rioja, Spain
| |
Collapse
|
5
|
Niu Z, Deng Z, Gao W, Bai S, Gong Z, Chen C, Rong F, Li F, Ma L. FNeXter: A Multi-Scale Feature Fusion Network Based on ConvNeXt and Transformer for Retinal OCT Fluid Segmentation. SENSORS (BASEL, SWITZERLAND) 2024; 24:2425. [PMID: 38676042 PMCID: PMC11054479 DOI: 10.3390/s24082425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/31/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024]
Abstract
The accurate segmentation and quantification of retinal fluid in Optical Coherence Tomography (OCT) images are crucial for the diagnosis and treatment of ophthalmic diseases such as age-related macular degeneration. However, the accurate segmentation of retinal fluid is challenging due to significant variations in the size, position, and shape of fluid, as well as their complex, curved boundaries. To address these challenges, we propose a novel multi-scale feature fusion attention network (FNeXter), based on ConvNeXt and Transformer, for OCT fluid segmentation. In FNeXter, we introduce a novel global multi-scale hybrid encoder module that integrates ConvNeXt, Transformer, and region-aware spatial attention. This module can capture long-range dependencies and non-local similarities while also focusing on local features. Moreover, this module possesses the spatial region-aware capabilities, enabling it to adaptively focus on the lesions regions. Additionally, we propose a novel self-adaptive multi-scale feature fusion attention module to enhance the skip connections between the encoder and the decoder. The inclusion of this module elevates the model's capacity to learn global features and multi-scale contextual information effectively. Finally, we conduct comprehensive experiments to evaluate the performance of the proposed FNeXter. Experimental results demonstrate that our proposed approach outperforms other state-of-the-art methods in the task of fluid segmentation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Lan Ma
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (Z.N.); (Z.D.); (W.G.); (S.B.); (Z.G.); (C.C.); (F.R.); (F.L.)
| |
Collapse
|
6
|
Jamali R, Generosi A, Villafan JY, Mengoni M, Pelagalli L, Battista G, Martarelli M, Chiariotti P, Mansi SA, Arnesano M, Castellini P. Facial Expression Recognition for Measuring Jurors' Attention in Acoustic Jury Tests. SENSORS (BASEL, SWITZERLAND) 2024; 24:2298. [PMID: 38610510 PMCID: PMC11014261 DOI: 10.3390/s24072298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/26/2024] [Accepted: 03/30/2024] [Indexed: 04/14/2024]
Abstract
The perception of sound greatly impacts users' emotional states, expectations, affective relationships with products, and purchase decisions. Consequently, assessing the perceived quality of sounds through jury testing is crucial in product design. However, the subjective nature of jurors' responses may limit the accuracy and reliability of jury test outcomes. This research explores the utility of facial expression analysis in jury testing to enhance response reliability and mitigate subjectivity. Some quantitative indicators allow the research hypothesis to be validated, such as the correlation between jurors' emotional responses and valence values, the accuracy of jury tests, and the disparities between jurors' questionnaire responses and the emotions measured by FER (facial expression recognition). Specifically, analysis of attention levels during different statuses reveals a discernible decrease in attention levels, with 70 percent of jurors exhibiting reduced attention levels in the 'distracted' state and 62 percent in the 'heavy-eyed' state. On the other hand, regression analysis shows that the correlation between jurors' valence and their choices in the jury test increases when considering the data where the jurors are attentive. The correlation highlights the potential of facial expression analysis as a reliable tool for assessing juror engagement. The findings suggest that integrating facial expression recognition can enhance the accuracy of jury testing in product design by providing a more dependable assessment of user responses and deeper insights into participants' reactions to auditory stimuli.
Collapse
Affiliation(s)
- Reza Jamali
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Andrea Generosi
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Josè Yuri Villafan
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Maura Mengoni
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Leonardo Pelagalli
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Gianmarco Battista
- Department of Engineering and Architecture, Università di Parma, Parco Area delle Scienze 181/A, 43124 Parma, Italy;
| | - Milena Martarelli
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| | - Paolo Chiariotti
- Department of Mechanical Engineering, Politecnico di Milano, Via Privata Giuseppe La Masa, 1, 20156 Milano, Italy;
| | - Silvia Angela Mansi
- Università Telematica eCampus, via Isimbardi 10, 22060 Novedrate, Italy; (S.A.M.); (M.A.)
| | - Marco Arnesano
- Università Telematica eCampus, via Isimbardi 10, 22060 Novedrate, Italy; (S.A.M.); (M.A.)
| | - Paolo Castellini
- Department of Industrial Engineering and Mathematical Sciences, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy; (R.J.); (J.Y.V.); (M.M.); (L.P.); (M.M.); (P.C.)
| |
Collapse
|