1
|
Li S, Wang J, Tian L, Wang J, Huang Y. A fine-grained human facial key feature extraction and fusion method for emotion recognition. Sci Rep 2025; 15:6153. [PMID: 39979500 PMCID: PMC11842553 DOI: 10.1038/s41598-025-90440-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 02/13/2025] [Indexed: 02/22/2025] Open
Abstract
Emotion, a fundamental mapping of human responses to external stimuli, has been extensively studied in human-computer interaction, particularly in areas such as intelligent cockpits and systems. However, accurately recognizing emotions from facial expressions remains a significant challenge due to lighting conditions, posture, and micro-expressions. Emotion recognition using global or local facial features is a key research direction. However, relying solely on global or local features often results in models that exhibit uneven attention across facial features, neglecting key variations critical for detecting emotional changes. This paper proposes a method for modeling and extracting key facial features by integrating global and local facial data. First, we construct a comprehensive image preprocessing model that includes super-resolution processing, lighting and shading processing, and texture enhancement. This preprocessing step significantly enriches the expression of image features. Second, A global facial feature recognition model is developed using an encoder-decoder architecture, which effectively eliminates environmental noise and generates a comprehensive global feature dataset for facial analysis. Simultaneously, the Haar cascade classifier is employed to extract refined features from key facial regions, including the eyes, mouth, and overall face, resulting in a corresponding local feature dataset. Finally, a two-branch convolutional neural network is designed to integrate both global and local facial feature datasets, enhancing the model's ability to recognize facial characteristics accurately. The global feature branch fully characterizes the global features of the face, while the local feature branch focuses on the local features. An adaptive fusion module integrates the global and local features, enhancing the model's ability to differentiate subtle emotional changes. To evaluate the accuracy and robustness of the model, we train and test it on the FER-2013 and JAFFE emotion datasets, achieving average accuracies of 80.59% and 97.61%, respectively. Compared to existing state-of-the-art models, our refined face feature extraction and fusion model demonstrates superior performance in emotion recognition. Additionally, the comparative analysis shows that emotional features across different faces show similarities. Building on psychological research, we categorize the dataset into three emotion classes: positive, neutral, and negative. The accuracy of emotion recognition is significantly improved under the new classification criteria. Additionally, the self-built dataset is used to validate further that this classification approach has important implications for practical applications.
Collapse
Affiliation(s)
- Shiwei Li
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China.
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China.
| | - Jisen Wang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
| | - Linbo Tian
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
| | - Jianqiang Wang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China
| | - Yan Huang
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China
- Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou, 730070, China
| |
Collapse
|
2
|
Mahavar A, Patel A, Patel A. A Comprehensive Review on Deep Learning Techniques in Alzheimer's Disease Diagnosis. Curr Top Med Chem 2025; 25:335-349. [PMID: 38847164 DOI: 10.2174/0115680266310776240524061252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/13/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2025]
Abstract
Alzheimer's Disease (AD) is a serious neurological illness that causes memory loss gradually by destroying brain cells. This deadly brain illness primarily strikes the elderly, impairing their cognitive and bodily abilities until brain shrinkage occurs. Modern techniques are required for an accurate diagnosis of AD. Machine learning has gained attraction in the medical field as a means of determining a person's risk of developing AD in its early stages. One of the most advanced soft computing neural network-based Deep Learning (DL) methodologies has garnered significant interest among researchers in automating early-stage AD diagnosis. Hence, a comprehensive review is necessary to gain insights into DL techniques for the advancement of more effective methods for diagnosing AD. This review explores multiple biomarkers associated with Alzheimer's Disease (AD) and various DL methodologies, including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), The k-nearest-neighbor (k-NN), Deep Boltzmann Machines (DBM), and Deep Belief Networks (DBN), which have been employed for automating the early diagnosis of AD. Moreover, the unique contributions of this review include the classification of ATN biomarkers for Alzheimer's Disease (AD), systemic description of diverse DL algorithms for early AD assessment, along with a discussion of widely utilized online datasets such as ADNI, OASIS, etc. Additionally, this review provides perspectives on future trends derived from critical evaluation of each variant of DL techniques across different modalities, dataset sources, AUC values, and accuracies.
Collapse
Affiliation(s)
- Anjali Mahavar
- Chandaben Mohanbhai Patel Institute of Computer Application, Charotar University of Science and Technology, CHARUSAT-Campus, Changa, 388421, Anand, Gujarat, India
| | - Atul Patel
- Chandaben Mohanbhai Patel Institute of Computer Application, Charotar University of Science and Technology, CHARUSAT-Campus, Changa, 388421, Anand, Gujarat, India
| | - Ashish Patel
- Ramanbhai Patel College of Pharmacy, Charotar University of Science and Technology, CHARUSAT- Campus, Changa, 388421, Anand, Gujarat, India
| |
Collapse
|
3
|
Chen L, Li M, Wu M, Pedrycz W, Hirota K. Coupled Multimodal Emotional Feature Analysis Based on Broad-Deep Fusion Networks in Human-Robot Interaction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:9663-9673. [PMID: 37021991 DOI: 10.1109/tnnls.2023.3236320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A coupled multimodal emotional feature analysis (CMEFA) method based on broad-deep fusion networks, which divide multimodal emotion recognition into two layers, is proposed. First, facial emotional features and gesture emotional features are extracted using the broad and deep learning fusion network (BDFN). Considering that the bi-modal emotion is not completely independent of each other, canonical correlation analysis (CCA) is used to analyze and extract the correlation between the emotion features, and a coupling network is established for emotion recognition of the extracted bi-modal features. Both simulation and application experiments are completed. According to the simulation experiments completed on the bimodal face and body gesture database (FABO), the recognition rate of the proposed method has increased by 1.15% compared to that of the support vector machine recursive feature elimination (SVMRFE) (without considering the unbalanced contribution of features). Moreover, by using the proposed method, the multimodal recognition rate is 21.22%, 2.65%, 1.61%, 1.54%, and 0.20% higher than those of the fuzzy deep neural network with sparse autoencoder (FDNNSA), ResNet-101 + GFK, C3D + MCB + DBN, the hierarchical classification fusion strategy (HCFS), and cross-channel convolutional neural network (CCCNN), respectively. In addition, preliminary application experiments are carried out on our developed emotional social robot system, where emotional robot recognizes the emotions of eight volunteers based on their facial expressions and body gestures.
Collapse
|
4
|
Ahmad M, Sanawar S, Alfandi O, Qadri SF, Saeed IA, Khan S, Hayat B, Ahmad A. Facial expression recognition using lightweight deep learning modeling. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8208-8225. [PMID: 37161193 DOI: 10.3934/mbe.2023357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Facial expression is a type of communication and is useful in many areas of computer vision, including intelligent visual surveillance, human-robot interaction and human behavior analysis. A deep learning approach is presented to classify happy, sad, angry, fearful, contemptuous, surprised and disgusted expressions. Accurate detection and classification of human facial expression is a critical task in image processing due to the inconsistencies amid the complexity, including change in illumination, occlusion, noise and the over-fitting problem. A stacked sparse auto-encoder for facial expression recognition (SSAE-FER) is used for unsupervised pre-training and supervised fine-tuning. SSAE-FER automatically extracts features from input images, and the softmax classifier is used to classify the expressions. Our method achieved an accuracy of 92.50% on the JAFFE dataset and 99.30% on the CK+ dataset. SSAE-FER performs well compared to the other comparative methods in the same domain.
Collapse
Affiliation(s)
- Mubashir Ahmad
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Tobe Camp, Abbottabad-22060, Pakistan
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Saira Sanawar
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Omar Alfandi
- College of Technological Innovation at Zayed University in Abu Dhabi, UAE
| | - Syed Furqan Qadri
- Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou 311121, China
| | - Iftikhar Ahmed Saeed
- Department of Computer Science, the University of Lahore, Sargodha Campus 40100, Pakistan
| | - Salabat Khan
- College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Bashir Hayat
- Department of Computer Science, Institute of Management Sciences, Peshawar, Pakistan
| | - Arshad Ahmad
- Department of IT & CS, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology (PAF-IAST), Haripur 22620, Pakistan
| |
Collapse
|
5
|
Patch Attention Convolutional Vision Transformer for Facial Expression Recognition with Occlusion. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
Kumar Arora T, Kumar Chaubey P, Shree Raman M, Kumar B, Nagesh Y, Anjani PK, Ahmed HMS, Hashmi A, Balamuralitharan S, Debtera B. Optimal Facial Feature Based Emotional Recognition Using Deep Learning Algorithm. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8379202. [PMID: 36177319 PMCID: PMC9514924 DOI: 10.1155/2022/8379202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/25/2022] [Indexed: 12/24/2022]
Abstract
Humans have traditionally found it simple to identify emotions from facial expressions, but it is far more difficult for a computer system to do the same. The social signal processing subfield of emotion recognition from facial expression is used in a wide range of contexts, particularly for human-computer interaction. Automatic emotion recognition has been the subject of numerous studies, most of which use a machine learning methodology. The recognition of simple emotions like anger, happiness, contempt, fear, sadness, and surprise, however, continues to be a difficult topic in computer vision. Deep learning has recently drawn increased attention as a solution to a variety of practical issues, including emotion recognition. In this study, we improved the convolutional neural network technique to identify 7 fundamental emotions and evaluated several preprocessing techniques to demonstrate how they affected the CNN performance. This research focuses on improving facial features and expressions based on emotional recognition. By identifying or recognising facial expressions that elicit human responses, it is possible for computers to make more accurate predictions about a person's mental state and to provide more tailored responses. As a result, we examine how a deep learning technique that employs a convolutional neural network might improve the detection of emotions based on facial features (CNN). Multiple facial expressions are included in our dataset, which consists of about 32,298 photos for testing and training. The preprocessing system aids in removing noise from the input image, and the pretraining phase aids in revealing face detection after noise removal, including feature extraction. As a result, the existing paper generates the classification of multiple facial reactions like the seven emotions of the facial acting coding system (FACS) without using the optimization technique, but our proposed paper reveals the same seven emotions of the facial acting coding system.
Collapse
Affiliation(s)
- Tarun Kumar Arora
- Professor-Department of Applied Sciences and Humanities, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
| | - Pavan Kumar Chaubey
- Department of Applied Sciences Engineering, Tula's Institute, Dhoolkot, Dehradun, Uttarakhand, India
| | - Manju Shree Raman
- Department of Management College of Business & Economics, Debre Tabor University, Debra Tabor, Ethiopia
| | - Bhupendra Kumar
- College of Business & Economics, Debre Tabor University Ethiopia, Debra Tabor, Ethiopia
| | - Yagnam Nagesh
- IT Department, Debra Tabore University, Debra Tabor, Ethiopia
| | - P. K. Anjani
- Department of Management Studies, Sona College of Technology, Salem, TN, India
| | - Hamed M. S. Ahmed
- Department of Management, College of Business and Economics, Werabe University, Addis Ababa, Ethiopia
| | - Arshad Hashmi
- Information Systems Department Faculty of Computing and Information Technology In Rabigh (Fcitr), King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Balamuralitharan
- Department of Mathematics, Bharath Institute of Higher Education and Research, Bharath Institute of Science and Technology, No. 173 Agharam Road Selaiyur, Chennai 600 073, Tamil Nadu, India
| | - Baru Debtera
- Department of Chemical Engineering, College of Biological and Chemical Engineering, Addis Ababa Science and Technology University, Addis Ababa, Ethiopia
| |
Collapse
|
7
|
Wang Y, Zhou S, Liu Y, Wang K, Fang F, Qian H. ConGNN: Context-consistent cross-graph neural network for group emotion recognition in the wild. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
8
|
Class-specific weighted broad learning system for imbalanced heartbeat classification. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
9
|
A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6060047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multimodal human–computer interaction (HCI) systems pledge a more human–human-like interaction between machines and humans. Their prowess in emanating an unambiguous information exchange between the two makes these systems more reliable, efficient, less error prone, and capable of solving complex tasks. Emotion recognition is a realm of HCI that follows multimodality to achieve accurate and natural results. The prodigious use of affective identification in e-learning, marketing, security, health sciences, etc., has increased demand for high-precision emotion recognition systems. Machine learning (ML) is getting its feet wet to ameliorate the process by tweaking the architectures or wielding high-quality databases (DB). This paper presents a survey of such DBs that are being used to develop multimodal emotion recognition (MER) systems. The survey illustrates the DBs that contain multi-channel data, such as facial expressions, speech, physiological signals, body movements, gestures, and lexical features. Few unimodal DBs are also discussed that work in conjunction with other DBs for affect recognition. Further, VIRI, a new DB of visible and infrared (IR) images of subjects expressing five emotions in an uncontrolled, real-world environment, is presented. A rationale for the superiority of the presented corpus over the existing ones is instituted.
Collapse
|
10
|
Construction of Home Product Design System Based on Self-Encoder Depth Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8331504. [PMID: 35498170 PMCID: PMC9050303 DOI: 10.1155/2022/8331504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 03/27/2022] [Accepted: 03/30/2022] [Indexed: 11/25/2022]
Abstract
The traditional home product design system mainly depends on relatively shallow learning network, relatively simple embedded technology and Internet of things technology. The traditional home design system mainly depends on the traditional self-encoder technology. When combined with the deep neural network, this technology has serious defects in the computer vision algorithm, resulting in the serious waste of the corresponding system storage and computing resources, the corresponding system learning efficiency is relatively poor and the learning ability is weak. Based on this, this paper will build a home product design system based on the deep neural network of self-encoder. By improving the sparsity of self-encoder in the process of learning and training, we can further improve the sparsity of the system and further optimize the structure of self-encoder in the design system, The performance of the deep learning model of the design system is further improved through the hierarchical features continuously learned by the self-encoder in the process of home case design. Based on the optimization of the home product design system in this paper, the system effectively improves and improves the accuracy and stability of the internal feature classifier of the system, and improves the overall performance of the furniture design system. In the specific system construction part, based on ZigBee technology and embedded technology as the design carrier, and adhering to the goal of simplicity, intelligence and convenience, this paper designs and constructs the home product design system. The experimental results show that the noise processing level of the proposed home product design system is lower than 4-5db compared with the traditional design system, and the corresponding image classification accuracy is about 4% higher than the traditional design system. Therefore, the experimental results show that the home design system proposed in this paper has obvious advantages.
Collapse
|
11
|
Robust affect analysis using committee of deep convolutional neural networks. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06632-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Rao Z, Wu J, Zhang F, Tian Z. Psychological and Emotional Recognition of Preschool Children Using Artificial Neural Network. Front Psychol 2022; 12:762396. [PMID: 35211052 PMCID: PMC8861073 DOI: 10.3389/fpsyg.2021.762396] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022] Open
Abstract
The artificial neural network (ANN) is employed to study children's psychological emotion recognition to fully reflect the psychological status of preschool children and promote the healthy growth of preschool children. Specifically, the ANN model is used to construct the human physiological signal measurement platform and emotion recognition platform to measure the human physiological signals in different psychological and emotional states. Finally, the parameter values are analyzed on the emotion recognition platform to identify the children's psychological and emotional states accurately. The experimental results demonstrate that the recognition ability of children aged 4-6 to recognize the three basic emotions of happiness, calm, and fear increases with age. Besides, there are significant age differences in children's recognition of happiness, calm, and fear. In addition, the effect of 4-year-old children on the theory of mind tasks is less than that of 5- to 6-year-old children, which may be related to more complex cognitive processes. Preschool children are experiencing a stage of rapid emotional development. If children cannot be guided to reasonably identify and deal with emotions at this stage, their education level and social ability development will be significantly affected. Therefore, this study has significant reference value for preschool children's emotional recognition and guidance and can promote children's emotional processing and mental health.
Collapse
Affiliation(s)
- Zhangxue Rao
- School of Education, China West Normal University, Nanchong, China
| | - Jihui Wu
- School of Education, China West Normal University, Nanchong, China
| | - Fengrui Zhang
- College of Life Science, Sichuan Agricultural University, Yaan, China
| | - Zhouyu Tian
- School of Economics and Management, Shenyang Institute of Technology, Fushun, China
| |
Collapse
|
13
|
Kumari N, Bhatia R. Efficient facial emotion recognition model using deep convolutional neural network and modified joint trilateral filter. Soft comput 2022. [DOI: 10.1007/s00500-022-06804-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
Improving Facial Emotion Recognition Using Residual Autoencoder Coupled Affinity Based Overlapping Reduction. MATHEMATICS 2022. [DOI: 10.3390/math10030406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Emotion recognition using facial images has been a challenging task in computer vision. Recent advancements in deep learning has helped in achieving better results. Studies have pointed out that multiple facial expressions may present in facial images of a particular type of emotion. Thus, facial images of a category of emotion may have similarity to other categories of facial images, leading towards overlapping of classes in feature space. The problem of class overlapping has been studied primarily in the context of imbalanced classes. Few studies have considered imbalanced facial emotion recognition. However, to the authors’ best knowledge, no study has been found on the effects of overlapped classes on emotion recognition. Motivated by this, in the current study, an affinity-based overlap reduction technique (AFORET) has been proposed to deal with the overlapped class problem in facial emotion recognition. Firstly, a residual variational autoencoder (RVA) model has been used to transform the facial images to a latent vector form. Next, the proposed AFORET method has been applied on these overlapped latent vectors to reduce the overlapping between classes. The proposed method has been validated by training and testing various well known classifiers and comparing their performance in terms of a well known set of performance indicators. In addition, the proposed AFORET method is compared with already existing overlap reduction techniques, such as the OSM, ν-SVM, and NBU methods. Experimental results have shown that the proposed AFORET algorithm, when used with the RVA model, boosts classifier performance to a greater extent in predicting human emotion using facial images.
Collapse
|
15
|
Hong A, Lunscher N, Hu T, Tsuboi Y, Zhang X, Franco Dos Reis Alves S, Nejat G, Benhabib B. A Multimodal Emotional Human-Robot Interaction Architecture for Social Robots Engaged in Bidirectional Communication. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5954-5968. [PMID: 32149676 DOI: 10.1109/tcyb.2020.2974688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
For social robots to effectively engage in human-robot interaction (HRI), they need to be able to interpret human affective cues and to respond appropriately via display of their own emotional behavior. In this article, we present a novel multimodal emotional HRI architecture to promote natural and engaging bidirectional emotional communications between a social robot and a human user. User affect is detected using a unique combination of body language and vocal intonation, and multimodal classification is performed using a Bayesian Network. The Emotionally Expressive Robot utilizes the user's affect to determine its own emotional behavior via an innovative two-layer emotional model consisting of deliberative (hidden Markov model) and reactive (rule-based) layers. The proposed architecture has been implemented via a small humanoid robot to perform diet and fitness counseling during HRI. In order to evaluate the Emotionally Expressive Robot's effectiveness, a Neutral Robot that can detect user affects but lacks an emotional display, was also developed. A between-subjects HRI experiment was conducted with both types of robots. Extensive results have shown that both robots can effectively detect user affect during the real-time HRI. However, the Emotionally Expressive Robot can appropriately determine its own emotional response based on the situation at hand and, therefore, induce more user positive valence and less negative arousal than the Neutral Robot.
Collapse
|
16
|
Mi X, Wang S, Shao C, Zhang P, Chen M. Resident travel mode prediction model in Beijing metropolitan area. PLoS One 2021; 16:e0259793. [PMID: 34762681 PMCID: PMC8588932 DOI: 10.1371/journal.pone.0259793] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 10/26/2021] [Indexed: 11/19/2022] Open
Abstract
With the development of economic integration, Beijing has become more closely connected with surrounding areas, which gradually formed the Beijing metropolitan area (BMA). The authors define the scope of BMA from two dimensions of space and time. BMA is determined to be the built-up area of Beijing and its surrounding 10 districts. Designed questionnaire survey the personal characteristics, family characteristics, and travel characteristics of residents from 10 districts in the surrounding BMA. The statistical analysis of questionnaires shows that the supply of public transportation is insufficient and cannot meet traffic demand. Further, the travel mode prediction model of Softmax regression machine learning algorithm for BMA (SRBM) is established. To further verify the prediction performance of the proposed model, the Multinomial Logit Model (MNL) and Support Vector Machine (SVM), model are introduced to compare the prediction accuracy. The results show that the constructed SRBM model exhibits high prediction accuracy, with an average accuracy of 88.35%, which is 2.83% and 18.11% higher than the SVM and MNL models, respectively. This article provides new ideas for the prediction of travel modes in the Beijing metropolitan area.
Collapse
Affiliation(s)
- Xueyu Mi
- Key Laboratory of Transport Industry of Big Data Application Technologies
for Comprehensive Transport, Ministry of Transport, School of Traffic and
Transportation, Beijing Jiaotong University, Beijing, China
- School of Civil Engineering, North China University of Technology,
Tangshan, Hebei province, China
| | - Shengyou Wang
- Key Laboratory of Transport Industry of Big Data Application Technologies
for Comprehensive Transport, Ministry of Transport, School of Traffic and
Transportation, Beijing Jiaotong University, Beijing, China
| | - Chunfu Shao
- Key Laboratory of Transport Industry of Big Data Application Technologies
for Comprehensive Transport, Ministry of Transport, School of Traffic and
Transportation, Beijing Jiaotong University, Beijing, China
| | - Peng Zhang
- Engineering General Contracting Department II, Beijing Municipal Road
& Bridge Co., Ltd., Beijing, China
| | - Mingming Chen
- School of Civil Engineering, North China University of Technology,
Tangshan, Hebei province, China
| |
Collapse
|
17
|
Wang S, Yuan Y, Zheng X, Lu X. Local and correlation attention learning for subtle facial expression recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.07.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
|
19
|
Benouar S, Hafid A, Kedir-Talha M, Seoane F. Classification of impedance cardiography dZ/dt complex subtypes using pattern recognition artificial neural networks. ACTA ACUST UNITED AC 2021; 66:515-527. [PMID: 34162027 DOI: 10.1515/bmt-2020-0267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 06/09/2021] [Indexed: 11/15/2022]
Abstract
In impedance cardiography (ICG), the detection of dZ/dt signal (ICG) characteristic points, especially the X point, is a crucial step for the calculation of hemodynamic parameters such as stroke volume (SV) and cardiac output (CO). Unfortunately, for beat-to-beat calculations, the accuracy of the detection is affected by the variability of the ICG complex subtypes. Thus, in this work, automated classification of ICG complexes is proposed to support the detection of ICG characteristic points and the extraction of hemodynamic parameters according to several existing subtypes. A novel pattern recognition artificial neural network (PRANN) approach was implemented, and a divide-and-conquer strategy was used to identify the five different waveforms of the ICG complex waveform with output nodes no greater than 3. The PRANN was trained, tested and validated using a dataset from four volunteers from a measurement of eight electrodes. Once the training was satisfactory, the deployed network was validated on two other datasets that were completely different from the training dataset. As an additional performance validation of the PRANN, each dataset included four volunteers for a total of eight volunteers. The results show an average accuracy of 96% in classifying ICG complex subtypes with only a decrease in the accuracy to 83 and 80% on the validation datasets. This work indicates that the PRANN is a promising method for automated classification of ICG subtypes, facilitating the investigation of the extraction of hemodynamic parameters from beat-to-beat dZ/dt complexes.
Collapse
Affiliation(s)
- Sara Benouar
- Laboratory of Instrumentation, University of Sciences and Technology Houari Boumediene, Algiers, Algeria.,Department of Textile Technology, University of Borås, Borås, Sweden
| | - Abdelakram Hafid
- Laboratory of Instrumentation, University of Sciences and Technology Houari Boumediene, Algiers, Algeria.,Department of Textile Technology, University of Borås, Borås, Sweden
| | - Malika Kedir-Talha
- Laboratory of Instrumentation, University of Sciences and Technology Houari Boumediene, Algiers, Algeria
| | - Fernando Seoane
- Department for Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.,The Department of Medical Technology, Karolinska University Hospital, Stockholm,Sweden.,The Swedish School of Textiles, University of Borås, Borås, Sweden
| |
Collapse
|
20
|
Deep transfer learning in human–robot interaction for cognitive and physical rehabilitation purposes. Pattern Anal Appl 2021. [DOI: 10.1007/s10044-021-00988-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
21
|
A Partially Interpretable Adaptive Softmax Regression for Credit Scoring. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11073227] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Credit scoring is a process of determining whether a borrower is successful or unsuccessful in repaying a loan using borrowers’ qualitative and quantitative characteristics. In recent years, machine learning algorithms have become widely studied in the development of credit scoring models. Although efficiently classifying good and bad borrowers is a core objective of the credit scoring model, there is still a need for the model that can explain the relationship between input and output. In this work, we propose a novel partially interpretable adaptive softmax (PIA-Soft) regression model to achieve both state-of-the-art predictive performance and marginally interpretation between input and output. We augment softmax regression by neural networks to make it adaptive for each borrower. Our PIA-Soft model consists of two main components: linear (softmax regression) and non-linear (neural network). The linear part explains the fundamental relationship between input and output variables. The non-linear part serves to improve the prediction performance by identifying the non-linear relationship between features for each borrower. The experimental result on public benchmark datasets shows that our proposed model not only outperformed the machine learning baselines but also showed the explanations that logically related to the real-world.
Collapse
|
22
|
Chen L, Li M, Su W, Wu M, Hirota K, Pedrycz W. Adaptive Feature Selection-Based AdaBoost-KNN With Direct Optimization for Dynamic Emotion Recognition in Human–Robot Interaction. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2019.2909930] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
|
24
|
Liu L, Jiang R, Huo J, Chen J. Self-Difference Convolutional Neural Network for Facial Expression Recognition. SENSORS 2021; 21:s21062250. [PMID: 33807088 PMCID: PMC8005141 DOI: 10.3390/s21062250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/14/2021] [Accepted: 03/19/2021] [Indexed: 11/16/2022]
Abstract
Facial expression recognition (FER) is a challenging problem due to the intra-class variation caused by subject identities. In this paper, a self-difference convolutional network (SD-CNN) is proposed to address the intra-class variation issue in FER. First, the SD-CNN uses a conditional generative adversarial network to generate the six typical facial expressions for the same subject in the testing image. Second, six compact and light-weighted difference-based CNNs, called DiffNets, are designed for classifying facial expressions. Each DiffNet extracts a pair of deep features from the testing image and one of the six synthesized expression images, and compares the difference between the deep feature pair. In this way, any potential facial expression in the testing image has an opportunity to be compared with the synthesized "Self"-an image of the same subject with the same facial expression as the testing image. As most of the self-difference features of the images with the same facial expression gather tightly in the feature space, the intra-class variation issue is significantly alleviated. The proposed SD-CNN is extensively evaluated on two widely-used facial expression datasets: CK+ and Oulu-CASIA. Experimental results demonstrate that the SD-CNN achieves state-of-the-art performance with accuracies of 99.7% on CK+ and 91.3% on Oulu-CASIA, respectively. Moreover, the model size of the online processing part of the SD-CNN is only 9.54 MB (1.59 MB ×6), which enables the SD-CNN to run on low-cost hardware.
Collapse
Affiliation(s)
- Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Rubin Jiang
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
| | - Jiao Huo
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
| | - Jingying Chen
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
- Correspondence: ; Tel.: +86-135-1721-9631
| |
Collapse
|
25
|
Rawat A, Kumar A, Upadhyay P, Kumar S. Deep learning-based models for temporal satellite data processing: Classification of paddy transplanted fields. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101214] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Zhang S, Yu H, Wang T, Dong J, Pham TD. Linearly augmented real-time 4D expressional face capture. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.099] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
27
|
Liang X, Xu L, Liu J, Liu Z, Cheng G, Xu J, Liu L. Patch Attention Layer of Embedding Handcrafted Features in CNN for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2021; 21:833. [PMID: 33513723 PMCID: PMC7865259 DOI: 10.3390/s21030833] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/13/2021] [Accepted: 01/21/2021] [Indexed: 01/21/2023]
Abstract
Recognizing facial expression has attracted much more attention due to its broad range of applications in human-computer interaction systems. Although facial representation is crucial to final recognition accuracy, traditional handcrafted representations only reflect shallow characteristics and it is uncertain whether the convolutional layer can extract better ones. In addition, the policy that weights are shared across a whole image is improper for structured face images. To overcome such limitations, a novel method based on patches of interest, the Patch Attention Layer (PAL) of embedding handcrafted features, is proposed to learn the local shallow facial features of each patch on face images. Firstly, a handcrafted feature, Gabor surface feature (GSF), is extracted by convolving the input face image with a set of predefined Gabor filters. Secondly, the generated feature is segmented as nonoverlapped patches that can capture local shallow features by the strategy of using different local patches with different filters. Then, the weighted shallow features are fed into the remaining convolutional layers to capture high-level features. Our method can be carried out directly on a static image without facial landmark information, and the preprocessing step is very simple. Experiments on four databases show that our method achieved very competitive performance (Extended Cohn-Kanade database (CK+): 98.93%; Oulu-CASIA: 97.57%; Japanese Female Facial Expressions database (JAFFE): 93.38%; and RAF-DB: 86.8%) compared to other state-of-the-art methods.
Collapse
Affiliation(s)
- Xingcan Liang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; (X.L.); (J.L.); (Z.L.); (G.C.)
- University of Science and Technology of China, Hefei 230026, China; (J.X.); (L.L.)
| | - Linsen Xu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; (X.L.); (J.L.); (Z.L.); (G.C.)
- Anhui Province Key Laboratory of Biomimetic Sensing and Advanced Robot Technology, Hefei 230031, China
| | - Jinfu Liu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; (X.L.); (J.L.); (Z.L.); (G.C.)
| | - Zhipeng Liu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; (X.L.); (J.L.); (Z.L.); (G.C.)
- University of Science and Technology of China, Hefei 230026, China; (J.X.); (L.L.)
| | - Gaoxin Cheng
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; (X.L.); (J.L.); (Z.L.); (G.C.)
- University of Science and Technology of China, Hefei 230026, China; (J.X.); (L.L.)
| | - Jiajun Xu
- University of Science and Technology of China, Hefei 230026, China; (J.X.); (L.L.)
| | - Lei Liu
- University of Science and Technology of China, Hefei 230026, China; (J.X.); (L.L.)
| |
Collapse
|
28
|
Ramis S, Buades JM, Perales FJ. Using a Social Robot to Evaluate Facial Expressions in the Wild. SENSORS 2020; 20:s20236716. [PMID: 33255347 PMCID: PMC7727691 DOI: 10.3390/s20236716] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/20/2020] [Accepted: 11/20/2020] [Indexed: 11/22/2022]
Abstract
In this work an affective computing approach is used to study the human-robot interaction using a social robot to validate facial expressions in the wild. Our global goal is to evaluate that a social robot can be used to interact in a convincing manner with human users to recognize their potential emotions through facial expressions, contextual cues and bio-signals. In particular, this work is focused on analyzing facial expression. A social robot is used to validate a pre-trained convolutional neural network (CNN) which recognizes facial expressions. Facial expression recognition plays an important role in recognizing and understanding human emotion by robots. Robots equipped with expression recognition capabilities can also be a useful tool to get feedback from the users. The designed experiment allows evaluating a trained neural network in facial expressions using a social robot in a real environment. In this paper a comparison between the CNN accuracy and human experts is performed, in addition to analyze the interaction, attention and difficulty to perform a particular expression by 29 non-expert users. In the experiment, the robot leads the users to perform different facial expressions in motivating and entertaining way. At the end of the experiment, the users are quizzed about their experience with the robot. Finally, a set of experts and the CNN classify the expressions. The obtained results allow affirming that the use of social robot is an adequate interaction paradigm for the evaluation on facial expression.
Collapse
|
29
|
Liu C, Hirota K, Wang B, Dai Y, Jia Z. Two-Channel Feature Extraction Convolutional Neural Network for Facial Expression Recognition. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2020. [DOI: 10.20965/jaciii.2020.p0792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An emotion recognition framework based on a two-channel convolutional neural network (CNN) is proposed to detect the affective state of humans through facial expressions. The framework consists of three parts, i.e., the frontal face detection module, the feature extraction module, and the classification module. The feature extraction module contains two channels: one is for raw face images and the other is for texture feature images. The local binary pattern (LBP) images are utilized for texture feature extraction to enrich facial features and improve the network performance. The attention mechanism is adopted in both CNN feature extraction channels to highlight the features that are related to facial expressions. Moreover, arcface loss function is integrated into the proposed network to increase the inter-class distance and decrease the inner-class distance of facial features. The experiments conducted on the two public databases, FER2013 and CK+, demonstrate that the proposed method outperforms the previous methods, with the accuracies of 72.56% and 94.24%, respectively. The improvement in emotion recognition accuracy makes our approach applicable to service robots.
Collapse
|
30
|
Alo UR, Nweke HF, Teh YW, Murtaza G. Smartphone Motion Sensor-Based Complex Human Activity Identification Using Deep Stacked Autoencoder Algorithm for Enhanced Smart Healthcare System. SENSORS 2020; 20:s20216300. [PMID: 33167424 PMCID: PMC7663988 DOI: 10.3390/s20216300] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 09/28/2020] [Accepted: 10/04/2020] [Indexed: 11/16/2022]
Abstract
Human motion analysis using a smartphone-embedded accelerometer sensor provided important context for the identification of static, dynamic, and complex sequence of activities. Research in smartphone-based motion analysis are implemented for tasks, such as health status monitoring, fall detection and prevention, energy expenditure estimation, and emotion detection. However, current methods, in this regard, assume that the device is tightly attached to a pre-determined position and orientation, which might cause performance degradation in accelerometer data due to changing orientation. Therefore, it is challenging to accurately and automatically identify activity details as a result of the complexity and orientation inconsistencies of the smartphone. Furthermore, the current activity identification methods utilize conventional machine learning algorithms that are application dependent. Moreover, it is difficult to model the hierarchical and temporal dynamic nature of the current, complex, activity identification process. This paper aims to propose a deep stacked autoencoder algorithm, and orientation invariant features, for complex human activity identification. The proposed approach is made up of various stages. First, we computed the magnitude norm vector and rotation feature (pitch and roll angles) to augment the three-axis dimensions (3-D) of the accelerometer sensor. Second, we propose a deep stacked autoencoder based deep learning algorithm to automatically extract compact feature representation from the motion sensor data. The results show that the proposed integration of the deep learning algorithm, and orientation invariant features, can accurately recognize complex activity details using only smartphone accelerometer data. The proposed deep stacked autoencoder method achieved 97.13% identification accuracy compared to the conventional machine learning methods and the deep belief network algorithm. The results suggest the impact of the proposed method to improve a smartphone-based complex human activity identification framework.
Collapse
Affiliation(s)
- Uzoma Rita Alo
- Computer Science Department, Alex Ekwueme Federal University, Ndufu-Alike, Ikwo, P.M.B 1010, Abakaliki, Ebonyi State 480263, Nigeria;
| | - Henry Friday Nweke
- Computer Science Department, Ebonyi State University, P.M.B 053, Abakaliki, Ebonyi State 480211, Nigeria
- Correspondence: (H.F.N.); (Y.W.T.); Tel.: +234-703-6799-510 (H.F.N.)
| | - Ying Wah Teh
- Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia;
- Correspondence: (H.F.N.); (Y.W.T.); Tel.: +234-703-6799-510 (H.F.N.)
| | - Ghulam Murtaza
- Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia;
- Department of Computer Science, Sukkur IBA University, Sukkur 65200, Pakistan
| |
Collapse
|
31
|
Abstract
Face recognition (FR) is a hotspot in pattern recognition and image processing for its wide applications in real life. One of the most challenging problems in FR is single sample face recognition (SSFR). In this paper, we proposed a novel algorithm based on nonnegative sparse representation, collaborative presentation, and probabilistic graph estimation to address SSFR. The proposed algorithm is named as Nonnegative Sparse Probabilistic Estimation (NNSPE). To extract the variation information from the generic training set, we first select some neighbor samples from the generic training set for each sample in the gallery set and the generic training set can be partitioned into some reference subsets. To make more meaningful reconstruction, the proposed method adopts nonnegative sparse representation to reconstruct training samples, and according to the reconstruction coefficients, NNSPE computes the probabilistic label estimation for the samples of the generic training set. Then, for a given test sample, collaborative representation (CR) is used to acquire an adaptive variation subset. Finally, the NNSPE classifies the test sample with the adaptive variation subset and probabilistic label estimation. The experiments on the AR and PIE verify the effectiveness of the proposed method both in recognition rates and time cost.
Collapse
Affiliation(s)
- Shuhuan Zhao
- College of Electronic and Information Engineering, Hebei University, Baoding 071000, P. R. China
| |
Collapse
|
32
|
Li Y, Fang S, Bai X, Jiao L, Marturi N. Parallel design of sparse deep belief network with multi-objective optimization. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
33
|
A Deep Learning Model for Fault Diagnosis with a Deep Neural Network and Feature Fusion on Multi-Channel Sensory Signals. SENSORS 2020; 20:s20154300. [PMID: 32752215 PMCID: PMC7436083 DOI: 10.3390/s20154300] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 11/16/2022]
Abstract
Collecting multi-channel sensory signals is a feasible way to enhance performance in the diagnosis of mechanical equipment. In this article, a deep learning method combined with feature fusion on multi-channel sensory signals is proposed. First, a deep neural network (DNN) made up of auto-encoders is adopted to adaptively learn representative features from sensory signal and approximate non-linear relation between symptoms and fault modes. Then, Locality Preserving Projection (LPP) is utilized in the fusion of features extracted from multi-channel sensory signals. Finally, a novel diagnostic model based on multiple DNNs (MDNNs) and softmax is constructed with the input of fused deep features. The proposed method is verified in intelligent failure recognition for automobile final drive to evaluate its performance. A set of contrastive analyses of several intelligent models based on the Back-Propagation Neural Network (BPNN), Support Vector Machine (SVM) and the proposed deep architecture with single sensory signal and multi-channel sensory signals is implemented. The proposed deep architecture of feature extraction and feature fusion on multi-channel sensory signals can effectively recognize the fault patterns of final drive with the best diagnostic accuracy of 95.84%. The results confirm that the proposed method is more robust and effective than other comparative methods in the contrastive experiments.
Collapse
|
34
|
Chen L, Chen J. Deep Neural Network for Automatic Classification of Pathological Voice Signals. J Voice 2020; 36:288.e15-288.e24. [PMID: 32660846 DOI: 10.1016/j.jvoice.2020.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVES Computer-aided pathological voice detection is efficient for initial screening of pathological voice, and has received high academic and clinical attention. This paper proposes an automatic diagnosis method of pathological voice based on deep neural network (DNN). Other two classification models (support vector machines and random forests) were used to verify the effectiveness of DNN. METHODS In this paper, we extracted 12 Mel frequency cepstral coefficients of each voice sample as row features. The constructed DNN consists a two-layer stacked sparse autoencoders network and a softmax layer. The stacked sparse autoencoders layer can learn high-level features from raw Mel frequency cepstral coefficients features. Then, the softmax layer can diagnose pathological voice according to high-level features. The DNN and the other two comparison models used the same train set and test set for the experiment. RESULTS Experimental results reveal that the value of sensitivity, specificity, precision, accuracy, and F1 score of the DNN can reach 97.8%, 99.4%, 99.4%, 98.6%, and 98.4%, respectively. The five indexes of DNN classification results are at least 6.2%, 5%, 5.6%, 5.7%, and 6.2% higher than the comparison models (support vector machine and random forest). CONCLUSIONS The proposed DNN can learn advanced features from raw acoustic features, and distinguish pathological voice from healthy voice. To the extent of this preliminary study, future studies can further explore the application of DNN in other experiments and clinical practice.
Collapse
Affiliation(s)
- Lili Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China; Chongqing Survey Institute, Chongqing, China.
| | - Junjiang Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
| |
Collapse
|
35
|
Deep neural network for semi-automatic classification of term and preterm uterine recordings. Artif Intell Med 2020; 105:101861. [PMID: 32505424 DOI: 10.1016/j.artmed.2020.101861] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 02/25/2020] [Accepted: 04/14/2020] [Indexed: 02/04/2023]
Abstract
Pregnancy is a complex process, and the prediction of premature birth is uncertain. Many researchers are exploring non-invasive approaches to enhance its predictability. Currently, the ElectroHysteroGram (EHG) and Tocography (TOCO) signal are a real-time and non-invasive technology which can be employed to predict preterm birth. For this purpose, sparse autoencoder (SAE) based deep neural network (SAE-based DNN) is developed. The deep neural network has three layers including a stacked sparse autoencoder (SSAE) network with two hidden layers and one final softmax layer. To this end, the bursts of all 26 recordings of the publicly available TPEHGT DS database corresponding to uterine contraction intervals and non-contraction intervals (dummy intervals) were manually segmented. 20 features were extracted by two feature extraction algorithms including sample entropy and wavelet entropy. Afterwards, the SSAE network is adopted to learn high-level features from raw features by unsupervised learning. The softmax layer is added at the top of the SSAE network for classification. In order to verify the effectiveness of the proposed method, this study used 10-fold cross-validation and four indicators to evaluate classification performance. Experimental research results display that the performance of deep neural network can achieve Sensitivity of 98.2%, Specificity of 97.74%, and Accuracy of 97.9% in the publicly TPEHGT DS database. The performance of deep neural network outperforms the comparison models including deep belief networks (DBN) and hierarchical extreme learning machine (H-ELM). Finally, experimental research results reveal that the proposed method could be valid applied to semi-automatic identification of term and preterm uterine recordings.
Collapse
|
36
|
Abstract
AbstractThis paper investigates the use of recurrent neural network to predict urban long-term traffic flows. A representation of the long-term flows with related weather and contextual information is first introduced. A recurrent neural network approach, named RNN-LF, is then proposed to predict the long-term of flows from multiple data sources. Moreover, a parallel implementation on GPU of the proposed solution is developed (GRNN-LF), which allows to boost the performance of RNN-LF. Several experiments have been carried out on real traffic flow including a small city (Odense, Denmark) and a very big city (Beijing). The results reveal that the sequential version (RNN-LF) is capable of dealing effectively with traffic of small cities. They also confirm the scalability of GRNN-LF compared to the most competitive GPU-based software tools when dealing with big traffic flow such as Beijing urban data.
Collapse
|
37
|
Gautam R, Sharma M. Prevalence and Diagnosis of Neurological Disorders Using Different Deep Learning Techniques: A Meta-Analysis. J Med Syst 2020; 44:49. [DOI: 10.1007/s10916-019-1519-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/12/2019] [Indexed: 02/06/2023]
|
38
|
Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.09.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
39
|
Chen L, Feng Y, Maram MA, Wang Y, Wu M, Hirota K, Pedrycz W. Multi-SVM based Dempster–Shafer theory for gesture intention understanding using sparse coding feature. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105787] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
40
|
|
41
|
Zhao Q, Adeli E, Honnorat N, Leng T, Pohl KM. Variational AutoEncoder For Regression: Application to Brain Aging Analysis. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2019; 11765:823-831. [PMID: 32705091 PMCID: PMC7377006 DOI: 10.1007/978-3-030-32245-8_91] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
While unsupervised variational autoencoders (VAE) have become a powerful tool in neuroimage analysis, their application to supervised learning is under-explored. We aim to close this gap by proposing a unified probabilistic model for learning the latent space of imaging data and performing supervised regression. Based on recent advances in learning disentangled representations, the novel generative process explicitly models the conditional distribution of latent representations with respect to the regression target variable. Performing a variational inference procedure on this model leads to joint regularization between the VAE and a neural-network regressor. In predicting the age of 245 subjects from their structural Magnetic Resonance (MR) images, our model is more accurate than state-of-the-art methods when applied to either region-of-interest (ROI) measurements or raw 3D volume images. More importantly, unlike simple feed-forward neural-networks, disentanglement of age in latent representations allows for intuitive interpretation of the structural developmental patterns of the human brain.
Collapse
Affiliation(s)
| | | | | | - Tuo Leng
- Stanford University, Stanford, CA, USA
| | - Kilian M Pohl
- Stanford University, Stanford, CA, USA
- SRI International, Menlo Park, CA, USA
| |
Collapse
|
42
|
Huang H, Feng R, Zhu J, Li P. Prediction of pH Value by Multi-Classification in the Weizhou Island Area. SENSORS 2019; 19:s19183875. [PMID: 31500377 PMCID: PMC6767063 DOI: 10.3390/s19183875] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 09/05/2019] [Accepted: 09/05/2019] [Indexed: 11/16/2022]
Abstract
Ocean acidification is changing the chemical environment on which marine life depends. It causes a decrease in seawater pH and changes the water quality parameters of seawater. Changes in water quality parameters may affect pH, a key indicator for assessing ocean acidification. Therefore, it is particularly important to study the correlation between pH and various water quality parameters. In this paper, several water quality parameters with potential correlation with pH are investigated, and multiple linear regression, softmax regression, and support vector machine are used to perform multi-classification. Most importantly, experimental data were collected from Weizhou Island, China. The classification results show that the pH has a strong correlation with salinity, temperature, and dissolved oxygen. The prediction accuracy of the classification is good, and the correlation with dissolved oxygen is the most significant. The prediction accuracies of the three methods for multi-classifiers based on the above three factors reach 87.01%, 87.77%, and 89.04%, respectively.
Collapse
Affiliation(s)
- Haocai Huang
- Ocean College, Zhejiang University, Zhoushan 316021, China.
- Laboratory for Marine Geology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266061, China.
| | - Rendong Feng
- Ocean College, Zhejiang University, Zhoushan 316021, China.
| | - Jiang Zhu
- Ocean College, Zhejiang University, Zhoushan 316021, China.
| | - Peiliang Li
- Ocean College, Zhejiang University, Zhoushan 316021, China.
| |
Collapse
|
43
|
Raghavendra U, Gudigar A, Bhandary SV, Rao TN, Ciaccio EJ, Acharya UR. A Two Layer Sparse Autoencoder for Glaucoma Identification with Fundus Images. J Med Syst 2019; 43:299. [DOI: 10.1007/s10916-019-1427-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/21/2019] [Indexed: 12/12/2022]
|
44
|
Li Y, Gu JX, Zhen D, Xu M, Ball A. An Evaluation of Gearbox Condition Monitoring Using Infrared Thermal Images Applied with Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2019; 19:E2205. [PMID: 31086051 PMCID: PMC6540112 DOI: 10.3390/s19092205] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 05/08/2019] [Accepted: 05/10/2019] [Indexed: 11/16/2022]
Abstract
As an important machine component, the gearbox is widely used in industry for power transmission. Condition monitoring (CM) of a gearbox is critical to provide timely information for undertaking necessary maintenance actions. Massive research efforts have been made in the last two decades to develop vibration-based techniques. However, vibration-based methods usually include several inherent shortages including contact measurement, localized information, noise contamination, and high computation costs, making it difficult to be a cost-effective CM technique. In this paper, infrared thermal (IRT) images, which can contain information covering a large area and acquired remotely, are based on developing a cost-effective CM method. Moreover, a convolutional neural network (CNN) is employed to automatically process the raw IRT images for attaining more comprehensive feature parameters, which avoids the deficiency of incomplete information caused by various feature-extraction methods in vibration analysis. Thus, an IRT-CNN method is developed to achieve online remote monitoring of a gearbox. The performance evaluation based on a bevel gearbox shows that the proposed method can achieve nearly 100% correctness in identifying several common gear faults such as tooth pitting, cracks, and breakages and their compounds. It is also especially robust to ambient temperature changes. In addition, IRT also significantly outperforms its vibration-based counterparts.
Collapse
Affiliation(s)
- Yongbo Li
- School of Aeronautics, Northwestern Polytechnical University, Xian 710072, China.
| | - James Xi Gu
- Centre for Efficiency and Performance Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK.
| | - Dong Zhen
- School of Mechanical Engineering, Hebei University of Technology, Tianjin 300401, China.
| | - Minqiang Xu
- Astronautical Science and Mechanics, Harbin Institute of Technology (HIT), No.92 West Dazhi Street, Harbin 150001, China.
| | - Andrew Ball
- Centre for Efficiency and Performance Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK.
| |
Collapse
|
45
|
Das S, Sil J. Managing uncertainty in imputing missing symptom value for healthcare of rural India. Health Inf Sci Syst 2019; 7:5. [PMID: 30863541 DOI: 10.1007/s13755-019-0066-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Accepted: 02/01/2019] [Indexed: 11/30/2022] Open
Abstract
Purpose In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height-weight, BMI, Oxygen saturation level (SpO2) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set. Methods The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients' records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk-Leiber (K-L) distance and distance correlation (dCor) techniques, showing that the simulated data sets are well correlated with the real data set. Results Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts. Conclusions It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.
Collapse
Affiliation(s)
- Sayan Das
- Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal India
| | - Jaya Sil
- Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal India
| |
Collapse
|
46
|
|
47
|
Qian W, Li S, Wang J, Wu Q. A novel supervised sparse feature extraction method and its application on rotating machine fault diagnosis. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.09.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|