1
|
Rashad M, Alebiary D, Aldawsari M, Elsawy A, H. AbuEl-Atta A. FERDCNN: an efficient method for facial expression recognition through deep convolutional neural networks. PeerJ Comput Sci 2024; 10:e2272. [PMID: 39650474 PMCID: PMC11622989 DOI: 10.7717/peerj-cs.2272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 07/29/2024] [Indexed: 12/11/2024]
Abstract
Facial expression recognition (FER) has caught the research community's attention recently because it can affect many real-life applications. Multiple studies have focused on automatic FER, most of which use a machine learning methodology, FER has continued to be a difficult and exciting issue in computer vision. Deep learning has recently drawn increased attention as a solution to several practical issues, including facial expression recognition. This article introduces an efficient method for FER (FERDCNN) verified on five different pre-trained deep CNN (DCNN) models (AlexNet, GoogleNet, ResNet-18, ResNet-50, and ResNet-101). In the proposed method, firstly the input image has been pre-processed using face detection, resizing, gamma correction, and histogram equalization techniques. Secondly, the images go through DCNN to extract deep features. Finally, support vector machine (SVM) and transfer learning are used to classify generated features. Recent methods have been employed to evaluate and contrast the performance of the proposed approach on two publicly standard databases namely, CK+ and JAFFE on the seven classes of fundamental emotions, including anger, disgust, fear, happiness, sadness, and surprise beside neutrality for CK+ and contempt for JAFFE. The suggested method tested Four different traditional supervised classifiers with deep features, Experimental found that AlexNet excels as a feature extractor, while SVM demonstrates superiority as a classifier because of this combination achieving the highest accuracy rates of 99.0% and 95.16% for the CK+ database and the JAFFE datasets, respectively.
Collapse
Affiliation(s)
- Metwally Rashad
- Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
- Department of Computer Engineering and Information, College of Engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Doaa Alebiary
- Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
| | - Mohammed Aldawsari
- Department of Computer Engineering and Information, College of Engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Ahmed Elsawy
- Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
- Information Technology Department, Faculty of Technological Industry and Energy, Delta Technological Unversity, Egypt, Egypt
| | - Ahmed H. AbuEl-Atta
- Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
| |
Collapse
|
2
|
Fang B, Zhao Y, Han G, He J. Expression-Guided Deep Joint Learning for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:7148. [PMID: 37631685 PMCID: PMC10457757 DOI: 10.3390/s23167148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/10/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023]
Abstract
In recent years, convolutional neural networks (CNNs) have played a dominant role in facial expression recognition. While CNN-based methods have achieved remarkable success, they are notorious for having an excessive number of parameters, and they rely on a large amount of manually annotated data. To address this challenge, we expand the number of training samples by learning expressions from a face recognition dataset to reduce the impact of a small number of samples on the network training. In the proposed deep joint learning framework, the deep features of the face recognition dataset are clustered, and simultaneously, the parameters of an efficient CNN are learned, thereby marking the data for network training automatically and efficiently. Specifically, first, we develop a new efficient CNN based on the proposed affinity convolution module with much lower computational overhead for deep feature learning and expression classification. Then, we develop an expression-guided deep facial clustering approach to cluster the deep features and generate abundant expression labels from the face recognition dataset. Finally, the AC-based CNN is fine-tuned using an updated training set and a combined loss function. Our framework is evaluated on several challenging facial expression recognition datasets as well as a self-collected dataset. In the context of facial expression recognition applied to the field of education, our proposed method achieved an impressive accuracy of 95.87% on the self-collected dataset, surpassing other existing methods.
Collapse
Affiliation(s)
- Bei Fang
- Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an 710062, China; (B.F.); (G.H.)
| | - Yujie Zhao
- Department of Information Construction and Management, Shaanxi Normal University, Xi’an 710061, China;
| | - Guangxin Han
- Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an 710062, China; (B.F.); (G.H.)
| | - Juhou He
- Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an 710062, China; (B.F.); (G.H.)
| |
Collapse
|
3
|
Bellamkonda S, Gopalan NP, Mala C, Settipalli L. Facial expression recognition on partially occluded faces using component based ensemble stacked CNN. Cogn Neurodyn 2023; 17:985-1008. [PMID: 37522034 PMCID: PMC10374495 DOI: 10.1007/s11571-022-09879-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 07/22/2022] [Accepted: 08/13/2022] [Indexed: 11/28/2022] Open
Abstract
Facial Expression Recognition (FER) is the basis for many applications including human-computer interaction and surveillance. While developing such applications, it is imperative to understand human emotions for better interaction with machines. Among many FER models developed so far, Ensemble Stacked Convolution Neural Networks (ES-CNN) showed an empirical impact in improving the performance of FER on static images. However, the existing ES-CNN based FER models trained with features extracted from the entire face, are unable to address the issues of ambient parameters such as pose, illumination, occlusions. To mitigate the problem of reduced performance of ES-CNN on partially occluded faces, a Component based ES-CNN (CES-CNN) is proposed. CES-CNN applies ES-CNN on action units of individual face components such as eyes, eyebrows, nose, cheek, mouth, and glabella as one subnet of the network. Max-Voting based ensemble classifier is used to ensemble the decisions of the subnets in order to obtain the optimized recognition accuracy. The proposed CES-CNN is validated by conducting experiments on benchmark datasets and the performance is compared with the state-of-the-art models. It is observed from the experimental results that the proposed model has a significant enhancement in the recognition accuracy compared to the existing models.
Collapse
Affiliation(s)
- Sivaiah Bellamkonda
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - N. P. Gopalan
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - C. Mala
- Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - Lavanya Settipalli
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| |
Collapse
|
4
|
Lukac M, Zhambulova G, Abdiyeva K, Lewis M. Study on emotion recognition bias in different regional groups. Sci Rep 2023; 13:8414. [PMID: 37225756 PMCID: PMC10209154 DOI: 10.1038/s41598-023-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 05/10/2023] [Indexed: 05/26/2023] Open
Abstract
Human-machine communication can be substantially enhanced by the inclusion of high-quality real-time recognition of spontaneous human emotional expressions. However, successful recognition of such expressions can be negatively impacted by factors such as sudden variations of lighting, or intentional obfuscation. Reliable recognition can be more substantively impeded due to the observation that the presentation and meaning of emotional expressions can vary significantly based on the culture of the expressor and the environment within which the emotions are expressed. As an example, an emotion recognition model trained on a regionally-specific database collected from North America might fail to recognize standard emotional expressions from another region, such as East Asia. To address the problem of regional and cultural bias in emotion recognition from facial expressions, we propose a meta-model that fuses multiple emotional cues and features. The proposed approach integrates image features, action level units, micro-expressions and macro-expressions into a multi-cues emotion model (MCAM). Each of the facial attributes incorporated into the model represents a specific category: fine-grained content-independent features, facial muscle movements, short-term facial expressions and high-level facial expressions. The results of the proposed meta-classifier (MCAM) approach show that a) the successful classification of regional facial expressions is based on non-sympathetic features b) learning the emotional facial expressions of some regional groups can confound the successful recognition of emotional expressions of other regional groups unless it is done from scratch and c) the identification of certain facial cues and features of the data-sets that serve to preclude the design of the perfect unbiased classifier. As a result of these observations we posit that to learn certain regional emotional expressions, other regional expressions first have to be "forgotten".
Collapse
Affiliation(s)
- Martin Lukac
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan.
| | - Gulnaz Zhambulova
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| | - Kamila Abdiyeva
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| | - Michael Lewis
- Department of Computer Science, Nazarbayev University, Kabanbay Batyr 53, Astana, 010000, Kazakhstan
| |
Collapse
|
5
|
Ryumina E, Dresvyanskiy D, Karpov A. In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
6
|
Liu P, Lin Y, Meng Z, Lu L, Deng W, Zhou JT, Yang Y. Point Adversarial Self-Mining: A Simple Method for Facial Expression Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12649-12660. [PMID: 34197333 DOI: 10.1109/tcyb.2021.3085744] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we propose a simple yet effective approach, called point adversarial self mining (PASM), to improve the recognition accuracy in facial expression recognition (FER). Unlike previous works focusing on designing specific architectures or loss functions to solve this problem, PASM boosts the network capability by simulating human learning processes: providing updated learning materials and guidance from more capable teachers. Specifically, to generate new learning materials, PASM leverages a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task, generating harder learning samples to refine the network. The searched position is highly adaptive since it considers both the statistical information of each sample and the teacher network capability. Other than being provided new learning materials, the student network also receives guidance from the teacher network. After the student network finishes training, the student network changes its role and acts as a teacher, generating new learning materials and providing stronger guidance to train a better student network. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively. Extensive experimental results validate the efficacy of our method over the existing state of the arts for FER.
Collapse
|
7
|
A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6060047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multimodal human–computer interaction (HCI) systems pledge a more human–human-like interaction between machines and humans. Their prowess in emanating an unambiguous information exchange between the two makes these systems more reliable, efficient, less error prone, and capable of solving complex tasks. Emotion recognition is a realm of HCI that follows multimodality to achieve accurate and natural results. The prodigious use of affective identification in e-learning, marketing, security, health sciences, etc., has increased demand for high-precision emotion recognition systems. Machine learning (ML) is getting its feet wet to ameliorate the process by tweaking the architectures or wielding high-quality databases (DB). This paper presents a survey of such DBs that are being used to develop multimodal emotion recognition (MER) systems. The survey illustrates the DBs that contain multi-channel data, such as facial expressions, speech, physiological signals, body movements, gestures, and lexical features. Few unimodal DBs are also discussed that work in conjunction with other DBs for affect recognition. Further, VIRI, a new DB of visible and infrared (IR) images of subjects expressing five emotions in an uncontrolled, real-world environment, is presented. A rationale for the superiority of the presented corpus over the existing ones is instituted.
Collapse
|
8
|
Kaur I, Goyal LM, Ghansiyal A, Hemanth DJ. Efficient Approach for Rhopalocera Classification Using Growing Convolutional Neural Network. INT J UNCERTAIN FUZZ 2022. [DOI: 10.1142/s0218488522400189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the present times, artificial-intelligence based techniques are considered as one of the prominent ways to classify images which can be conveniently leveraged in the real-world scenarios. This technology can be extremely beneficial to the lepidopterists, to assist them in classification of the diverse species of Rhopalocera, commonly called as butterflies. In this article, image classification is performed on a dataset of various butterfly species, facilitated via the feature extraction process of the Convolutional Neural Network (CNN) along with leveraging the additional features calculated independently to train the model. The classification models deployed for this purpose predominantly include K-Nearest Neighbors (KNN), Random Forest and Support Vector Machine (SVM). However, each of these methods tend to focus on one specific class of features. Therefore, an ensemble of multiple classes of features used for classification of images is implemented. This research paper discusses the results achieved from the classification performed on basis of two different classes of features i.e., structure and texture. The amalgamation of the two specified classes of features forms a combined data set, which has further been used to train the Growing Convolutional Neural Network (GCNN), resulting in higher accuracy of the classification model. The experiment performed resulted in promising outcomes with TP rate, FP rate, Precision, recall and F-measure values as 0.9690, 0.0034, 0.9889, 0.9692 and 0.9686 respectively. Furthermore, an accuracy of 96.98% was observed by the proposed methodology.
Collapse
Affiliation(s)
- Iqbaldeep Kaur
- Department of Computer Science, CGC, Landran, Mohali, India
| | - Lalit Mohan Goyal
- Department of Computer Engineering, J C Bose University of Science and Technology, YMCA, Faridabad, India
| | | | - D. Jude Hemanth
- Department of ECE, Karunya Institute of Technology and Sciences, Coimbatore, India
| |
Collapse
|
9
|
Liu M, Zhang Y, Wang J, Qin N, Yang H, Sun K, Hao J, Shu L, Liu J, Chen Q, Zhang P, Tao TH. A star-nose-like tactile-olfactory bionic sensing array for robust object recognition in non-visual environments. Nat Commun 2022; 13:79. [PMID: 35013205 PMCID: PMC8748716 DOI: 10.1038/s41467-021-27672-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 11/24/2021] [Indexed: 12/18/2022] Open
Abstract
Object recognition is among the basic survival skills of human beings and other animals. To date, artificial intelligence (AI) assisted high-performance object recognition is primarily visual-based, empowered by the rapid development of sensing and computational capabilities. Here, we report a tactile-olfactory sensing array, which was inspired by the natural sense-fusion system of star-nose mole, and can permit real-time acquisition of the local topography, stiffness, and odor of a variety of objects without visual input. The tactile-olfactory information is processed by a bioinspired olfactory-tactile associated machine-learning algorithm, essentially mimicking the biological fusion procedures in the neural system of the star-nose mole. Aiming to achieve human identification during rescue missions in challenging environments such as dark or buried scenarios, our tactile-olfactory intelligent sensing system could classify 11 typical objects with an accuracy of 96.9% in a simulated rescue scenario at a fire department test site. The tactile-olfactory bionic sensing system required no visual input and showed superior tolerance to environmental interference, highlighting its great potential for robust object recognition in difficult environments where other methods fall short.
Collapse
Affiliation(s)
- Mengwei Liu
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yujia Zhang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiachuang Wang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Nan Qin
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Heng Yang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ke Sun
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jie Hao
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100049, China
| | - Lin Shu
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiarui Liu
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100049, China
| | - Qiang Chen
- Shanghai Fire Research Institute of MEM, Shanghai, 200003, China
| | - Pingping Zhang
- Suzhou Huiwen Nanotechnology Co., Ltd, Suzhou, 215004, China
| | - Tiger H Tao
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China.
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China.
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China.
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 200031, China.
- Institute of Brain-Intelligence Technology, Zhangjiang Laboratory, Shanghai, 200031, China.
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, 200031, China.
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
10
|
Georgescu MI, Duţǎ GE, Ionescu RT. Teacher-student training and triplet loss to reduce the effect of drastic face occlusion: Application to emotion recognition, gender identification and age estimation. MACHINE VISION AND APPLICATIONS 2021; 33:12. [PMID: 34955610 PMCID: PMC8693600 DOI: 10.1007/s00138-021-01270-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 08/13/2021] [Accepted: 11/19/2021] [Indexed: 06/14/2023]
Abstract
We study a series of recognition tasks in two realistic scenarios requiring the analysis of faces under strong occlusion. On the one hand, we aim to recognize facial expressions of people wearing virtual reality headsets. On the other hand, we aim to estimate the age and identify the gender of people wearing surgical masks. For all these tasks, the common ground is that half of the face is occluded. In this challenging setting, we show that convolutional neural networks trained on fully visible faces exhibit very low performance levels. While fine-tuning the deep learning models on occluded faces is extremely useful, we show that additional performance gains can be obtained by distilling knowledge from models trained on fully visible faces. To this end, we study two knowledge distillation methods, one based on teacher-student training and one based on triplet loss. Our main contribution consists in a novel approach for knowledge distillation based on triplet loss, which generalizes across models and tasks. Furthermore, we consider combining distilled models learned through conventional teacher-student training or through our novel teacher-student training based on triplet loss. We provide empirical evidence showing that, in most cases, both individual and combined knowledge distillation methods bring statistically significant performance improvements. We conduct experiments with three different neural models (VGG-f, VGG-face and ResNet-50) on various tasks (facial expression recognition, gender recognition, age estimation), showing consistent improvements regardless of the model or task.
Collapse
Affiliation(s)
- Mariana-Iuliana Georgescu
- SecurifAI, Bd. Mircea Vodă 21D, Bucharest, Romania
- Department of Computer Science, University of Bucharest, 14 Academiei, Bucharest, Romania
| | - Georgian-Emilian Duţǎ
- SecurifAI, Bd. Mircea Vodă 21D, Bucharest, Romania
- Department of Computer Science, University of Bucharest, 14 Academiei, Bucharest, Romania
| | - Radu Tudor Ionescu
- SecurifAI, Bd. Mircea Vodă 21D, Bucharest, Romania
- Department of Computer Science and Romanian Young Academy, University of Bucharest, 14 Academiei, Bucharest, Romania
| |
Collapse
|
11
|
Racial Identity-Aware Facial Expression Recognition Using Deep Convolutional Neural Networks. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010088] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Multi-culture facial expression recognition remains challenging due to cross cultural variations in facial expressions representation, caused by facial structure variations and culture specific facial characteristics. In this research, a joint deep learning approach called racial identity aware deep convolution neural network is developed to recognize the multicultural facial expressions. In the proposed model, a pre-trained racial identity network learns the racial features. Then, the racial identity aware network and racial identity network jointly learn the racial identity aware facial expressions. By enforcing the marginal independence of facial expression and racial identity, the proposed joint learning approach is expected to be purer for the expression and be robust to facial structure and culture specific facial characteristics variations. For the reliability of the proposed joint learning technique, extensive experiments were performed with racial identity features and without racial identity features. Moreover, culture wise facial expression recognition was performed to analyze the effect of inter-culture variations in facial expression representation. A large scale multi-culture dataset is developed by combining the four facial expression datasets including JAFFE, TFEID, CK+ and RaFD. It contains facial expression images of Japanese, Taiwanese, American, Caucasian and Moroccan cultures. We achieved 96% accuracy with racial identity features and 93% accuracy without racial identity features.
Collapse
|
12
|
Guo L, Li R, Jiang B. An Ensemble Broad Learning Scheme for Semisupervised Vehicle Type Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5287-5297. [PMID: 34086583 DOI: 10.1109/tnnls.2021.3083508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays vehicle type classification is a fundamental part of intelligent transportation systems (ITSs) and is widely used in various applications like traffic flow monitoring, security enforcement, and autonomous driving, etc. However, vehicle classification is usually used in supervised learning, which greatly limits the applicability for real ITS. This article proposes a semisupervised vehicle type classification scheme via ensemble broad learning for ITS. This presented method contains two main parts. In the first part, a collection of base broad learning system (BLS) classifiers is trained by semisupervised learning to avoid time-consuming training process and alleviate the increasingly unlabeled samples burden. In the second part, a dynamic ensemble structure constructed by trained classifier groups with different characteristics obtains the highest type probability and determine which the vehicle belongs, so as to achieve superior generalization performance than a single base classifier. Several experiments conducted on the pubic BIT-Vehicle dataset and MIO-TCD dataset demonstrate that the proposed method outperforms single BLS classifier and some mainstream methods on effectiveness and efficiency.
Collapse
|
13
|
Mittal S. Ensemble of transfer learnt classifiers for recognition of cardiovascular tissues from histological images. Phys Eng Sci Med 2021; 44:655-665. [PMID: 34014495 DOI: 10.1007/s13246-021-01013-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 05/10/2021] [Indexed: 12/16/2022]
Abstract
Recognition of tissues and organs is a recurrent step performed by experts during analyses of histological images. With advancement in the field of machine learning, such steps can be automated using computer vision methods. This paper presents an ensemble-based approach for improved classification of non-pathological tissues and organs in histological images using convolutional neural networks (CNNs). With limited dataset size, we relied upon transfer learning where pre-trained CNNs are re-used for new classification problems. The transfer learning was done using eleven CNN architectures upon 6000 image patches constituting training and validation subsets of a public dataset containing six cardiovascular categories. The CNN models were fine-tuned upon a much larger dataset obtained by augmenting training subset to obtain agreeable performance on validation subset. Lastly, we created various ensembles of trained classifiers and evaluate them on testing subset of 7500 patches. The best ensemble classifier gives, precision, recall, and accuracy of 0.876, 0.869 and 0.869, respectively upon test images. With an overall F1-score of 0.870, our ensemble-based approach outperforms previous approaches with single fine-tuned CNN, CNN trained from scratch, and traditional machine learning by 0.019, 0.064 and 0.183, respectively. Ensemble approach can perform better than individual classifier-based ones, provided the constituent classifiers are chosen wisely. The empirical choice of classifiers reinforces the intuition that models which are newer and outperformed in their native domain are more likely to outperform in transferred-domain, since the best ensemble dominantly consists of more lately proposed and better architectures.
Collapse
Affiliation(s)
- Shubham Mittal
- Department of Electronics and Communication Engineering, Ambedkar Institute of Advanced Communication Technologies and Research, Delhi, India.
| |
Collapse
|
14
|
Abstract
Human facial emotion recognition (FER) has attracted the attention of the research community for its promising applications. Mapping different facial expressions to the respective emotional states are the main task in FER. The classical FER consists of two major steps: feature extraction and emotion recognition. Currently, the Deep Neural Networks, especially the Convolutional Neural Network (CNN), is widely used in FER by virtue of its inherent feature extraction mechanism from images. Several works have been reported on CNN with only a few layers to resolve FER problems. However, standard shallow CNNs with straightforward learning schemes have limited feature extraction capability to capture emotion information from high-resolution images. A notable drawback of the most existing methods is that they consider only the frontal images (i.e., ignore profile views for convenience), although the profile views taken from different angles are important for a practical FER system. For developing a highly accurate FER system, this study proposes a very Deep CNN (DCNN) modeling through Transfer Learning (TL) technique where a pre-trained DCNN model is adopted by replacing its dense upper layer(s) compatible with FER, and the model is fine-tuned with facial emotion data. A novel pipeline strategy is introduced, where the training of the dense layer(s) is followed by tuning each of the pre-trained DCNN blocks successively that has led to gradual improvement of the accuracy of FER to a higher level. The proposed FER system is verified on eight different pre-trained DCNN models (VGG-16, VGG-19, ResNet-18, ResNet-34, ResNet-50, ResNet-152, Inception-v3 and DenseNet-161) and well-known KDEF and JAFFE facial image datasets. FER is very challenging even for frontal views alone. FER on the KDEF dataset poses further challenges due to the diversity of images with different profile views together with frontal views. The proposed method achieved remarkable accuracy on both datasets with pre-trained models. On a 10-fold cross-validation way, the best achieved FER accuracies with DenseNet-161 on test sets of KDEF and JAFFE are 96.51% and 99.52%, respectively. The evaluation results reveal the superiority of the proposed FER system over the existing ones regarding emotion detection accuracy. Moreover, the achieved performance on the KDEF dataset with profile views is promising as it clearly demonstrates the required proficiency for real-life applications.
Collapse
|
15
|
Gupta V, Bhavsar A. Heterogeneous ensemble with information theoretic diversity measure for human epithelial cell image classification. Med Biol Eng Comput 2021; 59:1035-1054. [PMID: 33860445 DOI: 10.1007/s11517-021-02336-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 02/09/2021] [Indexed: 12/01/2022]
Abstract
In this work, we propose a heterogeneous committee (ensemble) of diverse members (classification approaches) to solve the problem of human epithelial (HEp-2) cell image classification using indirect Immunofluorescence (IIF) imaging. We hypothesize that an ensemble involving different feature representations can enable higher performance if individual members in the ensemble are sufficiently varied. These members are of two types: (1) CNN-based members, (2) traditional members. For the CNN members, we have employed the well-established ResNet, DenseNet, and Inception models, which have distinctive salient aspects. For the traditional members, we incorporate class-specific features which are characterized depending on visual morphological attributes, and some standard texture features. To select the members which are discriminating and not redundant, we use an information theoretic measure which considers the trade-off between individual accuracies and diversity among the members. For all selected members, a compelling fusion required to combine their outputs to reach a final decision. Thus, we also investigate various fusion methods that combine the opinion of the committee at different levels: maximum voting, product, decision template, Bayes, Dempster-Shafer, etc. The proposed method is evaluated using ICPR-2014 data which consists of more images than some previous datasets ICPR-2012 and demonstrate state-of-the-art performance. To check the effectiveness of the proposed methodology for other related datasets, we test our methodology with newly compiled large-scale HEp-2 dataset with 63K cell images and demonstrate comparable performance even with less number of training samples. The proposed method produces 99.80% and 86.03% accuracy respectively when tested on ICPR-2014 and a new large-scale data containing 63K samples. Graphical Abstract Overview of the proposed methodology.
Collapse
Affiliation(s)
- Vibha Gupta
- School of Computer and Electrical Engineering, Indian Institute of Technology, Himachal Pradesh, Mandi, 175005, India.
| | - Arnav Bhavsar
- School of Computer and Electrical Engineering, Indian Institute of Technology, Himachal Pradesh, Mandi, 175005, India
| |
Collapse
|
16
|
Ramis S, Buades JM, Perales FJ. Using a Social Robot to Evaluate Facial Expressions in the Wild. SENSORS 2020; 20:s20236716. [PMID: 33255347 PMCID: PMC7727691 DOI: 10.3390/s20236716] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/20/2020] [Accepted: 11/20/2020] [Indexed: 11/22/2022]
Abstract
In this work an affective computing approach is used to study the human-robot interaction using a social robot to validate facial expressions in the wild. Our global goal is to evaluate that a social robot can be used to interact in a convincing manner with human users to recognize their potential emotions through facial expressions, contextual cues and bio-signals. In particular, this work is focused on analyzing facial expression. A social robot is used to validate a pre-trained convolutional neural network (CNN) which recognizes facial expressions. Facial expression recognition plays an important role in recognizing and understanding human emotion by robots. Robots equipped with expression recognition capabilities can also be a useful tool to get feedback from the users. The designed experiment allows evaluating a trained neural network in facial expressions using a social robot in a real environment. In this paper a comparison between the CNN accuracy and human experts is performed, in addition to analyze the interaction, attention and difficulty to perform a particular expression by 29 non-expert users. In the experiment, the robot leads the users to perform different facial expressions in motivating and entertaining way. At the end of the experiment, the users are quizzed about their experience with the robot. Finally, a set of experts and the CNN classify the expressions. The obtained results allow affirming that the use of social robot is an adequate interaction paradigm for the evaluation on facial expression.
Collapse
|
17
|
Abstract
AbstractLearning to play and perform a music instrument is a complex cognitive task, requiring high conscious control and coordination of an impressive number of cognitive and sensorimotor skills. For professional violinists, there exists a physical connection with the instrument allowing the player to continuously manage the sound through sophisticated bowing techniques and fine hand movements. Hence, it is not surprising that great importance in violin training is given to right hand techniques, responsible for most of the sound produced. In this paper, our aim is to understand which motion features can be used to efficiently and effectively distinguish a professional performance from that of a student without exploiting sound-based features. We collected and made freely available a dataset consisting of motion capture recordings of different violinists with different skills performing different exercises covering different pedagogical and technical aspects. We then engineered peculiar features and trained a data-driven classifier to distinguish among two different levels of violinist experience, namely beginners and experts. In accordance with the hierarchy present in the dataset, we study two different scenarios: extrapolation with respect to different exercises and violinists. Furthermore, we study which features are the most predictive ones of the quality of a violinist to corroborate the significance of the results. The results, both in terms of accuracy and insight on the cognitive problem, support the proposal and support the use of the proposed technique as a support tool for students to monitor and enhance their home study and practice.
Collapse
|
18
|
Taherkhani A, Cosma G, McGinnity T. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.064] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
19
|
Deep reinforcement learning for robust emotional classification in facial expression recognition. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106172] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
20
|
A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images. MULTIMODAL TECHNOLOGIES AND INTERACTION 2020. [DOI: 10.3390/mti4030046] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The contribution involves implementing an ensemble-based approach for the AER through the fusion of visible images and infrared (IR) images with speech. The framework is implemented in two layers, where the first layer detects emotions using single modalities while the second layer combines the modalities and classifies emotions. Convolutional Neural Networks (CNN) have been used for feature extraction and classification. A hybrid fusion approach comprising early (feature-level) and late (decision-level) fusion, was applied to combine the features and the decisions at different stages. The output of the CNN trained with voice samples of the RAVDESS database was combined with the image classifier’s output using decision-level fusion to obtain the final decision. An accuracy of 86.36% and similar recall (0.86), precision (0.88), and f-measure (0.87) scores were obtained. A comparison with contemporary work endorsed the competitiveness of the framework with the rationale for exclusivity in attaining this accuracy in wild backgrounds and light-invariant conditions.
Collapse
|
21
|
A Novel Functional Link Network Stacking Ensemble with Fractal Features for Multichannel Fall Detection. Cognit Comput 2020. [DOI: 10.1007/s12559-020-09749-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractFalls are a major health concern and result in high morbidity and mortality rates in older adults with high costs to health services. Automatic fall classification and detection systems can provide early detection of falls and timely medical aid. This paper proposes a novel Random Vector Functional Link (RVFL) stacking ensemble classifier with fractal features for classification of falls. The fractal Hurst exponent is used as a representative of fractal dimensionality for capturing irregularity of accelerometer signals for falls and other activities of daily life. The generalised Hurst exponents along with wavelet transform coefficients are leveraged as input feature space for a novel stacking ensemble of RVFLs composed with an RVFL neural network meta-learner. Novel fast selection criteria are presented for base classifiers founded on the proposed diversity indicator, obtained from the overall performance values during the training phase. The proposed features and the stacking ensemble provide the highest classification accuracy of 95.71% compared with other machine learning techniques, such as Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine. The proposed ensemble classifier is 2.3× faster than a single Decision Tree and achieves the highest speedup in training time of 317.7× and 198.56× compared with a highly optimised ANN and RF ensemble, respectively. The significant improvements in training times of the order of 100× and high accuracy demonstrate that the proposed RVFL ensemble is a prime candidate for real-time, embedded wearable device–based fall detection systems.
Collapse
|
22
|
|
23
|
Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10072373] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and malware signatures, respectively, some recent research, utilizing relatively vanilla deep learning (DL) methods, has demonstrated promising anomaly-based detection results that already have practical applicability due low false positive rate (FPR). More complex DL methods typically provide better results in natural language processing and image recognition tasks. In this paper, we analyze applicability of more complex dual-flow DL methods, such as long short-term memory fully convolutional network (LSTM-FCN), gated recurrent unit (GRU)-FCN, and several others, for the task specified on the attack-caused Windows OS system calls traces dataset (AWSCTD) and compare it with vanilla single-flow convolutional neural network (CNN) models. The results obtained do not demonstrate any advantages of dual-flow models while processing univariate times series data and introducing unnecessary level of complexity, increasing training, and anomaly detection time, which is crucial in the intrusion containment process. On the other hand, the newly tested AWSCTD-CNN-static (S) single-flow model demonstrated three times better training and testing times, preserving the high detection accuracy.
Collapse
|
24
|
Ensemble of Deep Convolutional Neural Networks for Automatic Pavement Crack Detection and Measurement. COATINGS 2020. [DOI: 10.3390/coatings10020152] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Automated pavement crack detection and measurement are important road issues. Agencies have to guarantee the improvement of road safety. Conventional crack detection and measurement algorithms can be extremely time-consuming and low efficiency. Therefore, recently, innovative algorithms have received increased attention from researchers. In this paper, we propose an ensemble of convolutional neural networks (without a pooling layer) based on probability fusion for automated pavement crack detection and measurement. Specifically, an ensemble of convolutional neural networks was employed to identify the structure of small cracks with raw images. Secondly, outputs of the individual convolutional neural network model for the ensemble were averaged to produce the final crack probability value of each pixel, which can obtain a predicted probability map. Finally, the predicted morphological features of the cracks were measured by using the skeleton extraction algorithm. To validate the proposed method, some experiments were performed on two public crack databases (CFD and AigleRN) and the results of the different state-of-the-art methods were compared. To evaluate the efficiency of crack detection methods, three parameters were considered: precision (Pr), recall (Re) and F1 score (F1). For the two public databases of pavement images, the proposed method obtained the highest values of the three evaluation parameters: for the CFD database, Pr = 0.9552, Re = 0.9521 and F1 = 0.9533 (which reach values up to 0.5175 higher than the values obtained on the same database with the other methods), for the AigleRN database, Pr = 0.9302, Re = 0.9166 and F1 = 0.9238 (which reach values up to 0.7313 higher than the values obtained on the same database with the other methods). The experimental results show that the proposed method outperforms the other methods. For crack measurement, the crack length and width can be measure based on different crack types (complex, common, thin, and intersecting cracks.). The results show that the proposed algorithm can be effectively applied for crack measurement.
Collapse
|
25
|
Seo YA, Kim KR, Cho C, Oh JW, Kim TH. Deep Neural Network-Based Concentration Model for Oak Pollen Allergy Warning in South Korea. ALLERGY, ASTHMA & IMMUNOLOGY RESEARCH 2020; 12:149-163. [PMID: 31743971 PMCID: PMC6875477 DOI: 10.4168/aair.2020.12.1.149] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 08/29/2019] [Accepted: 09/04/2019] [Indexed: 11/29/2022]
Abstract
Purpose Oak is the dominant tree species in Korea. Oak pollen has the highest sensitivity rate among all allergenic tree species in Korea. A deep neural network (DNN)-based estimation model was developed to determine the concentration of oak pollen and overcome the shortcomings of conventional regression models. Methods The DNN model proposed in this study utilized weather factors as the input and provided pollen concentrations as the output. Weather and pollen concentration data were used from 2007 to 2016 obtained from the Korea Meteorological Administration pollen observation network. Because it is difficult to prevent over-fitting and underestimation by using a DNN model alone, we developed a bootstrap aggregating-type ensemble model. Each of the 30 ensemble members was trained with random sampling at a fixed rate according to the pollen risk grade. To verify the effectiveness of the proposed model, we compared its performance with those of models of regression and support vector regression (SVR) under the same conditions, with respect to the prediction of pollen concentrations, risk levels, and season length. Results The mean absolute percentage error in the estimated pollen concentrations was 11.18%, 10.37%, and 5.04% for the regression, SVR and DNN models, respectively. The start of the pollen season was estimated to be 20, 22, and 6 days earlier than that predicted by the regression, SVR and DNN models, respectively. Similarly, the end of the pollen season was estimated to be 33, 20, and 9 days later that predicted by the regression, SVR and DNN models, respectively. Conclusions Overall, the DNN model performed better than the other models. However, the prediction of peak pollen concentrations needs improvement. Improved observation quality with optimization of the DNN model will resolve this issue.
Collapse
Affiliation(s)
- Yun Am Seo
- AI Weather Forecast Research Team, National Institute of Meteorological Science, Seogwipo, Korea
| | - Kyu Rang Kim
- Applied Meteorology Research Division, National Institute of Meteorological Science, Seogwipo, Korea.
| | - Changbum Cho
- Applied Meteorology Research Division, National Institute of Meteorological Science, Seogwipo, Korea
| | - Jae Won Oh
- Department of Pediatrics, Hanyang University College of Medicine, Seoul, Korea
| | - Tae Hee Kim
- Urban Forest Research Center, National Institute of Forest Science, Korea Forest Service, Seoul, Korea
| |
Collapse
|
26
|
Ruiz J, Mahmud M, Modasshir M, Shamim Kaiser M, Alzheimer’s Disease Neuroimaging In FT. 3D DenseNet Ensemble in 4-Way Classification of Alzheimer’s Disease. Brain Inform 2020. [DOI: 10.1007/978-3-030-59277-6_8] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
27
|
Abstract
As an important part of emotion research, facial expression recognition is a necessary requirement in human–machine interface. Generally, a face expression recognition system includes face detection, feature extraction, and feature classification. Although great success has been made by the traditional machine learning methods, most of them have complex computational problems and lack the ability to extract comprehensive and abstract features. Deep learning-based methods can realize a higher recognition rate for facial expressions, but a large number of training samples and tuning parameters are needed, and the hardware requirement is very high. For the above problems, this paper proposes a method combining features that extracted by the convolutional neural network (CNN) with the C4.5 classifier to recognize facial expressions, which not only can address the incompleteness of handcrafted features but also can avoid the high hardware configuration in the deep learning model. Considering some problems of overfitting and weak generalization ability of the single classifier, random forest is applied in this paper. Meanwhile, this paper makes some improvements for C4.5 classifier and the traditional random forest in the process of experiments. A large number of experiments have proved the effectiveness and feasibility of the proposed method.
Collapse
|
28
|
Abstract
In recent years, with the development of artificial intelligence and human–computer interaction, more attention has been paid to the recognition and analysis of facial expressions. Despite much great success, there are a lot of unsatisfying problems, because facial expressions are subtle and complex. Hence, facial expression recognition is still a challenging problem. In most papers, the entire face image is often chosen as the input information. In our daily life, people can perceive other’s current emotions only by several facial components (such as eye, mouth and nose), and other areas of the face (such as hair, skin tone, ears, etc.) play a smaller role in determining one’s emotion. If the entire face image is used as the only input information, the system will produce some unnecessary information and miss some important information in the process of feature extraction. To solve the above problem, this paper proposes a method that combines multiple sub-regions and the entire face image by weighting, which can capture more important feature information that is conducive to improving the recognition accuracy. Our proposed method was evaluated based on four well-known publicly available facial expression databases: JAFFE, CK+, FER2013 and SFEW. The new method showed better performance than most state-of-the-art methods.
Collapse
|
29
|
Hou B, Kang G, Zhang N, Liu K. Multi-target Interactive Neural Network for Automated Segmentation of the Hippocampus in Magnetic Resonance Imaging. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09645-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
30
|
Gan Y, Chen J, Xu L. Facial expression recognition boosted by soft label with a diverse ensemble. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
31
|
|
32
|
|
33
|
Wang H, Shen Y, Wang S, Xiao T, Deng L, Wang X, Zhao X. Ensemble of 3D densely connected convolutional network for diagnosis of mild cognitive impairment and Alzheimer’s disease. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.12.018] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
34
|
|
35
|
Li W, Chu M, Qiao J. Design of a hierarchy modular neural network and its application in multimodal emotion recognition. Soft comput 2019. [DOI: 10.1007/s00500-018-03735-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
36
|
|
37
|
Xi Y, Zheng J, Li X, Xu X, Ren J, Xie G. SR-POD: Sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection. COGN SYST RES 2018. [DOI: 10.1016/j.cogsys.2018.06.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
Wang QF, Xu M, Hussain A. Large-scale Ensemble Model for Customer Churn Prediction in Search Ads. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9608-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Liu B, He L, Li Y, Zhe S, Xu Z. NeuralCP: Bayesian Multiway Data Analysis with Neural Tensor Decomposition. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9587-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
40
|
Unsupervised Domain Adaptation for Facial Expression Recognition Using Generative Adversarial Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2018; 2018:7208794. [PMID: 30111995 PMCID: PMC6077544 DOI: 10.1155/2018/7208794] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Accepted: 06/19/2018] [Indexed: 11/18/2022]
Abstract
In the facial expression recognition task, a good-performing convolutional neural network (CNN) model trained on one dataset (source dataset) usually performs poorly on another dataset (target dataset). This is because the feature distribution of the same emotion varies in different datasets. To improve the cross-dataset accuracy of the CNN model, we introduce an unsupervised domain adaptation method, which is especially suitable for unlabelled small target dataset. In order to solve the problem of lack of samples from the target dataset, we train a generative adversarial network (GAN) on the target dataset and use the GAN generated samples to fine-tune the model pretrained on the source dataset. In the process of fine-tuning, we give the unlabelled GAN generated samples distributed pseudolabels dynamically according to the current prediction probabilities. Our method can be easily applied to any existing convolutional neural networks (CNN). We demonstrate the effectiveness of our method on four facial expression recognition datasets with two CNN structures and obtain inspiring results.
Collapse
|
41
|
|
42
|
A New Algorithm for SAR Image Target Recognition Based on an Improved Deep Convolutional Neural Network. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9563-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
43
|
|
44
|
Li C, Deng C, Zhou S, Zhao B, Huang GB. Conditional Random Mapping for Effective ELM Feature Representation. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9557-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
45
|
|
46
|
Anbar M, Abdullah R, Al-Tamimi BN, Hussain A. A Machine Learning Approach to Detect Router Advertisement Flooding Attacks in Next-Generation IPv6 Networks. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9519-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
47
|
Ren P, Sun W, Luo C, Hussain A. Clustering-Oriented Multiple Convolutional Neural Networks for Single Image Super-Resolution. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9512-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|