51
|
Buongiorno D, Cascarano GD, De Feudis I, Brunetti A, Carnimeo L, Dimauro G, Bevilacqua V. Deep learning for processing electromyographic signals: A taxonomy-based survey. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.139] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
52
|
Zhang SS, Liu JW, Zuo X, Lu RK, Lian SM. Online deep learning based on auto-encoder. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02058-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
53
|
Jia S, Jiang S, Lin Z, Li N, Xu M, Yu S. A survey: Deep learning for hyperspectral image classification with few labeled samples. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.035] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
54
|
A novel approach for facial expression recognition based on Gabor filters and genetic algorithm. EVOLVING SYSTEMS 2021. [DOI: 10.1007/s12530-021-09393-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
55
|
Hohmann V, Paluch R, Krueger M, Meis M, Grimm G. The Virtual Reality Lab: Realization and Application of Virtual Sound Environments. Ear Hear 2021; 41 Suppl 1:31S-38S. [PMID: 33105257 PMCID: PMC7676619 DOI: 10.1097/aud.0000000000000945] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/15/2020] [Indexed: 12/23/2022]
Abstract
To assess perception with and performance of modern and future hearing devices with advanced adaptive signal processing capabilities, novel evaluation methods are required that go beyond already established methods. These novel methods will simulate to a certain extent the complexity and variability of acoustic conditions and acoustic communication styles in real life. This article discusses the current state and the perspectives of virtual reality technology use in the lab for designing complex audiovisual communication environments for hearing assessment and hearing device design and evaluation. In an effort to increase the ecological validity of lab experiments, that is, to increase the degree to which lab data reflect real-life hearing-related function, and to support the development of improved hearing-related procedures and interventions, this virtual reality lab marks a transition from conventional (audio-only) lab experiments to the field. The first part of the article introduces and discusses the notion of the communication loop as a theoretical basis for understanding the factors that are relevant for acoustic communication in real life. From this, requirements are derived that allow an assessment of the extent to which a virtual reality lab reflects these factors, and which may be used as a proxy for ecological validity. The most important factor of real-life communication identified is a closed communication loop among the actively behaving participants. The second part of the article gives an overview of the current developments towards a virtual reality lab at Oldenburg University that aims at interactive and reproducible testing of subjects with and without hearing devices in challenging communication conditions. The extent to which the virtual reality lab in its current state meets the requirements defined in the first part is discussed, along with its limitations and potential further developments. Finally, data are presented from a qualitative study that compared subject behavior and performance in two audiovisual environments presented in the virtual reality lab-a street and a cafeteria-with the corresponding field environments. The results show similarities and differences in subject behavior and performance between the lab and the field, indicating that the virtual reality lab in its current state marks a step towards more ecological validity in lab-based hearing and hearing device research, but requires further development towards higher levels of ecological validity.
Collapse
Affiliation(s)
- Volker Hohmann
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
- HörTech gGmbH, Oldenburg, Germany
- Cluster of Excellence “Hearing4all,” Oldenburg, Germany
| | - Richard Paluch
- Cluster of Excellence “Hearing4all,” Oldenburg, Germany
- Department of Social Sciences, University of Oldenburg, Oldenburg, Germany
| | - Melanie Krueger
- HörTech gGmbH, Oldenburg, Germany
- Cluster of Excellence “Hearing4all,” Oldenburg, Germany
| | - Markus Meis
- Cluster of Excellence “Hearing4all,” Oldenburg, Germany
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany
| | - Giso Grimm
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
- HörTech gGmbH, Oldenburg, Germany
- Cluster of Excellence “Hearing4all,” Oldenburg, Germany
| |
Collapse
|
56
|
Abstract
Cyber-Physical System (CPS) applications including human-robot interaction call for automated reasoning for rational decision-making. In the latter context, typically, audio-visual signals are employed. Τhis work considers brain signals for emotion recognition towards an effective human-robot interaction. An ElectroEncephaloGraphy (EEG) signal here is represented by an Intervals’ Number (IN). An IN-based, optimizable parametric k Nearest Neighbor (kNN) classifier scheme for decision-making by fuzzy lattice reasoning (FLR) is proposed, where the conventional distance between two points is replaced by a fuzzy order function (σ) for reasoning-by-analogy. A main advantage of the employment of INs is that no ad hoc feature extraction is required since an IN may represent all-order data statistics, the latter are the features considered implicitly. Four different fuzzy order functions are employed in this work. Experimental results demonstrate comparably the good performance of the proposed techniques.
Collapse
|
57
|
|
58
|
Xie W, Shen L, Duan J. Adaptive Weighting of Handcrafted Feature Losses for Facial Expression Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2787-2800. [PMID: 31395570 DOI: 10.1109/tcyb.2019.2925095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Due to the importance of facial expressions in human-machine interaction, a number of handcrafted features and deep neural networks have been developed for facial expression recognition. While a few studies have shown the similarity between the handcrafted features and the features learned by deep network, a new feature loss is proposed to use feature bias constraint of handcrafted and deep features to guide the deep feature learning during the early training of network. The feature maps learned with and without the proposed feature loss for a toy network suggest that our approach can fully explore the complementarity between handcrafted features and deep features. Based on the feature loss, a general framework for embedding the traditional feature information into deep network training was developed and tested using the FER2013, CK+, Oulu-CASIA, and MMI datasets. Moreover, adaptive loss weighting strategies are proposed to balance the influence of different losses for different expression databases. The experimental results show that the proposed feature loss with adaptive weighting achieves much better accuracy than the original handcrafted feature and the network trained without using our feature loss. Meanwhile, the feature loss with adaptive weighting can provide complementary information to compensate for the deficiency of a single feature.
Collapse
|
59
|
Soft-sensing of Wastewater Treatment Process via Deep Belief Network with Event-triggered Learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.108] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
60
|
Bhatti YK, Jamil A, Nida N, Yousaf MH, Viriri S, Velastin SA. Facial Expression Recognition of Instructor Using Deep Features and Extreme Learning Machine. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5570870. [PMID: 34007266 PMCID: PMC8110428 DOI: 10.1155/2021/5570870] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 02/22/2021] [Accepted: 04/12/2021] [Indexed: 11/25/2022]
Abstract
Classroom communication involves teacher's behavior and student's responses. Extensive research has been done on the analysis of student's facial expressions, but the impact of instructor's facial expressions is yet an unexplored area of research. Facial expression recognition has the potential to predict the impact of teacher's emotions in a classroom environment. Intelligent assessment of instructor behavior during lecture delivery not only might improve the learning environment but also could save time and resources utilized in manual assessment strategies. To address the issue of manual assessment, we propose an instructor's facial expression recognition approach within a classroom using a feedforward learning model. First, the face is detected from the acquired lecture videos and key frames are selected, discarding all the redundant frames for effective high-level feature extraction. Then, deep features are extracted using multiple convolution neural networks along with parameter tuning which are then fed to a classifier. For fast learning and good generalization of the algorithm, a regularized extreme learning machine (RELM) classifier is employed which classifies five different expressions of the instructor within the classroom. Experiments are conducted on a newly created instructor's facial expression dataset in classroom environments plus three benchmark facial datasets, i.e., Cohn-Kanade, the Japanese Female Facial Expression (JAFFE) dataset, and the Facial Expression Recognition 2013 (FER2013) dataset. Furthermore, the proposed method is compared with state-of-the-art techniques, traditional classifiers, and convolutional neural models. Experimentation results indicate significant performance gain on parameters such as accuracy, F1-score, and recall.
Collapse
Affiliation(s)
- Yusra Khalid Bhatti
- Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan
| | - Afshan Jamil
- Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan
| | - Nudrat Nida
- Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan
| | - Muhammad Haroon Yousaf
- Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan
- Swarm Robotics Lab, National Centre for Robotics and Automation (NCRA), Rawalpindi, Pakistan
| | - Serestina Viriri
- Department of Computer Science, University of Kwazulu Natal, Durban, South Africa
| | - Sergio A. Velastin
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
- Department of Computer Science and Engineering, Universidad Carlos III de Madrid, Leganés, Madrid 28911, Spain
| |
Collapse
|
61
|
Kalsum T, Mehmood Z, Kulsoom F, Chaudhry HN, Khan AR, Rashid M, Saba T. Localization and classification of human facial emotions using local intensity order pattern and shape-based texture features. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201799] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Facial emotion recognition system (FERS) recognize the person’s emotions based on various image processing stages including feature extraction as one of the major processing steps. In this study, we presented a hybrid approach for recognizing facial expressions by performing the feature level fusion of a local and a global feature descriptor that is classified by a support vector machine (SVM) classifier. Histogram of oriented gradients (HoG) is selected for the extraction of global facial features and local intensity order pattern (LIOP) to extract the local features. As HoG is a shape-based descriptor, with the help of edge information, it can extract the deformations caused in facial muscles due to changing emotions. On the contrary, LIOP works based on the information of pixels intensity order and is invariant to change in image viewpoint, illumination conditions, JPEG compression, and image blurring as well. Thus both the descriptors proved useful to recognize the emotions effectively in the images captured in both constrained and realistic scenarios. The performance of the proposed model is evaluated based on the lab-constrained datasets including CK+, TFEID, JAFFE as well as on realistic datasets including SFEW, RaF, and FER-2013 dataset. The optimal recognition accuracy of 99.8%, 98.2%, 93.5%, 78.1%, 63.0%, 56.0% achieved respectively for CK+, JAFFE, TFEID, RaF, FER-2013 and SFEW datasets respectively.
Collapse
Affiliation(s)
- Tehmina Kalsum
- Department of Software Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
| | - Zahid Mehmood
- Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
| | - Farzana Kulsoom
- Department of Electrical, Computer, and Biomedical Engineering, University of Pavia, Pavia, Itlay
| | - Hassan Nazeer Chaudhry
- Department of Electrical, Information, and Bio Engineering, Politecnico di Milano, Milan, Itlay
| | - Amjad Rehman Khan
- Artificial Intelligence & Data Analytics (AIDA) Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Muhammad Rashid
- Department of Computer Engineering, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Tanzila Saba
- Artificial Intelligence & Data Analytics (AIDA) Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
62
|
Salvi M, Bosco M, Molinaro L, Gambella A, Papotti M, Acharya UR, Molinari F. A hybrid deep learning approach for gland segmentation in prostate histopathological images. Artif Intell Med 2021; 115:102076. [PMID: 34001325 DOI: 10.1016/j.artmed.2021.102076] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 04/08/2021] [Accepted: 04/10/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND In digital pathology, the morphology and architecture of prostate glands have been routinely adopted by pathologists to evaluate the presence of cancer tissue. The manual annotations are operator-dependent, error-prone and time-consuming. The automated segmentation of prostate glands can be very challenging too due to large appearance variation and serious degeneration of these histological structures. METHOD A new image segmentation method, called RINGS (Rapid IdentificatioN of Glandural Structures), is presented to segment prostate glands in histopathological images. We designed a novel glands segmentation strategy using a multi-channel algorithm that exploits and fuses both traditional and deep learning techniques. Specifically, the proposed approach employs a hybrid segmentation strategy based on stroma detection to accurately detect and delineate the prostate glands contours. RESULTS Automated results are compared with manual annotations and seven state-of-the-art techniques designed for glands segmentation. Being based on stroma segmentation, no performance degradation is observed when segmenting healthy or pathological structures. Our method is able to delineate the prostate gland of the unknown histopathological image with a dice score of 90.16 % and outperforms all the compared state-of-the-art methods. CONCLUSIONS To the best of our knowledge, the RINGS algorithm is the first fully automated method capable of maintaining a high sensitivity even in the presence of severe glandular degeneration. The proposed method will help to detect the prostate glands accurately and assist the pathologists to make accurate diagnosis and treatment. The developed model can be used to support prostate cancer diagnosis in polyclinics and community care centres.
Collapse
Affiliation(s)
- Massimo Salvi
- Politecnico di Torino, PoliTo(BIO)Med Lab, Biolab, Department of Electronics and Telecommunications, Corso Duca degli Abruzzi 24, Turin, 10129, Italy.
| | - Martino Bosco
- San Lazzaro Hospital, Department of Pathology, Via Petrino Belli 26, Alba, 12051, Italy
| | - Luca Molinaro
- A.O.U. Città della Salute e della Scienza Hospital, Division of Pathology, Corso Bramante 88, Turin, 10126, Italy
| | - Alessandro Gambella
- A.O.U. Città della Salute e della Scienza Hospital, Division of Pathology, Corso Bramante 88, Turin, 10126, Italy
| | - Mauro Papotti
- University of Turin, Division of Pathology, Department of Oncology, Via Santena 5, Turin, 10126, Italy
| | - U Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore; Department of Biomedical Engineering, School of Science and Technology, SUSS University, Clementi, 599491, Singapore; Department of Bioinformatics and Medical Engineering, Asia University, Taiwan
| | - Filippo Molinari
- Politecnico di Torino, PoliTo(BIO)Med Lab, Biolab, Department of Electronics and Telecommunications, Corso Duca degli Abruzzi 24, Turin, 10129, Italy
| |
Collapse
|
63
|
Yiran L. Evaluation of students’ IELTS writing ability based on machine learning and neural network algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
With the increase of social development and international communication, more and more Chinese candidates take IELTS test as the basis of study and work. IELTS is more difficult than national CET-4 and CET-6, listening and reading are the process of students “entering” language, they accept it passively; words, writing and translation are students”’ exit “language process, which lays the foundation for” writing “. Writing is a second language and one of the main indicators of students’ ability to use English. It is not only an important way to communicate ideas, emotional expressions and cultural exchanges, but also reflects the language ability, communication ability and thinking level of English. Therefore, English writing ability is an important index of students’ English professional ability and an important aspect of students’ English professional training. This study identified the following research tasks: according to the established measurement and evaluation indicators, the students’ writing ability was evaluated and the students’ writing ability was evaluated.
Collapse
Affiliation(s)
- Liu Yiran
- International Exchange School, Changchun Normal University, Changchun, Jilin, China
| |
Collapse
|
64
|
Hung JY, Perera C, Chen KW, Myung D, Chiu HK, Fuh CS, Hsu CR, Liao SL, Kossler AL. A deep learning approach to identify blepharoptosis by convolutional neural networks. Int J Med Inform 2021; 148:104402. [PMID: 33609928 PMCID: PMC8191181 DOI: 10.1016/j.ijmedinf.2021.104402] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 01/22/2021] [Accepted: 01/24/2021] [Indexed: 11/17/2022]
Abstract
PURPOSE Blepharoptosis is a known cause of reversible vision loss. Accurate assessment can be difficult, especially amongst non-specialists. Existing automated techniques disrupt clinical workflow by requiring user input, or placement of reference markers. Neural networks are known to be effective in image classification tasks. We aim to develop an algorithm that can accurately identify blepharoptosis from a clinical photo. METHODS A total of 500 clinical photographs from patients with and without blepharoptosis were sourced from a tertiary ophthalmic center in Taiwan. Images were labeled by two oculoplastic surgeons, with an independent third oculoplastic surgeon to adjudicate disagreements. These images were used to train a series of convolutional neural networks (CNNs) to ascertain the best CNN architecture for this particular task. RESULTS Of the models that trained on the dataset, most were able to identify ptosis images with reasonable accuracy. We found the best performing model to use the DenseNet121 architecture without pre-training which achieved a sensitivity of 90.1 % with a specificity of 82.4 %, compared to the worst performing model which was used a Resnet34 architecture with pre-training, achieving a sensitivity of 74.1 %, and specificity of 63.6 %. Models with and without pre-training performed similarly (mean accuracy 82.6 % vs. 85.8 % respectively, p = 0.06), though models with pre-training took less time to train (1-minute vs. 16 min, p < 0.01). CONCLUSIONS We report the use of AI to accurately diagnose blepharoptosis from a clinical photograph with no external reference markers or user input requirement. Most current-generation CNN architectures performed reasonably on this task, with the DenseNet121, and Resnet18 architectures without pre-training performing best in our dataset.
Collapse
Affiliation(s)
- Ju-Yi Hung
- Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, California, United States; Ophthalmology, Taipei Medical University Hospital, Taipei, Taiwan; Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Chandrashan Perera
- Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, California, United States
| | - Ke-Wei Chen
- Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, California, United States; Biomedical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - David Myung
- Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, California, United States
| | - Hsu-Kuang Chiu
- Computer Science, Stanford University, Stanford, California, United States
| | - Chiou-Shann Fuh
- Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Cherng-Ru Hsu
- Ophthalmology, National Taiwan University Hospital, Taipei, Taiwan; Ophthalmology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Shu-Lang Liao
- Ophthalmology, National Taiwan University Hospital, Taipei, Taiwan; College of Medicine, National Taiwan University, Taipei, Taiwan.
| | - Andrea Lora Kossler
- Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, California, United States.
| |
Collapse
|
65
|
|
66
|
|
67
|
Abstract
Abstract
Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.
Collapse
|
68
|
Liu L, Jiang R, Huo J, Chen J. Self-Difference Convolutional Neural Network for Facial Expression Recognition. SENSORS 2021; 21:s21062250. [PMID: 33807088 PMCID: PMC8005141 DOI: 10.3390/s21062250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/14/2021] [Accepted: 03/19/2021] [Indexed: 11/16/2022]
Abstract
Facial expression recognition (FER) is a challenging problem due to the intra-class variation caused by subject identities. In this paper, a self-difference convolutional network (SD-CNN) is proposed to address the intra-class variation issue in FER. First, the SD-CNN uses a conditional generative adversarial network to generate the six typical facial expressions for the same subject in the testing image. Second, six compact and light-weighted difference-based CNNs, called DiffNets, are designed for classifying facial expressions. Each DiffNet extracts a pair of deep features from the testing image and one of the six synthesized expression images, and compares the difference between the deep feature pair. In this way, any potential facial expression in the testing image has an opportunity to be compared with the synthesized "Self"-an image of the same subject with the same facial expression as the testing image. As most of the self-difference features of the images with the same facial expression gather tightly in the feature space, the intra-class variation issue is significantly alleviated. The proposed SD-CNN is extensively evaluated on two widely-used facial expression datasets: CK+ and Oulu-CASIA. Experimental results demonstrate that the SD-CNN achieves state-of-the-art performance with accuracies of 99.7% on CK+ and 91.3% on Oulu-CASIA, respectively. Moreover, the model size of the online processing part of the SD-CNN is only 9.54 MB (1.59 MB ×6), which enables the SD-CNN to run on low-cost hardware.
Collapse
Affiliation(s)
- Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Rubin Jiang
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
| | - Jiao Huo
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
| | - Jingying Chen
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (L.L.); (R.J.); (J.H.)
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China
- Correspondence: ; Tel.: +86-135-1721-9631
| |
Collapse
|
69
|
Hybrid Attention Cascade Network for Facial Expression Recognition. SENSORS 2021; 21:s21062003. [PMID: 33809038 PMCID: PMC8002145 DOI: 10.3390/s21062003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 03/05/2021] [Accepted: 03/06/2021] [Indexed: 11/30/2022]
Abstract
As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.
Collapse
|
70
|
Akheel TS, Shree VU, Mastani SA. Stochastic gradient descent linear collaborative discriminant regression classification based face recognition. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-021-00585-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
71
|
Ma Q, Chen E, Lin Z, Yan J, Yu Z, Ng WWY. Convolutional Multitimescale Echo State Network. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1613-1625. [PMID: 31217137 DOI: 10.1109/tcyb.2019.2919648] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As efficient recurrent neural network (RNN) models, echo state networks (ESNs) have attracted widespread attention and been applied in many application domains in the last decade. Although they have achieved great success in modeling time series, a single ESN may have difficulty in capturing the multitimescale structures that naturally exist in temporal data. In this paper, we propose the convolutional multitimescale ESN (ConvMESN), which is a novel training-efficient model for capturing multitimescale structures and multiscale temporal dependencies of temporal data. In particular, a multitimescale memory encoder is constructed with a multireservoir structure, in which different reservoirs have recurrent connections with different skip lengths (or time spans). By collecting all past echo states in each reservoir, this multireservoir structure encodes the history of a time series as nonlinear multitimescale echo state representations (MESRs). Our visualization analysis verifies that the MESRs provide better discriminative features for time series. Finally, multiscale temporal dependencies of MESRs are learned by a convolutional layer. By leveraging the multitimescale reservoirs followed by a convolutional learner, the ConvMESN has not only efficient memory encoding ability for temporal data with multitimescale structures but also strong learning ability for complex temporal dependencies. Furthermore, the training-free reservoirs and the single convolutional layer provide high-computational efficiency for the ConvMESN to model complex temporal data. Extensive experiments on 18 multivariate time series (MTS) benchmark datasets and 3 skeleton-based action recognition datasets demonstrate that the ConvMESN captures multitimescale dynamics and outperforms existing methods.
Collapse
|
72
|
Ke L, Zhang Y, Yang B, Luo Z, Liu Z. Fault diagnosis with synchrosqueezing transform and optimized deep convolutional neural network: An application in modular multilevel converters. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.037] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
73
|
Liang J, Li K, Liu C, Li K. Joint offloading and scheduling decisions for DAG applications in mobile edge computing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2019.11.081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
74
|
Training deep neural networks for wireless sensor networks using loosely and weakly labeled images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
75
|
Li H, Wang N, Ding X, Yang X, Gao X. Adaptively Learning Facial Expression Representation via C-F Labels and Distillation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2016-2028. [PMID: 33439841 DOI: 10.1109/tip.2021.3049955] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Facial expression recognition is of significant importance in criminal investigation and digital entertainment. Under unconstrained conditions, existing expression datasets are highly class-imbalanced, and the similarity between expressions is high. Previous methods tend to improve the performance of facial expression recognition through deeper or wider network structures, resulting in increased storage and computing costs. In this paper, we propose a new adaptive supervised objective named AdaReg loss, re-weighting category importance coefficients to address this class imbalance and increasing the discrimination power of expression representations. Inspired by human beings' cognitive mode, an innovative coarse-fine (C-F) labels strategy is designed to guide the model from easy to difficult to classify highly similar representations. On this basis, we propose a novel training framework named the emotional education mechanism (EEM) to transfer knowledge, composed of a knowledgeable teacher network (KTN) and a self-taught student network (STSN). Specifically, KTN integrates the outputs of coarse and fine streams, learning expression representations from easy to difficult. Under the supervision of the pre-trained KTN and existing learning experience, STSN can maximize the potential performance and compress the original KTN. Extensive experiments on public benchmarks demonstrate that the proposed method achieves superior performance compared to current state-of-the-art frameworks with 88.07% on RAF-DB, 63.97% on AffectNet and 90.49% on FERPlus.
Collapse
|
76
|
Comparative analysis of machine learning algorithms for Lip print based person identification. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-020-00561-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
77
|
Kawulok M, Nalepa J, Kawulok J, Smolka B. Dynamics of facial actions for assessing smile genuineness. PLoS One 2021; 16:e0244647. [PMID: 33400708 PMCID: PMC7785114 DOI: 10.1371/journal.pone.0244647] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 12/14/2020] [Indexed: 11/19/2022] Open
Abstract
Applying computer vision techniques to distinguish between spontaneous and posed smiles is an active research topic of affective computing. Although there have been many works published addressing this problem and a couple of excellent benchmark databases created, the existing state-of-the-art approaches do not exploit the action units defined within the Facial Action Coding System that has become a standard in facial expression analysis. In this work, we explore the possibilities of extracting discriminative features directly from the dynamics of facial action units to differentiate between genuine and posed smiles. We report the results of our experimental study which shows that the proposed features offer competitive performance to those based on facial landmark analysis and on textural descriptors extracted from spatial-temporal blocks. We make these features publicly available for the UvA-NEMO and BBC databases, which will allow other researchers to further improve the classification scores, while preserving the interpretation capabilities attributed to the use of facial action units. Moreover, we have developed a new technique for identifying the smile phases, which is robust against the noise and allows for continuous analysis of facial videos.
Collapse
Affiliation(s)
- Michal Kawulok
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| | - Jakub Nalepa
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Jolanta Kawulok
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Bogdan Smolka
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
78
|
Lamas A, Tabik S, Cruz P, Montes R, Martínez-Sevilla Á, Cruz T, Herrera F. MonuMAI: Dataset, deep learning pipeline and citizen science based app for monumental heritage taxonomy and classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.041] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
79
|
Masson A, Cazenave G, Trombini J, Batt M. The current challenges of automatic recognition of facial expressions: A systematic review. AI COMMUN 2020. [DOI: 10.3233/aic-200631] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, due to its great economic and social potential, the recognition of facial expressions linked to emotions has become one of the most flourishing applications in the field of artificial intelligence, and has been the subject of many developments. However, despite significant progress, this field is still subject to many theoretical debates and technical challenges. It therefore seems important to make a general inventory of the different lines of research and to present a synthesis of recent results in this field. To this end, we have carried out a systematic review of the literature according to the guidelines of the PRISMA method. A search of 13 documentary databases identified a total of 220 references over the period 2014–2019. After a global presentation of the current systems and their performance, we grouped and analyzed the selected articles in the light of the main problems encountered in the field of automated facial expression recognition. The conclusion of this review highlights the strengths, limitations and main directions for future research in this field.
Collapse
Affiliation(s)
- Audrey Masson
- Interpsy – GRC, University of Lorraine, France. E-mails: ,
- Two-I, France. E-mails: ,
| | | | | | - Martine Batt
- Interpsy – GRC, University of Lorraine, France. E-mails: ,
| |
Collapse
|
80
|
Tong M, Yu X, Shao J, Shao Z, Li W, Lin W. Automated measuring method based on Machine learning for optomotor response in mice. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
81
|
Karim AM, Kaya H, Güzel MS, Tolun MR, Çelebi FV, Mishra A. A Novel Framework Using Deep Auto-Encoders Based Linear Model for Data Classification. SENSORS 2020; 20:s20216378. [PMID: 33182270 PMCID: PMC7664945 DOI: 10.3390/s20216378] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 11/03/2020] [Accepted: 11/05/2020] [Indexed: 11/25/2022]
Abstract
This paper proposes a novel data classification framework, combining sparse auto-encoders (SAEs) and a post-processing system consisting of a linear system model relying on Particle Swarm Optimization (PSO) algorithm. All the sensitive and high-level features are extracted by using the first auto-encoder which is wired to the second auto-encoder, followed by a Softmax function layer to classify the extracted features obtained from the second layer. The two auto-encoders and the Softmax classifier are stacked in order to be trained in a supervised approach using the well-known backpropagation algorithm to enhance the performance of the neural network. Afterwards, the linear model transforms the calculated output of the deep stacked sparse auto-encoder to a value close to the anticipated output. This simple transformation increases the overall data classification performance of the stacked sparse auto-encoder architecture. The PSO algorithm allows the estimation of the parameters of the linear model in a metaheuristic policy. The proposed framework is validated by using three public datasets, which present promising results when compared with the current literature. Furthermore, the framework can be applied to any data classification problem by considering minor updates such as altering some parameters including input features, hidden neurons and output classes.
Collapse
Affiliation(s)
- Ahmad M. Karim
- Computer Engineering Department, AYBU, Ankara 06830, Turkey; (A.M.K.); (H.K.); (F.V.Ç.)
| | - Hilal Kaya
- Computer Engineering Department, AYBU, Ankara 06830, Turkey; (A.M.K.); (H.K.); (F.V.Ç.)
| | | | - Mehmet R. Tolun
- Computer Engineering Department, Konya Food and Agriculture University, Konya 42080, Turkey;
| | - Fatih V. Çelebi
- Computer Engineering Department, AYBU, Ankara 06830, Turkey; (A.M.K.); (H.K.); (F.V.Ç.)
| | - Alok Mishra
- Faculty of Logistics, Molde University College-Specialized University in Logistics, 6402 Molde, Norway
- Software Engineering Department, Atilim University, Ankara 06830, Turkey
- Correspondence:
| |
Collapse
|
82
|
Li L, Xu W, Yu H. Character-level neural network model based on Nadam optimization and its application in clinical concept extraction. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
83
|
Li Q, Li P, Mao K, Lo EYM. Improving convolutional neural network for text classification by recursive data pruning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.049] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
84
|
Liu D, Ouyang X, Xu S, Zhou P, He K, Wen S. SAANet: Siamese action-units attention network for improving dynamic facial expression recognition. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.062] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
85
|
Facial Expression Recognition Method Based on a Part-Based Temporal Convolutional Network with a Graph-Structured Representation. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/978-3-030-61609-0_48] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
86
|
Zhang Z, Lai C, Liu H, Li YF. Infrared facial expression recognition via Gaussian-based label distribution learning in the dark illumination environment for human emotion detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.081] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
87
|
Jiang J, Li W, Dong A, Gou Q, Luo X. A Fast Deep AutoEncoder for high-dimensional and sparse matrices in recommender systems. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.109] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
88
|
Mahmoudi MA, Chetouani A, Boufera F, Tabia H. Learnable pooling weights for facial expression recognition. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
89
|
Tan C, Ceballos G, Kasabov N, Puthanmadam Subramaniyam N. FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5328. [PMID: 32957655 PMCID: PMC7571195 DOI: 10.3390/s20185328] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/04/2020] [Accepted: 09/11/2020] [Indexed: 01/22/2023]
Abstract
Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.
Collapse
Affiliation(s)
- Clarence Tan
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
| | - Gerardo Ceballos
- School of Electrical Engineering, University of Los Andes, Merida 5101, Venezuela;
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
| | - Narayan Puthanmadam Subramaniyam
- Faculty of Medicine and Health Technology and BioMediTech Institute, Tampere University, 33520 Tampere, Finland;
- Department of Neuroscience and Biomedical Engineering, School of Science, Aalto University, 02150 Espoo, Finland
| |
Collapse
|
90
|
Lei G, Xia Y, Zhai DH, Zhang W, Chen D, Wang D. StainCNNs: An efficient stain feature learning method. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
91
|
An analysis on the use of autoencoders for representation learning: Fundamentals, learning task case studies, explainability and challenges. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.057] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
92
|
Li H, Li Q. End-to-End Training for Compound Expression Recognition. SENSORS 2020; 20:s20174727. [PMID: 32825666 PMCID: PMC7506941 DOI: 10.3390/s20174727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/09/2020] [Accepted: 08/19/2020] [Indexed: 11/17/2022]
Abstract
For a long time, expressions have been something that human beings are proud of. That is an essential difference between us and machines. With the development of computers, we are more eager to develop communication between humans and machines, especially communication with emotions. The emotional growth of computers is similar to the growth process of each of us, starting with a natural, intimate, and vivid interaction by observing and discerning emotions. Since the basic emotions, angry, disgusted, fearful, happy, neutral, sad and surprised are put forward, there are many researches based on basic emotions at present, but few on compound emotions. However, in real life, people’s emotions are complex. Single expressions cannot fully and accurately show people’s inner emotional changes, thus, exploration of compound expression recognition is very essential to daily life. In this paper, we recommend a scheme of combining spatial and frequency domain transform to implement end-to-end joint training based on model ensembling between models for appearance and geometric representations learning for the recognition of compound expressions in the wild. We are mainly devoted to digging the appearance and geometric information based on deep learning models. For appearance feature acquisition, we adopt the idea of transfer learning, introducing the ResNet50 model pretrained on VGGFace2 for face recognition to implement the fine-tuning process. Here, we try and compare two minds, one is that we utilize two static expression databases FER2013 and RAF Basic for basic emotion recognition to fine tune, the other is that we fine tune the model on the input three channels composed of images generated by DWT2 and WAVEDEC2 wavelet transforms based on rbio3.1 and sym1 wavelet bases respectively. For geometric feature acquisition, we firstly introduce a densesift operator to extract facial key points and their histogram descriptions. After that, we introduce deep SAE with a softmax function, stacked LSTM and Sequence-to-Sequence with stacked LSTM and define their structures by ourselves. Then, we feed the salient key points and their descriptions into three models to train respectively and compare their performances. When the model training for appearance and geometric features learning is completed, we combine the two models with category labels to achieve further end-to-end joint training, considering that ensembling models, which describe different information, can further improve recognition results. Finally, we validate the performance of our proposed framework on an RAF Compound database and achieve a recognition rate of 66.97%. Experiments show that integrating different models, which express different information, and achieving end-to-end training can quickly and effectively improve the performance of the recognition.
Collapse
Affiliation(s)
- Hongfei Li
- Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China;
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Li
- Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China;
- University of Chinese Academy of Sciences, Beijing 100049, China
- Correspondence:
| |
Collapse
|
93
|
|
94
|
Wan X, Fang Z, Wu M, Du Y. Automatic detection of HFOs based on singular value decomposition and improved fuzzy c-means clustering for localization of seizure onset zones. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
95
|
|
96
|
Ren W, Zhai L, Jia J, Wang L, Zhang L. Learning selection channels for image steganalysis in spatial domain. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.105] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
97
|
Chen L, Chen J. Deep Neural Network for Automatic Classification of Pathological Voice Signals. J Voice 2020; 36:288.e15-288.e24. [PMID: 32660846 DOI: 10.1016/j.jvoice.2020.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVES Computer-aided pathological voice detection is efficient for initial screening of pathological voice, and has received high academic and clinical attention. This paper proposes an automatic diagnosis method of pathological voice based on deep neural network (DNN). Other two classification models (support vector machines and random forests) were used to verify the effectiveness of DNN. METHODS In this paper, we extracted 12 Mel frequency cepstral coefficients of each voice sample as row features. The constructed DNN consists a two-layer stacked sparse autoencoders network and a softmax layer. The stacked sparse autoencoders layer can learn high-level features from raw Mel frequency cepstral coefficients features. Then, the softmax layer can diagnose pathological voice according to high-level features. The DNN and the other two comparison models used the same train set and test set for the experiment. RESULTS Experimental results reveal that the value of sensitivity, specificity, precision, accuracy, and F1 score of the DNN can reach 97.8%, 99.4%, 99.4%, 98.6%, and 98.4%, respectively. The five indexes of DNN classification results are at least 6.2%, 5%, 5.6%, 5.7%, and 6.2% higher than the comparison models (support vector machine and random forest). CONCLUSIONS The proposed DNN can learn advanced features from raw acoustic features, and distinguish pathological voice from healthy voice. To the extent of this preliminary study, future studies can further explore the application of DNN in other experiments and clinical practice.
Collapse
Affiliation(s)
- Lili Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China; Chongqing Survey Institute, Chongqing, China.
| | - Junjiang Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
| |
Collapse
|
98
|
Zhang Y, Wang Y, Dong J, Qi L, Fan H, Dong X, Jian M, Yu H. A joint guidance-enhanced perceptual encoder and atrous separable pyramid-convolutions for image inpainting. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
99
|
Zhou F, Kong S, Fowlkes CC, Chen T, Lei B. Fine-grained facial expression analysis using dimensional emotion model. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.067] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
100
|
Qiu X, Zhou S. Generating adversarial examples with input significance indicator. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|