1
|
Miao J, Huang Y, Wang Z, Wu Z, Lv J. Image recognition of traditional Chinese medicine based on deep learning. Front Bioeng Biotechnol 2023; 11:1199803. [PMID: 37545883 PMCID: PMC10402920 DOI: 10.3389/fbioe.2023.1199803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 05/17/2023] [Indexed: 08/08/2023] Open
Abstract
Chinese herbal medicine is an essential part of traditional Chinese medicine and herbalism, and has important significance in the treatment combined with modern medicine. The correct use of Chinese herbal medicine, including identification and classification, is crucial to the life safety of patients. Recently, deep learning has achieved advanced performance in image classification, and researchers have applied this technology to carry out classification work on traditional Chinese medicine and its products. Therefore, this paper uses the improved ConvNeXt network to extract features and classify traditional Chinese medicine. Its structure is to fuse ConvNeXt with ACMix network to improve the performance of ConvNeXt feature extraction. Through using data processing and data augmentation techniques, the sample size is indirectly expanded, the generalization ability is enhanced, and the feature extraction ability is improved. A traditional Chinese medicine classification model is established, and the good recognition results are achieved. Finally, the effectiveness of traditional Chinese medicine identification is verified through the established classification model, and different depth of network models are compared to improve the efficiency and accuracy of the model.
Collapse
Affiliation(s)
- Junfeng Miao
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Yanan Huang
- Business School, Ezhou Vocational University, Ezhou, Hubei, China
| | - Zhaoshun Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Zeqing Wu
- School of Pharmacy, Xinxiang Medical University, Xinxiang, China
| | | |
Collapse
|
2
|
Park HJ, Kang JW, Kim BG. ssFPN: Scale Sequence ( S2) Feature-Based Feature Pyramid Network for Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094432. [PMID: 37177636 PMCID: PMC10181723 DOI: 10.3390/s23094432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/17/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023]
Abstract
Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S2) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S2) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S2) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S2 feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S2) feature. We verified that the scale sequence (S2) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S2) feature, experiments on the scale sequence (S2) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S2 feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the APS of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S2 feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the APS increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S2) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S2 feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images.
Collapse
Affiliation(s)
- Hye-Jin Park
- Department of Artificial Intelligence Engineering, Sookmyung Women's University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea
| | - Ji-Woo Kang
- Department of Artificial Intelligence Engineering, Sookmyung Women's University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea
| | - Byung-Gyu Kim
- Department of Artificial Intelligence Engineering, Sookmyung Women's University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea
| |
Collapse
|
3
|
Nasir M, Dutta P, Nandi A. Recognition of human emotion transition from video sequence using triangulation induced various centre pairs distance signatures. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
4
|
Akiyama T, Matsumoto K, Osaka K, Tanioka R, Betriana F, Zhao Y, Kai Y, Miyagawa M, Yasuhara Y, Ito H, Soriano G, Tanioka T. Comparison of Subjective Facial Emotion Recognition and "Facial Emotion Recognition Based on Multi-Task Cascaded Convolutional Network Face Detection" between Patients with Schizophrenia and Healthy Participants. Healthcare (Basel) 2022; 10:healthcare10122363. [PMID: 36553887 PMCID: PMC9777528 DOI: 10.3390/healthcare10122363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
Abstract
Patients with schizophrenia may exhibit a flat affect and poor facial expressions. This study aimed to compare subjective facial emotion recognition (FER) and FER based on multi-task cascaded convolutional network (MTCNN) face detection in 31 patients with schizophrenia (patient group) and 40 healthy participants (healthy participant group). A Pepper Robot was used to converse with the 71 aforementioned participants; these conversations were recorded on video. Subjective FER (assigned by medical experts based on video recordings) and FER based on MTCNN face detection was used to understand facial expressions during conversations. This study confirmed the discriminant accuracy of the FER based on MTCNN face detection. The analysis of the smiles of healthy participants revealed that the kappa coefficients of subjective FER (by six examiners) and FER based on MTCNN face detection concurred (κ = 0.63). The perfect agreement rate between the subjective FER (by three medical experts) and FER based on MTCNN face detection in the patient, and healthy participant groups were analyzed using Fisher's exact probability test where no significant difference was observed (p = 0.72). The validity and reliability were assessed by comparing the subjective FER and FER based on MTCNN face detection. The reliability coefficient of FER based on MTCNN face detection was low for both the patient and healthy participant groups.
Collapse
Affiliation(s)
- Toshiya Akiyama
- Graduate School of Health Sciences, Tokushima University, Tokushima 770-8509, Japan
| | - Kazuyuki Matsumoto
- Graduate School of Engineering, Tokushima University, Tokushima 770-8506, Japan
| | - Kyoko Osaka
- Department of Psychiatric Nursing, Nursing Course of Kochi Medical School, Kochi University, Kochi 783-8505, Japan
| | - Ryuichi Tanioka
- Department of Physical Therapy, Hiroshima Cosmopolitan University, Hiroshima 734-0014, Japan
| | | | - Yueren Zhao
- Department of Psychiatry, Fujita Health University, Nagoya 470-1192, Japan
| | - Yoshihiro Kai
- Department of Mechanical Engineering, Tokai University, Tokyo 151-8677, Japan
| | - Misao Miyagawa
- Department of Nursing, Faculty of Health and Welfare, Tokushima Bunri University, Tokushima 770-8514, Japan
| | - Yuko Yasuhara
- Institute of Biomedical Sciences, Tokushima University, Tokushima 770-8509, Japan
| | - Hirokazu Ito
- Institute of Biomedical Sciences, Tokushima University, Tokushima 770-8509, Japan
| | - Gil Soriano
- Department of Nursing, College of Allied Health, National University Philippines, Manila 1008, Philippines
| | - Tetsuya Tanioka
- Institute of Biomedical Sciences, Tokushima University, Tokushima 770-8509, Japan
- Correspondence:
| |
Collapse
|
5
|
Gong W, Qian Y, Fan Y. MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08040-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
DeepFake detection algorithm based on improved vision transformer. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03867-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
7
|
Artificial Intelligence for Multimedia Signal Processing. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
At the ImageNet Large Scale Visual Re-Conversion Challenge (ILSVRC), a 2012 global image recognition contest, the University of Toronto Supervision team led by Prof [...]
Collapse
|
8
|
Driver Emotions Recognition Based on Improved Faster R-CNN and Neural Architectural Search Network. Symmetry (Basel) 2022. [DOI: 10.3390/sym14040687] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
It is critical for intelligent vehicles to be capable of monitoring the health and well-being of the drivers they transport on a continuous basis. This is especially true in the case of autonomous vehicles. To address the issue, an automatic system is developed for driver’s real emotion recognizer (DRER) using deep learning. The emotional values of drivers in indoor vehicles are symmetrically mapped to image design in order to investigate the characteristics of abstract expressions, expression design principles, and an experimental evaluation is conducted based on existing research on the design of driver facial expressions for intelligent products. By substituting a custom-created CNN features learning block with the base 11 layers CNN model in this paper for the development of an improved faster R-CNN face detector that detects the driver’s face at a high frame per second (FPS). Transfer learning is performed in the NasNet large CNN model in order to recognize the driver’s various emotions. Additionally, a custom driver emotion recognition image dataset is being developed as part of this research task. The proposed model, which is a combination of an improved faster R-CNN and transfer learning in NasNet-Large CNN architecture for DER based on facial images, enables greater accuracy than previously possible for DER based on facial images. The proposed model outperforms some recently updated state-of-the-art techniques in terms of accuracy. The proposed model achieved the following accuracy on various benchmark datasets: JAFFE 98.48%, CK+ 99.73%, FER-2013 99.95%, AffectNet 95.28%, and 99.15% on a custom-developed dataset.
Collapse
|
9
|
Zhu Q, Mu Z, Yuan L. Corresponding keypoint constrained sparse representation three‐dimensional ear recognition via one sample per person. IET BIOMETRICS 2022. [DOI: 10.1049/bme2.12067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Affiliation(s)
- Qinping Zhu
- School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China
| | - Zhichun Mu
- School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China
| | - Li Yuan
- School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China
| |
Collapse
|
10
|
Singkul S, Woraratpanya K. Vector learning representation for generalized speech emotion recognition. Heliyon 2022; 8:e09196. [PMID: 35846479 PMCID: PMC9280549 DOI: 10.1016/j.heliyon.2022.e09196] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/25/2021] [Accepted: 03/22/2022] [Indexed: 11/19/2022] Open
Abstract
A verify-to-classify framework was designed for achieving in generalization and overall performance. An implemented verify-to-classify framework can work well in both verification (in-domain) and recognition (out-domain). Our softmax with Lo5 can work well with emotion vectors and help improve classification performance.
Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the “verify-to-classify” framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model—validated by EER—is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score.
Collapse
|
11
|
Desai S, Sabar NR, Alhadad R, Mahmood A, Chilamkurti N. Mitigating consumer privacy breach in smart grid using obfuscation-based generative adversarial network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3350-3368. [PMID: 35341255 DOI: 10.3934/mbe.2022155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Smart meters allow real-time monitoring and collection of power consumption data of a consumer's premise. With the worldwide integration of smart meters, there has been a substantial rise in concerns regarding threats to consumer privacy. The exposed fine-grained power consumption data results in behaviour leakage by revealing the end-user's home appliance usage information. Previously, researchers have proposed approaches to alter data using perturbation, aggregation or hide identifiers using anonymization. Unfortunately, these techniques suffer from various limitations. In this paper, we propose a privacy preserving architecture for fine-grained power data in a smart grid. The proposed architecture uses generative adversarial network (GAN) and an obfuscator to generate a synthetic timeseries. The proposed architecture enables to replace the existing appliance signature with appliances that are not active during that period while ensuring minimum energy difference between the ground truth and the synthetic timeseries. We use real-world dataset containing power consumption readings for our experiment and use non-intrusive load monitoring (NILM) algorithms to show that our approach is more effective in preserving the privacy level of a consumer's power consumption data.
Collapse
Affiliation(s)
- Sanket Desai
- Department of Computer Science & I.T., La Trobe University, Melbourne, VIC 3083, Australia
| | - Nasser R Sabar
- Department of Computer Science & I.T., La Trobe University, Melbourne, VIC 3083, Australia
| | - Rabei Alhadad
- Department of Computer Science & I.T., La Trobe University, Melbourne, VIC 3083, Australia
| | - Abdun Mahmood
- Department of Computer Science & I.T., La Trobe University, Melbourne, VIC 3083, Australia
| | - Naveen Chilamkurti
- Department of Computer Science & I.T., La Trobe University, Melbourne, VIC 3083, Australia
| |
Collapse
|
12
|
Liu M, Jiao R, Nian Q. Training method and system for stress management and mental health care of managers based on deep learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:371-393. [PMID: 34902996 DOI: 10.3934/mbe.2022019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In recent years, with the rapid development of the economy, in order to stabilize in the market and expand their own business, various companies in the form of various indicators, tangible or intangible to improve the management of the work of workers, speed up the pace of work, take up more work time. This article studies its relationship with stress management from the perspective of psychological capital, in order to achieve prior control of work stress from the perspective of individual positive psychological capital, and provide a new perspective for work stress management in the field of human resource management, and at the same time Enterprises and colleges and universities improve the psychological capital of employees and provide new management models. The unreasonable distribution of work even affects the daily life of management workers and aggravates the working pressure of company management workers. The training process of deep learning is actually the process of repeated forward and reverse calculations of the deep neural network based on the provided data. This process can actually be abstracted, and the deep learning framework is designed to accomplish this task. The existence of a deep learning framework allows users not to fully understand the principles and training process of deep neural networks, but can effectively train the models they want. A long time of high mental state tension leads to a variety of physical and psychological discomfort. If the pressure cannot be alleviated and released, this article extends the health collection equipment of the deep learning to households, continuously records the health status of residents through the mobile Internet, and uses the information resources of the regional residents' health file platform to provide residents with health status evaluation, management and guidance, health care consultation, education and education. A series of personal health management services such as health risk factor assessment. The positive emotion index of managers increased from 18 to 27, and the negative emotion index decreased from 29 to 13. The positive emotion was significantly more than the negative emotion, and the emotional situation was improved.
Collapse
Affiliation(s)
- Mengfan Liu
- School of Psychology, Northeast Normal University, Changchun, Jilin 130024, China
| | - Runkai Jiao
- School of Psychology, Northeast Normal University, Changchun, Jilin 130024, China
- National Training Center for Kindergarten Principals, Ministry of Education, Northeast Normal University, Changchun, Jilin 130024, China
| | - Qing Nian
- School of Physical Education, Northeast Normal University, Changchun, Jilin 130024, China
| |
Collapse
|
13
|
A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme. SENSORS 2021; 21:s21216954. [PMID: 34770262 PMCID: PMC8587878 DOI: 10.3390/s21216954] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/11/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022]
Abstract
In recent years, the importance of catching humans' emotions grows larger as the artificial intelligence (AI) field is being developed. Facial expression recognition (FER) is a part of understanding the emotion of humans through facial expressions. We proposed a robust multi-depth network that can efficiently classify the facial expression through feeding various and reinforced features. We designed the inputs for the multi-depth network as minimum overlapped frames so as to provide more spatio-temporal information to the designed multi-depth network. To utilize a structure of a multi-depth network, a multirate-based 3D convolutional neural network (CNN) based on a multirate signal processing scheme was suggested. In addition, we made the input images to be normalized adaptively based on the intensity of the given image and reinforced the output features from all depth networks by the self-attention module. Then, we concatenated the reinforced features and classified the expression by a joint fusion classifier. Through the proposed algorithm, for the CK+ database, the result of the proposed scheme showed a comparable accuracy of 96.23%. For the MMI and the GEMEP-FERA databases, it outperformed other state-of-the-art models with accuracies of 96.69% and 99.79%. For the AFEW database, which is known as one in a very wild environment, the proposed algorithm achieved an accuracy of 31.02%.
Collapse
|
14
|
Subject-Specific Cognitive Workload Classification Using EEG-Based Functional Connectivity and Deep Learning. SENSORS 2021; 21:s21206710. [PMID: 34695921 PMCID: PMC8541420 DOI: 10.3390/s21206710] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/22/2021] [Accepted: 10/02/2021] [Indexed: 11/16/2022]
Abstract
Cognitive workload is a crucial factor in tasks involving dynamic decision-making and other real-time and high-risk situations. Neuroimaging techniques have long been used for estimating cognitive workload. Given the portability, cost-effectiveness and high time-resolution of EEG as compared to fMRI and other neuroimaging modalities, an efficient method of estimating an individual’s workload using EEG is of paramount importance. Multiple cognitive, psychiatric and behavioral phenotypes have already been known to be linked with “functional connectivity”, i.e., correlations between different brain regions. In this work, we explored the possibility of using different model-free functional connectivity metrics along with deep learning in order to efficiently classify the cognitive workload of the participants. To this end, 64-channel EEG data of 19 participants were collected while they were doing the traditional n-back task. These data (after pre-processing) were used to extract the functional connectivity features, namely Phase Transfer Entropy (PTE), Mutual Information (MI) and Phase Locking Value (PLV). These three were chosen to do a comprehensive comparison of directed and non-directed model-free functional connectivity metrics (allows faster computations). Using these features, three deep learning classifiers, namely CNN, LSTM and Conv-LSTM were used for classifying the cognitive workload as low (1-back), medium (2-back) or high (3-back). With the high inter-subject variability in EEG and cognitive workload and recent research highlighting that EEG-based functional connectivity metrics are subject-specific, subject-specific classifiers were used. Results show the state-of-the-art multi-class classification accuracy with the combination of MI with CNN at 80.87%, followed by the combination of PLV with CNN (at 75.88%) and MI with LSTM (at 71.87%). The highest subject specific performance was achieved by the combinations of PLV with Conv-LSTM, and PLV with CNN with an accuracy of 97.92%, followed by the combination of MI with CNN (at 95.83%) and MI with Conv-LSTM (at 93.75%). The results highlight the efficacy of the combination of EEG-based model-free functional connectivity metrics and deep learning in order to classify cognitive workload. The work can further be extended to explore the possibility of classifying cognitive workload in real-time, dynamic and complex real-world scenarios.
Collapse
|
15
|
Pandeya YR, Bhattarai B, Lee J. Music video emotion classification using slow-fast audio-video network and unsupervised feature representation. Sci Rep 2021; 11:19834. [PMID: 34615904 PMCID: PMC8494760 DOI: 10.1038/s41598-021-98856-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 09/13/2021] [Indexed: 12/02/2022] Open
Abstract
Affective computing has suffered by the precise annotation because the emotions are highly subjective and vague. The music video emotion is complex due to the diverse textual, acoustic, and visual information which can take the form of lyrics, singer voice, sounds from the different instruments, and visual representations. This can be one reason why there has been a limited study in this domain and no standard dataset has been produced before now. In this study, we proposed an unsupervised method for music video emotion analysis using music video contents on the Internet. We also produced a labelled dataset and compared the supervised and unsupervised methods for emotion classification. The music and video information are processed through a multimodal architecture with audio-video information exchange and boosting method. The general 2D and 3D convolution networks compared with the slow-fast network with filter and channel separable convolution in multimodal architecture. Several supervised and unsupervised networks were trained in an end-to-end manner and results were evaluated using various evaluation metrics. The proposed method used a large dataset for unsupervised emotion classification and interpreted the results quantitatively and qualitatively in the music video that had never been applied in the past. The result shows a large increment in classification score using unsupervised features and information sharing techniques on audio and video network. Our best classifier attained 77% accuracy, an f1-score of 0.77, and an area under the curve score of 0.94 with minimum computational cost.
Collapse
Affiliation(s)
- Yagya Raj Pandeya
- Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea.
| | - Bhuwan Bhattarai
- Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea
| | - Joonwhoan Lee
- Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea.
| |
Collapse
|
16
|
Behzad M, Vo N, Li X, Zhao G. Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Filali H, Riffi J, Aboussaleh I, Mahraz AM, Tairi H. Meaningful Learning for Deep Facial Emotional Features. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10636-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Tungjitnob S, Pasupa K, Suntisrivaraporn B. Identifying SME customers from click feedback on mobile banking apps: Supervised and semi-supervised approaches. Heliyon 2021; 7:e07761. [PMID: 34458608 PMCID: PMC8379470 DOI: 10.1016/j.heliyon.2021.e07761] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/28/2021] [Accepted: 08/09/2021] [Indexed: 11/28/2022] Open
Abstract
Nowadays, the banking industry has moved from traditional branch services into mobile banking applications or apps. Using customer segmentation, banks can obtain more insights and better understand their customers' lifestyle and their behavior. In this work, we described a method to classify mobile app user click behavior into two groups, i.e. SME and Non-SME users. This task enabled the bank to identify anonymous users and offer them the right services and products. We extracted hand-crafted features from click log data and evaluated them with the Extreme Gradient Boosting algorithm (XGBoost). We also converted these logs into images, which captured temporal information. These image representations reduced the need for feature engineering, were easier to visualize and trained with a Convolutional Neural Network (CNN). We used ResNet-18 with the image dataset and achieved 71.69% accuracy on average, which outperformed XGBoost, which only achieved 61.70% accuracy. We also evaluated a semi-supervised learning model with our converted image data. Our semi-supervised method achieved 73.12% accuracy, using just half of the labeled images, combined with unlabeled images. Our method showed that these converted images were able to train with a semi-supervised algorithm that performed better than CNN with fewer labeled images. Our work also led to a better understanding of mobile banking user behavior and a novel way of developing a customer segmentation classifier.
Collapse
Affiliation(s)
- Suchat Tungjitnob
- Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Kitsuchart Pasupa
- Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | | |
Collapse
|
19
|
Gao B, Huang L. Toward a theory of smart media usage: The moderating role of smart media market development. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:7218-7238. [PMID: 34814246 DOI: 10.3934/mbe.2021357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Smart media usage is influenced by certain critical factors and can be further affected by the degree of diffusion in the market. However, existing research lacks sufficient understanding of the factors affecting smart media usage and their influential mechanisms. Taking AI-enabled smart TV in China as the research object, this study (1) develops a base model that includes users' three key gratifications (bi-directional communication, personalization, and co-creation); and (2) takes two sub-dimensions of market development (geographic segment and income segment) as moderators. Using data from 407 valid samples of current users, the partial least squares structural equation modeling analysis suggests that these three key smart gratifications can impact continuance intention with the moderating effect of market development. This study thus contributes to the literature by (1) clarifying the smart media gratification opportunities (smart media users' motivations or needs) for using smart media itself; (2) exploring the impact of the degree of market development on the uses and gratifications of the smart media itself; and (3) combining the uses and gratifications theory, and the diffusion of innovations theory, to complement each other in a model that provides a more complete picture of smart media usage.
Collapse
Affiliation(s)
- Biao Gao
- Jiangxi University of Finance and Economics, Nanchang 330013, China
| | - Lin Huang
- Graduate School of Business Administration, Kobe University, Kobe 6578501, Japan
| |
Collapse
|
20
|
Paxinou E, Kalles D, Panagiotakopoulos CT, Verykios VS. Analyzing Sequence Data with Markov Chain Models in Scientific Experiments. ACTA ACUST UNITED AC 2021; 2:385. [PMID: 34308368 PMCID: PMC8294291 DOI: 10.1007/s42979-021-00768-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 07/04/2021] [Indexed: 11/05/2022]
Abstract
Virtual reality-based instruction is becoming an important resource to improve learning outcomes and communicate hands-on skills in science laboratory courses. Our study attempts first to investigate whether a Markov chain model can predict the students’ performance in conducting an experiment and whether simulations improve learner achievement in handling lab equipment and conducting science experiments in physical labs. In the present study, three cohorts of graduate students are trained on a microscopy experiment using different teaching methodologies. The effectiveness of the teaching strategies is evaluated by observing the sequences of students’ actions, while engaging in the microscopy experiment in real-lab situations. The students’ ability in performing the science experiment is estimated by sequential analysis using a Markov chain model. According to the Markov chain analysis, the students who are trained via a virtual reality software exhibit a higher probability to perform the steps of the experiment without difficulty and without assistance than their fellow students who attend more traditional training scenarios. Our study indicates that a Markov chain model is a powerful tool that can lead to a dynamic evaluation of the students’ performance in science experiments by tracing the students’ knowledge states and by predicting their innate abilities.
Collapse
Affiliation(s)
- Evgenia Paxinou
- School of Science and Technology, Hellenic Open University, Patras, Greece
| | - Dimitrios Kalles
- School of Science and Technology, Hellenic Open University, Patras, Greece
| | | | | |
Collapse
|
21
|
Guerrero MC, Parada JS, Espitia HE. EEG signal analysis using classification techniques: Logistic regression, artificial neural networks, support vector machines, and convolutional neural networks. Heliyon 2021; 7:e07258. [PMID: 34159278 PMCID: PMC8203713 DOI: 10.1016/j.heliyon.2021.e07258] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 02/21/2021] [Accepted: 06/03/2021] [Indexed: 12/18/2022] Open
Abstract
Epilepsy is a brain abnormality that leads its patients to suffer from seizures, which conditions their behavior and lifestyle. Neurologists use an electroencephalogram (EEG) to diagnose this disease. This test illustrates the signaling behavior of a person's brain, allowing, among other things, the diagnosis of epilepsy. From a visual analysis of these signals, neurologists identify patterns such as peaks or valleys, looking for any indication of brain disorder that leads to the diagnosis of epilepsy in a purely qualitative way. However, by applying a test based on Fourier signal analysis through rapid transformation in the frequency domain, patterns can be quantitatively identified to differentiate patients diagnosed with the disease and others who are not. In this article, an analysis of the EEG signal is performed to extract characteristics in patients already classified as epileptic and non-epileptic, which will be used in the training of models based on classification techniques such as logistic regression, artificial neural networks, support vector machines, and convolutional neural networks. Based on the results obtained with each technique, an analysis is performed to decide which of these behaves better. In this study traditional classification techniques were implemented that had as data frequency data in the channels with distinctive information of EEG examinations, this was done through a feature extraction obtained with Fourier analysis considering frequency bands. The techniques used for classification were implemented in Python and through a comparison of metrics and performance, it was concluded that the best classification technique to characterize epileptic patients are artificial neural networks with an accuracy of 86%.
Collapse
|
22
|
WS-RCNN: Learning to Score Proposals for Weakly Supervised Instance Segmentation. SENSORS 2021; 21:s21103475. [PMID: 34067559 PMCID: PMC8156195 DOI: 10.3390/s21103475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/29/2021] [Accepted: 05/07/2021] [Indexed: 11/18/2022]
Abstract
Weakly supervised instance segmentation (WSIS) provides a promising way to address instance segmentation in the absence of sufficient labeled data for training. Previous attempts on WSIS usually follow a proposal-based paradigm, critical to which is the proposal scoring strategy. These works mostly rely on certain heuristic strategies for proposal scoring, which largely hampers the sustainable advances concerning WSIS. Towards this end, this paper introduces a novel framework for weakly supervised instance segmentation, called Weakly Supervised R-CNN (WS-RCNN). The basic idea is to deploy a deep network to learn to score proposals, under the special setting of weak supervision. To tackle the key issue of acquiring proposal-level pseudo labels for model training, we propose a so-called Attention-Guided Pseudo Labeling (AGPL) strategy, which leverages the local maximal (peaks) in image-level attention maps and the spatial relationship among peaks and proposals to infer pseudo labels. We also suggest a novel training loss, called Entropic OpenSet Loss, to handle background proposals more effectively so as to further improve the robustness. Comprehensive experiments on two standard benchmarking datasets demonstrate that the proposed WS-RCNN can outperform the state-of-the-art by a large margin, with an improvement of 11.6% on PASCAL VOC 2012 and 10.7% on MS COCO 2014 in terms of mAP50, which indicates that learning-based proposal scoring and the proposed WS-RCNN framework might be a promising way towards WSIS.
Collapse
|
23
|
Yao T, Gao F, Zhang Q, Ma Y. Multi-feature gait recognition with DNN based on sEMG signals. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:3521-3542. [PMID: 34198399 DOI: 10.3934/mbe.2021177] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This study proposed a gait recognition method based on the deep neural network of surface electromyography (sEMG) signals to improve the stability and accuracy of gait recognition using sEMG signals of the lower limbs. First, we determined the parameters of time domain features, including the mean of absolute value, root mean square, waveform length, the number of zero-crossing points of the sEMG signals after noise elimination, and the frequency domain features, including mean power frequency and median frequency. Second, the time domain feature and frequency domain feature were combined into a multi-feature combination. Then, the classifier was trained and used for gait recognition. Finally, in terms of the recognition rate, the classifier was compared with the support vector machine (SVM) and extreme learning machine (ELM). The results showed the method of deep neural network (DNN) had a better recognition rate than that of SVM and ELM. The experimental results of the participants indicated that the average recognition rate obtained with the method of DNN exceeded 95%. On the other hand, from the statistical results of standard deviation, the difference between subjects ranged from 0.46 to 0.94%, which also proved the robustness and stability of the proposed method.
Collapse
Affiliation(s)
- Ting Yao
- Institute of Intelligent Control and Robotics, School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Farong Gao
- Institute of Intelligent Control and Robotics, School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Qizhong Zhang
- Institute of Intelligent Control and Robotics, School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yuliang Ma
- Institute of Intelligent Control and Robotics, School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
24
|
"Reading Pictures Instead of Looking": RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter. SENSORS 2021; 21:s21062217. [PMID: 33810140 PMCID: PMC8005215 DOI: 10.3390/s21062217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 11/25/2022]
Abstract
This paper proposes an action recognition algorithm based on the capsule network and Kalman filter called “Reading Pictures Instead of Looking” (RPIL). This method resolves the convolutional neural network’s over sensitivity to rotation and scaling and increases the interpretability of the model as per the spatial coordinates in graphics. The capsule network is first used to obtain the components of the target human body. The detected parts and their attribute parameters (e.g., spatial coordinates, color) are then analyzed by Bert. A Kalman filter analyzes the predicted capsules and filters out any misinformation to prevent the action recognition results from being affected by incorrectly predicted capsules. The parameters between neuron layers are evaluated, then the structure is pruned into a dendritic network to enhance the computational efficiency of the algorithm. This minimizes the dependence of in-depth learning on the random features extracted by the CNN without sacrificing the model’s accuracy. The association between hidden layers of the neural network is also explained. With a 90% observation rate, the OAD dataset test precision is 83.3%, the ChaLearn Gesture dataset test precision is 72.2%, and the G3D dataset test precision is 86.5%. The RPILNet also satisfies real-time operation requirements (>30 fps).
Collapse
|
25
|
Oh G, Ryu J, Jeong E, Yang JH, Hwang S, Lee S, Lim S. DRER: Deep Learning-Based Driver's Real Emotion Recognizer. SENSORS 2021; 21:s21062166. [PMID: 33808922 PMCID: PMC8003797 DOI: 10.3390/s21062166] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/15/2021] [Accepted: 03/16/2021] [Indexed: 12/18/2022]
Abstract
In intelligent vehicles, it is essential to monitor the driver’s condition; however, recognizing the driver’s emotional state is one of the most challenging and important tasks. Most previous studies focused on facial expression recognition to monitor the driver’s emotional state. However, while driving, many factors are preventing the drivers from revealing the emotions on their faces. To address this problem, we propose a deep learning-based driver’s real emotion recognizer (DRER), which is a deep learning-based algorithm to recognize the drivers’ real emotions that cannot be completely identified based on their facial expressions. The proposed algorithm comprises of two models: (i) facial expression recognition model, which refers to the state-of-the-art convolutional neural network structure; and (ii) sensor fusion emotion recognition model, which fuses the recognized state of facial expressions with electrodermal activity, a bio-physiological signal representing electrical characteristics of the skin, in recognizing even the driver’s real emotional state. Hence, we categorized the driver’s emotion and conducted human-in-the-loop experiments to acquire the data. Experimental results show that the proposed fusing approach achieves 114% increase in accuracy compared to using only the facial expressions and 146% increase in accuracy compare to using only the electrodermal activity. In conclusion, our proposed method achieves 86.8% recognition accuracy in recognizing the driver’s induced emotion while driving situation.
Collapse
Affiliation(s)
- Geesung Oh
- Graduate School of Automotive Engineering, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea; (G.O.); (J.R.); (E.J.)
| | - Junghwan Ryu
- Graduate School of Automotive Engineering, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea; (G.O.); (J.R.); (E.J.)
| | - Euiseok Jeong
- Graduate School of Automotive Engineering, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea; (G.O.); (J.R.); (E.J.)
| | - Ji Hyun Yang
- Department of Automobile and IT Convergence, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea;
| | - Sungwook Hwang
- Chassis System Control Research Lab, Hyundai Motor Group, Hwaseong 18280, Korea; (S.H.); (S.L.)
| | - Sangho Lee
- Chassis System Control Research Lab, Hyundai Motor Group, Hwaseong 18280, Korea; (S.H.); (S.L.)
| | - Sejoon Lim
- Department of Automobile and IT Convergence, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul 02707, Korea;
- Correspondence: ; Tel.: +82-2-910-5469
| |
Collapse
|
26
|
Single Image Super-Resolution Method Using CNN-Based Lightweight Neural Networks. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11031092] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
There are many studies that seek to enhance a low resolution image to a high resolution image in the area of super-resolution. As deep learning technologies have recently shown impressive results on the image interpolation and restoration field, recent studies are focusing on convolutional neural network (CNN)-based super-resolution schemes to surpass the conventional pixel-wise interpolation methods. In this paper, we propose two lightweight neural networks with a hybrid residual and dense connection structure to improve the super-resolution performance. In order to design the proposed networks, we extracted training images from the DIVerse 2K (DIV2K) image dataset and investigated the trade-off between the quality enhancement performance and network complexity under the proposed methods. The experimental results show that the proposed methods can significantly reduce both the inference speed and the memory required to store parameters and intermediate feature maps, while maintaining similar image quality compared to the previous methods.
Collapse
|
27
|
Abstract
Recent developments in image/video-based deep learning technology have enabled new services in the field of multimedia and recognition technology [...]
Collapse
|
28
|
Jiang W, Ye X, Chen R, Su F, Lin M, Ma Y, Zhu Y, Huang S. Wearable on-device deep learning system for hand gesture recognition based on FPGA accelerator. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2020; 18:132-153. [PMID: 33525084 DOI: 10.3934/mbe.2021007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Gesture recognition is critical in the field of Human-Computer Interaction, especially in healthcare, rehabilitation, sign language translation, etc. Conventionally, the gesture recognition data collected by the inertial measurement unit (IMU) sensors is relayed to the cloud or a remote device with higher computing power to train models. However, it is not convenient for remote follow-up treatment of movement rehabilitation training. In this paper, based on a field-programmable gate array (FPGA) accelerator and the Cortex-M0 IP core, we propose a wearable deep learning system that is capable of locally processing data on the end device. With a pre-stage processing module and serial-parallel hybrid method, the device is of low-power and low-latency at the micro control unit (MCU) level, however, it meets or exceeds the performance of single board computers (SBC). For example, its performance is more than twice as much of Cortex-A53 (which is usually used in Raspberry Pi). Moreover, a convolutional neural network (CNN) and a multilayer perceptron neural network (NN) is used in the recognition model to extract features and classify gestures, which helps achieve a high recognition accuracy at 97%. Finally, this paper offers a software-hardware co-design method that is worth referencing for the design of edge devices in other scenarios.
Collapse
Affiliation(s)
- Weibin Jiang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
| | - Xuelin Ye
- Department of Statistics, University of Warwick CV4 7AL, United Kingdom
| | - Ruiqi Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
- VeriMake Research, Nanjing Qujike Info-tech Co., Ltd., Nanjing 210088, China
| | - Feng Su
- VeriMake Research, Nanjing Qujike Info-tech Co., Ltd., Nanjing 210088, China
- Tsinghua-Berkeley Shenzhen institute, Tsinghua University, Shenzhen 518055, China
| | - Mengru Lin
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
| | - Yuhanxiao Ma
- Gallatin School of Individualized Study, New York University, NY 10012, United States
| | - Yanxiang Zhu
- VeriMake Research, Nanjing Qujike Info-tech Co., Ltd., Nanjing 210088, China
| | - Shizhen Huang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
| |
Collapse
|
29
|
Abstract
The human mood has a temporary effect on the face shape due to the movement of its muscles. Happiness, sadness, fear, anger, and other emotional conditions may affect the face biometric system’s reliability. Most of the current studies on facial expressions are concerned about the accuracy of classifying the subjects based on their expressions. This study investigated the effect of facial expressions on the reliability of a face biometric system to find out which facial expression puts the biometric system at greater risk. Moreover, it identified a set of facial features that have the lowest facial deformation caused by facial expressions to be generalized during the recognition process, regardless of which facial expression is presented. In order to achieve the goal of this study, an analysis of 22 facial features between the normal face and six universal facial expressions is obtained. The results show that the face biometric systems are affected by facial expressions where the disgust expression achieved the most dissimilar score, while the sad expression achieved the lowest dissimilar score. Additionally, the study identified the five and top ten facial features that have the lowest facial deformations on the face shape in all facial expressions. Besides that, the relativity score showed less variances between the sample using the top facial features. The obtained results of this study minimized the false rejection rate in the face biometric system and subsequently the ability to raise the system’s acceptance threshold to maximize the intrusion detection rate without affecting the user convenience.
Collapse
|
30
|
A Light-Weight Practical Framework for Feces Detection and Trait Recognition. SENSORS 2020; 20:s20092644. [PMID: 32384651 PMCID: PMC7248729 DOI: 10.3390/s20092644] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/04/2020] [Accepted: 05/05/2020] [Indexed: 12/14/2022]
Abstract
Fecal trait examinations are critical in the clinical diagnosis of digestive diseases, and they can effectively reveal various aspects regarding the health of the digestive system. An automatic feces detection and trait recognition system based on a visual sensor could greatly alleviate the burden on medical inspectors and overcome many sanitation problems, such as infections. Unfortunately, the lack of digital medical images acquired with camera sensors due to patient privacy has obstructed the development of fecal examinations. In general, the computing power of an automatic fecal diagnosis machine or a mobile computer-aided diagnosis device is not always enough to run a deep network. Thus, a light-weight practical framework is proposed, which consists of three stages: illumination normalization, feces detection, and trait recognition. Illumination normalization effectively suppresses the illumination variances that degrade the recognition accuracy. Neither the shape nor the location is fixed, so shape-based and location-based object detection methods do not work well in this task. Meanwhile, this leads to a difficulty in labeling the images for training convolutional neural networks (CNN) in detection. Our segmentation scheme is free from training and labeling. The feces object is accurately detected with a well-designed threshold-based segmentation scheme on the selected color component to reduce the background disturbance. Finally, the preprocessed images are categorized into five classes with a light-weight shallow CNN, which is suitable for feces trait examinations in real hospital environments. The experiment results from our collected dataset demonstrate that our framework yields a satisfactory accuracy of 98.4%, while requiring low computational complexity and storage.
Collapse
|