1
|
He L, Wei B, Hao K, Gao L, Peng C. Bio-inspired deep neural local acuity and focus learning for visual image recognition. Neural Netw 2025; 181:106712. [PMID: 39388996 DOI: 10.1016/j.neunet.2024.106712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 07/27/2024] [Accepted: 09/05/2024] [Indexed: 10/12/2024]
Abstract
In the field of computer vision and image recognition, enabling the computer to discern target features while filtering out irrelevant ones poses a challenge. Drawing insights from studies in biological vision, we find that there is a local visual acuity mechanism and a visual focus mechanism in the initial conversion and processing of visual information, ensuring that the visual system can give ear to salient features of the target in the early visual observation phase. Inspired by this, we build a novel image recognition network to focus on the target features while ignoring other irrelevant features in the image, and further focus on the focus features based on the target features. Meanwhile, in order to comply with the output characteristics when similar features exist in different categories, we design a softer image label operation for similar features in different categories, which solves the correlation of labels between categories. Relevant experimental findings underscore the efficacy of our proposed method, revealing discernible advantages in comparison to alternative approaches. Visualization results further attest to the method's capability to selectively focus on pertinent target features within the image, sidelining extraneous information.
Collapse
Affiliation(s)
- Langping He
- Engineering Research Center of Digitized Textile & Apparel Technology, Ministry of Education, Donghua University, Shanghai 201620, China; College of Information Sciences and Technology, Donghua University, Shanghai 201620, China
| | - Bing Wei
- Engineering Research Center of Digitized Textile & Apparel Technology, Ministry of Education, Donghua University, Shanghai 201620, China; College of Information Sciences and Technology, Donghua University, Shanghai 201620, China.
| | - Kuangrong Hao
- Engineering Research Center of Digitized Textile & Apparel Technology, Ministry of Education, Donghua University, Shanghai 201620, China; College of Information Sciences and Technology, Donghua University, Shanghai 201620, China
| | - Lei Gao
- Commonwealth Scientific and Industrial Research Organization (CSIRO), Waite Campus, Urrbrae, SA 5064, Australia
| | - Chuang Peng
- Engineering Research Center of Digitized Textile & Apparel Technology, Ministry of Education, Donghua University, Shanghai 201620, China; College of Information Sciences and Technology, Donghua University, Shanghai 201620, China
| |
Collapse
|
2
|
Choo S, Park H, Jung JY, Flores K, Nam CS. Improving classification performance of motor imagery BCI through EEG data augmentation with conditional generative adversarial networks. Neural Netw 2024; 180:106665. [PMID: 39241437 DOI: 10.1016/j.neunet.2024.106665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 08/12/2024] [Accepted: 08/23/2024] [Indexed: 09/09/2024]
Abstract
In brain-computer interface (BCI), building accurate electroencephalogram (EEG) classifiers for specific mental tasks is critical for BCI performance. The classifiers are developed by machine learning (ML) and deep learning (DL) techniques, requiring a large dataset for training to build reliable and accurate models. However, collecting large enough EEG datasets is difficult due to intra-/inter-subject variabilities and experimental costs. This leads to the data scarcity problem, which causes overfitting issues to training samples, resulting in reducing generalization performance. To solve the EEG data scarcity problem and improve the performance of the EEG classifiers, we propose a novel EEG data augmentation (DA) framework using conditional generative adversarial networks (cGANs). An experimental study is implemented with two public EEG datasets, including motor imagery (MI) tasks (BCI competition IV IIa and III IVa), to validate the effectiveness of the proposed EEG DA method for the EEG classifiers. To evaluate the proposed cGAN-based DA method, we tested eight EEG classifiers for the experiment, including traditional MLs and state-of-the-art DLs with three existing EEG DA methods. Experimental results showed that most DA methods with proper DA proportion in the training dataset had higher classification performances than without DA. Moreover, applying the proposed DA method showed superior classification performance improvement than the other DA methods. This shows that the proposed method is a promising EEG DA method for enhancing the performances of the EEG classifiers in MI-based BCIs.
Collapse
Affiliation(s)
- Sanghyun Choo
- Department of Industrial Engineering, Kumoh National Institute of Technology, South Korea
| | - Hoonseok Park
- Department of Big Data Analytics, Kyung Hee University, South Korea
| | - Jae-Yoon Jung
- Department of Big Data Analytics, Kyung Hee University, South Korea; Department of Industrial and Management Systems Engineering, Kyung Hee University, South Korea
| | - Kevin Flores
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Chang S Nam
- Department of Industrial and Management Systems Engineering, Kyung Hee University, South Korea; Department of Industrial and Systems Engineering, Northern Illinois University, DeKalb, IL, USA.
| |
Collapse
|
3
|
Tao W, Wang X, Yan T, Liu Z, Wan S. ESF-YOLO: an accurate and universal object detector based on neural networks. Front Neurosci 2024; 18:1371418. [PMID: 38650621 PMCID: PMC11033406 DOI: 10.3389/fnins.2024.1371418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/28/2024] [Indexed: 04/25/2024] Open
Abstract
As an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Sampling Conv Module (MSCM) is designed, which enhances the backbone network's learning capability for low-level features through multi-scale receptive fields and cross-scale feature fusion. Secondly, to tackle occlusion issues, a new Block-wise Channel Attention Module (BCAM) is designed, assigning greater weights to channels corresponding to critical information. Next, a lightweight Decoupled Head (LD-Head) is devised. Additionally, the loss function is redesigned to address asynchrony between labels and confidences, alleviating the imbalance between positive and negative samples during the neural network training. Finally, an adaptive scale factor for Intersection over Union (IoU) calculation is innovatively proposed, adjusting bounding box sizes adaptively to accommodate targets of different sizes in the dataset. Experimental results on the SODA10M and CBIA8K datasets demonstrate that ESF-YOLO increases Average Precision at 0.50 IoU (AP50) by 3.93 and 2.24%, Average Precision at 0.75 IoU (AP75) by 4.77 and 4.85%, and mean Average Precision (mAP) by 4 and 5.39%, respectively, validating the model's broad applicability.
Collapse
Affiliation(s)
- Wenguang Tao
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Xiaotian Wang
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Tian Yan
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Zhengzhuo Liu
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Shizheng Wan
- Shanghai Electro-Mechanical Engineering Institute, Shanghai, China
| |
Collapse
|
4
|
Ng WWY, Zhang Q, Zhong C, Zhang J. Improving domain generalization by hybrid domain attention and localized maximum sensitivity. Neural Netw 2024; 171:320-331. [PMID: 38113717 DOI: 10.1016/j.neunet.2023.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 07/13/2023] [Accepted: 12/07/2023] [Indexed: 12/21/2023]
Abstract
Domain generalization has attracted much interest in recent years due to its practical application scenarios, in which the model is trained using data from various source domains but is tested using data from an unseen target domain. Existing domain generalization methods concern all visual features, including irrelevant ones with the same priority, which easily results in poor generalization performance of the trained model. In contrast, human beings have strong generalization capabilities to distinguish images from different domains by focusing on important features while suppressing irrelevant features with respect to labels. Motivated by this observation, we propose a channel-wise and spatial-wise hybrid domain attention mechanism to force the model to focus on more important features associated with labels in this work. In addition, models with higher robustness with respect to small perturbations of inputs are expected to have higher generalization capability, which is preferable in domain generalization. Therefore, we propose to reduce the localized maximum sensitivity of the small perturbations of inputs in order to improve the network's robustness and generalization capability. Extensive experiments on PACS, VLCS, and Office-Home datasets validate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wing W Y Ng
- Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science & Engineer, South China University of Technology, Guangzhou, 510006, China
| | - Qin Zhang
- Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science & Engineer, South China University of Technology, Guangzhou, 510006, China
| | - Cankun Zhong
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China.
| | - Jianjun Zhang
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
5
|
Gao X, Jiang B, Wang X, Huang L, Tu Z. Chest x-ray diagnosis via spatial-channel high-order attention representation learning. Phys Med Biol 2024; 69:045026. [PMID: 38347732 DOI: 10.1088/1361-6560/ad2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 01/18/2024] [Indexed: 02/15/2024]
Abstract
Objective. Chest x-ray image representation and learning is an important problem in computer-aided diagnostic area. Existing methods usually adopt CNN or Transformers for feature representation learning and focus on learning effective representations for chest x-ray images. Although good performance can be obtained, however, these works are still limited mainly due to the ignorance of mining the correlations of channels and pay little attention on the local context-aware feature representation of chest x-ray image.Approach. To address these problems, in this paper, we propose a novel spatial-channel high-order attention model (SCHA) for chest x-ray image representation and diagnosis. The proposed network architecture mainly contains three modules, i.e. CEBN, SHAM and CHAM. To be specific, firstly, we introduce a context-enhanced backbone network by employing multi-head self-attention to extract initial features for the input chest x-ray images. Then, we develop a novel SCHA which contains both spatial and channel high-order attention learning branches. For the spatial branch, we develop a novel local biased self-attention mechanism which can capture both local and long-range global dependences of positions to learn rich context-aware representation. For the channel branch, we employ Brownian Distance Covariance to encode the correlation information of channels and regard it as the image representation. Finally, the two learning branches are integrated together for the final multi-label diagnosis classification and prediction.Main results. Experiments on the commonly used datasets including ChestX-ray14 and CheXpert demonstrate that our proposed SCHA approach can obtain better performance when comparing many related approaches.Significance. This study obtains a more discriminative method for chest x-ray classification and provides a technique for computer-aided diagnosis.
Collapse
Affiliation(s)
- Xinyue Gao
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Bo Jiang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Xixi Wang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Lili Huang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Zhengzheng Tu
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| |
Collapse
|
6
|
Yun J, Jiang D, Huang L, Tao B, Liao S, Liu Y, Liu X, Li G, Chen D, Chen B. Grasping detection of dual manipulators based on Markov decision process with neural network. Neural Netw 2024; 169:778-792. [PMID: 38000180 DOI: 10.1016/j.neunet.2023.09.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 09/03/2023] [Accepted: 09/07/2023] [Indexed: 11/26/2023]
Abstract
With the development of artificial intelligence, robots are widely used in various fields, grasping detection has been the focus of intelligent robot research. A dual manipulator grasping detection model based on Markov decision process is proposed to realize the stable grasping with complex multiple objects in this paper. Based on the principle of Markov decision process, the cross entropy convolutional neural network and full convolutional neural network are used to parameterize the grasping detection model of dual manipulators which are two-finger manipulator and vacuum sucker manipulator for multi-objective unknown objects. The data set generated in the simulated environment is used to train the two grasping detection networks. By comparing the grasping quality of the detection network output the best grasping by the two grasping methods, the network with better detection effect corresponding to the two grasping methods of two-finger and vacuum sucker is determined, and the dual manipulator grasping detection model is constructed in this paper. Robot grasping experiments are carried out, and the experimental results show that the proposed dual manipulator grasping detection method achieves 90.6% success rate, which is much higher than the other groups of experiments. The feasibility and superiority of the dual manipulator grasping detection method based on Markov decision process are verified.
Collapse
Affiliation(s)
- Juntong Yun
- Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China; Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China
| | - Du Jiang
- Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China; Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of science and Technology, Wuhan 430081, China; Hubei Longzhong Laboratory, Xiangyang 441000, Hubei, China.
| | - Li Huang
- College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan 430081, China
| | - Bo Tao
- Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China; Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China; Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of science and Technology, Wuhan 430081, China
| | - Shangchun Liao
- Hubei Longzhong Laboratory, Xiangyang 441000, Hubei, China
| | - Ying Liu
- Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China; Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of science and Technology, Wuhan 430081, China.
| | - Xin Liu
- Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China; Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China
| | - Gongfa Li
- Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China; Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of science and Technology, Wuhan 430081, China; Hubei Longzhong Laboratory, Xiangyang 441000, Hubei, China.
| | - Disi Chen
- Robotics and machine vision, Bristol Robotics Lab, University of the West of England, United Kingdom
| | - Baojia Chen
- Hubei Key Laboratory of Hydroelectric Machinery Design& Maintenance, China Three Gorges University, Yichang 443002, China
| |
Collapse
|
7
|
Han H, Zhang Q, Li F, Du Y. Spatial oblivion channel attention targeting intra-class diversity feature learning. Neural Netw 2023; 167:10-21. [PMID: 37619510 DOI: 10.1016/j.neunet.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 07/21/2023] [Accepted: 07/24/2023] [Indexed: 08/26/2023]
Abstract
Convolutional neural networks (CNNs) have successfully driven many visual recognition tasks including image classification. However, when dealing with classification tasks with intra-class sample style diversity, the network tends to be disturbed by more diverse features, resulting in limited feature learning. In this article, a spatial oblivion channel attention (SOCA) for intra-class diversity feature learning is proposed. Specifically, SOCA performs spatial structure oblivion in a progressive regularization for each channel after convolution, so that the network is not restricted to a limited feature learning, and pays attention to more regionally detailed features. Further, SOCA reassigns channel weights in the progressively oblivious feature space from top to bottom along the channel direction, to ensure the network learns more image details in an orderly manner while not falling into feature redundancy. Experiments are conducted on the standard classification dataset CIFAR-10/100 and two garbage datasets with intra-class diverse styles. SOCA improves SqueezeNet, MobileNet, BN-VGG-19, Inception and ResNet-50 in classification accuracy by 1.31%, 1.18%, 1.57%, 2.09% and 2.27% on average, respectively. The feasibility and effectiveness of intra-class diversity feature learning in SOCA-enhanced networks are verified. Besides, the class activation map shows that more local detail feature regions are activated by adding the SOCA module, which also demonstrates the interpretability of the method for intra-class diversity feature learning.
Collapse
Affiliation(s)
- Honggui Han
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Engineering Research Center of Digital Community Ministry of Education, Beijing University of Technology, Beijing 100124, China; Beijing Artificial Intelligence Institute and Beijing Laboratory for Intelligent Environmental Protection, Beijing 100124, China.
| | - Qiyu Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China
| | - Fangyu Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Engineering Research Center of Digital Community Ministry of Education, Beijing University of Technology, Beijing 100124, China; Beijing Artificial Intelligence Institute and Beijing Laboratory for Intelligent Environmental Protection, Beijing 100124, China
| | - Yongping Du
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
8
|
Huang X, Zhang B, Feng S, Ye Y, Li X. Interpretable local flow attention for multi-step traffic flow prediction. Neural Netw 2023; 161:25-38. [PMID: 36735998 DOI: 10.1016/j.neunet.2023.01.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/09/2022] [Accepted: 01/19/2023] [Indexed: 01/29/2023]
Abstract
Traffic flow prediction (TFP) has attracted increasing attention with the development of smart city. In the past few years, neural network-based methods have shown impressive performance for TFP. However, most of previous studies fail to explicitly and effectively model the relationship between inflows and outflows. Consequently, these methods are usually uninterpretable and inaccurate. In this paper, we propose an interpretable local flow attention (LFA) mechanism for TFP, which yields three advantages. (1) LFA is flow-aware. Different from existing works, which blend inflows and outflows in the channel dimension, we explicitly exploit the correlations between flows with a novel attention mechanism. (2) LFA is interpretable. It is formulated by the truisms of traffic flow, and the learned attention weights can well explain the flow correlations. (3) LFA is efficient. Instead of using global spatial attention as in previous studies, LFA leverages the local mode. The attention query is only performed on the local related regions. This not only reduces computational cost but also avoids false attention. Based on LFA, we further develop a novel spatiotemporal cell, named LFA-ConvLSTM (LFA-based convolutional long short-term memory), to capture the complex dynamics in traffic data. Specifically, LFA-ConvLSTM consists of three parts. (1) A ConvLSTM module is utilized to learn flow-specific features. (2) An LFA module accounts for modeling the correlations between flows. (3) A feature aggregation module fuses the above two to obtain a comprehensive feature. Extensive experiments on two real-world datasets show that our method achieves a better prediction performance. We improve the RMSE metric by 3.2%-4.6%, and the MAPE metric by 6.2%-6.7%. Our LFA-ConvLSTM is also almost 32% faster than global self-attention ConvLSTM in terms of prediction time. Furthermore, we also present some visual results to analyze the learned flow correlations.
Collapse
Affiliation(s)
- Xu Huang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.
| | - Bowen Zhang
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China.
| | - Shanshan Feng
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.
| | - Yunming Ye
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China.
| | - Xutao Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China.
| |
Collapse
|
9
|
Olimov B, Subramanian B, Ugli RAA, Kim JS, Kim J. Consecutive multiscale feature learning-based image classification model. Sci Rep 2023; 13:3595. [PMID: 36869132 PMCID: PMC9984458 DOI: 10.1038/s41598-023-30480-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 02/23/2023] [Indexed: 03/05/2023] Open
Abstract
Extracting useful features at multiple scales is a crucial task in computer vision. The emergence of deep-learning techniques and the advancements in convolutional neural networks (CNNs) have facilitated effective multiscale feature extraction that results in stable performance improvements in numerous real-life applications. However, currently available state-of-the-art methods primarily rely on a parallel multiscale feature extraction approach, and despite exhibiting competitive accuracy, the models lead to poor results in efficient computation and low generalization on small-scale images. Moreover, efficient and lightweight networks cannot appropriately learn useful features, and this causes underfitting when training with small-scale images or datasets with a limited number of samples. To address these problems, we propose a novel image classification system based on elaborate data preprocessing steps and a carefully designed CNN model architecture. Specifically, we present a consecutive multiscale feature-learning network (CMSFL-Net) that employs a consecutive feature-learning approach based on the usage of various feature maps with different receptive fields to achieve faster training/inference and higher accuracy. In the conducted experiments using six real-life image classification datasets, including small-scale, large-scale, and limited data, the CMSFL-Net exhibits an accuracy comparable with those of existing state-of-the-art efficient networks. Moreover, the proposed system outperforms them in terms of efficiency and speed and achieves the best results in accuracy-efficiency trade-off.
Collapse
Affiliation(s)
- Bekhzod Olimov
- AI Department, IT Convergence R &D Center, Vitasoft, Seoul, South Korea
| | - Barathi Subramanian
- School of Computer Science and Engineering, Kyungpook National University, Daegu, 41586, South Korea
| | | | - Jea-Soo Kim
- School of Computer Science and Engineering, Kyungpook National University, Daegu, 41586, South Korea
| | - Jeonghong Kim
- School of Computer Science and Engineering, Kyungpook National University, Daegu, 41586, South Korea.
| |
Collapse
|
10
|
Liu B, Han Z, Chen X, Shao W, Jia H, Wang Y, Tang Y. A novel compact design of convolutional layers with spatial transformation towards lower-rank representation for image classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Xu Z, Li J, Meng Y, Zhang X. CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring. SENSORS (BASEL, SWITZERLAND) 2022; 22:4331. [PMID: 35746116 PMCID: PMC9229694 DOI: 10.3390/s22124331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/21/2022] [Accepted: 06/02/2022] [Indexed: 06/15/2023]
Abstract
Real-time coal mine intelligent monitoring for pedestrian identifying and positioning is an important means to ensure safety in production. Traditional object detection models based on neural networks require significant computational and storage resources, which results in difficulty of deploying models on edge devices for real-time intelligent monitoring. To address the above problems, CAP-YOLO (Channel Attention based Pruning YOLO) and AEPSM (adaptive image enhancement parameter selection module) are proposed in this paper to achieve real-time intelligent analysis for coal mine surveillance videos. Firstly, DCAM (Deep Channel Attention Module) is proposed to evaluate the importance level of channels in YOLOv3. Secondly, the filters corresponding to the low importance channels are pruned to generate CAP-YOLO, which recovers the accuracy through fine-tuning. Finally, considering the lighting environments are varied in different coal mine fields, AEPSM is proposed to select parameters for CLAHE (Contrast Limited Adaptive Histogram Equalization) under different fields. Experiment results show that the weight size of CAP-YOLO is 8.3× smaller than YOLOv3, but only 7% lower than mAP, and the inference speed of CAP-YOLO is three times faster than that of YOLOv3. On NVIDIA Jetson TX2, CAP-YOLO realizes 31 FPS inference speed.
Collapse
|
12
|
Automated grading of diabetic retinopathy using CNN with hierarchical clustering of image patches by siamese network. Phys Eng Sci Med 2022; 45:623-635. [PMID: 35587313 DOI: 10.1007/s13246-022-01129-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 04/19/2022] [Indexed: 10/18/2022]
Abstract
Diabetic retinopathy (DR) is a progressive vascular complication that affects people who have diabetes. This retinal abnormality can cause irreversible vision loss or permanent blindness; therefore, it is crucial to undergo frequent eye screening for early recognition and treatment. This paper proposes a feature extraction algorithm using discriminative multi-sized patches, based on deep learning convolutional neural network (CNN) for DR grading. This comprehensive algorithm extracts local and global features for efficient decision-making. Each input image is divided into small-sized patches to extract local-level features and then split into clusters or subsets. Hierarchical clustering by Siamese network with pre-trained CNN is proposed in this paper to select clusters with more discriminative patches. The fine-tuned Xception model of CNN is used to extract the global-level features of larger image patches. Local and global features are combined to improve the overall image-wise classification accuracy. The final support vector machine classifier exhibits 96% of classification accuracy with tenfold cross-validation in classifying DR images.
Collapse
|