1
|
Liu T, Cao GZ, He Z, Xie S, Deng X. Multimodal Fusion Network for 3-D Lane Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6054-6066. [PMID: 38776206 DOI: 10.1109/tnnls.2024.3398654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
3-D lane detection is a challenging task due to the diversity of lanes, occlusion, dazzle light, and so on. Traditional methods usually use highly specialized handcrafted features and carefully designed postprocessing to detect them. However, these methods are based on strong assumptions and single modal so that they are easily scalable and have poor performance. In this article, a multimodal fusion network (MFNet) is proposed through using multihead nonlocal attention and feature pyramid for 3-D lane detection. It includes three parts: multihead deformable transformation (MDT) module, multidirectional attention feature pyramid fusion (MA-FPF) module, and top-view lane prediction (TLP) ones. First, MDT is presented to learn and mine multimodal features from RGB images, depth maps, and point cloud data (PCD) for achieving optimal lane feature extraction. Then, MA-FPF is designed to fuse multiscale features for presenting the vanish of lane features as the network deepens. Finally, TLP is developed to estimate 3-D lanes and predict their position. Experimental results on the 3-D lane synthetic and ONCE-3DLanes datasets demonstrate that the performance of the proposed MFNet outperforms the state-of-the-art methods in both qualitative and quantitative analyses and visual comparisons.
Collapse
|
2
|
Lv C, Zhang E, Qi G, Li F, Huo J. A lightweight parallel attention residual network for tile defect recognition. Sci Rep 2024; 14:21872. [PMID: 39300076 DOI: 10.1038/s41598-024-70570-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open
Abstract
In modern industrial production, permanent magnet motors are an indispensable part of industrial manufacturing. The quality of the magnetic tiles directly affects the working performance of the permanent magnet motors, making the detection of defects on the surface of magnetic tiles critically important. However, due to the small size of defects on the tile image and the reflectivity of the defective surface, the details of image characteristics are not prominently acquired.These problems bring a lot of difficulties for the recognition of magnetic tile defects. In this paper, a magnetic tile defect detection method is proposed for the probAlems of unclear image features and small defects. First, the image is processed using linear variation to enhance the image detail features. Then, by introducing the inverted bottleneck block structure in MobileNetV2, the Attention Parallel Residual Convolution Block (APR) is proposed, and the Lightweight Parallel Attention Residual Network (LPAR-Net) is built. In APR Block, 7 × 7 convolution is introduced so that the model can extract spatial features from a larger range, and weighted fusion of input images by residual structure. In addition, in this paper, CBAM is improved, split into two parts and inserted into APR Block. Finally, the mainstream image classification models and the LPAR-Net proposed in this paper are used for comparison, respectively. The experimental results show that the method achieves 93.63% accuracy on the adopted dataset, which is better than the existing mainstream image classification network models DenseNet, MobileNet, ConvNext and so on. In addition, this paper introduces a strip steel surface defect dataset and compares it with the above image classification model, which verifies that the detection method proposed in this paper still has strong recognition capability.
Collapse
Affiliation(s)
- Cheng Lv
- School of Mechanical Engineering, Xijing University, Xi'an, 710123, China
| | - Enxu Zhang
- School of Mechanical Engineering, Xijing University, Xi'an, 710123, China
| | - Guowei Qi
- School of Mechanical Engineering, Xijing University, Xi'an, 710123, China
| | - Fei Li
- School of Mechanical Engineering, Xijing University, Xi'an, 710123, China
| | - Jiaofei Huo
- School of Mechanical Engineering, Xijing University, Xi'an, 710123, China.
| |
Collapse
|
3
|
Zhao X, Lai L, Li Y, Zhou X, Cheng X, Chen Y, Huang H, Guo J, Wang G. A lightweight bladder tumor segmentation method based on attention mechanism. Med Biol Eng Comput 2024; 62:1519-1534. [PMID: 38308022 DOI: 10.1007/s11517-024-03018-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/05/2024] [Indexed: 02/04/2024]
Abstract
In the endoscopic images of bladder, accurate segmentation of different grade bladder tumor from blurred boundary regions and highly variable shapes is of great significance for doctors' diagnosis and patients' later treatment. We propose a nested attentional feature fusion segmentation network (NAFF-Net) based on the encoder-decoder structure formed by the combination of weighted pyramid pooling module (WPPM) and nested attentional feature fusion (NAFF). Among them, WPPM applies the cascade of atrous convolution to enhance the overall perceptual field while introducing adaptive weights to optimize multi-scale feature extraction, NAFF integrates deep semantic information into shallow feature maps, effectively focusing on edge and detail information in bladder tumor images. Additionally, a weighted mixed loss function is constructed to alleviate the impact of imbalance between positive and negative sample distribution on segmentation accuracy. Experiments illustrate the proposed NAFF-Net achieves better segmentation results compared to other mainstream models, with a MIoU of 84.05%, MPrecision of 91.52%, MRecall of 90.81%, and F1-score of 91.16%, and also achieves good results on the public datasets Kvasir-SEG and CVC-ClinicDB. Compared to other models, NAFF-Net has a smaller number of parameters, which is a significant advantage in model deployment.
Collapse
Affiliation(s)
- Xiushun Zhao
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
| | - Libing Lai
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Yunjiao Li
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
| | - Xiaochen Zhou
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Xiaofeng Cheng
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Yujun Chen
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Haohui Huang
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
| | - Jing Guo
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Gongxian Wang
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China.
| |
Collapse
|
4
|
Zhao X, Guo J, He Z, Jiang X, Lou H, Li D. CLAD-Net: cross-layer aggregation attention network for real-time endoscopic instrument detection. Health Inf Sci Syst 2023; 11:58. [PMID: 38028959 PMCID: PMC10678866 DOI: 10.1007/s13755-023-00260-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/05/2023] [Indexed: 12/01/2023] Open
Abstract
As medical treatments continue to advance rapidly, minimally invasive surgery (MIS) has found extensive applications across various clinical procedures. Accurate identification of medical instruments plays a vital role in comprehending surgical situations and facilitating endoscopic image-guided surgical procedures. However, the endoscopic instrument detection poses a great challenge owing to the narrow operating space, with various interfering factors (e.g. smoke, blood, body fluids) and inevitable issues (e.g. mirror reflection, visual obstruction, illumination variation) in the surgery. To promote surgical efficiency and safety in MIS, this paper proposes a cross-layer aggregated attention detection network (CLAD-Net) for accurate and real-time detection of endoscopic instruments in complex surgical scenarios. We propose a cross-layer aggregation attention module to enhance the fusion of features and raise the effectiveness of lateral propagation of feature information. We propose a composite attention mechanism (CAM) to extract contextual information at different scales and model the importance of each channel in the feature map, mitigate the information loss due to feature fusion, and effectively solve the problem of inconsistent target size and low contrast in complex contexts. Moreover, the proposed feature refinement module (RM) enhances the network's ability to extract target edge and detail information by adaptively adjusting the feature weights to fuse different layers of features. The performance of CLAD-Net was evaluated using a public laparoscopic dataset Cholec80 and another set of neuroendoscopic dataset from Sun Yat-sen University Cancer Center. From both datasets and comparisons, CLAD-Net achieves the A P 0.5 of 98.9% and 98.6%, respectively, that is better than advanced detection networks. A video for the real-time detection is presented in the following link: https://github.com/A0268/video-demo.
Collapse
Affiliation(s)
- Xiushun Zhao
- School of Automation, Guangdong University of Technology, Guangzhou, 510006 China
| | - Jing Guo
- School of Automation, Guangdong University of Technology, Guangzhou, 510006 China
| | - Zhaoshui He
- School of Automation, Guangdong University of Technology, Guangzhou, 510006 China
| | - Xiaobing Jiang
- Department of Neurosurgery, Sun Yat-Sen University Cancer Center, Guangzhou, 510006 China
| | - Haifang Lou
- Department of Gastroenterology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, 310006 China
| | - Depei Li
- Department of Neurosurgery, Sun Yat-Sen University Cancer Center, Guangzhou, 510006 China
| |
Collapse
|
5
|
Luo F, Cui Y, Wang X, Zhang Z, Liao Y. Adaptive rotation attention network for accurate defect detection on magnetic tile surface. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:17554-17568. [PMID: 37920065 DOI: 10.3934/mbe.2023779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Defect detection on magnetic tile surfaces is of great significance for the production monitoring of permanent magnet motors. However, it is challenging to detect the surface defects from the magnetic tile due to these issues: 1) Defects appear randomly on the surface of the magnetic tile; 2) the defects are tiny and often overwhelmed by the background. To address such problems, an Adaptive Rotation Attention Network (ARA-Net) is proposed for defect detection on the magnetic tile surface, where the Adaptive Rotation Convolution (ARC) module is devised to capture the random defects on the magnetic tile surface by learning multi-view feature maps, and then the Rotation Region Attention (RAA) module is designed to locate the small defects from the complicated background by focusing more attention on the defect features. Experiments conducted on the MTSD3C6K dataset demonstrate the proposed ARA-Net outperforms the state-of-the-art methods, further providing assistance for permanent magnet motor monitoring.
Collapse
Affiliation(s)
- Fang Luo
- School of Mechatronics and Automotive Engineering, Qingyuan Polytechnic, Qingyuan 511500, China
| | - Yuan Cui
- Department of Intelligent Control, Guangzhou Light Industry Vocational School, Guangzhou 510300, China
| | - Xu Wang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China
| | - Zhiliang Zhang
- School of Mechatronics and Automotive Engineering, Qingyuan Polytechnic, Qingyuan 511500, China
| | - Yong Liao
- Microelectronics and Optoelectronics Technology Key Laboratory of Hunan Higher Education, School of Physics and Electronic Electrical Engineering, Xiangnan University, Chenzhou 423000, China
| |
Collapse
|
6
|
Wang H, Yang X, Zhou B, Shi Z, Zhan D, Huang R, Lin J, Wu Z, Long D. Strip Surface Defect Detection Algorithm Based on YOLOv5. MATERIALS (BASEL, SWITZERLAND) 2023; 16:2811. [PMID: 37049103 PMCID: PMC10096323 DOI: 10.3390/ma16072811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/26/2023] [Accepted: 03/29/2023] [Indexed: 06/19/2023]
Abstract
In order to improve the detection accuracy of the surface defect detection of industrial hot rolled strip steel, the advanced technology of deep learning is applied to the surface defect detection of strip steel. In this paper, we propose a framework for strip surface defect detection based on a convolutional neural network (CNN). In particular, we propose a novel multi-scale feature fusion module (ATPF) for integrating multi-scale features and adaptively assigning weights to each feature. This module can extract semantic information at different scales more fully. At the same time, based on this module, we build a deep learning network, CG-Net, that is suitable for strip surface defect detection. The test results showed that it achieved an average accuracy of 75.9 percent (mAP50) in 6.5 giga floating-point operation (GFLOPs) and 105 frames per second (FPS). The detection accuracy improved by 6.3% over the baseline YOLOv5s. Compared with YOLOv5s, the reference quantity and calculation amount were reduced by 67% and 59.5%, respectively. At the same time, we also verify that our model exhibits good generalization performance on the NEU-CLS dataset.
Collapse
Affiliation(s)
- Han Wang
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Xiuding Yang
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Bei Zhou
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Zhuohao Shi
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Daohua Zhan
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Renbin Huang
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Jian Lin
- School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou 510006, China; (H.W.); (X.Y.); (B.Z.); (Z.S.); (D.Z.); (R.H.); (J.L.)
| | - Zhiheng Wu
- Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou 510070, China
- Guangdong Provincial Key Laboratory of Modern Control Technology, Guangzhou 510070, China
| | - Danfeng Long
- Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou 510070, China
- Guangdong Provincial Key Laboratory of Modern Control Technology, Guangzhou 510070, China
| |
Collapse
|