1
|
Zhang J, Sun S, Song W, Li Y, Teng Q. A novel convolutional neural network for enhancing the continuity of pavement crack detection. Sci Rep 2024; 14:30376. [PMID: 39639139 PMCID: PMC11621684 DOI: 10.1038/s41598-024-81119-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 11/25/2024] [Indexed: 12/07/2024] Open
Abstract
Pavement cracks affect the structural stability and safety of roads, making accurate identification of crack for assessing the extent of damage and evaluating road health. However, traditional convolutional neural networks often struggle with issues such as missed detection and false detection when extracting cracks. This paper introduces a network called CPCDNet, designed to maintain continuous extraction of pavement cracks. The model incorporates a Crack align module (CAM) and a Weighted Edge Cross Entropy Loss Function (WECEL) to enhance the continuity of crack extraction in complex environments. Experimental results show that the proposed model achieves mIoU scores of 77.71%, 80.36%, 91.19%, and 71.16% on the public datasets CFD, Crack500, Deepcrack537, and Gaps384, respectively. Compared to other networks, the proposed method improves the continuity and accuracy of crack extraction.
Collapse
Affiliation(s)
- Jinhe Zhang
- School of Geomatics, Liaoning Technical University, Fuxin, 123000, China
- Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin, 123000, China
| | - Shangyu Sun
- School of Geomatics, Liaoning Technical University, Fuxin, 123000, China.
- Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin, 123000, China.
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China.
| | - Weidong Song
- Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin, 123000, China
| | - Yuxuan Li
- School of Geomatics, Liaoning Technical University, Fuxin, 123000, China
- Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin, 123000, China
| | - Qiaoshuang Teng
- School of Geomatics, Liaoning Technical University, Fuxin, 123000, China
- Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin, 123000, China
| |
Collapse
|
2
|
Chakurkar PS, Vora D, Patil S, Kotecha K. Automated crack localization for road safety using contextual u-net with spatial-channel feature integration. MethodsX 2024; 13:102796. [PMID: 39669512 PMCID: PMC11637186 DOI: 10.1016/j.mex.2024.102796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 06/08/2024] [Indexed: 12/14/2024] Open
Abstract
Accurate and timely crack localization is crucial for road safety and maintenance, but image processing and hand-crafted feature engineering methods, often fail to distinguish cracks from background noise under diverse lighting and surface conditions. This paper proposes a framework utilizing contextual U-Net deep learning model to automatically localize cracks in road images. The framework design considers crack localization as a task of pixel-level segmenting, and analyzing each pixel in a road image to determine if it belongs to a crack or not. The proposed U-Net model uses a robust EfficientNet encoder to capture crucial details (spatial features) and overall patterns (channel-wise features) within the road image. This balanced approach helps the model learn effectively from both individual elements and the context of the images, leading to improved crack detection. A customized hierarchical attention mechanism is designed to make U-Net model contextually adaptive to focus on relevant features at different scales and resolutions for accurately localizing road cracks that can vary widely in size and shape. The model's effectiveness is demonstrated through extensive evaluations on the benchmarked and custom-made datasets.
Collapse
Affiliation(s)
- Priti S. Chakurkar
- Computer Science and Engineering, Symbiosis Institute of Technology Pune, Symbiosis International (Deemed University) (SIU), Lavale, Pune, Maharashtra, India
- School of Computer Engineering, Dr.Vishwanath Karad MIT WORLD, PEACE UNIVERSITY, Kothrud, Pune, Maharashtra, India
| | - Deepali Vora
- Computer Science and Engineering, Symbiosis Institute of Technology Pune, Symbiosis International (Deemed University) (SIU), Lavale, Pune, Maharashtra, India
| | - Shruti Patil
- Artificial Intelligence and Machine Learning (AIML) Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University) (SIU), Lavale, Pune, Maharashtra, India
| | - Ketan Kotecha
- Computer Science and Engineering, Symbiosis Institute of Technology Pune, Symbiosis International (Deemed University) (SIU), Lavale, Pune, Maharashtra, India
| |
Collapse
|
3
|
Li H, Hussin N, He D, Geng Z, Li S. Design of image segmentation model based on residual connection and feature fusion. PLoS One 2024; 19:e0309434. [PMID: 39361568 PMCID: PMC11449362 DOI: 10.1371/journal.pone.0309434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/12/2024] [Indexed: 10/05/2024] Open
Abstract
With the development of deep learning technology, convolutional neural networks have made great progress in the field of image segmentation. However, for complex scenes and multi-scale target images, the existing technologies are still unable to achieve effective image segmentation. In view of this, an image segmentation model based on residual connection and feature fusion is proposed. The model makes comprehensive use of the deep feature extraction ability of residual connections and the multi-scale feature integration ability of feature fusion. In order to solve the problem of background complexity and information loss in traditional image segmentation, experiments were carried out on two publicly available data sets. The results showed that in the ISPRS Vaihingen dataset and the Caltech UCSD Birds200 dataset, when the model completed the 56th and 84th iterations, respectively, the average accuracy of FRes-MFDNN was the highest, which was 97.89% and 98.24%, respectively. In the ISPRS Vaihingen dataset and the Caltech UCSD Birds200 dataset, when the system model ran to 0.20s and 0.26s, the F1 value of the FRes-MFDNN method was the largest, and the F1 value approached 100% infinitely. The FRes-MFDNN segmented four images in the ISPRS Vaihingen dataset, and the segmentation accuracy of images 1, 2, 3 and 4 were 91.44%, 92.12%, 94.02% and 91.41%, respectively. In practical applications, the MSRF-Net method, LBN-AA-SPN method, ARG-Otsu method, and FRes-MFDNN were used to segment unlabeled bird images. The results showed that the FRes-MFDNN was more complete in details, and the overall effect was significantly better than the other three models. Meanwhile, in ordinary scene images, although there was a certain degree of noise and occlusion, the model still accurately recognized and segmented the main bird images. The results show that compared with the traditional model, after FRes-MFDNN segmentation, the completeness, detail, and spatial continuity of pixels have been significantly improved, making it more suitable for complex scenes.
Collapse
Affiliation(s)
- Hong Li
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Norriza Hussin
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Dandan He
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
- Faculty of Engineering, Built Environment and Information Technology, SEGi University, Kota Damansara, Malaysia
| | - Zexun Geng
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
| | - Shengpu Li
- School of Information Engineering, Pingdingshan University, Pingdingshan, China
| |
Collapse
|
4
|
Yuan S, Zhang L, Dong R, Xiong J, Zheng J, Fu H, Gong P. Relational Part-Aware Learning for Complex Composite Object Detection in High-Resolution Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:6118-6131. [PMID: 38768005 DOI: 10.1109/tcyb.2024.3392474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
In high-resolution remote sensing images (RSIs), complex composite object detection (e.g., coal-fired power plant detection and harbor detection) is challenging due to multiple discrete parts with variable layouts leading to complex weak inter-relationship and blurred boundaries, instead of a clearly defined single object. To address this issue, this article proposes an end-to-end framework, i.e., relational part-aware network (REPAN), to explore the semantic correlation and extract discriminative features among multiple parts. Specifically, we first design a part region proposal network (P-RPN) to locate discriminative yet subtle regions. With butterfly units (BFUs) embedded, feature-scale confusion problems stemming from aliasing effects can be largely alleviated. Second, a feature relation Transformer (FRT) plumbs the depths of the spatial relationships by part-and-global joint learning, exploring correlations between various parts to enhance significant part representation. Finally, a contextual detector (CD) classifies and detects parts and the whole composite object through multirelation-aware features, where part information guides to locate the whole object. We collect three remote sensing object detection datasets with four categories to evaluate our method. Consistently surpassing the performance of state-of-the-art methods, the results of extensive experiments underscore the effectiveness and superiority of our proposed method.
Collapse
|
5
|
Zhang Y, Lu Y, Huo Z, Li J, Sun Y, Huang H. USSC-YOLO: Enhanced Multi-Scale Road Crack Object Detection Algorithm for UAV Image. SENSORS (BASEL, SWITZERLAND) 2024; 24:5586. [PMID: 39275498 PMCID: PMC11398180 DOI: 10.3390/s24175586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 08/25/2024] [Accepted: 08/28/2024] [Indexed: 09/16/2024]
Abstract
Road crack detection is of paramount importance for ensuring vehicular traffic safety, and implementing traditional detection methods for cracks inevitably impedes the optimal functioning of traffic. In light of the above, we propose a USSC-YOLO-based target detection algorithm for unmanned aerial vehicle (UAV) road cracks based on machine vision. The algorithm aims to achieve the high-precision detection of road cracks at all scale levels. Compared with the original YOLOv5s, the main improvements to USSC-YOLO are the ShuffleNet V2 block, the coordinate attention (CA) mechanism, and the Swin Transformer. First, to address the problem of large network computational spending, we replace the backbone network of YOLOv5s with ShuffleNet V2 blocks, reducing computational overhead significantly. Next, to reduce the problems caused by the complex background interference, we introduce the CA attention mechanism into the backbone network, which reduces the missed and false detection rate. Finally, we integrate the Swin Transformer block at the end of the neck to enhance the detection accuracy for small target cracks. Experimental results on our self-constructed UAV near-far scene road crack i(UNFSRCI) dataset demonstrate that our model reduces the giga floating-point operations per second (GFLOPs) compared to YOLOv5s while achieving a 6.3% increase in mAP@50 and a 12% improvement in mAP@ [50:95]. This indicates that the model remains lightweight meanwhile providing excellent detection performance. In future work, we will assess road safety conditions based on these detection results to prioritize maintenance sequences for crack targets and facilitate further intelligent management.
Collapse
Affiliation(s)
- Yanxiang Zhang
- College of Civil Engineering, Central South University of Forestry & Technology, Changsha 410004, China
| | - Yao Lu
- College of Information Science and Technology, Shihezi University, Shihezi 832003, China
| | - Zijian Huo
- College of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou 450001, China
| | - Jiale Li
- College of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou 450001, China
| | - Yurong Sun
- College of Computer and Mathematics, Central South University of Forestry & Technology, Changsha 410004, China
| | - Hao Huang
- College of Computer and Mathematics, Central South University of Forestry & Technology, Changsha 410004, China
| |
Collapse
|
6
|
Kong W, Li B, Wei K, Li D, Zhu J, Yu G. Dual contrast attention-guided multi-frequency fusion for multi-contrast MRI super-resolution. Phys Med Biol 2023; 69:015010. [PMID: 37944482 DOI: 10.1088/1361-6560/ad0b65] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/09/2023] [Indexed: 11/12/2023]
Abstract
Objective. Multi-contrast magnetic resonance (MR) imaging super-resolution (SR) reconstruction is an effective solution for acquiring high-resolution MR images. It utilizes anatomical information from auxiliary contrast images to improve the quality of the target contrast images. However, existing studies have simply explored the relationships between auxiliary contrast and target contrast images but did not fully consider different anatomical information contained in multi-contrast images, resulting in texture details and artifacts unrelated to the target contrast images.Approach. To address these issues, we propose a dual contrast attention-guided multi-frequency fusion (DCAMF) network to reconstruct SR MR images from low-resolution MR images, which adaptively captures relevant anatomical information and processes the texture details and low-frequency information from multi-contrast images in parallel. Specifically, after the feature extraction, a feature selection module based on a dual contrast attention mechanism is proposed to focus on the texture details of the auxiliary contrast images and the low-frequency features of the target contrast images. Then, based on the characteristics of the selected features, a high- and low-frequency fusion decoder is constructed to fuse these features. In addition, a texture-enhancing module is embedded in the high-frequency fusion decoder, to highlight and refine the texture details of the auxiliary contrast and target contrast images. Finally, the high- and low-frequency fusion process is constrained by integrating a deeply-supervised mechanism into the DCAMF network.Main results. The experimental results show that the DCAMF outperforms other state-of-the-art methods. The peak signal-to-noise ratio and structural similarity of DCAMF are 39.02 dB and 0.9771 on the IXI dataset and 37.59 dB and 0.9770 on the BraTS2018 dataset, respectively. The image recovery is further validated in segmentation tasks.Significance. Our proposed SR model can enhance the quality of MR images. The results of the SR study provide a reliable basis for clinical diagnosis and subsequent image-guided treatment.
Collapse
Affiliation(s)
- Weipeng Kong
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, People's Republic of China
| | - Baosheng Li
- Department of Radiation Oncology Physics, Shandong Cancer Hospital and Institute, Shandong Cancer Hospital affiliate to Shandong University, Jinan, People's Republic of China
| | - Kexin Wei
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, People's Republic of China
| | - Dengwang Li
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, People's Republic of China
| | - Jian Zhu
- Department of Radiation Oncology Physics, Shandong Cancer Hospital and Institute, Shandong Cancer Hospital affiliate to Shandong University, Jinan, People's Republic of China
| | - Gang Yu
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, People's Republic of China
| |
Collapse
|
7
|
Ren Z, Zhang H, Li Z. Improved YOLOv5 Network for Real-Time Object Detection in Vehicle-Mounted Camera Capture Scenarios. SENSORS (BASEL, SWITZERLAND) 2023; 23:4589. [PMID: 37430502 DOI: 10.3390/s23104589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 04/30/2023] [Accepted: 05/07/2023] [Indexed: 07/12/2023]
Abstract
Object detection in the process of driving is a convenient and efficient task. However, due to the complex transformation of the road environment and vehicle speed, the scale of the target will not only change significantly but also be accompanied by the phenomenon of motion blur, which will have a significant impact on the detection accuracy. In practical application scenarios, it is difficult for traditional methods to simultaneously take into account the need for real-time detection and high accuracy. To address the above problems, this study proposes an improved network based on YOLOv5, taking traffic signs and road cracks as detection objects and conducting separate research. This paper proposes a GS-FPN structure to replace the original feature fusion structure for road cracks. This structure integrates the convolutional block attention model (CBAM) based on bidirectional feature pyramid networks (Bi-FPN) and introduces a new lightweight convolution module (GSConv) to reduce the information loss of the feature map, enhance the expressive ability of the network, and ultimately achieve improved recognition performance. For traffic signs, a four-scale feature detection structure is used to increase the detection scale of shallow layers and improve the recognition accuracy for small targets. In addition, this study has combined various data augmentation methods to improve the robustness of the network. Through experiments using 2164 road crack datasets and 8146 traffic sign datasets made by LabelImg, compared to the baseline model (YOLOv5s), the modified YOLOv5 network improves the mean average precision (mAP) result of the road crack dataset and small targets in the traffic sign dataset by 3% and 12.2%, respectively.
Collapse
Affiliation(s)
- Zuyue Ren
- School of Information Engineering, Minzu University of China, Beijing 100080, China
| | - Hong Zhang
- School of Information Engineering, Minzu University of China, Beijing 100080, China
| | - Zan Li
- School of Information Engineering, Minzu University of China, Beijing 100080, China
| |
Collapse
|
8
|
Quan J, Ge B, Wang M. CrackViT: a unified CNN-transformer model for pixel-level crack extraction. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08277-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
9
|
Li H, Wang W, Wang M, Li L, Vimlund V. A review of deep learning methods for pixel-level crack detection. JOURNAL OF TRAFFIC AND TRANSPORTATION ENGINEERING (ENGLISH EDITION) 2022. [DOI: 10.1016/j.jtte.2022.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
10
|
Abstract
Cracks are widespread in infrastructure that are closely related to human activity. It is very popular to use artificial intelligence to detect cracks intelligently, which is known as crack detection. The noise in the background of crack images, discontinuity of cracks and other problems make the crack detection task a huge challenge. Although many approaches have been proposed, there are still two challenges: (1) cracks are long and complex in shape, making it difficult to capture long-range continuity; (2) most of the images in the crack dataset have noise, and it is difficult to detect only the cracks and ignore the noise. In this paper, we propose a novel method called Transformer-based Multi-scale Fusion Model (TransMF) for crack detection, including an Encoder Module (EM), Decoder Module (DM) and Fusion Module (FM). The Encoder Module uses a hybrid of convolution blocks and Swin Transformer block to model the long-range dependencies of different parts in a crack image from a local and global perspective. The Decoder Module is designed with symmetrical structure to the Encoder Module. In the Fusion Module, the output in each layer with unique scales of Encoder Module and Decoder Module are fused in the form of convolution, which can release the effect of background noise and strengthen the correlations between relevant context in order to enhance the crack detection. Finally, the output of each layer of the Fusion Module is concatenated to achieve the purpose of crack detection. Extensive experiments on three benchmark datasets (CrackLS315, CRKWH100 and DeepCrack) demonstrate that the proposed TransMF in this paper exceeds the best performance of present baselines.
Collapse
|
11
|
Chen GH, Ni J, Chen Z, Huang H, Sun YL, Ip WH, Yung KL. Detection of Highway Pavement Damage Based on a CNN Using Grayscale and HOG Features. SENSORS 2022; 22:s22072455. [PMID: 35408070 PMCID: PMC9002920 DOI: 10.3390/s22072455] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/14/2022] [Accepted: 03/18/2022] [Indexed: 11/16/2022]
Abstract
Aiming at the demand for rapid detection of highway pavement damage, many deep learning methods based on convolutional neural networks (CNNs) have been developed. However, CNN methods with raw image data require a high-performance hardware configuration and cost machine time. To reduce machine time and to apply the detection methods in common scenarios, the CNN structure with preprocessed image data needs to be simplified. In this work, a detection method based on a CNN and the combination of the grayscale and histogram of oriented gradients (HOG) features is proposed. First, the Gamma correction was employed to highlight the grayscale distribution of the damage area, which compresses the space of normal pavement. The preprocessed image was then divided into several unit cells, whose grayscale and HOG were calculated, respectively. The grayscale and HOG of each unit cell were combined to construct the grayscale-weighted HOG (GHOG) feature patterns. These feature patterns were input to the CNN with a specific structure and parameters. The trained indices suggested that the performance of the GHOG-based method was significantly improved, compared with the traditional HOG-based method. Furthermore, the GHOG-feature-based CNN technique exhibited flexibility and effectiveness under the same accuracy, in comparison to those deep learning techniques that directly deal with raw data. Since the grayscale has a definite physical meaning, the present detection method possesses a potential application for the further detection of damage details in the future.
Collapse
Affiliation(s)
- Guo-Hong Chen
- School of Information and Electrical Engineering, Zhejiang University City College, 51 Huzhou Street, Hangzhou 310015, China; (G.-H.C.); (J.N.); (Z.C.)
| | - Jie Ni
- School of Information and Electrical Engineering, Zhejiang University City College, 51 Huzhou Street, Hangzhou 310015, China; (G.-H.C.); (J.N.); (Z.C.)
| | - Zhuo Chen
- School of Information and Electrical Engineering, Zhejiang University City College, 51 Huzhou Street, Hangzhou 310015, China; (G.-H.C.); (J.N.); (Z.C.)
| | - Hao Huang
- Hubei Key Laboratory of Ferro- & Piezoelectric Materials and Devices, Faculty of Physics and Electronic Science, Hubei University, 368 Youyi Street, Wuhan 430062, China
- Key Laboratory of Wireless Sensor Network & Communication, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 865 Changning Road, Shanghai 200050, China
- Correspondence: (H.H.); (Y.-L.S.)
| | - Yun-Lei Sun
- School of Information and Electrical Engineering, Zhejiang University City College, 51 Huzhou Street, Hangzhou 310015, China; (G.-H.C.); (J.N.); (Z.C.)
- Correspondence: (H.H.); (Y.-L.S.)
| | - Wai Hung Ip
- Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China; (W.H.I.); (K.L.Y.)
| | - Kai Leung Yung
- Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China; (W.H.I.); (K.L.Y.)
| |
Collapse
|
12
|
An Automatic Surface Defect Detection Method with Residual Attention Network. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20500-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|