1
|
Yang J, Chen G, Huang J, Ma D, Liu J, Zhu H. GLE-net: global-local information enhancement for semantic segmentation of remote sensing images. Sci Rep 2024; 14:25282. [PMID: 39455717 PMCID: PMC11512047 DOI: 10.1038/s41598-024-76622-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 10/15/2024] [Indexed: 10/28/2024] Open
Abstract
Remote sensing (RS) images contain a wealth of information with expansive potential for applications in image segmentation. However, Convolutional Neural Networks (CNN) face challenges in fully harnessing the global contextual information. Leveraging the formidable capabilities of global information modeling with Swin-Transformer, a novel RS images segmentation model with CNN (GLE-Net) was introduced. This integration gives rise to a revamped encoder structure. The subbranch initiates the process by extracting features at varying scales within the RS images using the Multiscale Feature Fusion Module (MFM), acquiring rich semantic information, discerning localized finer features, and adeptly handling occlusions. Subsequently, Feature Compression Module (FCM) is introduced in main branch to downsize the feature map, effectively reducing information loss while preserving finer details, enhancing segmentation accuracy for smaller targets. Finally, we integrate local features and global features through Spatial Information Enhancement Module (SIEM) for comprehensive feature modeling, augmenting the segmentation capabilities of model. We performed experiments on public datasets provided by ISPRS, yielding notably remarkable experimental outcomes. This underscores the substantial potential of our model in the realm of RS image segmentation within the context of scientific research.
Collapse
Affiliation(s)
- Junliang Yang
- Department of Intelligent Technology and Engineering, Chongqing University of Science and Technology, No.20, East University Town Road, Shapingba District, Chongqing, 401331, China
| | - Guorong Chen
- Department of Intelligent Technology and Engineering, Chongqing University of Science and Technology, No.20, East University Town Road, Shapingba District, Chongqing, 401331, China.
| | - Jiaming Huang
- Department of Intelligent Technology and Engineering, Chongqing University of Science and Technology, No.20, East University Town Road, Shapingba District, Chongqing, 401331, China
| | - Denglong Ma
- School of Mechanical Engineering, Xi'an Jiaotong University, No. 28, Xianning West Road, Xi'an, 710049, Shaanxi, China
| | - Jingcheng Liu
- Luzhou Vocational and Technical College, No.2, Changqiao Road, Longmatan District, Luzhou, 646299, Sichuan , China
| | - Huazheng Zhu
- Department of Intelligent Technology and Engineering, Chongqing University of Science and Technology, No.20, East University Town Road, Shapingba District, Chongqing, 401331, China
| |
Collapse
|
2
|
Liu H, Li J, Tong J, Li G, Wang Q, Zhang M. Real-time spatiotemporal action localization algorithm using improved CNNs architecture. Sci Rep 2024; 14:24772. [PMID: 39433784 PMCID: PMC11493972 DOI: 10.1038/s41598-024-73622-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 09/19/2024] [Indexed: 10/23/2024] Open
Abstract
This paper aims to propose a faster and more accurate network for human spatiotemporal action localization tasks. Like the YOWO model, we also use convolutional neural networks (CNNs) for feature extraction, but our model differs from YOWO in three significant ways: firstly, we don't use the feature fusion strategy, we only use spatial features extracted by 2D CNNs for action localization and spatiotemporal features extracted by 3D CNNs for action recognition; secondly, we make an improvement to the 2D CNNs network by introducing a coordinate attention mechanism and utilize the CIoU loss instead of the coordinate offset loss for bounding box regression; thirdly, we provide a more lightweight and faster spatiotemporal action localization architecture, which reduces the number of parameters by 21.76 million and achieves a speed of 39 fps on 16-frame input clips compared to the YOWO model. We test our model's performance on three public datasets: UCF-Sports, JHMDB-21 and UCF101-24. Compared with the YOWO model, we improve frame-mAP (@IoU 0.5) by 17.09% and 7.15% on the UCF-Sports and JHMDB-21 datasets, and for video-mAP, we improve by 2.7%, 8.7% and 14.4% at IoU thresholds of 0.2, 0.5 and 0.75 on the JHMDB-21 dataset.
Collapse
Affiliation(s)
- Hengshuai Liu
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China
| | - Jianjun Li
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China.
| | - Jiale Tong
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China
| | - Guang Li
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China
| | - Qian Wang
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China
| | - Ming Zhang
- School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014000, Inner Mongolia, China
| |
Collapse
|
3
|
Liu R, Jiang S, Ou J, Kouadio KL, Xiong B. Multifaceted anomaly detection framework for leachate monitoring in landfills. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 368:122130. [PMID: 39180823 DOI: 10.1016/j.jenvman.2024.122130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/13/2024] [Accepted: 08/05/2024] [Indexed: 08/27/2024]
Abstract
The imperative to preserve environmental resources has transcended traditional conservation efforts, becoming a crucial element for sustaining life. Our deep interconnectedness with the natural environment, which directly impacts our well-being, emphasizes this urgency. Contaminants such as leachate from landfills are increasingly threatening groundwater, a vital resource that provides drinking water for nearly half of the global population. This critical environmental threat requires advanced detection and monitoring solutions to effectively safeguard our groundwater resources. To address this pressing need, we introduce the Multifaceted Anomaly Detection Framework (MADF), which integrates Electrical Resistivity Tomography (ERT) with advanced machine learning models-Isolation Forest (IF), One-Class Support Vector Machines (OC-SVM), and Local Outlier Factor (LOF). MADF processes and analyzes ERT data, employing these hybrid machine learning models to identify and quantify anomaly signals accurately via the majority vote strategy. Applied to the Chaling landfill site in Zhuzhou, China, MADF demonstrated significant improvements in detection capability. The framework enhanced the precision of anomaly detection, evidenced by higher Youden Index values (≈ 6.216%), with a 30% increase in sensitivity and a 25% reduction in false positives compared to traditional ERT inversion methods. Indeed, these enhancements are crucial for effective environmental monitoring, where the cost of missing a leak could be catastrophic, and for reducing unnecessary interventions that can be resource-intensive. These results underscore MADF's potential as a robust tool for proactive environmental management, offering a scalable and adaptable solution for comprehensive landfill monitoring and pollution prevention across varied environmental settings.
Collapse
Affiliation(s)
- Rong Liu
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, 410083, China; Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha, Hunan, 410083, China.
| | - Shiyu Jiang
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, 410083, China; Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha, Hunan, 410083, China.
| | - Jian Ou
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, 410083, China; Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha, Hunan, 410083, China; Hunan Province Geological Disaster Survey and Monitoring Institute, Changsha, Hunan, 410004, China.
| | - Kouao Laurent Kouadio
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, 410083, China; Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha, Hunan, 410083, China; UFR des Sciences de la Terre et des Ressources Minières, Université Félix Houphouët-Boigny, 22, Abidjan, 22 BP 582, Cote d'Ivoire.
| | - Bo Xiong
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, 410083, China; Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha, Hunan, 410083, China.
| |
Collapse
|
4
|
Choi JH, Kim JH, Nasridinov A, Kim YS. Three-dimensional atrous inception module for crowd behavior classification. Sci Rep 2024; 14:14390. [PMID: 38909074 PMCID: PMC11193784 DOI: 10.1038/s41598-024-65003-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/15/2024] [Indexed: 06/24/2024] Open
Abstract
Recent advances in deep learning have led to a surge in computer vision research, including the recognition and classification of human behavior in video data. However, most studies have focused on recognizing individual behaviors, whereas recognizing crowd behavior remains a complex problem because of the large number of interactions and similar behaviors among individuals or crowds in video surveillance systems. To solve this problem, we propose a three-dimensional atrous inception module (3D-AIM) network, which is a crowd behavior classification model that uses atrous convolution to explore interactions between individuals or crowds. The 3D-AIM network is a 3D convolutional neural network that can use receptive fields of various sizes to effectively identify specific features that determine crowd behavior. To further improve the accuracy of the 3D-AIM network, we introduced a new loss function called the separation loss function. This loss function focuses the 3D-AIM network more on the features that distinguish one type of crowd behavior from another, thereby enabling a more precise classification. Finally, we demonstrate that the proposed model outperforms existing human behavior classification models in terms of accurately classifying crowd behaviors. These results suggest that the 3D-AIM network with a separation loss function can be valuable for understanding complex crowd behavior in video surveillance systems.
Collapse
Affiliation(s)
- Jong-Hyeok Choi
- Bigdata Research Institute, Chungbuk National University, Cheongju, 28644, South Korea
- Research Institute, AICON Company Co., Ltd, Seoul, 06774, South Korea
| | - Jeong-Hun Kim
- Bigdata Research Institute, Chungbuk National University, Cheongju, 28644, South Korea
| | - Aziz Nasridinov
- Bigdata Research Institute, Chungbuk National University, Cheongju, 28644, South Korea.
- Department of Computer Science, Chungbuk National University, Cheongju, 28644, South Korea.
| | - Yoo-Sung Kim
- Department of Artificial Intelligence, Inha University, Incheon, 22212, South Korea.
| |
Collapse
|