1
|
Song H, Xie H, Duan Y, Xie X, Gan F, Wang W, Liu J. Pure data correction enhancing remote sensing image classification with a lightweight ensemble model. Sci Rep 2025; 15:5507. [PMID: 39953086 PMCID: PMC11829047 DOI: 10.1038/s41598-025-89735-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 02/07/2025] [Indexed: 02/17/2025] Open
Abstract
The classification of remote sensing images is inherently challenging due to the complexity, diversity, and sparsity of the data across different image samples. Existing advanced methods often require substantial modifications to model architectures to achieve optimal performance, resulting in complex frameworks that are difficult to adapt. To overcome these limitations, we propose a lightweight ensemble method, enhanced by pure data correction, called the Exceptionally Straightforward Ensemble. This approach eliminates the need for extensive structural modifications to models. A key innovation in our method is the introduction of a novel strategy, quantitative augmentation, implemented through a plug-and-play module. This strategy effectively corrects feature distributions across remote sensing data, significantly improving the performance of Convolutional Neural Networks and Vision Transformers beyond traditional data augmentation techniques. Furthermore, we propose a straightforward algorithm to generate an ensemble network composed of two components, serving as the proposed lightweight classifier. We evaluate our method on three well-known datasets, with results demonstrating that our ensemble models outperform 48 state-of-the-art methods published since 2020, excelling in accuracy, inference speed, and model compactness. Specifically, our models achieve an overall accuracy of up to 96.8%, representing a 1.1% improvement on the challenging NWPU45 dataset. Moreover, the smallest model in our ensemble reduces parameters by up to 90% and inference time by 74%. Notably, our approach significantly enhances the performance of Convolutional Neural Networks and Vision Transformers, even with limited training data, thus alleviating the performance dependence on large-scale datasets. In summary, our data-driven approach offers an efficient, accessible solution for remote sensing image classification, providing an elegant alternative for researchers in geoscience fields who may have limited time or resources for model optimization.
Collapse
Affiliation(s)
- Huaxiang Song
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China.
| | - Hanglu Xie
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| | - Yingying Duan
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| | - Xinyi Xie
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| | - Fang Gan
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| | - Wei Wang
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| | - Jinling Liu
- School of Geography Science and Tourism, Hunan University of Arts and Science, Changde, 415000, China
| |
Collapse
|
2
|
Zhang M, Bai H, Shang W, Guo J, Li Y, Gao X. MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality Enhancement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2410-2422. [PMID: 38285580 DOI: 10.1109/tnnls.2024.3354982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.
Collapse
|
3
|
Zhong L, Chen Z, Wu Z, Du S, Chen Z, Wang S. Learnable Graph Convolutional Network With Semisupervised Graph Information Bottleneck. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:433-446. [PMID: 37847634 DOI: 10.1109/tnnls.2023.3322739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
Graph convolutional network (GCN) has gained widespread attention in semisupervised classification tasks. Recent studies show that GCN-based methods have achieved decent performance in numerous fields. However, most of the existing methods generally adopted a fixed graph that cannot dynamically capture both local and global relationships. This is because the hidden and important relationships may not be directed exhibited in the fixed structure, causing the degraded performance of semisupervised classification tasks. Moreover, the missing and noisy data yielded by the fixed graph may result in wrong connections, thereby disturbing the representation learning process. To cope with these issues, this article proposes a learnable GCN-based framework, aiming to obtain the optimal graph structures by jointly integrating graph learning and feature propagation in a unified network. Besides, to capture the optimal graph representations, this article designs dual-GCN-based meta-channels to simultaneously explore local and global relations during the training process. To minimize the interference of the noisy data, a semisupervised graph information bottleneck (SGIB) is introduced to conduct the graph structural learning (GSL) for acquiring the minimal sufficient representations. Concretely, SGIB aims to maximize the mutual information of both the same and different meta-channels by designing the constraints between them, thereby improving the node classification performance in the downstream tasks. Extensive experimental results on real-world datasets demonstrate the robustness of the proposed model, which outperforms state-of-the-art methods with fixed-structure graphs.
Collapse
|
4
|
Zhang Y, Gao X, Duan Q, Leng J, Pu X, Gao X. Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18590-18604. [PMID: 37792649 DOI: 10.1109/tnnls.2023.3319363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, Transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution ( pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive Transformer grows quadratically with the increase in image size, which prevents Transformer-based models from VHR RS image ( pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and, therefore, propose an efficient complex SA (CSA) mechanism. Benefiting from the conjugated symmetric property of DFT, CSA is capable to model the high-order contextual information with less than half computations of naive SA. To overcome the gradient explosion in Fourier complex field, we replace the Softmax function with the carefully designed Logmax function to normalize the attention map of CSA and stabilize the gradient propagation. By stacking various layers of CSA blocks, we propose the Fourier complex Transformer (FCT) model to learn global contextual information from VHR aerial images following the hierarchical manners. Universal experiments conducted on commonly used RS classification datasets demonstrate the effectiveness and efficiency of FCT, especially on VHR RS images. The source code of FCT will be available at https://github.com/Gao-xiyuan/FCT.
Collapse
|
5
|
Chen J, Jiao L, Liu X, Liu F, Li L, Yang S. Multiresolution Interpretable Contourlet Graph Network for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17716-17729. [PMID: 37747859 DOI: 10.1109/tnnls.2023.3307721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
Modeling contextual relationships in images as graph inference is an interesting and promising research topic. However, existing approaches only perform graph modeling of entities, ignoring the intrinsic geometric features of images. To overcome this problem, a novel multiresolution interpretable contourlet graph network (MICGNet) is proposed in this article. MICGNet delicately balances graph representation learning with the multiscale and multidirectional features of images, where contourlet is used to capture the hyperplanar directional singularities of images and multilevel sparse contourlet coefficients are encoded into graph for further graph representation learning. This process provides interpretable theoretical support for optimizing the model structure. Specifically, first, the superpixel-based region graph is constructed. Then, the region graph is applied to code the nonsubsampled contourlet transform (NSCT) coefficients of the image, which are considered as node features. Considering the statistical properties of the NSCT coefficients, we calculate the node similarity, i.e., the adjacency matrix, using Mahalanobis distance. Next, graph convolutional networks (GCNs) are employed to further learn more abstract multilevel NSCT-enhanced graph representations. Finally, the learnable graph assignment matrix is designed to get the geometric association representations, which accomplish the assignment of graph representations to grid feature maps. We conduct comparative experiments on six publicly available datasets, and the experimental analysis shows that MICGNet is significantly more effective and efficient than other algorithms of recent years.
Collapse
|
6
|
Tang J. The deep convolution network in immersive design of digital media art in smart city. Sci Rep 2024; 14:28219. [PMID: 39548292 PMCID: PMC11568182 DOI: 10.1038/s41598-024-79742-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 11/12/2024] [Indexed: 11/17/2024] Open
Abstract
The goal of the paper is to examine how convolutional neural network (CNN) is used in the immersive world of digital media art. In order to do this, this paper first explains digital media art in the context of smart cities and the use of immersive scenarios. Next, a brief analysis of the Unet network model and deep aggregation structure is provided. Then, a projector, camera, and computer-based immersive projector-camera display system is constructed. Based on the mathematical reflection model of the system, this paper discusses the method of using CNN to solve the photometric compensation of the projection picture. Meanwhile, a Projection Compensation Network (PCN) is designed, and a multi-scale perceptual loss is added to this compensation network, and the content information of the compensated image is improved by calculating the loss of feature maps of different scales. The final network is named Perceptual Loss-Projection Compensation Network (PL-PCN). Experiments are used to confirm the PL-PCN model's efficacy. The outcomes demonstrate that the SSIM and PSNR of the projected picture compensated by PL-PCN are boosted by 35.8% and 31.6%, respectively. While the RMSE is reduced by 40.9%, demonstrating an improvement in the compensated image's quality. Additionally, utilizing a CNN makes it possible to do cross-reflection compensation. Additionally, compared to the network without deep polymerization, the PL-PCN with deep polymerization structure boosts the projected image's SSIM and PSNR by 6% and 7.57%, respectively, and lowers the RMSE by 13.3% This demonstrates that the addition of deep polymerization structure can have a stronger compensatory effect. This paper can offer a theoretical framework for improving the immersive scene quality of digital media art.
Collapse
Affiliation(s)
- Jiao Tang
- School of Fine Arts & Colored Lantern, Sichuan University of Science & Engineering, Zigong, 634000, China.
| |
Collapse
|
7
|
Liu D, Zhou W, Zhou L, Guan W. Target detection of helicopter electric power inspection based on the feature embedding convolution model. PLoS One 2024; 19:e0311278. [PMID: 39374316 PMCID: PMC11458054 DOI: 10.1371/journal.pone.0311278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 09/17/2024] [Indexed: 10/09/2024] Open
Abstract
This study aims to improve the helicopter electric power inspection process by using the feature embedding convolution (FEC) model to solve the problems of small scope and poor real-time inspection. First, simulation experiments and model analysis determine the keyframe and flight trajectory. Second, an improved FEC model is proposed, extracting features from aerial images in large ranges in real time and accurately identifying and classifying electric power inspection targets. In the simulation experiment, the accuracy of the model in electric power circuit and equipment detection is improved by 30% compared with the traditional algorithm, and the inspection range is expanded by 26%. In addition, this study further optimizes the model with reinforcement learning technology, conducts a comparative analysis of different flight environments and facilities, and reveals the diversity and complexity of inspection objectives. The performance of the optimized model in fault detection is increased by more than 36%. In conclusion, the proposed model improves the accuracy and scope of inspection, provides a more scientific strategy for electric power inspection, and ensures inspection efficiency.
Collapse
Affiliation(s)
- Dakun Liu
- School of Mechanical Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu Province, P. R. China
| | - Wei Zhou
- School of Mechanical Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu Province, P. R. China
| | - Linzhen Zhou
- School of Mechanical Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu Province, P. R. China
| | - Wen Guan
- School of Mechanical Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu Province, P. R. China
| |
Collapse
|
8
|
Liu Q, Yue J, Kuang Y, Xie W, Fang L. SemiRS-COC: Semi-Supervised Classification for Complex Remote Sensing Scenes With Cross-Object Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3855-3870. [PMID: 38896517 DOI: 10.1109/tip.2024.3414122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Semi-supervised learning (SSL), which aims to learn with limited labeled data and massive amounts of unlabeled data, offers a promising approach to exploit the massive amounts of satellite Earth observation images. The fundamental concept underlying most state-of-the-art SSL methods involves generating pseudo-labels for unlabeled data based on image-level predictions. However, complex remote sensing (RS) scene images frequently encounter challenges, such as interference from multiple background objects and significant intra-class differences, resulting in unreliable pseudo-labels. In this paper, we propose the SemiRS-COC, a novel semi-supervised classification method for complex RS scenes. Inspired by the idea that neighboring objects in feature space should share consistent semantic labels, SemiRS-COC utilizes the similarity between foreground objects in RS images to generate reliable object-level pseudo-labels, effectively addressing the issues of multiple background objects and significant intra-class differences in complex RS images. Specifically, we first design a Local Self-Learning Object Perception (LSLOP) mechanism, which transforms multiple background objects interference of RS images into usable annotation information, enhancing the model's object perception capability. Furthermore, we present a Cross-Object Consistency Pseudo-Labeling (COCPL) strategy, which generates reliable object-level pseudo-labels by comparing the similarity of foreground objects across different RS images, effectively handling significant intra-class differences. Extensive experiments demonstrate that our proposed method achieves excellent performance compared to state-of-the-art methods on three widely-adopted RS datasets.
Collapse
|
9
|
Gonçalves DN, Junior JM, Arruda MDSD, Fernandes VJM, Ramos APM, Furuya DEG, Osco LP, He H, Jorge LADC, Li J, Melgani F, Pistori H, Gonçalves WN. A deep learning approach based on graphs to detect plantation lines. Heliyon 2024; 10:e31730. [PMID: 38841473 PMCID: PMC11152659 DOI: 10.1016/j.heliyon.2024.e31730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 05/21/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024] Open
Abstract
Identifying plantation lines in aerial images of agricultural landscapes is re-quired for many automatic farming processes. Deep learning-based networks are among the most prominent methods to learn such patterns and extract this type of information from diverse imagery conditions. However, even state-of-the-art methods may stumble in complex plantation patterns. Here, we propose a deep learning approach based on graphs to detect plantation lines in UAV-based RGB imagery, presenting a challenging scenario containing spaced plants. The first module of our method extracts a feature map throughout the backbone, which consists of the initial layers of the VGG16. This feature map is used as an input to the Knowledge Estimation Module (KEM), organized in three concatenated branches for detecting 1) the plant positions, 2) the plantation lines, and 3) the displacement vectors between the plants. A graph modeling is applied considering each plant position on the image as vertices, and edges are formed between two vertices (i.e. plants). Finally, the edge is classified as pertaining to a certain plantation line based on three probabilities (higher than 0.5): i) in visual features obtained from the backbone; ii) a chance that the edge pixels belong to a line, from the KEM step; and iii) an alignment of the displacement vectors with the edge, also from the KEM step. Experiments were conducted initially in corn plantations with different growth stages and patterns with aerial RGB imagery to present the advantages of adopting each module. We assessed the generalization capability in the other two cultures (orange and eucalyptus) datasets. The proposed method was compared against state-of-the-art deep learning methods and achieved superior performance with a significant margin considering all three datasets. This approach is useful in extracting lines with spaced plantation patterns and could be implemented in scenarios where plantation gaps occur, generating lines with few-to-no interruptions.
Collapse
Affiliation(s)
- Diogo Nunes Gonçalves
- Faculty of Computer Science, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
| | - José Marcato Junior
- Faculty of Engineering, Architecture, and Urbanism and Geography, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
| | - Mauro dos Santos de Arruda
- Faculty of Computer Science, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
| | | | - Ana Paula Marques Ramos
- Faculty of Science and Technology, São Paulo State University (UNESP), R. Roberto Simonsen, 305, Presidente Prudente 19060-900, SP, Brazil
| | - Danielle Elis Garcia Furuya
- Program of Environment and Regional Developement, University of Western São Paulo, Raposo Tavares, km 572, Presidente Prudente, 19067-175, SP, Brazil
| | - Lucas Prado Osco
- Program of Environment and Regional Developement, University of Western São Paulo, Raposo Tavares, km 572, Presidente Prudente, 19067-175, SP, Brazil
| | - Hongjie He
- Department of Geography and Environmental Management, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Lucio André de Castro Jorge
- National Research Center of Development of Agricultural Instrumentation, Brazilian Agricultural Research Agency (EMBRAPA), 13560-970, R. XV de Novembro, 1452, São Carlos, SP, Brazil
| | - Jonathan Li
- Department of Geography and Environmental Management, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Farid Melgani
- Department of Information Engineering and Computer Science, University of Trento, Trento, 38122, Italy
| | - Hemerson Pistori
- Faculty of Computer Science, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
- INOVISAO, Dom Bosco Catholic University, Avenida Tamandaré, 6000, Campo Grande, 79117-900, MS, Brazil
| | - Wesley Nunes Gonçalves
- Faculty of Computer Science, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
- Faculty of Engineering, Architecture, and Urbanism and Geography, Federal University of Mato Grosso do Sul, Av. Costa e Silva, Campo Grande, 79070-900, MS, Brazil
| |
Collapse
|
10
|
Li Z, Hu J, Wu K, Miao J, Zhao Z, Wu J. Local feature acquisition and global context understanding network for very high-resolution land cover classification. Sci Rep 2024; 14:12597. [PMID: 38824153 PMCID: PMC11144191 DOI: 10.1038/s41598-024-63363-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/28/2024] [Indexed: 06/03/2024] Open
Abstract
Very high-resolution remote sensing images hold promising applications in ground observation tasks, paving the way for highly competitive solutions using image processing techniques for land cover classification. To address the challenges faced by convolutional neural network (CNNs) in exploring contextual information in remote sensing image land cover classification and the limitations of vision transformer (ViT) series in effectively capturing local details and spatial information, we propose a local feature acquisition and global context understanding network (LFAGCU). Specifically, we design a multidimensional and multichannel convolutional module to construct a local feature extractor aimed at capturing local information and spatial relationships within images. Simultaneously, we introduce a global feature learning module that utilizes multiple sets of multi-head attention mechanisms for modeling global semantic information, abstracting the overall feature representation of remote sensing images. Validation, comparative analyses, and ablation experiments conducted on three different scales of publicly available datasets demonstrate the effectiveness and generalization capability of the LFAGCU method. Results show its effectiveness in locating category attribute information related to remote sensing areas and its exceptional generalization capability. Code is available at https://github.com/lzp-lkd/LFAGCU .
Collapse
Affiliation(s)
- Zhengpeng Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China
| | - Jun Hu
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China.
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China.
| | - Kunyang Wu
- College of Instrumentation and Electrical Engineering, Jilin University, Changchun, China
- National Geophysical Exploration Equipment Engineering Research Center, Jilin University, Changchun, China
- Key Laboratory of Geophysical Exploration Equipment Ministry of Education of China (Jilin University), Changchun, China
| | - Jiawei Miao
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China
| | - Zixue Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, China
| | - Jiansheng Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, China
| |
Collapse
|
11
|
Wan Y, Zhong Y, Ma A, Wang J, Zhang L. E2SCNet: Efficient Multiobjective Evolutionary Automatic Search for Remote Sensing Image Scene Classification Network Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7752-7766. [PMID: 36395135 DOI: 10.1109/tnnls.2022.3220699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Remote sensing image scene classification methods based on deep learning have been widely studied and discussed. However, most of the network architectures are directly reliant on natural image processing methods and are fixed. A few studies have focused on automatic search mechanisms, but they cannot weigh the interpretation accuracy and the parameter quantity for practical application. As a result, automatic global search methods based on multiobjective evolutionary computation have more advantages. However, in the ranking process, the network individuals with large parameter quantities are easy to eliminate, but a higher accuracy may be obtained after full training. In addition, evolutionary neural architecture search methods often take several days. In this article, in order to solve the above concerns, we propose an efficient multiobjective evolutionary automatic search framework for remote sensing image scene classification deep learning network architectures (E2SCNet). In E2SCNet, eight kinds of lightweight operators are used to build a diversified search space, and the coding connection mode is flexible. In the search process, a large model retention mechanism is implemented through two-step multiobjective modeling and evolutionary search, where one step involves the "parameter quantity and accuracy," and the other step involves the "parameter quantity and accuracy growth quantity." Moreover, a super network is constructed to share the weight in the process of individual network evaluation and promote the search speed. The effectiveness of E2SCNet is proven by comparison with several networks designed by human experts and networks obtained by gradient and evolutionary computing-based search methods.
Collapse
|
12
|
Huang CQ, Jiang F, Huang QH, Wang XZ, Han ZM, Huang WY. Dual-Graph Attention Convolution Network for 3-D Point Cloud Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4813-4825. [PMID: 35385393 DOI: 10.1109/tnnls.2022.3162301] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Three-dimensional point cloud classification is fundamental but still challenging in 3-D vision. Existing graph-based deep learning methods fail to learn both low-level extrinsic and high-level intrinsic features together. These two levels of features are critical to improving classification accuracy. To this end, we propose a dual-graph attention convolution network (DGACN). The idea of DGACN is to use two types of graph attention convolution operations with a feedback graph feature fusion mechanism. Specifically, we exploit graph geometric attention convolution to capture low-level extrinsic features in 3-D space. Furthermore, we apply graph embedding attention convolution to learn multiscale low-level extrinsic and high-level intrinsic fused graph features together. Moreover, the points belonging to different parts in real-world 3-D point cloud objects are distinguished, which results in more robust performance for 3-D point cloud classification tasks than other competitive methods, in practice. Our extensive experimental results show that the proposed network achieves state-of-the-art performance on both the synthetic ModelNet40 and real-world ScanObjectNN datasets.
Collapse
|
13
|
Alkendi Y, Azzam R, Ayyad A, Javed S, Seneviratne L, Zweiri Y. Neuromorphic Camera Denoising Using Graph Neural Network-Driven Transformers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4110-4124. [PMID: 36107888 DOI: 10.1109/tnnls.2022.3201830] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Neuromorphic vision is a bio-inspired technology that has triggered a paradigm shift in the computer vision community and is serving as a key enabler for a wide range of applications. This technology has offered significant advantages, including reduced power consumption, reduced processing needs, and communication speedups. However, neuromorphic cameras suffer from significant amounts of measurement noise. This noise deteriorates the performance of neuromorphic event-based perception and navigation algorithms. In this article, we propose a novel noise filtration algorithm to eliminate events that do not represent real log-intensity variations in the observed scene. We employ a graph neural network (GNN)-driven transformer algorithm, called GNN-Transformer, to classify every active event pixel in the raw stream into real log-intensity variation or noise. Within the GNN, a message-passing framework, referred to as EventConv, is carried out to reflect the spatiotemporal correlation among the events while preserving their asynchronous nature. We also introduce the known-object ground-truth labeling (KoGTL) approach for generating approximate ground-truth labels of event streams under various illumination conditions. KoGTL is used to generate labeled datasets, from experiments recorded in challenging lighting conditions, including moon light. These datasets are used to train and extensively test our proposed algorithm. When tested on unseen datasets, the proposed algorithm outperforms state-of-the-art methods by at least 8.8% in terms of filtration accuracy. Additional tests are also conducted on publicly available datasets (ETH Zürich Color-DAVIS346 datasets) to demonstrate the generalization capabilities of the proposed algorithm in the presence of illumination variations and different motion dynamics. Compared to state-of-the-art solutions, qualitative results verified the superior capability of the proposed algorithm to eliminate noise while preserving meaningful events in the scene.
Collapse
|
14
|
Zhang DJ, Gao YL, Zhao JX, Zheng CH, Liu JX. A New Graph Autoencoder-Based Consensus-Guided Model for scRNA-seq Cell Type Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2473-2483. [PMID: 35857730 DOI: 10.1109/tnnls.2022.3190289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology is famous for providing a microscopic view to help capture cellular heterogeneity. This characteristic has advanced the field of genomics by enabling the delicate differentiation of cell types. However, the properties of single-cell datasets, such as high dropout events, noise, and high dimensionality, are still a research challenge in the single-cell field. To utilize single-cell data more efficiently and to better explore the heterogeneity among cells, a new graph autoencoder (GAE)-based consensus-guided model (scGAC) is proposed in this article. The data are preprocessed into multiple top-level feature datasets. Then, feature learning is performed by using GAEs to generate new feature matrices, followed by similarity learning based on distance fusion methods. The learned similarity matrices are fed back to the GAEs to guide their feature learning process. Finally, the abovementioned steps are iterated continuously to integrate the final consistent similarity matrix and perform other related downstream analyses. The scGAC model can accurately identify critical features and effectively preserve the internal structure of the data. This can further improve the accuracy of cell type identification.
Collapse
|
15
|
Rubab S, Khan MA, Hamza A, Albarakati HM, Saidani O, Alshardan A, Alasiry A, Marzougui M, Nam Y. A Novel Network-Level Fusion Architecture of Proposed Self-Attention and Vision Transformer Models for Land Use and Land Cover Classification From Remote Sensing Images. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2024; 17:13135-13148. [DOI: 10.1109/jstars.2024.3426950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Affiliation(s)
- Saddaf Rubab
- Department of Computer Engineering, College of Computing and Informatics, University of Sharjah, Sharjah, UAE
| | | | - Ameer Hamza
- Department of Computer Science, HITEC University, Taxila, Pakistan
| | - Hussain Mobarak Albarakati
- Department of Computer and Network Engineering, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Amal Alshardan
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Areej Alasiry
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Mehrez Marzougui
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Yunyoung Nam
- Department of ICT Convergence, Soonchunhyang University, Asan, South Korea
| |
Collapse
|
16
|
Zhang Z, Mi X, Yang J, Wei X, Liu Y, Yan J, Liu P, Gu X, Yu T. Remote Sensing Image Scene Classification in Hybrid Classical-Quantum Transferring CNN with Small Samples. SENSORS (BASEL, SWITZERLAND) 2023; 23:8010. [PMID: 37766063 PMCID: PMC10537394 DOI: 10.3390/s23188010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/04/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023]
Abstract
The scope of this research lies in the combination of pre-trained Convolutional Neural Networks (CNNs) and Quantum Convolutional Neural Networks (QCNN) in application to Remote Sensing Image Scene Classification(RSISC). Deep learning (RL) is improving by leaps and bounds pretrained CNNs in Remote Sensing Image (RSI) analysis, and pre-trained CNNs have shown remarkable performance in remote sensing image scene classification (RSISC). Nonetheless, CNNs training require massive, annotated data as samples. When labeled samples are not sufficient, the most common solution is using pre-trained CNNs with a great deal of natural image datasets (e.g., ImageNet). However, these pre-trained CNNs require a large quantity of labelled data for training, which is often not feasible in RSISC, especially when the target RSIs have different imaging mechanisms from RGB natural images. In this paper, we proposed an improved hybrid classical-quantum transfer learning CNNs composed of classical and quantum elements to classify open-source RSI dataset. The classical part of the model is made up of a ResNet network which extracts useful features from RSI datasets. To further refine the network performance, a tensor quantum circuit is subsequently employed by tuning parameters on near-term quantum processors. We tested our models on the open-source RSI dataset. In our comparative study, we have concluded that the hybrid classical-quantum transferring CNN has achieved better performance than other pre-trained CNNs based RSISC methods with small training samples. Moreover, it has been proven that the proposed algorithm improves the classification accuracy while greatly decreasing the amount of model parameters and the sum of training data.
Collapse
Affiliation(s)
- Zhouwei Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Xiaofei Mi
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Jian Yang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Xiangqin Wei
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Yan Liu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Jian Yan
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Peizhuo Liu
- School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China;
| | - Xingfa Gu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Tao Yu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| |
Collapse
|
17
|
Guo N, Jiang M, Gao L, Tang Y, Han J, Chen X. CRABR-Net: A Contextual Relational Attention-Based Recognition Network for Remote Sensing Scene Objective. SENSORS (BASEL, SWITZERLAND) 2023; 23:7514. [PMID: 37687971 PMCID: PMC10490739 DOI: 10.3390/s23177514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/12/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023]
Abstract
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for high-resolution RSSOR either use only the feature map of the last layer or directly fuse the feature maps from various layers in the "summation" way, which not only ignores the favorable relationship information between adjacent layers but also leads to redundancy and loss of feature map, which hinders the improvement of recognition accuracy. In this study, a contextual, relational attention-based recognition network (CRABR-Net) was presented, which extracts different convolutional feature maps from CNN, focuses important feature content by using a simple, parameter-free attention module (SimAM), fuses the adjacent feature maps by using the complementary relationship feature map calculation, improves the feature learning ability by using the enhanced relationship feature map calculation, and finally uses the concatenated feature maps from different layers for RSSOR. Experimental results show that CRABR-Net exploits the relationship between the different CNN layers to improve recognition performance, achieves better results compared to several state-of-the-art algorithms, and the average accuracy on AID, UC-Merced, and RSSCN7 can be up to 96.46%, 99.20%, and 95.43% with generic training ratios.
Collapse
Affiliation(s)
- Ningbo Guo
- Space Information Academic, Space Engineering University, Beijing 101407, China; (N.G.)
| | - Mingyong Jiang
- Space Information Academic, Space Engineering University, Beijing 101407, China; (N.G.)
| | - Lijing Gao
- State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China
| | - Yizhuo Tang
- Space Information Academic, Space Engineering University, Beijing 101407, China; (N.G.)
| | - Jinwei Han
- Space Information Academic, Space Engineering University, Beijing 101407, China; (N.G.)
| | - Xiangning Chen
- Space Information Academic, Space Engineering University, Beijing 101407, China; (N.G.)
| |
Collapse
|
18
|
Liu S, Li H, Chen J, Li S, Song L, Zhang G, Hu B. Adaptive convolution kernel network for change detection in hyperspectral images. APPLIED OPTICS 2023; 62:2039-2047. [PMID: 37133091 DOI: 10.1364/ao.479955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Feature extraction is a key step in hyperspectral image change detection. However, many targets with great various sizes, such as narrow paths, wide rivers, and large tracts of cultivated land, can appear in a satellite remote sensing image at the same time, which will increase the difficulty of feature extraction. In addition, the phenomenon that the number of changed pixels is much less than unchanged pixels will lead to class imbalance and affect the accuracy of change detection. To address the above issues, based on the U-Net model, we propose an adaptive convolution kernel structure to replace the original convolution operations and design a weight loss function in the training stage. The adaptive convolution kernel contains two various kernel sizes and can automatically generate their corresponding weight feature map during training. Each output pixel obtains the corresponding convolution kernel combination according to the weight. This structure of automatically selecting the size of the convolution kernel can effectively adapt to different sizes of targets and extract multi-scale spatial features. The modified cross-entropy loss function solves the problem of class imbalance by increasing the weight of changed pixels. Study results on four datasets indicate that the proposed method performs better than most existing methods.
Collapse
|
19
|
Ning H, Lei T, An M, Sun H, Hu Z, Nandi AK. Scale‐wise interaction fusion and knowledge distillation network for aerial scene recognition. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Affiliation(s)
- Hailong Ning
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Tao Lei
- School of Electronic Information and Artificial Intelligence Shaanxi University of Science and Technology Xi'an China
| | - Mengyuan An
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Hao Sun
- School of Computer Central China Normal University Wuhan China
| | - Zhanxuan Hu
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Asoke K. Nandi
- Department of Electronic and Electrical Engineering Brunel University London London UK
- Xi'an Jiaotong University Xi'an China
| |
Collapse
|
20
|
Yang Y, Tang X, Cheung YM, Zhang X, Jiao L. SAGN: Semantic-Aware Graph Network for Remote Sensing Scene Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1011-1025. [PMID: 37021983 DOI: 10.1109/tip.2023.3238310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The scene classification of remote sensing (RS) images plays an essential role in the RS community, aiming to assign the semantics to different RS scenes. With the increase of spatial resolution of RS images, high-resolution RS (HRRS) image scene classification becomes a challenging task because the contents within HRRS images are diverse in type, various in scale, and massive in volume. Recently, deep convolution neural networks (DCNNs) provide the promising results of the HRRS scene classification. Most of them regard HRRS scene classification tasks as single-label problems. In this way, the semantics represented by the manual annotation decide the final classification results directly. Although it is feasible, the various semantics hidden in HRRS images are ignored, thus resulting in inaccurate decision. To overcome this limitation, we propose a semantic-aware graph network (SAGN) for HRRS images. SAGN consists of a dense feature pyramid network (DFPN), an adaptive semantic analysis module (ASAM), a dynamic graph feature update module, and a scene decision module (SDM). Their function is to extract the multi-scale information, mine the various semantics, exploit the unstructured relations between diverse semantics, and make the decision for HRRS scenes, respectively. Instead of transforming single-label problems into multi-label issues, our SAGN elaborates the proper methods to make full use of diverse semantics hidden in HRRS images to accomplish scene classification tasks. The extensive experiments are conducted on three popular HRRS scene data sets. Experimental results show the effectiveness of the proposed SAGN. Our source codes are available at https://github.com/TangXu-Group/SAGN.
Collapse
|
21
|
Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14143247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Remote sensing image classification is a prominent topic in earth observation research, but there is a performance bottleneck when classifying single-source objects. As the types of remote sensing data are gradually diversified, the joint classification of multi-source remote sensing data becomes possible. However, the existing classification methods have limitations in heterogeneous feature representation of multimodal remote sensing data, which restrict the collaborative classification performance. To resolve this issue, a position-channel collaborative attention network is proposed for the joint classification of hyperspectral and LiDAR data. Firstly, in order to extract the spatial, spectral, and elevation features of land cover objects, a multiscale network and a single-branch backbone network are designed. Then, the proposed position-channel collaborative attention module adaptively enhances the features extracted from the multi-scale network in different degrees through the self-attention module, and exploits the features extracted from the multiscale network and single-branch network through the cross-attention module, so as to capture the comprehensive features of HSI and LiDAR data, narrow the semantic differences of heterogeneous features, and realize complementary advantages. The depth intersection mode further improves the performance of collaborative classification. Finally, a series of comparative experiments were carried out in the 2012 Houston dataset and Trento dataset, and the effectiveness of the model was proved by qualitative and quantitative comparison.
Collapse
|
22
|
Knowledge Distillation of Grassmann Manifold Network for Remote Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13224537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Due to device limitations, small networks are necessary for some real-world scenarios, such as satellites and micro-robots. Therefore, the development of a network with both good performance and small size is an important area of research. Deep networks can learn well from large amounts of data, while manifold networks have outstanding feature representation at small sizes. In this paper, we propose an approach that exploits the advantages of deep networks and shallow Grassmannian manifold networks. Inspired by knowledge distillation, we use the information learned from convolutional neural networks to guide the training of the manifold networks. Our approach leads to a reduction in model size, which addresses the problem of deploying deep learning on resource-limited embedded devices. Finally, a series of experiments were conducted on four remote sensing scene classification datasets. The method in this paper improved the classification accuracy by 2.31% and 1.73% on the UC Merced Land Use and SIRIWHU datasets, respectively, and the experimental results demonstrate the effectiveness of our approach.
Collapse
|