1
|
Lian J, Wang L, Sun H, Huang H. GT-HAD: Gated Transformer for Hyperspectral Anomaly Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3631-3645. [PMID: 38347690 DOI: 10.1109/tnnls.2024.3355166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Hyperspectral anomaly detection (HAD) aims to distinguish between the background and anomalies in a scene, which has been widely adopted in various applications. Deep neural network (DNN)-based methods have emerged as the predominant solution, wherein the standard paradigm is to discern the background and anomalies based on the error of self-supervised hyperspectral image (HSI) reconstruction. However, current DNN-based methods cannot guarantee correspondence between the background, anomalies, and reconstruction error, which limits the performance of HAD. In this article, we propose a novel gated transformer network for HAD (GT-HAD). Our key observation is that the spatial-spectral similarity in HSI can effectively distinguish between the background and anomalies, which aligns with the fundamental definition of HAD. Consequently, we develop GT-HAD to exploit the spatial-spectral similarity during HSI reconstruction. GT-HAD consists of two distinct branches that model the features of the background and anomalies, respectively, with content similarity as constraints. Furthermore, we introduce an adaptive gating unit to regulate the activation states of these two branches based on a content-matching method (CMM). Extensive experimental results demonstrate the superior performance of GT-HAD. The original code is publicly available at https://github.com/jeline0110/ GT-HAD, along with a comprehensive benchmark of state-of-the-art HAD methods.
Collapse
|
2
|
Liu Y, Li H, Hu C, Luo S, Luo Y, Chen CW. Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:595-609. [PMID: 38261502 DOI: 10.1109/tnnls.2023.3336563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
The task of instance segmentation in remote sensing images, aiming at performing per-pixel labeling of objects at the instance level, is of great importance for various civil applications. Despite previous successes, most existing instance segmentation methods designed for natural images encounter sharp performance degradations when they are directly applied to top-view remote sensing images. Through careful analysis, we observe that the challenges mainly come from the lack of discriminative object features due to severe scale variations, low contrasts, and clustered distributions. In order to address these problems, a novel context aggregation network (CATNet) is proposed to improve the feature extraction process. The proposed model exploits three lightweight plug-and-play modules, namely, dense feature pyramid network (DenseFPN), spatial context pyramid (SCP), and hierarchical region of interest extractor (HRoIE), to aggregate global visual context at feature, spatial, and instance domains, respectively. DenseFPN is a multi-scale feature propagation module that establishes more flexible information flows by adopting interlevel residual connections, cross-level dense connections, and feature reweighting strategy. Leveraging the attention mechanism, SCP further augments the features by aggregating global spatial context into local regions. For each instance, HRoIE adaptively generates RoI features for different downstream tasks. Extensive evaluations of the proposed scheme on iSAID, DIOR, NWPU VHR-10, and HRSID datasets demonstrate that the proposed approach outperforms state-of-the-arts under similar computational costs. Source code and pretrained models are available at https://github.com/yeliudev/CATNet.
Collapse
|
3
|
Zhang Y, Gao X, Duan Q, Leng J, Pu X, Gao X. Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18590-18604. [PMID: 37792649 DOI: 10.1109/tnnls.2023.3319363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, Transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution ( pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive Transformer grows quadratically with the increase in image size, which prevents Transformer-based models from VHR RS image ( pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and, therefore, propose an efficient complex SA (CSA) mechanism. Benefiting from the conjugated symmetric property of DFT, CSA is capable to model the high-order contextual information with less than half computations of naive SA. To overcome the gradient explosion in Fourier complex field, we replace the Softmax function with the carefully designed Logmax function to normalize the attention map of CSA and stabilize the gradient propagation. By stacking various layers of CSA blocks, we propose the Fourier complex Transformer (FCT) model to learn global contextual information from VHR aerial images following the hierarchical manners. Universal experiments conducted on commonly used RS classification datasets demonstrate the effectiveness and efficiency of FCT, especially on VHR RS images. The source code of FCT will be available at https://github.com/Gao-xiyuan/FCT.
Collapse
|
4
|
Chen J, Jiao L, Liu X, Liu F, Li L, Yang S. Multiresolution Interpretable Contourlet Graph Network for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17716-17729. [PMID: 37747859 DOI: 10.1109/tnnls.2023.3307721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
Modeling contextual relationships in images as graph inference is an interesting and promising research topic. However, existing approaches only perform graph modeling of entities, ignoring the intrinsic geometric features of images. To overcome this problem, a novel multiresolution interpretable contourlet graph network (MICGNet) is proposed in this article. MICGNet delicately balances graph representation learning with the multiscale and multidirectional features of images, where contourlet is used to capture the hyperplanar directional singularities of images and multilevel sparse contourlet coefficients are encoded into graph for further graph representation learning. This process provides interpretable theoretical support for optimizing the model structure. Specifically, first, the superpixel-based region graph is constructed. Then, the region graph is applied to code the nonsubsampled contourlet transform (NSCT) coefficients of the image, which are considered as node features. Considering the statistical properties of the NSCT coefficients, we calculate the node similarity, i.e., the adjacency matrix, using Mahalanobis distance. Next, graph convolutional networks (GCNs) are employed to further learn more abstract multilevel NSCT-enhanced graph representations. Finally, the learnable graph assignment matrix is designed to get the geometric association representations, which accomplish the assignment of graph representations to grid feature maps. We conduct comparative experiments on six publicly available datasets, and the experimental analysis shows that MICGNet is significantly more effective and efficient than other algorithms of recent years.
Collapse
|
5
|
Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L. Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17935-17949. [PMID: 37672374 DOI: 10.1109/tnnls.2023.3309889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In recent years, object localization and detection methods in remote sensing images (RSIs) have received increasing attention due to their broad applications. However, most previous fully supervised methods require a large number of time-consuming and labor-intensive instance-level annotations. Compared with those fully supervised methods, weakly supervised object localization (WSOL) aims to recognize object instances using only image-level labels, which greatly saves the labeling costs of RSIs. In this article, we propose a self-directed weakly supervised strategy (SD-WSS) to perform WSOL in RSIs. To specify, we fully exploit and enhance the spatial feature extraction capability of the RSIs' classification model to accurately localize the objects of interest. To alleviate the serious discriminative region problem exhibited by previous WSOL methods, the spatial location information implicit in the classification model is carefully extracted by GradCAM++ to guide the learning procedure. Furthermore, to eliminate the interference from complex backgrounds of RSIs, we design a novel self-directed loss to make the model optimize itself and explicitly tell it where to look. Finally, we review and annotate the existing remote sensing scene classification dataset and create two new WSOL benchmarks in RSIs, named C45V2 and PN2. We conduct extensive experiments to evaluate the proposed method and six mainstream WSOL methods with three backbones on C45V2 and PN2. The results demonstrate that our proposed method achieves better performance when compared with state-of-the-arts.
Collapse
|
6
|
Liu Y, Zhang J. Deep and shallow feature fusion framework for remote sensing open pit coal mine scene recognition. Sci Rep 2024; 14:24124. [PMID: 39406759 PMCID: PMC11480329 DOI: 10.1038/s41598-024-72855-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 09/11/2024] [Indexed: 10/19/2024] Open
Abstract
Understanding land use and damage in open-pit coal mining areas is crucial for effective scientific oversight and management. Current recognition methods exhibit limitations: traditional approaches depend on manually designed features, which offer limited expressiveness, whereas deep learning techniques are heavily reliant on sample data. In order to overcome the aforementioned limitations, a three-branch feature extraction framework was proposed in the present study. The proposed framework effectively fuses deep features (DF) and shallow features (SF), and can accomplish scene recognition tasks with high accuracy and fewer samples. Deep features are enhanced through a neighbouring feature attention module and a Graph Convolutional Network (GCN) module, which capture both neighbouring features and the correlation between local scene information. Shallow features are extracted using the Gray-Level Co-occurrence Matrix (GLCM) and Gabor filters, which respectively capture local and overall texture variations. Evaluation results on the AID and RSSCN7 datasets demonstrate that the proposed deep feature extraction model achieved classification accuracies of 97.53% and 96.73%, respectively, indicating superior performance in deep feature extraction tasks. Finally, the two kinds of features were fused and input into the particle swarm algorithm optimised support vector machine (PSO-SVM) to classify the scenes of remote sensing images, and the classification accuracy reached 92.78%, outperforming four other classification methods.
Collapse
Affiliation(s)
- Yang Liu
- School of Mining Engineering, Taiyuan University of Technology, Shanxi, Taiyuan, China
| | - Jin Zhang
- School of Mining Engineering, Taiyuan University of Technology, Shanxi, Taiyuan, China.
| |
Collapse
|
7
|
Xing C, Zhao J, Wang Z, Wang M. Deep Ring-Block-Wise Network for Hyperspectral Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14125-14137. [PMID: 37220048 DOI: 10.1109/tnnls.2023.3274745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Deep learning has achieved many successes in the field of the hyperspectral image (HSI) classification. Most of existing deep learning-based methods have no consideration of feature distribution, which may yield lowly separable and discriminative features. From the perspective of spatial geometry, one excellent feature distribution form requires to satisfy both properties, i.e., block and ring. The block means that in a feature space, the distance of intraclass samples is close and the one of interclass samples is far. The ring represents that all class samples are overall distributed in a ring topology. Accordingly, in this article, we propose a novel deep ring-block-wise network (DRN) for the HSI classification, which takes full consideration of feature distribution. To obtain the good distribution used for high classification performance, in this DRN, a ring-block perception (RBP) layer is built by integrating the self-representation and ring loss into a perception model. By such way, the exported features are imposed to follow the requirements of both block and ring, so as to be more separably and discriminatively distributed compared with traditional deep networks. Besides, we also design an optimization strategy with alternating update to obtain the solution of this RBP layer model. Extensive results on the Salinas, Pavia Centre, Indian Pines, and Houston datasets have demonstrated that the proposed DRN method achieves the better classification performance in contrast to the state-of-the-art approaches.
Collapse
|
8
|
Dai W, Shi F, Wang X, Xu H, Yuan L, Wen X. A multi-scale dense residual correlation network for remote sensing scene classification. Sci Rep 2024; 14:22197. [PMID: 39333732 PMCID: PMC11437181 DOI: 10.1038/s41598-024-73252-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/16/2024] [Indexed: 09/29/2024] Open
Abstract
Most existing scene classification methods based on remote sensing images tend to ignore important interactive information at different levels in the image. We propose an effective remote sensing scene classification method named multi-scale dense residual correlation network. The method is divided into three parts. First, the multi-stream feature extraction module is introduced which effectively utilizes features at different scales to extract different levels of information. Secondly, the dense residual connected feature fusion technology is proposed, which allows for a wide range of feature fusion. The Correlation Attention Module learn feature representations at multiple levels. This improves classification performance. The method outperforms existing algorithms in terms of effectiveness and accuracy, achieving state-of-the-art results on widely used remote sensing scene classification benchmarks.
Collapse
Affiliation(s)
- Wei Dai
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China
| | - Furong Shi
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China
| | - Xinyu Wang
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China
| | - Haixia Xu
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China
| | - Liming Yuan
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China
| | - Xianbin Wen
- Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, 300384, China.
- Ministry of Education, Key Laboratory of Computer Vision and System, Tianjin, 300384, China.
| |
Collapse
|
9
|
Wan Y, Zhong Y, Ma A, Wang J, Zhang L. E2SCNet: Efficient Multiobjective Evolutionary Automatic Search for Remote Sensing Image Scene Classification Network Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7752-7766. [PMID: 36395135 DOI: 10.1109/tnnls.2022.3220699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Remote sensing image scene classification methods based on deep learning have been widely studied and discussed. However, most of the network architectures are directly reliant on natural image processing methods and are fixed. A few studies have focused on automatic search mechanisms, but they cannot weigh the interpretation accuracy and the parameter quantity for practical application. As a result, automatic global search methods based on multiobjective evolutionary computation have more advantages. However, in the ranking process, the network individuals with large parameter quantities are easy to eliminate, but a higher accuracy may be obtained after full training. In addition, evolutionary neural architecture search methods often take several days. In this article, in order to solve the above concerns, we propose an efficient multiobjective evolutionary automatic search framework for remote sensing image scene classification deep learning network architectures (E2SCNet). In E2SCNet, eight kinds of lightweight operators are used to build a diversified search space, and the coding connection mode is flexible. In the search process, a large model retention mechanism is implemented through two-step multiobjective modeling and evolutionary search, where one step involves the "parameter quantity and accuracy," and the other step involves the "parameter quantity and accuracy growth quantity." Moreover, a super network is constructed to share the weight in the process of individual network evaluation and promote the search speed. The effectiveness of E2SCNet is proven by comparison with several networks designed by human experts and networks obtained by gradient and evolutionary computing-based search methods.
Collapse
|
10
|
Huyan N, Zhang X, Quan D, Chanussot J, Jiao L. AUD-Net: A Unified Deep Detector for Multiple Hyperspectral Image Anomaly Detection via Relation and Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6835-6849. [PMID: 36301787 DOI: 10.1109/tnnls.2022.3213023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article addresses the problem of the building an out-of-the-box deep detector, motivated by the need to perform anomaly detection across multiple hyperspectral images (HSIs) without repeated training. To solve this challenging task, we propose a unified detector [anomaly detection network (AUD-Net)] inspired by few-shot learning. The crucial issues solved by AUD-Net include: how to improve the generalization of the model on various HSIs that contain different categories of land cover; and how to unify the different spectral sizes between HSIs. To achieve this, we first build a series of subtasks to classify the relations between the center and its surroundings in the dual window. Through relation learning, AUD-Net can be more easily generalized to unseen HSIs, as the relations of the pixel pairs are shared among different HSIs. Secondly, to handle different HSIs with various spectral sizes, we propose a pooling layer based on the vector of local aggregated descriptors, which maps the variable-sized features to the same space and acquires the fixed-sized relation embeddings. To determine whether the center of the dual window is an anomaly, we build a memory model by the transformer, which integrates the contextual relation embeddings in the dual window and estimates the relation embeddings of the center. By computing the feature difference between the estimated relation embeddings of the centers and the corresponding real ones, the centers with large differences will be detected as anomalies, as they are more difficult to be estimated by the corresponding surroundings. Extensive experiments on both the simulation dataset and 13 real HSIs demonstrate that this proposed AUD-Net has strong generalization for various HSIs and achieves significant advantages over the specific-trained detectors for each HSI.
Collapse
|
11
|
Yuan X, Zhu J, Lei H, Peng S, Wang W, Li X. Duplex-Hierarchy Representation Learning for Remote Sensing Image Classification. SENSORS (BASEL, SWITZERLAND) 2024; 24:1130. [PMID: 38400288 PMCID: PMC10892595 DOI: 10.3390/s24041130] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/03/2024] [Accepted: 02/04/2024] [Indexed: 02/25/2024]
Abstract
Remote sensing image classification (RSIC) is designed to assign specific semantic labels to aerial images, which is significant and fundamental in many applications. In recent years, substantial work has been conducted on RSIC with the help of deep learning models. Even though these models have greatly enhanced the performance of RSIC, the issues of diversity in the same class and similarity between different classes in remote sensing images remain huge challenges for RSIC. To solve these problems, a duplex-hierarchy representation learning (DHRL) method is proposed. The proposed DHRL method aims to explore duplex-hierarchy spaces, including a common space and a label space, to learn discriminative representations for RSIC. The proposed DHRL method consists of three main steps: First, paired images are fed to a pretrained ResNet network for extracting the corresponding features. Second, the extracted features are further explored and mapped into a common space for reducing the intra-class scatter and enlarging the inter-class separation. Third, the obtained representations are used to predict the categories of the input images, and the discrimination loss in the label space is minimized to further promote the learning of discriminative representations. Meanwhile, a confusion score is computed and added to the classification loss for guiding the discriminative representation learning via backpropagation. The comprehensive experimental results show that the proposed method is superior to the existing state-of-the-art methods on two challenging remote sensing image scene datasets, demonstrating that the proposed method is significantly effective.
Collapse
Affiliation(s)
- Xiaobin Yuan
- The School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- The Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
| | - Jingping Zhu
- The School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hao Lei
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi’an Jiaotong University, Xi’an 710049, China
- Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China
| | - Shengjun Peng
- The State Key Laboratory of Astronautic Dynamics, China Xi’an Satellite Control Center, Xi’an 710043, China
| | | | - Xiaobin Li
- The Beijing Institute of Remote Sensing Information, Beijing 100192, China
| |
Collapse
|
12
|
Xing C, Cong Y, Duan C, Wang Z, Wang M. Deep Network With Irregular Convolutional Kernels and Self-Expressive Property for Classification of Hyperspectral Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10747-10761. [PMID: 35560082 DOI: 10.1109/tnnls.2022.3171324] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article presents a novel deep network with irregular convolutional kernels and self-expressive property (DIKS) for the classification of hyperspectral images (HSIs). Specifically, we use the principal component analysis (PCA) and superpixel segmentation to obtain a series of irregular patches, which are regarded as convolutional kernels of our network. With such kernels, the feature maps of HSIs can be adaptively computed to well describe the characteristics of each object class. After multiple convolutional layers, features exported by all convolution operations are combined into a stacked form with both shallow and deep features. These stacked features are then clustered by introducing the self-expression theory to produce final features. Unlike most traditional deep learning approaches, the DIKS method has the advantage of self-adaptability to the given HSI due to building irregular kernels. In addition, this proposed method does not require any training operations for feature extraction. Because of using both shallow and deep features, the DIKS has the advantage of being multiscale. Due to introducing self-expression, the DIKS method can export more discriminative features for HSI classification. Extensive experimental results are provided to validate that our method achieves better classification performance compared with state-of-the-art algorithms.
Collapse
|
13
|
Zhang Z, Mi X, Yang J, Wei X, Liu Y, Yan J, Liu P, Gu X, Yu T. Remote Sensing Image Scene Classification in Hybrid Classical-Quantum Transferring CNN with Small Samples. SENSORS (BASEL, SWITZERLAND) 2023; 23:8010. [PMID: 37766063 PMCID: PMC10537394 DOI: 10.3390/s23188010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/04/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023]
Abstract
The scope of this research lies in the combination of pre-trained Convolutional Neural Networks (CNNs) and Quantum Convolutional Neural Networks (QCNN) in application to Remote Sensing Image Scene Classification(RSISC). Deep learning (RL) is improving by leaps and bounds pretrained CNNs in Remote Sensing Image (RSI) analysis, and pre-trained CNNs have shown remarkable performance in remote sensing image scene classification (RSISC). Nonetheless, CNNs training require massive, annotated data as samples. When labeled samples are not sufficient, the most common solution is using pre-trained CNNs with a great deal of natural image datasets (e.g., ImageNet). However, these pre-trained CNNs require a large quantity of labelled data for training, which is often not feasible in RSISC, especially when the target RSIs have different imaging mechanisms from RGB natural images. In this paper, we proposed an improved hybrid classical-quantum transfer learning CNNs composed of classical and quantum elements to classify open-source RSI dataset. The classical part of the model is made up of a ResNet network which extracts useful features from RSI datasets. To further refine the network performance, a tensor quantum circuit is subsequently employed by tuning parameters on near-term quantum processors. We tested our models on the open-source RSI dataset. In our comparative study, we have concluded that the hybrid classical-quantum transferring CNN has achieved better performance than other pre-trained CNNs based RSISC methods with small training samples. Moreover, it has been proven that the proposed algorithm improves the classification accuracy while greatly decreasing the amount of model parameters and the sum of training data.
Collapse
Affiliation(s)
- Zhouwei Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Xiaofei Mi
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Jian Yang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Xiangqin Wei
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Yan Liu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Jian Yan
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Peizhuo Liu
- School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China;
| | - Xingfa Gu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| | - Tao Yu
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; (Z.Z.); (X.M.); (X.W.); (Y.L.); (J.Y.); (T.Y.)
- National Engineering Laboratory for Satellite Remote Sensing Applications, Beijing 100094, China
| |
Collapse
|
14
|
Yang S, Wang H, Gao H, Zhang L. Few-shot remote sensing scene classification based on multi subband deep feature fusion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:12889-12907. [PMID: 37501471 DOI: 10.3934/mbe.2023575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Recently, convolutional neural networks (CNNs) have performed well in object classification and object recognition. However, due to the particularity of geographic data, the labeled samples are seriously insufficient, which limits the practical application of CNN methods in remote sensing (RS) image processing. To address the problem of small sample RS image classification, a discrete wavelet-based multi-level deep feature fusion method is proposed. First, the deep features are extracted from the RS images using pre-trained deep CNNs and discrete wavelet transform (DWT) methods. Next, a modified discriminant correlation analysis (DCA) approach is proposed to distinguish easily confused categories effectively, which is based on the distance coefficient of between-class. The proposed approach can effectively integrate the deep feature information of various frequency bands. Thereby, the proposed method obtains the low-dimensional features with good discrimination, which is demonstrated through experiments on four benchmark datasets. Compared with several state-of-the-art methods, the proposed method achieves outstanding performance under limited training samples, especially one or two training samples per class.
Collapse
Affiliation(s)
- Song Yang
- College of Computer and Information, Hohai University, Nanjing 211100, China
- Faculty of Electronic Information Engineering, Huaiyin Institute of Technology, Huaian 223001, China
| | - Huibin Wang
- College of Computer and Information, Hohai University, Nanjing 211100, China
| | - Hongmin Gao
- College of Computer and Information, Hohai University, Nanjing 211100, China
| | - Lili Zhang
- College of Computer and Information, Hohai University, Nanjing 211100, China
| |
Collapse
|
15
|
Zhao Q, Lyu S, Li Y, Ma Y, Chen L. MGML: Multigranularity Multilevel Feature Ensemble Network for Remote Sensing Scene Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2308-2322. [PMID: 34469317 DOI: 10.1109/tnnls.2021.3106391] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Remote sensing (RS) scene classification is a challenging task to predict scene categories of RS images. RS images have two main issues: large intraclass variance caused by large resolution variance and confusing information from large geographic covering area. To ease the negative influence from the above two issues. We propose a multigranularity multilevel feature ensemble network (MGML-FENet) to efficiently tackle the RS scene classification task in this article. Specifically, we propose multigranularity multilevel feature fusion branch (MGML-FFB) to extract multigranularity features in different levels of network by channel-separate feature generator (CS-FG). To avoid the interference from confusing information, we propose a multigranularity multilevel feature ensemble module (MGML-FEM), which can provide diverse predictions by full-channel feature generator (FC-FG). Compared to previous methods, our proposed networks have the ability to use structure information and abundant fine-grained features. Furthermore, through the ensemble learning method, our proposed MGML-FENets can obtain more convincing final predictions. Extensive classification experiments on multiple RS datasets (AID, NWPU-RESISC45, UC-Merced, and VGoogle) demonstrate that our proposed networks achieve better performance than previous state-of-the-art (SOTA) networks. The visualization analysis also shows the good interpretability of MGML-FENet.
Collapse
|
16
|
Chen X, Zhu G, Liu M, Chen Z. Few-shot remote sensing image scene classification based on multiscale covariance metric network (MCMNet). Neural Netw 2023; 163:132-145. [PMID: 37044028 DOI: 10.1016/j.neunet.2023.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 03/07/2023] [Accepted: 04/02/2023] [Indexed: 04/08/2023]
Abstract
Few-shot learning (FSL) is a paradigm that simulates the fast learning ability of human beings, which can learn the feature differences between two groups of small-scale samples with common label space, and the label space of the training set and the test set is not repeated. By this way, it can quickly identify the categories of the unseen image in the test set. This method is widely used in image scene recognition, and it is expected to overcome difficulties of scarce annotated samples in remote sensing (RS). However, among most existing FSL methods, images were embed into Euclidean space, and the similarity between features at the last layer of deep network were measured by Euclidean distance. It is difficult to measure the inter-class similarity and intra-class difference of RS images. In this paper, we propose a multi-scale covariance network (MCMNet) for the application of remote sensing scene classification (RSSC). Taking Conv64F as the backbone, we mapped the features of the 1, 2, and 4 layers of the network to the manifold space by constructing a regional covariance matrix to form a covariance network with different scales. For each layer of features, we introduce the center in manifold space as a prototype for different categories of features. We simultaneously measure the similarity of three prototypes on the manifold space with different scales to form three loss functions and optimize the whole network by episodic training strategy. We conducted comparative experiments on three public datasets. The results show that the classification accuracy (CA) of our proposed method is from 1.35 % to 2.36% higher than that of the most excellent method, which demonstrates that the performance of MCMNet outperforms other methods.
Collapse
|
17
|
Ning H, Lei T, An M, Sun H, Hu Z, Nandi AK. Scale‐wise interaction fusion and knowledge distillation network for aerial scene recognition. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Affiliation(s)
- Hailong Ning
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Tao Lei
- School of Electronic Information and Artificial Intelligence Shaanxi University of Science and Technology Xi'an China
| | - Mengyuan An
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Hao Sun
- School of Computer Central China Normal University Wuhan China
| | - Zhanxuan Hu
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Asoke K. Nandi
- Department of Electronic and Electrical Engineering Brunel University London London UK
- Xi'an Jiaotong University Xi'an China
| |
Collapse
|
18
|
Tang C, Zheng X, Tang C. Adaptive Discriminative Regions Learning Network for Remote Sensing Scene Classification. SENSORS (BASEL, SWITZERLAND) 2023; 23:773. [PMID: 36679569 PMCID: PMC9865113 DOI: 10.3390/s23020773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 01/02/2023] [Accepted: 01/05/2023] [Indexed: 06/17/2023]
Abstract
As an auxiliary means of remote sensing (RS) intelligent interpretation, remote sensing scene classification (RSSC) attracts considerable attention and its performance has been improved significantly by the popular deep convolutional neural networks (DCNNs). However, there are still several challenges that hinder the practical applications of RSSC, such as complex composition of land cover, scale-variation of objects, and redundant and noisy areas for scene classification. In order to mitigate the impact of these issues, we propose an adaptive discriminative regions learning network for RSSC, referred as ADRL-Net briefly, which locates discriminative regions effectively for boosting the performance of RSSC by utilizing a novel self-supervision mechanism. Our proposed ADRL-Net consists of three main modules, including a discriminative region generator, a region discriminator, and a region scorer. Specifically, the discriminative region generator first generates some candidate regions which could be informative for RSSC. Then, the region discriminator evaluates the regions generated by region generator and provides feedback for the generator to update the informative regions. Finally, the region scorer makes prediction scores for the whole image by using the discriminative regions. In such a manner, the three modules of ADRL-Net can cooperate with each other and focus on the most informative regions of an image and reduce the interference of redundant regions for final classification, which is robust to the complex scene composition, object scales, and irrelevant information. In order to validate the efficacy of the proposed network, we conduct experiments on four widely used benchmark datasets, and the experimental results demonstrate that ADRL-Net consistently outperforms other state-of-the-art RSSC methods.
Collapse
Affiliation(s)
- Chuan Tang
- School of Computer Science, China University of Geosciences, No. 68 Jincheng Road, Wuhan 430078, China
| | - Xiao Zheng
- School of Computer, National University of Defense Technology, Deya Road, Changsha 410073, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, No. 68 Jincheng Road, Wuhan 430078, China
| |
Collapse
|
19
|
Dai Y, Song W, Li Y, Stefano LD. Feature disentangling and reciprocal learning with label-guided similarity for multi-label image retrieval. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Xu K, Huang H, Deng P, Li Y. Deep Feature Aggregation Framework Driven by Graph Convolutional Network for Scene Classification in Remote Sensing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5751-5765. [PMID: 33857002 DOI: 10.1109/tnnls.2021.3071369] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Scene classification of high spatial resolution (HSR) images can provide data support for many practical applications, such as land planning and utilization, and it has been a crucial research topic in the remote sensing (RS) community. Recently, deep learning methods driven by massive data show the impressive ability of feature learning in the field of HSR scene classification, especially convolutional neural networks (CNNs). Although traditional CNNs achieve good classification results, it is difficult for them to effectively capture potential context relationships. The graphs have powerful capacity to represent the relevance of data, and graph-based deep learning methods can spontaneously learn intrinsic attributes contained in RS images. Inspired by the abovementioned facts, we develop a deep feature aggregation framework driven by graph convolutional network (DFAGCN) for the HSR scene classification. First, the off-the-shelf CNN pretrained on ImageNet is employed to obtain multilayer features. Second, a graph convolutional network-based model is introduced to effectively reveal patch-to-patch correlations of convolutional feature maps, and more refined features can be harvested. Finally, a weighted concatenation method is adopted to integrate multiple features (i.e., multilayer convolutional features and fully connected features) by introducing three weighting coefficients, and then a linear classifier is employed to predict semantic classes of query images. Experimental results performed on the UCM, AID, RSSCN7, and NWPU-RESISC45 data sets demonstrate that the proposed DFAGCN framework obtains more competitive performance than some state-of-the-art methods of scene classification in terms of OAs.
Collapse
|
21
|
A Lightweight Convolutional Neural Network Based on Hierarchical-Wise Convolution Fusion for Remote-Sensing Scene Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14133184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The large intra-class difference and inter-class similarity of scene images bring great challenges to the research of remote-sensing scene image classification. In recent years, many remote-sensing scene classification methods based on convolutional neural networks have been proposed. In order to improve the classification performance, many studies increase the width and depth of convolutional neural network to extract richer features, which increases the complexity of the model and reduces the running speed of the model. In order to solve this problem, a lightweight convolutional neural network based on hierarchical-wise convolution fusion (LCNN-HWCF) is proposed for remote-sensing scene image classification. Firstly, in the shallow layer of the neural network (groups 1–3), the proposed lightweight dimension-wise convolution (DWC) is utilized to extract the shallow features of remote-sensing images. Dimension-wise convolution is carried out in the three dimensions of width, depth and channel, and then, the convoluted features of the three dimensions are fused. Compared with traditional convolution, dimension-wise convolution has a lower number of parameters and computations. In the deep layer of the neural network (groups 4–7), the running speed of the network usually decreases due to the increase in the number of filters. Therefore, the hierarchical-wise convolution fusion module is designed to extract the deep features of remote-sensing images. Finally, the global average pooling layer, the fully connected layer and the Softmax function are used for classification. Using global average pooling before the fully connected layer can better preserve the spatial information of features. The proposed method achieves good classification results on UCM, RSSCN7, AID and NWPU datasets. The classification accuracy of the proposed LCNN-HWCF on the AID dataset (training:test = 2:8) and the NWPU dataset (training:test = 1:9), with great classification difficulty, reaches 95.76% and 94.53%, respectively. A series of experimental results show that compared with some state-of-the-art classification methods, the proposed method not only greatly reduces the number of network parameters but also ensures the classification accuracy and achieves a good trade-off between the model classification accuracy and running speed.
Collapse
|
22
|
Geographic Scene Understanding of High-Spatial-Resolution Remote Sensing Images: Methodological Trends and Current Challenges. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12126000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
As one of the primary means of Earth observation, high-spatial-resolution remote sensing images can describe the geometry, texture and structure of objects in detail. It has become a research hotspot to recognize the semantic information of objects, analyze the semantic relationship between objects and then understand the more abstract geographic scenes in high-spatial-resolution remote sensing images. Based on the basic connotation of geographic scene understanding of high-spatial-resolution remote sensing images, this paper firstly summarizes the keystones in geographic scene understanding, such as various semantic hierarchies, complex spatial structures and limited labeled samples. Then, the achievements in the processing strategies and techniques of geographic scene understanding in recent years are reviewed from three layers: visual semantics, object semantics and concept semantics. On this basis, the new challenges in the research of geographic scene understanding of high-spatial-resolution remote sensing images are analyzed, and future research prospects have been proposed.
Collapse
|
23
|
Triplet-Metric-Guided Multi-Scale Attention for Remote Sensing Image Scene Classification with a Convolutional Neural Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14122794] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Remote sensing image scene classification (RSISC) plays a vital role in remote sensing applications. Recent methods based on convolutional neural networks (CNNs) have driven the development of RSISC. However, these approaches are not adequate considering the contributions of different features to the global decision. In this paper, triplet-metric-guided multi-scale attention (TMGMA) is proposed to enhance task-related salient features and suppress task-unrelated salient and redundant features. Firstly, we design the multi-scale attention module (MAM) guided by multi-scale feature maps to adaptively emphasize salient features and simultaneously fuse multi-scale and contextual information. Secondly, to capture task-related salient features, we use the triplet metric (TM) to optimize the learning of MAM under the constraint that the distance of the negative pair is supposed to be larger than the distance of the positive pair. Notably, the MAM and TM collaboration can enforce learning a more discriminative model. As such, our TMGMA can avoid the classification confusion caused by only using the attention mechanism and the excessive correction of features caused by only using the metric learning. Extensive experiments demonstrate that our TMGMA outperforms the ResNet50 baseline by 0.47% on the UC Merced, 1.46% on the AID, and 1.55% on the NWPU-RESISC45 dataset, respectively, and achieves performance that is competitive with other state-of-the-art methods.
Collapse
|
24
|
Wang Q, Huang W, Xiong Z, Li X. Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1414-1428. [PMID: 33332278 DOI: 10.1109/tnnls.2020.3042276] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global-local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL.
Collapse
|
25
|
Remote Sensing Scene Image Classification Based on Self-Compensating Convolution Neural Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14030545] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, convolution neural networks (CNNs) have been widely used in the field of remote sensing scene image classification. However, CNN models with good classification performance tend to have high complexity, and CNN models with low complexity are difficult to obtain high classification accuracy. These models hardly achieve a good trade-off between classification accuracy and model complexity. To solve this problem, we made the following three improvements and proposed a lightweight modular network model. First, we proposed a lightweight self-compensated convolution (SCC). Although traditional convolution can effectively extract features from the input feature map, when there are a large number of filters (such as 512 or 1024 common filters), this process takes a long time. To speed up the network without increasing the computational load, we proposed a self-compensated convolution. The core idea of this convolution is to perform traditional convolution by reducing the number of filters, and then compensate the convoluted channels by input features. It incorporates shallow features into the deep and complex features, which helps to improve the speed and classification accuracy of the model. In addition, we proposed a self-compensating bottleneck module (SCBM) based on the self-compensating convolution. The wider channel shortcut in this module facilitates more shallow information to be transferred to the deeper layer and improves the feature extraction ability of the model. Finally, we used the proposed self-compensation bottleneck module to construct a lightweight and modular self-compensation convolution neural network (SCCNN) for remote sensing scene image classification. The network is built by reusing bottleneck modules with the same structure. A lot of experiments were carried out on six open and challenging remote sensing image scene datasets. The experimental results show that the classification performance of the proposed method is superior to some of the state-of-the-art classification methods with less parameters.
Collapse
|
26
|
Remote Sensing Scene Type Classification using Multi Trial Vector based Differential Evolution Algorithm and Multi Support Vector Machine Classifier. INTERNATIONAL JOURNAL OF E-COLLABORATION 2022. [DOI: 10.4018/ijec.301259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In recent decades, remote sensing scene type classification becomes a challenging task in remote sensing applications. In this paper, a new model is proposed for multi-class scene type classification in remote sensing images. Firstly, the aerial images are collected from the Aerial Image Dataset (AID), University of California Merced (UC Merced) and REmote Sensing Image Scene Classification 45 (RESISC45) datasets. Next, AlexNet, GoogLeNet, ResNet 18, and Visual Geometric Group (VGG) 19 models are used for extracting feature vectors from the collected aerial images. After feature extraction, the Multi-Trial vector based Differential Evolution (MTDE) algorithm is proposed to choose active feature vectors for better classification and to reduce system complexity and time consumption. The selected active features are fed to the Multi Support Vector Machine (MSVM) for final scene type classification. The simulation results showed that the proposed MTDE-MSVM model obtained high classification accuracy of 99.41%, 99.59% and 99.74% on RESISC45, AID and UC Merced datasets.
Collapse
|
27
|
A Lightweight Convolutional Neural Network Based on Group-Wise Hybrid Attention for Remote Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs14010161] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve classification performance on remote sensing scene images, the direct introduction of an attention module can increase the number of model parameters and amount of calculation, resulting in slower model operations. To solve this problem, we carried out the following work. First, a channel attention module and spatial attention module were constructed. The input features were enhanced through channel attention and spatial attention separately, and the features recalibrated by the attention modules were fused to obtain the features with hybrid attention. Then, to reduce the increase in parameters caused by the attention module, a group-wise hybrid attention module was constructed. The group-wise hybrid attention module divided the input features into four groups along the channel dimension, then used the hybrid attention mechanism to enhance the features in the channel and spatial dimensions for each group, then fused the features of the four groups along the channel dimension. Through the use of the group-wise hybrid attention module, the number of parameters and computational burden of the network were greatly reduced, and the running time of the network was shortened. Finally, a lightweight convolutional neural network was constructed based on the group-wise hybrid attention (LCNN-GWHA) for remote sensing scene image classification. Experiments on four open and challenging remote sensing scene datasets demonstrated that the proposed method has great advantages, in terms of classification accuracy, even with a very low number of parameters.
Collapse
|
28
|
A Lightweight Convolutional Neural Network Based on Channel Multi-Group Fusion for Remote Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs14010009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the development of remote sensing scene image classification, convolutional neural networks have become the most commonly used method in this field with their powerful feature extraction ability. In order to improve the classification performance of convolutional neural networks, many studies extract deeper features by increasing the depth and width of convolutional neural networks, which improves classification performance but also increases the complexity of the model. To solve this problem, a lightweight convolutional neural network based on channel multi-group fusion (LCNN-CMGF) is presented. For the proposed LCNN-CMGF method, a three-branch downsampling structure was designed to extract shallow features from remote sensing images. In the deep layer of the network, the channel multi-group fusion structure is used to extract the abstract semantic features of remote sensing scene images. The structure solves the problem of lack of information exchange between groups caused by group convolution through channel fusion of adjacent features. The four most commonly used remote sensing scene datasets, UCM21, RSSCN7, AID and NWPU45, were used to carry out a variety of experiments in this paper. The experimental results under the conditions of four datasets and multiple training ratios show that the proposed LCNN-CMGF method has more significant performance advantages than the compared advanced method.
Collapse
|
29
|
A Deformable Convolutional Neural Network with Spatial-Channel Attention for Remote Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13245076] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Remote sensing scene classification converts remote sensing images into classification information to support high-level applications, so it is a fundamental problem in the field of remote sensing. In recent years, many convolutional neural network (CNN)-based methods have achieved impressive results in remote sensing scene classification, but they have two problems in extracting remote sensing scene features: (1) fixed-shape convolutional kernels cannot effectively extract features from remote sensing scenes with complex shapes and diverse distributions; (2) the features extracted by CNN contain a large number of redundant and invalid information. To solve these problems, this paper constructs a deformable convolutional neural network to adapt the convolutional sampling positions to the shape of objects in the remote sensing scene. Meanwhile, the spatial and channel attention mechanisms are used to focus on the effective features while suppressing the invalid ones. The experimental results indicate that the proposed method is competitive to the state-of-the-art methods on three remote sensing scene classification datasets (UCM, NWPU, and AID).
Collapse
|
30
|
Chen SB, Wei QS, Wang WZ, Tang J, Luo B, Wang ZY. Remote Sensing Scene Classification via Multi-Branch Local Attention Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:99-109. [PMID: 34793302 DOI: 10.1109/tip.2021.3127851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Remote sensing scene classification (RSSC) is a hotspot and play very important role in the field of remote sensing image interpretation in recent years. With the recent development of the convolutional neural networks, a significant breakthrough has been made in the classification of remote sensing scenes. Many objects form complex and diverse scenes through spatial combination and association, which makes it difficult to classify remote sensing image scenes. The problem of insufficient differentiation of feature representations extracted by Convolutional Neural Networks (CNNs) still exists, which is mainly due to the characteristics of similarity for inter-class images and diversity for intra-class images. In this paper, we propose a remote sensing image scene classification method via Multi-Branch Local Attention Network (MBLANet), where Convolutional Local Attention Module (CLAM) is embedded into all down-sampling blocks and residual blocks of ResNet backbone. CLAM contains two submodules, Convolutional Channel Attention Module (CCAM) and Local Spatial Attention Module (LSAM). The two submodules are placed in parallel to obtain both channel and spatial attentions, which helps to emphasize the main target in the complex background and improve the ability of feature representation. Extensive experiments on three benchmark datasets show that our method is better than state-of-the-art methods.
Collapse
|
31
|
Review of Image Classification Algorithms Based on Convolutional Neural Networks. REMOTE SENSING 2021. [DOI: 10.3390/rs13224712] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Image classification has always been a hot research direction in the world, and the emergence of deep learning has promoted the development of this field. Convolutional neural networks (CNNs) have gradually become the mainstream algorithm for image classification since 2012, and the CNN architecture applied to other visual recognition tasks (such as object detection, object localization, and semantic segmentation) is generally derived from the network architecture in image classification. In the wake of these successes, CNN-based methods have emerged in remote sensing image scene classification and achieved advanced classification accuracy. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art (SOAT) network architectures. Along the way, we analyze (1) the basic structure of artificial neural networks (ANNs) and the basic network layers of CNNs, (2) the classic predecessor network models, (3) the recent SOAT network algorithms, (4) comprehensive comparison of various image classification methods mentioned in this article. Finally, we have also summarized the main analysis and discussion in this article, as well as introduce some of the current trends.
Collapse
|
32
|
Remote Sensing Image Scene Classification Based on Global Self-Attention Module. REMOTE SENSING 2021. [DOI: 10.3390/rs13224542] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The complexity of scene images makes the research on remote-sensing image scene classification challenging. With the wide application of deep learning in recent years, many remote-sensing scene classification methods using a convolutional neural network (CNN) have emerged. Current CNN usually output global information by integrating the depth features extricated from the convolutional layer through the fully connected layer; however, the global information extracted is not comprehensive. This paper proposes an improved remote-sensing image scene classification method based on a global self-attention module to address this problem. The global information is derived from the depth characteristics extracted by the CNN. In order to better express the semantic information of the remote-sensing image, the multi-head self-attention module is introduced for global information augmentation. Meanwhile, the local perception unit is utilized to improve the self-attention module’s representation capabilities for local objects. The proposed method’s effectiveness is validated through comparative experiments with various training ratios and different scales on public datasets (UC Merced, AID, and NWPU-NESISC45). The precision of our proposed model is significantly improved compared to other methods for remote-sensing image scene classification.
Collapse
|
33
|
Remote Sensing Scene Image Classification Based on Dense Fusion of Multi-level Features. REMOTE SENSING 2021. [DOI: 10.3390/rs13214379] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
For remote sensing scene image classification, many convolution neural networks improve the classification accuracy at the cost of the time and space complexity of the models. This leads to a slow running speed for the model and cannot realize a trade-off between the model accuracy and the model running speed. As the network deepens, it is difficult to extract the key features with a sample double branched structure, and it also leads to the loss of shallow features, which is unfavorable to the classification of remote sensing scene images. To solve this problem, we propose a dual branch multi-level feature dense fusion-based lightweight convolutional neural network (BMDF-LCNN). The network structure can fully extract the information of the current layer through 3 × 3 depthwise separable convolution and 1 × 1 standard convolution, identity branches, and fuse with the features extracted from the previous layer 1 × 1 standard convolution, thus avoiding the loss of shallow information due to network deepening. In addition, we propose a downsampling structure that is more suitable for extracting the shallow features of the network by using the pooled branch to downsample and the convolution branch to compensate for the pooled features. Experiments were carried out on four open and challenging remote sensing image scene data sets. The experimental results show that the proposed method has higher classification accuracy and lower model complexity than some state-of-the-art classification methods and realizes the trade-off between model accuracy and model running speed.
Collapse
|
34
|
DM-CTSA: a discriminative multi-focused and complementary temporal/spatial attention framework for action recognition. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05698-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
35
|
Dai Y, Li Y, Sun B, Liu LJ. Skip-connected network with gram matrix for product image retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
SAFFNet: Self-Attention-Based Feature Fusion Network for Remote Sensing Few-Shot Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13132532] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In real applications, it is necessary to classify new unseen classes that cannot be acquired in training datasets. To solve this problem, few-shot learning methods are usually adopted to recognize new categories with only a few (out-of-bag) labeled samples together with the known classes available in the (large-scale) training dataset. Unlike common scene classification images obtained by CCD (Charge-Coupled Device) cameras, remote sensing scene classification datasets tend to have plentiful texture features rather than shape features. Therefore, it is important to extract more valuable texture semantic features from a limited number of labeled input images. In this paper, a multi-scale feature fusion network for few-shot remote sensing scene classification is proposed by integrating a novel self-attention feature selection module, denoted as SAFFNet. Unlike a pyramidal feature hierarchy for object detection, the informative representations of the images with different receptive fields are automatically selected and re-weighted for feature fusion after refining network and global pooling operation for a few-shot remote sensing classification task. Here, the feature weighting value can be fine-tuned by the support set in the few-shot learning task. The proposed model is evaluated on three publicly available datasets for few shot remote sensing scene classification. Experimental results demonstrate the effectiveness of the proposed SAFFNet to improve the few-shot classification accuracy significantly compared to other few-shot methods and the typical multi-scale feature fusion network.
Collapse
|
37
|
Zhang L, Nie J, Wei W, Li Y, Zhang Y. Deep Blind Hyperspectral Image Super-Resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2388-2400. [PMID: 32639931 DOI: 10.1109/tnnls.2020.3005234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The production of a high spatial resolution (HR) hyperspectral image (HSI) through the fusion of a low spatial resolution (LR) HSI with an HR multispectral image (MSI) has underpinned much of the recent progress in HSI super-resolution. The premise of these signs of progress is that both the degeneration from the HR HSI to LR HSI in the spatial domain and the degeneration from the HR HSI to HR MSI in the spectral domain are assumed to be known in advance. However, such a premise is difficult to achieve in practice. To address this problem, we propose to incorporate degeneration estimation into HSI super-resolution and present an unsupervised deep framework for "blind" HSIs super-resolution where the degenerations in both domains are unknown. In this framework, we model the latent HR HSI and the unknown degenerations with deep network structures to regularize them instead of using handcrafted (or shallow) priors. Specifically, we generate the latent HR HSI with an image-specific generator network and structure the degenerations in spatial and spectral domains through a convolution layer and a fully connected layer, respectively. By doing this, the proposed framework can be formulated as an end-to-end deep network learning problem, which is purely supervised by those two input images (i.e., LR HSI and HR MSI) and can be effectively solved by the backpropagation algorithm. Experiments on both natural scene and remote sensing HSI data sets show the superior performance of the proposed method in coping with unknown degeneration either in the spatial domain, spectral domain, or even both of them.
Collapse
|
38
|
A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13101950] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.
Collapse
|
39
|
Ru L, Du B, Wu C. Multi-Temporal Scene Classification and Scene Change Detection With Correlation Based Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:1382-1394. [PMID: 33237858 DOI: 10.1109/tip.2020.3039328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Classifying multi-temporal scene land-use categories and detecting their semantic scene-level changes for remote sensing imagery covering urban regions could straightly reflect the land-use transitions. Existing methods for scene change detection rarely focus on the temporal correlation of bi-temporal features, and are mainly evaluated on small scale scene change detection datasets. In this work, we proposed a CorrFusion module that fuses the highly correlated components in bi-temporal feature embeddings. We first extract the deep representations of the bi-temporal inputs with deep convolutional networks. Then the extracted features will be projected into a lower-dimensional space to extract the most correlated components and compute the instance-level correlation. The cross-temporal fusion will be performed based on the computed correlation in CorrFusion module. The final scene classification results are obtained with softmax layers. In the objective function, we introduced a new formulation to calculate the temporal correlation more efficiently and stably. The detailed derivation of backpropagation gradients for the proposed module is also given. Besides, we presented a much larger scale scene change detection dataset with more semantic categories and conducted extensive experiments on this dataset. The experimental results demonstrated that our proposed CorrFusion module could remarkably improve the multi-temporal scene classification and scene change detection results.
Collapse
|
40
|
Ensemble Learning Approaches Based on Covariance Pooling of CNN Features for High Resolution Remote Sensing Scene Classification. REMOTE SENSING 2020. [DOI: 10.3390/rs12203292] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Remote sensing image scene classification, which consists of labeling remote sensing images with a set of categories based on their content, has received remarkable attention for many applications such as land use mapping. Standard approaches are based on the multi-layer representation of first-order convolutional neural network (CNN) features. However, second-order CNNs have recently been shown to outperform traditional first-order CNNs for many computer vision tasks. Hence, the aim of this paper is to show the use of second-order statistics of CNN features for remote sensing scene classification. This takes the form of covariance matrices computed locally or globally on the output of a CNN. However, these datapoints do not lie in an Euclidean space but a Riemannian manifold. To manipulate them, Euclidean tools are not adapted. Other metrics should be considered such as the log-Euclidean one. This consists of projecting the set of covariance matrices on a tangent space defined at a reference point. In this tangent plane, which is a vector space, conventional machine learning algorithms can be considered, such as the Fisher vector encoding or SVM classifier. Based on this log-Euclidean framework, we propose a novel transfer learning approach composed of two hybrid architectures based on covariance pooling of CNN features, the first is local and the second is global. They rely on the extraction of features from models pre-trained on the ImageNet dataset processed with some machine learning algorithms. The first hybrid architecture consists of an ensemble learning approach with the log-Euclidean Fisher vector encoding of region covariance matrices computed locally on the first layers of a CNN. The second one concerns an ensemble learning approach based on the covariance pooling of CNN features extracted globally from the deepest layers. These two ensemble learning approaches are then combined together based on the strategy of the most diverse ensembles. For validation and comparison purposes, the proposed approach is tested on various challenging remote sensing datasets. Experimental results exhibit a significant gain of approximately 2% in overall accuracy for the proposed approach compared to a similar state-of-the-art method based on covariance pooling of CNN features (on the UC Merced dataset).
Collapse
|
41
|
A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. REMOTE SENSING 2020. [DOI: 10.3390/rs12142209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
High-resolution optical remote sensing image classification is an important research direction in the field of computer vision. It is difficult to extract the rich semantic information from remote sensing images with many objects. In this paper, a multiscale self-adaptive attention network (MSAA-Net) is proposed for the optical remote sensing image classification, which includes multiscale feature extraction, adaptive information fusion, and classification. In the first part, two parallel convolution blocks with different receptive fields are adopted to capture multiscale features. Then, the squeeze process is used to obtain global information and the excitation process is used to learn the weights in different channels, which can adaptively select useful information from multiscale features. Furthermore, the high-level features are classified by many residual blocks with an attention mechanism and a fully connected layer. Experiments were conducted using the UC Merced, NWPU, and the Google SIRI-WHU datasets. Compared to the state-of-the-art methods, the MSAA-Net has great effect and robustness, with average accuracies of 94.52%, 95.01%, and 95.21% on the three widely used remote sensing datasets.
Collapse
|