1
|
Pan X, Jiao C, Yang B, Zhu H, Wu J. Attribute-guided feature fusion network with knowledge-inspired attention mechanism for multi-source remote sensing classification. Neural Netw 2025; 187:107332. [PMID: 40088832 DOI: 10.1016/j.neunet.2025.107332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 10/21/2024] [Accepted: 02/27/2025] [Indexed: 03/17/2025]
Abstract
Land use and land cover (LULC) classification is a popular research area in remote sensing. The information of single-modal data is insufficient for accurate classification, especially in complex scenes, while the complementarity of multi-modal data such as hyperspectral images (HSIs) and light detection and ranging (LiDAR) data could effectively improve classification performance. The attention mechanism has recently been widely used in multi-modal LULC classification methods to achieve better feature representation. However, the knowledge of data is insufficiently considered in these methods, such as spectral mixture in HSIs and inconsistent spatial scales of different categories in LiDAR data. Moreover, multi-modal features contain different physical attributes, HSI features can represent spectral information of several channels while LiDAR features focus on elevation information at the spatial dimension. Ignoring these attributes, feature fusion may introduce redundant information and effect detrimentally on classification. In this paper, we propose an attribute-guided feature fusion network with knowledge-inspired attention mechanisms, named AFNKA. Focusing on the spectral characteristics of HSI and elevation information of LiDAR data, we design the knowledge-inspired attention mechanism to explore enhanced features. Especially, a novel adaptive cosine estimator (ACE) based attention module is presented to learn features with more discriminability, which adequately utilizes the spatial-spectral correlation of HSI mixed pixels. In the fusion stage, two novel attribute-guided fusion modules are developed to selectively aggregate multi-modal features, which sufficiently exploit the correlations between the spatial-spectral property of HSI features and the spatial-elevation property of LiDAR features. Experimental results on several multi-source datasets quantitatively indicate that the proposed AFNKA significantly outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Xiao Pan
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Changzhe Jiao
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Bo Yang
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Hao Zhu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Jinjian Wu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China.
| |
Collapse
|
2
|
Geng X, Jiao L, Liu X, Li L, Chen P, Liu F, Yang S. A Spatial-Spectral Relation-Guided Fusion Network for Multisource Optical RS Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8991-9004. [PMID: 38954572 DOI: 10.1109/tnnls.2024.3413799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Multisource optical remote sensing (RS) image classification has obtained extensive research interest with demonstrated superiority. Existing approaches mainly improve classification performance by exploiting complementary information from multisource data. However, these approaches are insufficient in effectively extracting data features and utilizing correlations of multisource optical RS images. For this purpose, this article proposes a generalized spatial-spectral relation-guided fusion network (S2RGF-Net) for multisource optical RS image classification. First, we elaborate on spatial- and spectral-domain-specific feature encoders based on data characteristics to explore the rich feature information of optical RS data deeply. Subsequently, two relation-guided fusion strategies are proposed at the dual-level (intradomain and interdomain) to integrate multisource image information effectively. In the intradomain feature fusion, an adaptive de-redundancy fusion module (ADRF) is introduced to eliminate redundancy so that the spatial and spectral features are complete and compact, respectively. In interdomain feature fusion, we construct a spatial-spectral joint attention module (SSJA) based on interdomain relationships to sufficiently enhance the complementary features, so as to facilitate later fusion. Experiments on various multisource optical RS datasets demonstrate that S2RGF-Net outperforms other state-of-the-art (SOTA) methods.
Collapse
|
3
|
Dentamaro V, Giglio P, Impedovo D, Pirlo G, Ciano MD. An Interpretable Adaptive Multiscale Attention Deep Neural Network for Tabular Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6995-7009. [PMID: 38748522 DOI: 10.1109/tnnls.2024.3392355] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
Deep learning (DL) has been demonstrated to be a valuable tool for analyzing signals such as sounds and images, thanks to its capabilities of automatically extracting relevant patterns as well as its end-to-end training properties. When applied to tabular structured data, DL has exhibited some performance limitations compared to shallow learning techniques. This work presents a novel technique for tabular data called adaptive multiscale attention deep neural network architecture (also named excited attention). By exploiting parallel multilevel feature weighting, the adaptive multiscale attention can successfully learn the feature attention and thus achieve high levels of F1-score on seven different classification tasks (on small, medium, large, and very large datasets) and low mean absolute errors on four regression tasks of different size. In addition, adaptive multiscale attention provides four levels of explainability (i.e., comprehension of its learning process and therefore of its outcomes): 1) calculates attention weights to determine which layers are most important for given classes; 2) shows each feature's attention across all instances; 3) understands learned feature attention for each class to explore feature attention and behavior for specific classes; and 4) finds nonlinear correlations between co-behaving features to reduce dataset dimensionality and improve interpretability. These interpretability levels, in turn, allow for employing adaptive multiscale attention as a useful tool for feature ranking and feature selection.
Collapse
|
4
|
Sun K, Zhang J, Xu S, Zhao Z, Zhang C, Liu J, Hu J. CACNN: Capsule Attention Convolutional Neural Networks for 3D Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4091-4102. [PMID: 37934641 DOI: 10.1109/tnnls.2023.3326606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Recently, view-based approaches, which recognize a 3D object through its projected 2-D images, have been extensively studied and have achieved considerable success in 3D object recognition. Nevertheless, most of them use a pooling operation to aggregate viewwise features, which usually leads to the visual information loss. To tackle this problem, we propose a novel layer called capsule attention layer (CAL) by using attention mechanism to fuse the features expressed by capsules. In detail, instead of dynamic routing algorithm, we use an attention module to transmit information from the lower level capsules to higher level capsules, which obviously improves the speed of capsule networks. In particular, the view pooling layer of multiview convolutional neural network (MVCNN) becomes a special case of our CAL when the trainable weights are chosen on some certain values. Furthermore, based on CAL, we propose a capsule attention convolutional neural network (CACNN) for 3D object recognition. Extensive experimental results on three benchmark datasets demonstrate the efficiency of our CACNN and show that it outperforms many state-of-the-art methods.
Collapse
|
5
|
Zhang B, Chen Y, Xiong S, Lu X. Hyperspectral Image Classification via Cascaded Spatial Cross-Attention Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:899-913. [PMID: 40031310 DOI: 10.1109/tip.2025.3533205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In hyperspectral images (HSIs), different land cover (LC) classes have distinct reflective characteristics at various wavelengths. Therefore, relying on only a few bands to distinguish all LC classes often leads to information loss, resulting in poor average accuracy. To address this problem, we propose a method called Cascaded Spatial Cross-Attention Network (CSCANet) for HSI classification. We design a cascaded spatial cross-attention module, which first performs cross-attention on local and global features in the spatial context, then uses a group cascade structure to sequentially propagate important spatial regions within the different channels, and finally obtains joint attention features to improve the robustness of the network. Moreover, we also design a two-branch feature separation structure based on spatial-spectral features to separate different LC Tokens as much as possible, thereby improving the distinguishability of different LC classes. Extensive experiments demonstrate that our method achieves excellent performance in enhancing classification accuracy and robustness. The source code can be obtained from https://github.com/WUTCM-Lab/CSCANet.
Collapse
|
6
|
Wang J, Zhang M, Li W, Tao R. A Multistage Information Complementary Fusion Network Based on Flexible-Mixup for HSI-X Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17189-17201. [PMID: 37578909 DOI: 10.1109/tnnls.2023.3300903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Mixup-based data augmentation has been proven to be beneficial to the regularization of models during training, especially in the remote-sensing field where the training data is scarce. However, in the process of data augmentation, the Mixup-based methods ignore the target proportion in different inputs and keep the linear insertion ratio consistent, which leads to the response of label space even if no effective objects are introduced in the mixed image due to the randomness of the augmentation process. Moreover, although some previous works have attempted to utilize different multimodal interaction strategies, they could not be well extended to various remote-sensing data combinations. To this end, a multistage information complementary fusion network based on flexible-mixup (Flex-MCFNet) is proposed for hyperspectral-X image classification. First, to bridge the gap between the mixed image and the label, a flexible-mixup (FlexMix) data augmentation strategy is designed, where the weight of the label increases with the ratio of the input image to prevent the negative impact on the label space because of the introduction of invalid information. More importantly, to summarize diverse remote-sensing data inputs including various modal supplements and uncertainties, a multistage information complementary fusion network (MCFNet) is developed. After extracting the features of hyperspectral and complementary modalities [X-modal, including multispectral, synthetic aperture radar (SAR), and light detection and ranging (LiDAR)] separately, the information between complementary modalities is fully interacted and enhanced through multiple stages of information complement and fusion, which is used for the final image classification. Extensive experimental results have demonstrated that Flex-MCFNet can not only effectively expand the training data, but also adequately regularize different data combinations to achieve state-of-the-art performance.
Collapse
|
7
|
Li W, Gao Y, Zhang M, Tao R, Du Q. Asymmetric Feature Fusion Network for Hyperspectral and SAR Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8057-8070. [PMID: 35180093 DOI: 10.1109/tnnls.2022.3149394] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Joint classification using multisource remote sensing data for Earth observation is promising but challenging. Due to the gap of imaging mechanism and imbalanced information between multisource data, integrating the complementary merits for interpretation is still full of difficulties. In this article, a classification method based on asymmetric feature fusion, named asymmetric feature fusion network (AsyFFNet), is proposed. First, the weight-share residual blocks are utilized for feature extraction while keeping separate batch normalization (BN) layers. In the training phase, redundancy of the current channel is self-determined by the scaling factors in BN, which is replaced by another channel when the scaling factor is less than a threshold. To eliminate unnecessary channels and improve the generalization, a sparse constraint is imposed on partial scaling factors. Besides, a feature calibration module is designed to exploit the spatial dependence of multisource features, so that the discrimination capability is enhanced. Experimental results on the three datasets demonstrate that the proposed AsyFFNet significantly outperforms other competitive approaches.
Collapse
|
8
|
Li X, Li Z, Qiu H, Chen G, Fan P. Soil carbon content prediction using multi-source data feature fusion of deep learning based on spectral and hyperspectral images. CHEMOSPHERE 2023; 336:139161. [PMID: 37302502 DOI: 10.1016/j.chemosphere.2023.139161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/05/2023] [Accepted: 06/06/2023] [Indexed: 06/13/2023]
Abstract
Visible near-infrared reflectance spectroscopy (VNIR) and hyperspectral images (HSI) have their respective advantages in soil carbon content prediction, and the effective fusion of VNIR and HSI is of great significance for improving the prediction accuracy. But the contribution difference analysis of multiple features in the multi-source data is inadequate, and there is a lack of in-depth research on the contribution difference analysis of artificial feature and deep learning feature. In order to solve the problem, soil carbon content prediction methods based on VNIR and HSI multi-source data feature fusion are proposed. The multi-source data fusion network under the attention mechanism and the multi-source data fusion network with artificial feature are designed. For the multi-source data fusion network based on the attention mechanism, the information are fused through the attention mechanism according to the contribution difference of each feature. For the other network, artificial feature are introduced to fuse multi-source data. The results show that multi-source data fusion network based on the attention mechanism can improve the prediction accuracy of soil carbon content, and multi-source data fusion network combined with artificial feature has better prediction effect. Compared with two single-source data from the VNIR and HSI, the relative percent deviation of Neilu, Aoshan Bay and Jiaozhou Bay based on multi-source data fusion network combined with artificial feature are increased by 56.81% and 149.18%, 24.28% and 43.96%, 31.16% and 28.73% respectively. This study can effectively solve the problem of the deep fusion of multiple features in the soil carbon content prediction by VNIR and HSI, so as to improve the accuracy and stability of soil carbon content prediction, promote the application and development of soil carbon content prediction in spectral and hyperspectral image, and provide technical support for the study of carbon cycle and the carbon sink.
Collapse
Affiliation(s)
- Xueying Li
- Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao, 266061, China; College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266590, China
| | - Zongmin Li
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266590, China
| | - Huimin Qiu
- Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao, 266061, China
| | - Guangyuan Chen
- College of Ocean Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
| | - Pingping Fan
- Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao, 266061, China.
| |
Collapse
|
9
|
Zhang M, Li W, Zhang Y, Tao R, Du Q. Hyperspectral and LiDAR Data Classification Based on Structural Optimization Transmission. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:3153-3164. [PMID: 35560096 DOI: 10.1109/tcyb.2022.3169773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of the sensor technology, complementary data of different sources can be easily obtained for various applications. Despite the availability of adequate multisource observation data, for example, hyperspectral image (HSI) and light detection and ranging (LiDAR) data, existing methods may lack effective processing on structural information transmission and physical properties alignment, weakening the complementary ability of multiple sources in the collaborative classification task. The complementary information collaboration manner and the redundancy exclusion operator need to be redesigned for strengthening the semantic relatedness of multisources. As a remedy, we propose a structural optimization transmission framework, namely, structural optimization transmission network (SOT-Net), for collaborative land-cover classification of HSI and LiDAR data. Specifically, the SOT-Net is developed with three key modules: 1) cross-attention module; 2) dual-modes propagation module; and 3) dynamic structure optimization module. Based on above designs, SOT-Net can take full advantage of the reflectance-specific information of HSI and the detailed edge (structure) representations of multisource data. The inferred transmission plan, which integrates a self-alignment regularizer into the classification task, enhances the robustness of the feature extraction and classification process. Experiments show consistent outperformance of SOT-Net over baselines across three benchmark remote sensing datasets, and the results also demonstrate that the proposed framework can yield satisfying classification result even with small-size training samples.
Collapse
|
10
|
HA-RoadFormer: Hybrid Attention Transformer with Multi-Branch for Large-Scale High-Resolution Dense Road Segmentation. MATHEMATICS 2022. [DOI: 10.3390/math10111915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Road segmentation is one of the essential tasks in remote sensing. Large-scale high-resolution remote sensing images originally have larger pixel sizes than natural images, while the existing models based on Transformer have the high computational cost of square complexity, leading to more extended model training and inference time. Inspired by the long text Transformer model, this paper proposes a novel hybrid attention mechanism to improve the inference speed of the model. By calculating several diagonals and random blocks of the attention matrix, hybrid attention achieves linear time complexity in the token sequence. Using the superposition of adjacent and random attention, hybrid attention introduces the inductive bias similar to convolutional neural networks (CNNs) and retains the ability to acquire long-distance dependence. In addition, the dense road segmentation result of remote sensing image still has the problem of insufficient continuity. However, multiscale feature representation is an effective means in the network based on CNNs. Inspired by this, we propose a multi-scale patch embedding module, which divides images by patches with different scales to obtain coarse-to-fine feature representations. Experiments on the Massachusetts dataset show that the proposed HA-RoadFormer could effectively preserve the integrity of the road segmentation results, achieving a higher Intersection over Union (IoU) 67.36% of road segmentation compared to other state-of-the-art (SOTA) methods. At the same time, the inference speed has also been greatly improved compared with other Transformer based models.
Collapse
|
11
|
Building Damage Assessment Based on Siamese Hierarchical Transformer Framework. MATHEMATICS 2022. [DOI: 10.3390/math10111898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The rapid and accurate damage assessment of buildings plays a critical role in disaster response. Based on pairs of pre- and post-disaster remote sensing images, effective building damage level assessment can be conducted. However, most existing methods are based on Convolutional Neural Network, which has limited ability to learn the global context. An attention mechanism helps ameliorate this problem. Hierarchical Transformer has powerful potential in the remote sensing field with strong global modeling capability. In this paper, we propose a novel two-stage damage assessment framework called SDAFormer, which embeds a symmetric hierarchical Transformer into a siamese U-Net-like network. In the first stage, the pre-disaster image is fed into a segmentation network for building localization. In the second stage, a two-branch damage classification network is established based on weights shared from the first stage. Then, pre- and post-disaster images are delivered to the network separately for damage assessment. Moreover, a spatial fusion module is designed to improve feature representation capability by building pixel-level correlation, which establishes spatial information in Swin Transformer blocks. The proposed framework achieves significant improvement on the large-scale building damage assessment dataset—xBD.
Collapse
|
12
|
Monitoring the Invasive Plant Spartina alterniflora in Jiangsu Coastal Wetland Using MRCNN and Long-Time Series Landsat Data. REMOTE SENSING 2022. [DOI: 10.3390/rs14112630] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Jiangsu coastal wetland has the largest area of the invasive plant, Spartina alterniflora (S. alterniflora), in China. S. alterniflora has been present in the wetland for nearly 40 years and poses a substantial threat to the safety of coastal wetland ecosystems. There is an urgent need to control the distribution of S. alterniflora. The biological characteristics of the invasion process of S. alterniflora contribute to its multi-scale distribution. However, the current classification methods do not deal successfully with multi-scale problems, and it is also difficult to perform high-precision land cover classification on multi-temporal remote sensing images. In this study, based on Landsat data from 1990 to 2020, a new deep learning multi-scale residual convolutional neural network (MRCNN) model was developed to identify S. alterniflora. In this method, features at different scales are extracted and concatenated to obtain multi-scale information, and residual connections are introduced to ensure gradient propagation. A multi-year data unified training method was adopted to improve the temporal scalability of the MRCNN. The MRCNN model was able to identify the annual S. alterniflora distribution more accurately, overcame the disadvantage that traditional CNNs can only extract feature information at a single scale, and offered significant advantages in spatial characterization. A thematic map of S. alterniflora distribution was obtained. Since it was introduced in 1982, the distribution of S. alterniflora has expanded to approximately 17,400 ha. In Jiangsu, the expansion process of S. alterniflora over time was divided into three stages: the growth period (1982–1994), the outbreak period (1995–2004), and the plateau period (2005–2020). The spatial expansion direction was mainly parallel and perpendicular to the coastline. The hydrodynamic conditions and tidal flat environment on the coast of Jiangsu Province are suitable for the growth of S. alterniflora. Reclamation of tidal flats is the main factor affecting the expansion of S. alterniflora.
Collapse
|
13
|
Xue Z, Tan X, Yu X, Liu B, Yu A, Zhang P. Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3095-3110. [PMID: 35404817 DOI: 10.1109/tip.2022.3162964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this study, we develop a novel deep hierarchical vision transformer (DHViT) architecture for hyperspectral and light detection and ranging (LiDAR) data joint classification. Current classification methods have limitations in heterogeneous feature representation and information fusion of multi-modality remote sensing data (e.g., hyperspectral and LiDAR data), these shortcomings restrict the collaborative classification accuracy of remote sensing data. The proposed deep hierarchical vision transformer architecture utilizes both the powerful modeling capability of long-range dependencies and strong generalization ability across different domains of the transformer network, which is based exclusively on the self-attention mechanism. Specifically, the spectral sequence transformer is exploited to handle the long-range dependencies along the spectral dimension from hyperspectral images, because all diagnostic spectral bands contribute to the land cover classification. Thereafter, we utilize the spatial hierarchical transformer structure to extract hierarchical spatial features from hyperspectral and LiDAR data, which are also crucial for classification. Furthermore, the cross attention (CA) feature fusion pattern could adaptively and dynamically fuse heterogeneous features from multi-modality data, and this contextual aware fusion mode further improves the collaborative classification performance. Comparative experiments and ablation studies are conducted on three benchmark hyperspectral and LiDAR datasets, and the DHViT model could yield an average overall classification accuracy of 99.58%, 99.55%, and 96.40% on three datasets, respectively, which sufficiently certify the effectiveness and superior performance of the proposed method.
Collapse
|
14
|
DBMF: A Novel Method for Tree Species Fusion Classification Based on Multi-Source Images. FORESTS 2021. [DOI: 10.3390/f13010033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Multi-source data remote sensing provides innovative technical support for tree species recognition. Tree species recognition is relatively poor despite noteworthy advancements in image fusion methods because the features from multi-source data for each pixel in the same region cannot be deeply exploited. In the present paper, a novel deep learning approach for hyperspectral imagery is proposed to improve accuracy for the classification of tree species. The proposed method, named the double branch multi-source fusion (DBMF) method, could more deeply determine the relationship between multi-source data and provide more effective information. The DBMF method does this by fusing spectral features extracted from a hyperspectral image (HSI) captured by the HJ-1A satellite and spatial features extracted from a multispectral image (MSI) captured by the Sentinel-2 satellite. The network has two branches in the spatial branch to avoid the risk of information loss, of which, sandglass blocks are embedded into a convolutional neural network (CNN) to extract the corresponding spatial neighborhood features from the MSI. Simultaneously, to make the useful spectral feature transfer more effective in the spectral branch, we employed bidirectional long short-term memory (Bi-LSTM) with a triple attention mechanism to extract the spectral features of each pixel in the HSI with low resolution. The feature information is fused to classify the tree species after the addition of a fusion activation function, which could allow the network to obtain more interactive information. Finally, the fusion strategy allows for the prediction of the full classification map of three study areas. Experimental results on a multi-source dataset show that DBMF has a significant advantage over other state-of-the-art frameworks.
Collapse
|