1
|
Qi J, Gong Z, Liu X, Chen C, Zhong P. Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3012-3026. [PMID: 38163309 DOI: 10.1109/tnnls.2023.3345734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Deep learning (DL) methodology contributes a lot to the development of hyperspectral image (HSI) analysis community. However, it also makes HSI analysis systems vulnerable to adversarial attacks. To this end, we propose a masked spatial-spectral autoencoder (MSSA) in this article under self-supervised learning theory, for enhancing the robustness of HSI analysis systems. First, a masked sequence attention learning (MSAL) module is conducted to promote the inherent robustness of HSI analysis systems along spectral channel. Then, we develop a graph convolutional network (GCN) with learnable graph structure to establish global pixel-wise combinations. In this way, the attack effect would be dispersed by all the related pixels among each combination, and a better defense performance is achievable in spatial aspect. Finally, to improve the defense transferability and address the problem of limited labeled samples, MSSA employs spectra reconstruction as a pretext task and fits the datasets in a self-supervised manner. Comprehensive experiments over three benchmarks verify the effectiveness of MSSA in comparison with the state-of-the-art hyperspectral classification methods and representative adversarial defense strategies.
Collapse
|
2
|
Gao J, Jiao L, Liu X, Li L, Chen P, Liu F, Yang S. Multiscale Dynamic Curvelet Scattering Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7999-8012. [PMID: 36427283 DOI: 10.1109/tnnls.2022.3223212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The feature representation learning process greatly determines the performance of networks in classification tasks. By combining multiscale geometric tools and networks, better representation and learning can be achieved. However, relatively fixed geometric features and multiscale structures are always used. In this article, we propose a more flexible framework called the multiscale dynamic curvelet scattering network (MSDCCN). This data-driven dynamic network is based on multiscale geometric prior knowledge. First, multiresolution scattering and multiscale curvelet features are efficiently aggregated in different levels. Then, these features can be reused in networks flexibly and dynamically, depending on the multiscale intervention flag. The initial value of this flag is based on the complexity assessment, and it is updated according to feature sparsity statistics on the pretrained model. With the multiscale dynamic reuse structure, the feature representation learning process can be improved in the following training process. Also, multistage fine-tuning can be performed to further improve the classification accuracy. Furthermore, a novel multiscale dynamic curvelet scattering module, which is more flexible, is developed to be further embedded into other networks. Extensive experimental results show that better classification accuracies can be achieved by MSDCCN. In addition, necessary evaluation experiments have been performed, including convergence analysis, insight analysis, and adaptability analysis.
Collapse
|
3
|
Zhang Y, Li W, Zhang M, Wang S, Tao R, Du Q. Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1912-1925. [PMID: 35771788 DOI: 10.1109/tnnls.2022.3185795] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most domain adaptation (DA) methods in cross-scene hyperspectral image classification focus on cases where source data (SD) and target data (TD) with the same classes are obtained by the same sensor. However, the classification performance is significantly reduced when there are new classes in TD. In addition, domain alignment, as one of the main approaches in DA, is carried out based on local spatial information, rarely taking into account nonlocal spatial information (nonlocal relationships) with strong correspondence. A graph information aggregation cross-domain few-shot learning (Gia-CFSL) framework is proposed, intending to make up for the above-mentioned shortcomings by combining FSL with domain alignment based on graph information aggregation. SD with all label samples and TD with a few label samples are implemented for FSL episodic training. Meanwhile, intradomain distribution extraction block (IDE-block) and cross-domain similarity aware block (CSA-block) are designed. The IDE-block is used to characterize and aggregate the intradomain nonlocal relationships and the interdomain feature and distribution similarities are captured in the CSA-block. Furthermore, feature-level and distribution-level cross-domain graph alignments are used to mitigate the impact of domain shift on FSL. Experimental results on three public HSI datasets demonstrate the superiority of the proposed method. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TNNLS_Gia-CFSL.
Collapse
|
4
|
Zhu Z, Parker W, Wong A. Leveraging deep learning for automatic recognition of microplastics (MPs) via focal plane array (FPA) micro-FT-IR imaging. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 337:122548. [PMID: 37757933 DOI: 10.1016/j.envpol.2023.122548] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/14/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023]
Abstract
The fast and accurate identification of MPs in environmental samples is essential for the understanding of the fate and transport of MPs in ecosystems. The recognition of MPs in environmental samples by spectral classification using conventional library search routines can be challenging due to the presence of additives, surface modification, and adsorbed contaminants. Further, the thickness of MPs also impacts the shape of spectra when FTIR spectra are collected in transmission mode. To overcome these challenges, PlasticNet, a deep learning convolutional neural network architecture, was developed for enhanced MP recognition. Once trained with 8000 + spectra of virgin plastic, PlasticNet successfully classified 11 types of common plastic with accuracy higher than 95%. The errors in identification as indicated by a confusion matrix were found to be caused by edge effects, molecular similarity of plastics, and the contamination of standards. When PlasticNet was trained with spectra of virgin plastic it showed good performance (92%+) in recognizing spectra that had increased complexity due to the presence of additives and weathering. The re-training of PlasticNet with more complex spectra further enhanced the model's capability to recognize complex spectra. PlasticNet was also able to successfully identify MPs despite variations in spectra caused by variations in MP thickness. When compared with the performance of the library search in identifying MPs in the same complex dataset collected from an environmental sample, PlasticNet achieved comparable performance in identifying PP MPs, but a 17.3% improvement. PlasticNet has the potential to become a standard approach for rapid and accurate automatic recognition of MPs in environmental samples analyzed by FPA FT-IR imaging.
Collapse
Affiliation(s)
- Ziang Zhu
- Department of Systems Design Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada.
| | - Wayne Parker
- Department of Systems Design Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada
| | - Alexander Wong
- Department of Civil and Environmental Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
5
|
Zhang M, Li W, Zhang Y, Tao R, Du Q. Hyperspectral and LiDAR Data Classification Based on Structural Optimization Transmission. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:3153-3164. [PMID: 35560096 DOI: 10.1109/tcyb.2022.3169773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of the sensor technology, complementary data of different sources can be easily obtained for various applications. Despite the availability of adequate multisource observation data, for example, hyperspectral image (HSI) and light detection and ranging (LiDAR) data, existing methods may lack effective processing on structural information transmission and physical properties alignment, weakening the complementary ability of multiple sources in the collaborative classification task. The complementary information collaboration manner and the redundancy exclusion operator need to be redesigned for strengthening the semantic relatedness of multisources. As a remedy, we propose a structural optimization transmission framework, namely, structural optimization transmission network (SOT-Net), for collaborative land-cover classification of HSI and LiDAR data. Specifically, the SOT-Net is developed with three key modules: 1) cross-attention module; 2) dual-modes propagation module; and 3) dynamic structure optimization module. Based on above designs, SOT-Net can take full advantage of the reflectance-specific information of HSI and the detailed edge (structure) representations of multisource data. The inferred transmission plan, which integrates a self-alignment regularizer into the classification task, enhances the robustness of the feature extraction and classification process. Experiments show consistent outperformance of SOT-Net over baselines across three benchmark remote sensing datasets, and the results also demonstrate that the proposed framework can yield satisfying classification result even with small-size training samples.
Collapse
|
6
|
Jiang K, Xie W, Lei J, Li Z, Li Y, Jiang T, Du Q. E2E-LIADE: End-to-End Local Invariant Autoencoding Density Estimation Model for Anomaly Target Detection in Hyperspectral Image. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11385-11396. [PMID: 34077380 DOI: 10.1109/tcyb.2021.3079247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hyperspectral anomaly target detection (also known as hyperspectral anomaly detection (HAD)] is a technique aiming to identify samples with atypical spectra. Although some density estimation-based methods have been developed, they may suffer from two issues: 1) separated two-stage optimization with inconsistent objective functions makes the representation learning model fail to dig out characterization customized for HAD and 2) incapability of learning a low-dimensional representation that preserves the inherent information from the original high-dimensional spectral space. To address these problems, we propose a novel end-to-end local invariant autoencoding density estimation (E2E-LIADE) model. To satisfy the assumption on the manifold, the E2E-LIADE introduces a local invariant autoencoder (LIA) to capture the intrinsic low-dimensional manifold embedded in the original space. Augmented low-dimensional representation (ALDR) can be generated by concatenating the local invariant constrained by a graph regularizer and the reconstruction error. In particular, an end-to-end (E2E) multidistance measure, including mean-squared error (MSE) and orthogonal projection divergence (OPD), is imposed on the LIA with respect to hyperspectral data. More important, E2E-LIADE simultaneously optimizes the ALDR of the LIA and a density estimation network in an E2E manner to avoid the model being trapped in a local optimum, resulting in an energy map in which each pixel represents a negative log likelihood for the spectrum. Finally, a postprocessing procedure is conducted on the energy map to suppress the background. The experimental results demonstrate that compared to the state of the art, the proposed E2E-LIADE offers more satisfactory performance.
Collapse
|
7
|
Gong Z, Hu W, Du X, Zhong P, Hu P. Deep Manifold Embedding for Hyperspectral Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10430-10443. [PMID: 33872180 DOI: 10.1109/tcyb.2021.3069790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning methods have played a more important role in hyperspectral image classification. However, general deep learning methods mainly take advantage of the samplewise information to formulate the training loss while ignoring the intrinsic data structure of each class. Due to the high spectral dimension and great redundancy between different spectral channels in the hyperspectral image, these former training losses usually cannot work so well for the deep representation of the image. To tackle this problem, this work develops a novel deep manifold embedding method (DMEM) for deep learning in hyperspectral image classification. First, each class in the image is modeled as a specific nonlinear manifold, and the geodesic distance is used to measure the correlation between the samples. Then, based on the hierarchical clustering, the manifold structure of the data can be captured and each nonlinear data manifold can be divided into several subclasses. Finally, considering the distribution of each subclass and the correlation between different subclasses under data manifold, DMEM is constructed as the novel training loss to incorporate the special classwise information in the training process and obtain discriminative representation for the hyperspectral image. Experiments over four real-world hyperspectral image datasets have demonstrated the effectiveness of the proposed method when compared with general sample-based losses and showed superiority when compared with state-of-the-art methods.
Collapse
|
8
|
Jia S, Jiang S, Zhang S, Xu M, Jia X. Graph-in-Graph Convolutional Network for Hyperspectral Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:1157-1171. [PMID: 35724277 DOI: 10.1109/tnnls.2022.3182715] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of hyperspectral sensors, accessible hyperspectral images (HSIs) are increasing, and pixel-oriented classification has attracted much attention. Recently, graph convolutional networks (GCNs) have been proposed to process graph-structured data in non-Euclidean domains and have been employed in HSI classification. But most methods based on GCN are hard to sufficiently exploit information of ground objects due to feature aggregation. To solve this issue, in this article, we proposed a graph-in-graph (GiG) model and a related GiG convolutional network (GiGCN) for HSI classification from a superpixel viewpoint. The GiG representation covers information inside and outside superpixels, respectively, corresponding to the local and global characteristics of ground objects. Concretely, after segmenting HSI into disjoint superpixels, each one is converted to an internal graph. Meanwhile, an external graph is constructed according to the spatial adjacent relationships among superpixels. Significantly, each node in the external graph embeds a corresponding internal graph, forming the so-called GiG structure. Then, GiGCN composed of internal and External graph convolution (EGC) is designed to extract hierarchical features and integrate them into multiple scales, improving the discriminability of GiGCN. Ensemble learning is incorporated to further boost the robustness of GiGCN. It is worth noting that we are the first to propose the GiG framework from the superpixel point and the GiGCN scheme for HSI classification. Experiment results on four benchmark datasets demonstrate that our proposed method is effective and feasible for HSI classification with limited labeled samples. For study replication, the code developed for this study is available at https://github.com/ShuGuoJ/GiGCN.git.
Collapse
|
9
|
Multi-Prior Twin Least-Square Network for Anomaly Detection of Hyperspectral Imagery. REMOTE SENSING 2022. [DOI: 10.3390/rs14122859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
Anomaly detection of hyperspectral imagery (HSI) identifies the very few samples that do not conform to an intricate background without priors. Despite the extensive success of hyperspectral interpretation techniques based on generative adversarial networks (GANs), applying trained GAN models to hyperspectral anomaly detection remains promising but challenging. Previous generative models can accurately learn the complex background distribution of HSI and typically convert the high-dimensional data back to the latent space to extract features to detect anomalies. However, both background modeling and feature-extraction methods can be improved to become ideal in terms of the modeling power and reconstruction consistency capability. In this work, we present a multi-prior-based network (MPN) to incorporate the well-trained GANs as effective priors to a general anomaly-detection task. In particular, we introduce multi-scale covariance maps (MCMs) of precise second-order statistics to construct multi-scale priors. The MCM strategy implicitly bridges the spectral- and spatial-specific information and fully represents multi-scale, enhanced information. Thus, we reliably and adaptively estimate the HSI label to alleviate the problem of insufficient priors. Moreover, the twin least-square loss is imposed to improve the generative ability and training stability in feature and image domains, as well as to overcome the gradient vanishing problem. Last but not least, the network, enforced with a new anomaly rejection loss, establishes a pure and discriminative background estimation.
Collapse
|
10
|
Abstract
Convolutional neural networks (CNNs) are widely used among the various deep learning techniques available because of their superior performance in the fields of computer vision and natural language processing. CNNs can effectively extract the locality and correlation of input data using structures in which convolutional layers are successively applied to the input data. In general, the performance of neural networks has improved as the depth of CNNs has increased. However, an increase in the depth of a CNN is not always accompanied by an increase in the accuracy of the neural network. This is because the gradient vanishing problem may arise, causing the weights of the weighted layers to fail to converge. Accordingly, the gradient flows of the VGGNet, ResNet, SENet, and DenseNet models were analyzed and compared in this study, and the reasons for the differences in the error rate performances of the models were derived.
Collapse
|
11
|
A Self-Improving Framework for Joint Depth Estimation and Underwater Target Detection from Hyperspectral Imagery. REMOTE SENSING 2021. [DOI: 10.3390/rs13091721] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Underwater target detection (UTD) is one of the most attractive research topics in hyperspectral imagery (HSI) processing. Most of the existing methods are presented to predict the signatures of desired targets in an underwater context but ignore the depth information which is position-sensitive and contributes significantly to distinguishing the background and target pixels. So as to take full advantage of the depth information, in this paper a self-improving framework is proposed to perform joint depth estimation and underwater target detection, which exploits the depth information and detection results to alternately boost the final detection performance. However, it is difficult to calculate depth information under the interference of a water environment. To address this dilemma, the proposed framework, named self-improving underwater target detection framework (SUTDF), employs the spectral and spatial contextual information to pick out target-associated pixels as the guidance dataset for depth estimation work. Considering the incompleteness of the guidance dataset, an expectation-maximum liked updating scheme has also been developed to iteratively excavate the statistical and structural information from input HSI for further improving the diversity of the guidance dataset. During each updating epoch, the calculated depth information is used to yield a more diversified dataset for the target detection network, leading to a more accurate detection result. Meanwhile, the detection result will in turn contribute in detecting more target-associated pixels as the supplement for the guidance dataset, eventually promoting the capacity of the depth estimation network. With this specific self-improving framework, we can provide a more precise detection result for a hyperspectral UTD task. Qualitative and quantitative illustrations verify the effectiveness and efficiency of SUTDF in comparison with state-of-the-art underwater target detection methods.
Collapse
|
12
|
Zhang H, Yang J, Zhou K, Li F, Hu Y, Zhao Y, Zheng C, Zhang X, Liu J. Automatic Segmentation and Visualization of Choroid in OCT with Knowledge Infused Deep Learning. IEEE J Biomed Health Inform 2020; 24:3408-3420. [PMID: 32931435 DOI: 10.1109/jbhi.2020.3023144] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The choroid provides oxygen and nourishment to the outer retina thus is related to the pathology of various ocular diseases. Optical coherence tomography (OCT) is advantageous in visualizing and quantifying the choroid in vivo. However, its application in the study of the choroid is still limited for two reasons. (1) The lower boundary of the choroid (choroid-sclera interface) in OCT is fuzzy, which makes the automatic segmentation difficult and inaccurate. (2) The visualization of the choroid is hindered by the vessel shadows from the superficial layers of the inner retina. In this paper, we propose to incorporate medical and imaging prior knowledge with deep learning to address these two problems. We propose a biomarker-infused global-to-local network (Bio-Net) for the choroid segmentation, which not only regularizes the segmentation via predicted choroid thickness, but also leverages a global-to-local segmentation strategy to provide global structure information and suppress overfitting. For eliminating the retinal vessel shadows, we propose a deep-learning pipeline, which firstly locate the shadows using their projection on the retinal pigment epithelium layer, then the contents of the choroidal vasculature at the shadow locations are predicted with an edge-to-texture generative adversarial inpainting network. The results show our method outperforms the existing methods on both tasks. We further apply the proposed method in a clinical prospective study for understanding the pathology of glaucoma, which demonstrates its capacity in detecting the structure and vascular changes of the choroid related to the elevation of intra-ocular pressure.
Collapse
|
13
|
A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. REMOTE SENSING 2020. [DOI: 10.3390/rs12142209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
High-resolution optical remote sensing image classification is an important research direction in the field of computer vision. It is difficult to extract the rich semantic information from remote sensing images with many objects. In this paper, a multiscale self-adaptive attention network (MSAA-Net) is proposed for the optical remote sensing image classification, which includes multiscale feature extraction, adaptive information fusion, and classification. In the first part, two parallel convolution blocks with different receptive fields are adopted to capture multiscale features. Then, the squeeze process is used to obtain global information and the excitation process is used to learn the weights in different channels, which can adaptively select useful information from multiscale features. Furthermore, the high-level features are classified by many residual blocks with an attention mechanism and a fully connected layer. Experiments were conducted using the UC Merced, NWPU, and the Google SIRI-WHU datasets. Compared to the state-of-the-art methods, the MSAA-Net has great effect and robustness, with average accuracies of 94.52%, 95.01%, and 95.21% on the three widely used remote sensing datasets.
Collapse
|