1
|
Pan X, Jiao C, Yang B, Zhu H, Wu J. Attribute-guided feature fusion network with knowledge-inspired attention mechanism for multi-source remote sensing classification. Neural Netw 2025; 187:107332. [PMID: 40088832 DOI: 10.1016/j.neunet.2025.107332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 10/21/2024] [Accepted: 02/27/2025] [Indexed: 03/17/2025]
Abstract
Land use and land cover (LULC) classification is a popular research area in remote sensing. The information of single-modal data is insufficient for accurate classification, especially in complex scenes, while the complementarity of multi-modal data such as hyperspectral images (HSIs) and light detection and ranging (LiDAR) data could effectively improve classification performance. The attention mechanism has recently been widely used in multi-modal LULC classification methods to achieve better feature representation. However, the knowledge of data is insufficiently considered in these methods, such as spectral mixture in HSIs and inconsistent spatial scales of different categories in LiDAR data. Moreover, multi-modal features contain different physical attributes, HSI features can represent spectral information of several channels while LiDAR features focus on elevation information at the spatial dimension. Ignoring these attributes, feature fusion may introduce redundant information and effect detrimentally on classification. In this paper, we propose an attribute-guided feature fusion network with knowledge-inspired attention mechanisms, named AFNKA. Focusing on the spectral characteristics of HSI and elevation information of LiDAR data, we design the knowledge-inspired attention mechanism to explore enhanced features. Especially, a novel adaptive cosine estimator (ACE) based attention module is presented to learn features with more discriminability, which adequately utilizes the spatial-spectral correlation of HSI mixed pixels. In the fusion stage, two novel attribute-guided fusion modules are developed to selectively aggregate multi-modal features, which sufficiently exploit the correlations between the spatial-spectral property of HSI features and the spatial-elevation property of LiDAR features. Experimental results on several multi-source datasets quantitatively indicate that the proposed AFNKA significantly outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Xiao Pan
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Changzhe Jiao
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Bo Yang
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Hao Zhu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Jinjian Wu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China.
| |
Collapse
|
2
|
Yan Q, Zhang S, Chen X, Zheng Z. Multiscale superpixel depth feature extraction for hyperspectral image classification. Sci Rep 2025; 15:13529. [PMID: 40253388 PMCID: PMC12009309 DOI: 10.1038/s41598-025-90228-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 02/11/2025] [Indexed: 04/21/2025] Open
Abstract
Recently, superpixel segmentation has been widely employed in hyperspectral image (HSI) classification of remote sensing. However, the structures of land-covers in HSI commonly vary greatly, which makes it difficult to fully fit the boundaries of land-covers by single-scale superpixel segmentation. Moreover, the shape-irregularity of superpixel brings challenge for depth feature extraction. To overcome these issues, a multiscale superpixel depth feature extraction (MSDFE) method is proposed for HSI classification in this article, which effectively explores and integrates the spatial-spectral information of land-covers by adopting multiscale superpixel segmentation, constructing statistical features of superpixel, and conducting depth feature extraction. Specifically, to exploit rich spatial information of HSI, multiscale superpixel segmentation is firstly applied on the HSI. Once superpixels on different scales are obtained, two-dimensional statistical features with a united form are constructed for these superpixels with different spatial shapes. Based on these two-dimensional statistical features, a convolutional neural network is utilized to learn deeper features and classify these depth features. Finally, an adaptive strategy is adopted to fuse the multiscale classification results. Experiments on three real hyperspectral datasets indicate the superiority of the proposed MSDFE method over several state-of-the-art methods.
Collapse
Affiliation(s)
- Qi Yan
- College of Communication and Electronic Engineering, Jishou University, People's South Road, Jishou, 416000, Hunan, China
| | - Shuzhen Zhang
- College of Communication and Electronic Engineering, Jishou University, People's South Road, Jishou, 416000, Hunan, China.
| | - Xiang Chen
- College of Communication and Electronic Engineering, Jishou University, People's South Road, Jishou, 416000, Hunan, China
| | - Ziyou Zheng
- College of Communication and Electronic Engineering, Jishou University, People's South Road, Jishou, 416000, Hunan, China
| |
Collapse
|
3
|
Fang Y, Sun L, Zheng Y, Wu Z. Deformable Convolution-Enhanced Hierarchical Transformer with Spectral-Spatial Cluster Attention for Hyperspectral Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:701-716. [PMID: 40030755 DOI: 10.1109/tip.2024.3522809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Vision Transformer (ViT), known for capturing non-local features, is an effective tool for hyperspectral image classification (HSIC). However, ViT's multi-head self-attention (MHSA) mechanism often struggles to balance local details and long-range relationships for complex high-dimensional data, leading to a loss in spectral-spatial information representation. To address this issue, we propose a deformable convolution-enhanced hierarchical Transformer with spectral-spatial cluster attention (SClusterFormer) for HSIC. The model incorporates a unique cluster attention mechanism that utilizes spectral angle similarity and Euclidean distance metrics to enhance the representation of fine-grained homogenous local details and improve discrimination of non-local structures in 3-D HSI and 2-D morphological data, respectively. Additionally, a dual-branch multiscale deformable convolution framework augmented with frequency-based spectral attention is designed to capture both the discrepancy patterns in high-frequency and overall trend of the spectral profile in low-frequency. Finally, we utilize a cross-feature pixel-level fusion module for collaborative cross-learning and fusion of the results from the dual-branch framework. Comprehensive experiments conducted on multiple HSIC datasets validate the superiority of our proposed SClusterFormer model, which outperforms existing methods. The source code of SClusterFormer is available at https://github.com/Fang666666/HSIC SClusterFormer.
Collapse
|
4
|
Shao H, Li P, Zhong D. Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4911-4922. [PMID: 39236127 DOI: 10.1109/tip.2024.3451933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
As a promising topic in palmprint recognition, cross-dataset palmprint recognition is attracting more and more research interests. In this paper, a more difficult yet realistic scenario is studied, i.e., Single-Source Cross-Dataset Palmprint Recognition with Unseen Target dataset (S2CDPR-UT). It is aimed to generalize a palmprint feature extractor trained only on a single source dataset to multiple unseen target datasets collected by different devices or environments. To combat this challenge, we propose a novel method to improve the generalization of feature extractor for S2CDPR-UT, named Generating stylIzed FeaTures (GIFT). Firstly, the raw features are decoupled into high- and low- frequency components. Then, a feature stylization module is constructed to perturb the mean and variance of low-frequency components to generate more stylized features, which can provided more valuable knowledge. Furthermore, two diversity enhancement and consistency preservation supervisions are introduced at feature level to help to learn the model. The former is aimed to enhance the diversity of stylized features to expand the feature space. Meanwhile, the later is aimed to maintain the semantic consistency to ensure accurate palmprint recognition. Extensive experiments carried out on CASIA Multi-Spectral, XJTU-UP, and MPD palmprint databases show that our GIFT method can achieve significant improvement of performance over other methods. The codes will be released at https://github.com/HuikaiShao/GIFT.
Collapse
|
5
|
Zhang X, Dong S, Chen J, Tian Q, Gong Y, Hong X. Deep Class-Incremental Learning From Decentralized Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7190-7203. [PMID: 36315536 DOI: 10.1109/tnnls.2022.3214573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories. We initiate the study of data-decentralized class-incremental learning (DCIL) by making the following contributions. First, we formulate the DCIL problem and develop the experimental protocol. Second, we introduce a paradigm to create a basic decentralized counterpart of typical (centralized) CIL approaches, and as a result, establish a benchmark for the DCIL study. Third, we further propose a decentralized composite knowledge incremental distillation (DCID) framework to transfer knowledge from historical models and multiple local sites to the general model continually. DCID consists of three main components, namely, local CIL, collaborated knowledge distillation (KD) among local models, and aggregated KD from local models to the general one. We comprehensively investigate our DCID framework by using a different implementation of the three components. Extensive experimental results demonstrate the effectiveness of our DCID framework. The source code of the baseline methods and the proposed DCIL is available at https://github.com/Vision-Intelligence-and-Robots-Group/DCIL.
Collapse
|
6
|
Ashraf M, Alharthi R, Chen L, Umer M, Alsubai S, Eshmawi AA. Attention 3D central difference convolutional dense network for hyperspectral image classification. PLoS One 2024; 19:e0300013. [PMID: 38598444 PMCID: PMC11006129 DOI: 10.1371/journal.pone.0300013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 02/19/2024] [Indexed: 04/12/2024] Open
Abstract
Hyperspectral Images (HSI) classification is a challenging task due to a large number of spatial-spectral bands of images with high inter-similarity, extra variability classes, and complex region relationships, including overlapping and nested regions. Classification becomes a complex problem in remote sensing images like HSIs. Convolutional Neural Networks (CNNs) have gained popularity in addressing this challenge by focusing on HSI data classification. However, the performance of 2D-CNN methods heavily relies on spatial information, while 3D-CNN methods offer an alternative approach by considering both spectral and spatial information. Nonetheless, the computational complexity of 3D-CNN methods increases significantly due to the large capacity size and spectral dimensions. These methods also face difficulties in manipulating information from local intrinsic detailed patterns of feature maps and low-rank frequency feature tuning. To overcome these challenges and improve HSI classification performance, we propose an innovative approach called the Attention 3D Central Difference Convolutional Dense Network (3D-CDC Attention DenseNet). Our 3D-CDC method leverages the manipulation of local intrinsic detailed patterns in the spatial-spectral features maps, utilizing pixel-wise concatenation and spatial attention mechanism within a dense strategy to incorporate low-rank frequency features and guide the feature tuning. Experimental results on benchmark datasets such as Pavia University, Houston 2018, and Indian Pines demonstrate the superiority of our method compared to other HSI classification methods, including state-of-the-art techniques. The proposed method achieved 97.93% overall accuracy on the Houston-2018, 99.89% on Pavia University, and 99.38% on the Indian Pines dataset with the 25 × 25 window size.
Collapse
Affiliation(s)
- Mahmood Ashraf
- School of Micro Electronics & Communication Engineering, Chongqing University, Chongqing, China
| | - Raed Alharthi
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| | - Lihui Chen
- School of Micro Electronics & Communication Engineering, Chongqing University, Chongqing, China
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Ala Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
7
|
Gan K, Li R, Zhang J, Sun Z, Yin Z. Instantaneous estimation of momentary affective responses using neurophysiological signals and a spatiotemporal emotional intensity regression network. Neural Netw 2024; 172:106080. [PMID: 38160622 DOI: 10.1016/j.neunet.2023.12.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/25/2023] [Accepted: 12/19/2023] [Indexed: 01/03/2024]
Abstract
Previous studies in affective computing often use a fixed emotional label to train an emotion classifier with electroencephalography (EEG) from individuals experiencing an affective stimulus. However, EEGs encode emotional dynamics that include varying intensities within a given emotional category. To investigate these variations in emotional intensity, we propose a framework that obtains momentary affective labels for fine-grained segments of EEGs with human feedback. We then model these labeled segments using a novel spatiotemporal emotional intensity regression network (STEIR-Net). It integrates temporal EEG patterns from nine predefined cortical regions to provide a continuous estimation of emotional intensity. We demonstrate that the STEIR-Net outperforms classical regression models by reducing the root mean square error (RMSE) by an average of 4∼9 % and 2∼4 % for the SEED and SEED-IV databases, respectively. We find that the frontal and temporal cortical regions contribute significantly to the affective intensity's variation. Higher absolute values of the Spearman correlation coefficient between the model estimation and momentary affective labels under happiness (0.2114) and fear (0.2072) compared to neutral (0.1694) and sad (0.1895) emotions were observed. Besides, increasing the input length of the EEG segments from 4 to 20 s further reduces the RMSE from 1.3548 to 1.3188.
Collapse
Affiliation(s)
- Kaiyu Gan
- Engineering Research Center of Optical Instrument and System, Ministry of Education, Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai 200093, PR China; School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Ruiding Li
- Engineering Research Center of Optical Instrument and System, Ministry of Education, Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai 200093, PR China; School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Jianhua Zhang
- OsloMet Artificial Intelligence Lab, Department of Computer Science, Oslo Metropolitan University, Oslo N-0130, Norway
| | - Zhanquan Sun
- Engineering Research Center of Optical Instrument and System, Ministry of Education, Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai 200093, PR China; School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Zhong Yin
- Engineering Research Center of Optical Instrument and System, Ministry of Education, Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai 200093, PR China; School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, PR China.
| |
Collapse
|
8
|
Xu C, Wei Y, Tang B, Yin S, Zhang Y, Chen S, Wang Y. Dynamic-group-aware networks for multi-agent trajectory prediction with relational reasoning. Neural Netw 2024; 170:564-577. [PMID: 38056406 DOI: 10.1016/j.neunet.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 09/15/2023] [Accepted: 11/04/2023] [Indexed: 12/08/2023]
Abstract
Demystifying the interactions among multiple agents from their past trajectories is fundamental to precise and interpretable trajectory prediction. However, previous works mainly consider static, pairwise interactions with limited relational reasoning. To more comprehensively model interactions and reason relations, we propose DynGroupNet, a dynamic-group-aware network, which (i) models time-varying interactions in highly dynamic scenes; (ii) captures both pairwise and group-wise interactions; and (iii) reasons both interaction strength and category without direct supervision. Based on DynGroupNet, we further design a prediction system to forecast socially plausible trajectories with dynamic relational reasoning. The proposed prediction system leverages the Gaussian mixture model, multiple sampling and prediction refinement to promote prediction diversity for multiple future possibilities capturing, training stability for efficient model learning and trajectory smoothness for more realistic predictions, respectively. The proposed complex interaction modeling of DynGroupNet, future diversity capturing, efficient model training and trajectory smoothing of prediction system together to promote more accurate and plausible future predictions. Extensive experiments show that: (1) DynGroupNet can capture time-varying group behaviors, infer time-varying interaction category and interaction strength during prediction; (2) DynGroupNet significantly outperforms the state-of-the-art trajectory prediction methods by 28.0%, 34.9%, 13.0% in FDE on the NBA, NFL and SDD datasets.
Collapse
Affiliation(s)
- Chenxin Xu
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China.
| | - Yuxi Wei
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China.
| | - Bohan Tang
- Oxford-Man Institute and the Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Sheng Yin
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China.
| | - Ya Zhang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China.
| | - Siheng Chen
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China.
| | - Yanfeng Wang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China.
| |
Collapse
|
9
|
Wang L, Song C, Wan G, Cui S. A surface defect detection method for steel pipe based on improved YOLO. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:3016-3036. [PMID: 38454717 DOI: 10.3934/mbe.2024134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Surface defect detection is of great significance as a tool to ensure the quality of steel pipes. The surface defects of steel pipes are charactered by insufficient texture, high similarity between different types of defects, large size differences, and high proportions of small targets, posing great challenges to defect detection algorithms. To overcome the above issues, we propose a novel steel pipe surface defect detection method based on the YOLO framework. First, for the problem of a low detection rate caused by insufficient texture and high similarity among different types of defects of steel pipes, a new backbone block is proposed. By increasing high-order spatial interaction and enhancing the capture of internal correlations of data features, different feature information for similar defects is extracted, thereby alleviating the false detection rate. Second, to enhance the detection performance for small defects, a new neck block is proposed. By fusing multiple features, the accuracy of steel pipe defect detection is improved. Third, for the problem of a low detection rate causing large size differences in steel pipe surface defects, a novel regression loss function that considers the aspect ratio and scale is proposed, and the focal loss is introduced to further solve the sample imbalance problem in steel pipe defect datasets. The experimental results show that the proposed method can effectively improve the accuracy of steel pipe surface defect detection.
Collapse
Affiliation(s)
- Lili Wang
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
- College of Information, Liaoning University, Shenyang 110036, China
| | - Chunhe Song
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
| | - Guangxi Wan
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
| | - Shijie Cui
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
| |
Collapse
|
10
|
Xu J, Li D, Zhou P, Li C, Wang Z, Tong S. A multi-band centroid contrastive reconstruction fusion network for motor imagery electroencephalogram signal decoding. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20624-20647. [PMID: 38124568 DOI: 10.3934/mbe.2023912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Motor imagery (MI) brain-computer interface (BCI) assist users in establishing direct communication between their brain and external devices by decoding the movement intention of human electroencephalogram (EEG) signals. However, cerebral cortical potentials are highly rhythmic and sub-band features, different experimental situations and subjects have different categories of semantic information in specific sample target spaces. Feature fusion can lead to more discriminative features, but simple fusion of features from different embedding spaces leading to the model global loss is not easily convergent and ignores the complementarity of features. Considering the similarity and category contribution of different sub-band features, we propose a multi-band centroid contrastive reconstruction fusion network (MB-CCRF). We obtain multi-band spatio-temporal features by frequency division, preserving the task-related rhythmic features of different EEG signals; use a multi-stream cross-layer connected convolutional network to perform a deep feature representation for each sub-band separately; propose a centroid contrastive reconstruction fusion module, which maps different sub-band and category features into the same shared embedding space by comparing with category prototypes, reconstructing the feature semantic structure to ensure that the global loss of the fused features converges more easily. Finally, we use a learning mechanism to model the similarity between channel features and use it as the weight of fused sub-band features, thus enhancing the more discriminative features, suppressing the useless features. The experimental accuracy is 79.96% in the BCI competition Ⅳ-Ⅱa dataset. Moreover, the classification effect of sub-band features of different subjects is verified by comparison tests, the category propensity of different sub-band features is verified by confusion matrix tests and the distribution in different classes of each sub-band feature and fused feature are showed by visual analysis, revealing the importance of different sub-band features for the EEG-based MI classification task.
Collapse
Affiliation(s)
- Jiacan Xu
- The College of Engineering Training and Innovation, Shenyang Jianzhu University, Shenyang 110000, China
| | - Donglin Li
- The College of Electrical Engineering, Shenyang University of Technology, Shenyang 110000, China
| | - Peng Zhou
- The College of Engineering Training and Innovation, Shenyang Jianzhu University, Shenyang 110000, China
| | - Chunsheng Li
- The College of Electrical Engineering, Shenyang University of Technology, Shenyang 110000, China
| | - Zinan Wang
- The College of Engineering Training and Innovation, Shenyang Jianzhu University, Shenyang 110000, China
| | - Shenghao Tong
- The College of Engineering Training and Innovation, Shenyang Jianzhu University, Shenyang 110000, China
| |
Collapse
|
11
|
Wang Z, Wu B, Ota K, Dong M, Li H. A multi-scale self-supervised hypergraph contrastive learning framework for video question answering. Neural Netw 2023; 168:272-286. [PMID: 37774513 DOI: 10.1016/j.neunet.2023.08.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/10/2023] [Accepted: 08/30/2023] [Indexed: 10/01/2023]
Abstract
Video question answering (VideoQA) is a challenging video understanding task that requires a comprehensive understanding of multimodal information and accurate answers to related questions. Most existing VideoQA models use Graph Neural Networks (GNN) to capture temporal-spatial interactions between objects. Despite achieving certain success, we argue that current schemes have two limitations: (i) existing graph-based methods require stacking multi-layers of GNN to capture high-order relations between objects, which inevitably introduces irrelevant noise; (ii) neglecting the unique self-supervised signals in the high-order relational structures among multiple objects that can facilitate more accurate QA. To this end, we propose a novel Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework for VideoQA. Specifically, we first segment the video from multiple temporal dimensions to obtain multiple frame groups. For different frame groups, we design appearance and motion hyperedges based on node semantics to connect object nodes. In this way, we construct a multi-scale temporal-spatial hypergraph to directly capture high-order relations among multiple objects. Furthermore, the node features after hypergraph convolution are injected into a Transformer to capture the global information of the input sequence. Second, we design a self-supervised hypergraph contrastive learning task based on the node- and hyperedge-dropping data augmentation and an improved question-guided multimodal interaction module to enhance the accuracy and robustness of the VideoQA model. Finally, extensive experiments on three benchmark datasets demonstrate the superiority of our proposed MSHCL compared with stat-of-the-art methods.
Collapse
Affiliation(s)
- Zheng Wang
- Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China; Muroran Institute of Technology, Muroran 050-8585, Japan
| | - Bin Wu
- Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China.
| | - Kaoru Ota
- Muroran Institute of Technology, Muroran 050-8585, Japan
| | - Mianxiong Dong
- Muroran Institute of Technology, Muroran 050-8585, Japan.
| | - He Li
- Muroran Institute of Technology, Muroran 050-8585, Japan
| |
Collapse
|
12
|
Han Z. Multimodal intelligent logistics robot combining 3D CNN, LSTM, and visual SLAM for path planning and control. Front Neurorobot 2023; 17:1285673. [PMID: 37908407 PMCID: PMC10613672 DOI: 10.3389/fnbot.2023.1285673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 09/18/2023] [Indexed: 11/02/2023] Open
Abstract
Introduction In today's dynamic logistics landscape, the role of intelligent robots is paramount for enhancing efficiency, reducing costs, and ensuring safety. Traditional path planning methods often struggle to adapt to changing environments, resulting in issues like collisions and conflicts. This research addresses the challenge of path planning and control for logistics robots operating in complex environments. The proposed method aims to integrate information from various perception sources to enhance path planning and obstacle avoidance, thereby increasing the autonomy and reliability of logistics robots. Methods The method presented in this paper begins by employing a 3D Convolutional Neural Network (CNN) to learn feature representations of objects within the environment, enabling object recognition. Subsequently, Long Short-Term Memory (LSTM) models are utilized to capture spatio-temporal features and predict the behavior and trajectories of dynamic obstacles. This predictive capability empowers robots to more accurately anticipate the future positions of obstacles in intricate settings, thereby mitigating potential collision risks. Finally, the Dijkstra algorithm is employed for path planning and control decisions to ensure the selection of optimal paths across diverse scenarios. Results In a series of rigorous experiments, the proposed method outperforms traditional approaches in terms of both path planning accuracy and obstacle avoidance performance. These substantial improvements underscore the efficacy of the intelligent path planning and control scheme. Discussion This research contributes to enhancing the practicality of logistics robots in complex environments, thereby fostering increased efficiency and safety within the logistics industry. By combining object recognition, spatio-temporal modeling, and optimized path planning, the proposed method enables logistics robots to navigate intricate scenarios with higher precision and reliability, ultimately advancing the capabilities of autonomous logistics operations.
Collapse
Affiliation(s)
- Zhuqin Han
- School of Intelligent Engineering, Shaoguan University, Shaoguan, China
| |
Collapse
|
13
|
Lim JY, Lim KM, Lee CP, Tan YX. SCL: Self-supervised contrastive learning for few-shot image classification. Neural Netw 2023; 165:19-30. [PMID: 37263089 DOI: 10.1016/j.neunet.2023.05.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 04/07/2023] [Accepted: 05/18/2023] [Indexed: 06/03/2023]
Abstract
Few-shot learning aims to train a model with a limited number of base class samples to classify the novel class samples. However, to attain generalization with a limited number of samples is not a trivial task. This paper proposed a novel few-shot learning approach named Self-supervised Contrastive Learning (SCL) that enriched the model representation with multiple self-supervision objectives. Given the base class samples, the model is trained with the base class loss. Subsequently, contrastive-based self-supervision is introduced to minimize the distance between each training sample with their augmented variants to improve the sample discrimination. To recognize the distant sample, rotation-based self-supervision is proposed to enable the model to learn to recognize the rotation degree of the samples for better sample diversity. The multitask environment is introduced where each training sample is assigned with two class labels: base class label and rotation class label. Complex augmentation is put forth to help the model learn a deeper understanding of the object. The image structure of the training samples are augmented independent of the base class information. The proposed SCL is trained to minimize the base class loss, contrastive distance loss, and rotation class loss simultaneously to learn the generic features and improve the novel class performance. With the multiple self-supervision objectives, the proposed SCL outperforms state-of-the-art few-shot approaches on few-shot image classification benchmark datasets.
Collapse
Affiliation(s)
- Jit Yan Lim
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450, Melaka, Malaysia.
| | - Kian Ming Lim
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450, Melaka, Malaysia.
| | - Chin Poo Lee
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450, Melaka, Malaysia.
| | - Yong Xuan Tan
- Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450, Melaka, Malaysia.
| |
Collapse
|
14
|
Borsoi RA, Imbiriba T, Closas P. Dynamical Hyperspectral Unmixing With Variational Recurrent Neural Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2279-2294. [PMID: 37067972 DOI: 10.1109/tip.2023.3266660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Multitemporal hyperspectral unmixing (MTHU) is a fundamental tool in the analysis of hyperspectral image sequences. It reveals the dynamical evolution of the materials (endmembers) and of their proportions (abundances) in a given scene. However, adequately accounting for the spatial and temporal variability of the endmembers in MTHU is challenging, and has not been fully addressed so far in unsupervised frameworks. In this work, we propose an unsupervised MTHU algorithm based on variational recurrent neural networks. First, a stochastic model is proposed to represent both the dynamical evolution of the endmembers and their abundances, as well as the mixing process. Moreover, a new model based on a low-dimensional parametrization is used to represent spatial and temporal endmember variability, significantly reducing the amount of variables to be estimated. We propose to formulate MTHU as a Bayesian inference problem. However, the solution to this problem does not have an analytical solution due to the nonlinearity and non-Gaussianity of the model. Thus, we propose a solution based on deep variational inference, in which the posterior distribution of the estimated abundances and endmembers is represented by using a combination of recurrent neural networks and a physically motivated model. The parameters of the model are learned using stochastic backpropagation. Experimental results show that the proposed method outperforms state of the art MTHU algorithms.
Collapse
|
15
|
Wang X, Tan K, Du P, Han B, Ding J. A capsule-vectored neural network for hyperspectral image classification. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
|
16
|
Brezini SE, Deville Y. Hyperspectral and Multispectral Image Fusion with Automated Extraction of Image-Based Endmember Bundles and Sparsity-Based Unmixing to Deal with Spectral Variability. SENSORS (BASEL, SWITZERLAND) 2023; 23:2341. [PMID: 36850938 PMCID: PMC9959671 DOI: 10.3390/s23042341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 06/18/2023]
Abstract
The aim of fusing hyperspectral and multispectral images is to overcome the limitation of remote sensing hyperspectral sensors by improving their spatial resolutions. This process, also known as hypersharpening, generates an unobserved high-spatial-resolution hyperspectral image. To this end, several hypersharpening methods have been developed, however most of them do not consider the spectral variability phenomenon; therefore, neglecting this phenomenon may cause errors, which leads to reducing the spatial and spectral quality of the sharpened products. Recently, new approaches have been proposed to tackle this problem, particularly those based on spectral unmixing and using parametric models. Nevertheless, the reported methods need a large number of parameters to address spectral variability, which inevitably yields a higher computation time compared to the standard hypersharpening methods. In this paper, a new hypersharpening method addressing spectral variability by considering the spectra bundles-based method, namely the Automated Extraction of Endmember Bundles (AEEB), and the sparsity-based method called Sparse Unmixing by Variable Splitting and Augmented Lagrangian (SUnSAL), is introduced. This new method called Hyperspectral Super-resolution with Spectra Bundles dealing with Spectral Variability (HSB-SV) was tested on both synthetic and real data. Experimental results showed that HSB-SV provides sharpened products with higher spectral and spatial reconstruction fidelities with a very low computational complexity compared to other methods dealing with spectral variability, which are the main contributions of the designed method.
Collapse
Affiliation(s)
- Salah Eddine Brezini
- Institut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, UPS-CNRS-CNES, 31400 Toulouse, France
- Laboratoire Signaux et Images, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, Bir El Djir, Oran 31000, Algeria
| | - Yannick Deville
- Institut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, UPS-CNRS-CNES, 31400 Toulouse, France
| |
Collapse
|
17
|
Wang M, Wang Q, Hong D, Roy SK, Chanussot J. Learning Tensor Low-Rank Representation for Hyperspectral Anomaly Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:679-691. [PMID: 35609106 DOI: 10.1109/tcyb.2022.3175771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, low-rank representation (LRR) methods have been widely applied for hyperspectral anomaly detection, due to their potentials in separating the backgrounds and anomalies. However, existing LRR models generally convert 3-D hyperspectral images (HSIs) into 2-D matrices, inevitably leading to the destruction of intrinsic 3-D structure properties in HSIs. To this end, we propose a novel tensor low-rank and sparse representation (TLRSR) method for hyperspectral anomaly detection. A 3-D TLR model is expanded to separate the LR background part represented by a tensorial background dictionary and corresponding coefficients. This representation characterizes the multiple subspace property of the complex LR background. Based on the weighted tensor nuclear norm and the LF,1 sparse norm, a dictionary is designed to make its atoms more relevant to the background. Moreover, a principal component analysis (PCA) method can be assigned as one preprocessing step to exact a subset of HSI bands, retaining enough the HSI object information and reducing computational time of the postprocessing tensorial operations. The proposed model is efficiently solved by the well-designed alternating direction method of multipliers (ADMMs). A comparison with the existing algorithms via experiments establishes the competitiveness of the proposed method with the state-of-the-art competitors in the hyperspectral anomaly detection task.
Collapse
|
18
|
Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08092-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Hong D, Gao L, Yao J, Yokoya N, Chanussot J, Heiden U, Zhang B. Endmember-Guided Unmixing Network (EGU-Net): A General Deep Learning Framework for Self-Supervised Hyperspectral Unmixing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6518-6531. [PMID: 34048352 DOI: 10.1109/tnnls.2021.3082289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing (HU), yet their ability to simultaneously generalize various spectral variabilities (SVs) and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various SVs. Inspired by the powerful learning ability of deep learning (DL), we attempt to develop a general DL approach for HU, by fully considering the properties of endmembers extracted from the hyperspectral imagery, called endmember-guided unmixing network (EGU-Net). Beyond the alone autoencoder-like architecture, EGU-Net is a two-stream Siamese deep network, which learns an additional network from the pure or nearly pure endmembers to correct the weights of another unmixing network by sharing network parameters and adding spectrally meaningful constraints (e.g., nonnegativity and sum-to-one) toward a more accurate and interpretable unmixing solution. Furthermore, the resulting general framework is not only limited to pixelwise spectral unmixing but also applicable to spatial information modeling with convolutional operators for spatial-spectral unmixing. Experimental results conducted on three different datasets with the ground truth of abundance maps corresponding to each material demonstrate the effectiveness and superiority of the EGU-Net over state-of-the-art unmixing algorithms. The codes will be available from the website: https://github.com/danfenghong/IEEE_TNNLS_EGU-Net.
Collapse
|
20
|
Koosha M, Khodabandelou G, Ebadzadeh MM. A hierarchical estimation of multi-modal distribution programming for regression problems. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Zhang J, Cong S, Zhang G, Ma Y, Zhang Y, Huang J. Detecting Pest-Infested Forest Damage through Multispectral Satellite Imagery and Improved UNet+. SENSORS (BASEL, SWITZERLAND) 2022; 22:7440. [PMID: 36236538 PMCID: PMC9570766 DOI: 10.3390/s22197440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/22/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Plant pests are the primary biological threats to agricultural and forestry production as well as forest ecosystem. Monitoring forest-pest damage via satellite images is crucial for the development of prevention and control strategies. Previous studies utilizing deep learning to monitor pest-infested damage in satellite imagery adopted RGB images, while multispectral imagery and vegetation indices were not used. Multispectral images and vegetation indices contain a wealth of useful information for detecting plant health, which can improve the precision of pest damage detection. The aim of the study is to further improve forest-pest infestation area segmentation by combining multispectral, vegetation indices and RGB information into deep learning. We also propose a new image segmentation method based on UNet++ with attention mechanism module for detecting forest damage induced by bark beetle and aspen leaf miner in Sentinel-2 images. The ResNeSt101 is used as the feature extraction backbone, and the attention mechanism scSE module is introduced in the decoding phase for improving the image segmentation results. We used Sentinel-2 imagery to produce a dataset based on forest health damage data gathered by the Ministry of Forests, Lands, Natural Resource Operations and Rural Development (FLNRORD) in British Columbia (BC), Canada, during aerial overview surveys (AOS) in 2020. The dataset contains the 11 original Sentinel-2 bands and 13 vegetation indices. The experimental results confirmed that the significance of vegetation indices and multispectral data in enhancing the segmentation effect. The results demonstrated that the proposed method exhibits better segmentation quality and more accurate quantitative indices with overall accuracy of 85.11%, in comparison with the state-of-the-art pest area segmentation methods.
Collapse
|
22
|
Sarv Ahrabi S, Momenzadeh A, Baccarelli E, Scarpiniti M, Piazzo L. How much BiGAN and CycleGAN-learned hidden features are effective for COVID-19 detection from CT images? A comparative study. THE JOURNAL OF SUPERCOMPUTING 2022; 79:2850-2881. [PMID: 36042937 PMCID: PMC9411851 DOI: 10.1007/s11227-022-04775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 08/10/2022] [Indexed: 06/15/2023]
Abstract
Bidirectional generative adversarial networks (BiGANs) and cycle generative adversarial networks (CycleGANs) are two emerging machine learning models that, up to now, have been used as generative models, i.e., to generate output data sampled from a target probability distribution. However, these models are also equipped with encoding modules, which, after weakly supervised training, could be, in principle, exploited for the extraction of hidden features from the input data. At the present time, how these extracted features could be effectively exploited for classification tasks is still an unexplored field. Hence, motivated by this consideration, in this paper, we develop and numerically test the performance of a novel inference engine that relies on the exploitation of BiGAN and CycleGAN-learned hidden features for the detection of COVID-19 disease from other lung diseases in computer tomography (CT) scans. In this respect, the main contributions of the paper are twofold. First, we develop a kernel density estimation (KDE)-based inference method, which, in the training phase, leverages the hidden features extracted by BiGANs and CycleGANs for estimating the (a priori unknown) probability density function (PDF) of the CT scans of COVID-19 patients and, then, in the inference phase, uses it as a target COVID-PDF for the detection of COVID diseases. As a second major contribution, we numerically evaluate and compare the classification accuracies of the implemented BiGAN and CycleGAN models against the ones of some state-of-the-art methods, which rely on the unsupervised training of convolutional autoencoders (CAEs) for attaining feature extraction. The performance comparisons are carried out by considering a spectrum of different training loss functions and distance metrics. The obtained classification accuracies of the proposed CycleGAN-based (resp., BiGAN-based) models outperform the corresponding ones of the considered benchmark CAE-based models of about 16% (resp., 14%).
Collapse
Affiliation(s)
- Sima Sarv Ahrabi
- Department of Information Engineering, Electronics and Telecommunications, Sapienza University or Rome, Via Eudossiana, 18, 00184 Roma, Italy
| | - Alireza Momenzadeh
- Department of Information Engineering, Electronics and Telecommunications, Sapienza University or Rome, Via Eudossiana, 18, 00184 Roma, Italy
| | - Enzo Baccarelli
- Department of Information Engineering, Electronics and Telecommunications, Sapienza University or Rome, Via Eudossiana, 18, 00184 Roma, Italy
| | - Michele Scarpiniti
- Department of Information Engineering, Electronics and Telecommunications, Sapienza University or Rome, Via Eudossiana, 18, 00184 Roma, Italy
| | - Lorenzo Piazzo
- Department of Information Engineering, Electronics and Telecommunications, Sapienza University or Rome, Via Eudossiana, 18, 00184 Roma, Italy
| |
Collapse
|
23
|
Fully used reliable data and attention consistency for semi-supervised learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
24
|
A systematic review on hyperspectral imaging technology with a machine and deep learning methodology for agricultural applications. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101678] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
25
|
Zheng X, Sun H, Lu X, Xie W. Rotation-Invariant Attention Network for Hyperspectral Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4251-4265. [PMID: 35635815 DOI: 10.1109/tip.2022.3177322] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Hyperspectral image (HSI) classification refers to identifying land-cover categories of pixels based on spectral signatures and spatial information of HSIs. In recent deep learning-based methods, to explore the spatial information of HSIs, the HSI patch is usually cropped from original HSI as the input. And 3 ×3 convolution is utilized as a key component to capture spatial features for HSI classification. However, the 3 ×3 convolution is sensitive to the spatial rotation of inputs, which results in that recent methods perform worse in rotated HSIs. To alleviate this problem, a rotation-invariant attention network (RIAN) is proposed for HSI classification. First, a center spectral attention (CSpeA) module is designed to avoid the influence of other categories of pixels to suppress redundant spectral bands. Then, a rectified spatial attention (RSpaA) module is proposed to replace 3 ×3 convolution for extracting rotation-invariant spectral-spatial features from HSI patches. The CSpeA module, the 1 ×1 convolution and the RSpaA module are utilized to build the proposed RIAN for HSI classification. Experimental results demonstrate that RIAN is invariant to the spatial rotation of HSIs and has superior performance, e.g., achieving an overall accuracy of 86.53% (1.04% improvement) on the Houston database. The codes of this work are available at https://github.com/spectralpublic/RIAN.
Collapse
|
26
|
PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation. REMOTE SENSING 2022. [DOI: 10.3390/rs14133117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Polarimetric synthetic aperture radar (PolSAR) data can be acquired at all times and are not impacted by weather conditions. They can efficiently capture geometrical and geographical structures on the ground. However, due to the complexity of the data and the difficulty of data availability, PolSAR image scene classification remains a challenging task. To this end, in this paper, a low-rank constrained multimodal tensor representation method (LR-MTR) is proposed to integrate PolSAR data in multimodal representations. To preserve the multimodal polarimetric information simultaneously, the target decompositions in a scene from multiple spaces (e.g., Freeman, H/A/α, Pauli, etc.) are exploited to provide multiple pseudo-color images. Furthermore, a representation tensor is constructed via the representation matrices and constrained by the low-rank norm to keep the cross-information from multiple spaces. A projection matrix is also calculated by minimizing the differences between the whole cascaded data set and the features in the corresponding space. It also reduces the redundancy of those multiple spaces and solves the out-of-sample problem in the large-scale data set. To support the experiments, two new PolSAR image data sets are built via ALOS-2 full polarization data, covering the areas of Shanghai, China, and Tokyo, Japan. Compared with state-of-the-art (SOTA) dimension reduction algorithms, the proposed method achieves the best quantitative performance and demonstrates superiority in fusing multimodal PolSAR features for image scene classification.
Collapse
|
27
|
VSAI: A Multi-View Dataset for Vehicle Detection in Complex Scenarios Using Aerial Images. DRONES 2022. [DOI: 10.3390/drones6070161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Arbitrary-oriented vehicle detection via aerial imagery is essential in remote sensing and computer vision, with various applications in traffic management, disaster monitoring, smart cities, etc. In the last decade, we have seen notable progress in object detection in natural imagery; however, such development has been sluggish for airborne imagery, not only due to large-scale variations and various spins/appearances of instances but also due to the scarcity of the high-quality aerial datasets, which could reflect the complexities and challenges of real-world scenarios. To address this and to improve object detection research in remote sensing, we collected high-resolution images using different drone platforms spanning a large geographic area and introduced a multi-view dataset for vehicle detection in complex scenarios using aerial images (VSAI), featuring arbitrary-oriented views in aerial imagery, consisting of different types of complex real-world scenes. The imagery in our dataset was captured with a wide variety of camera angles, flight heights, times, weather conditions, and illuminations. VSAI contained 49,712 vehicle instances annotated with oriented bounding boxes and arbitrary quadrilateral bounding boxes (47,519 small vehicles and 2193 large vehicles); we also annotated the occlusion rate of the objects to further increase the generalization abilities of object detection networks. We conducted experiments to verify several state-of-the-art algorithms in vehicle detection on VSAI to form a baseline. As per our results, the VSAI dataset largely shows the complexity of the real world and poses significant challenges to existing object detection algorithms. The dataset is publicly available.
Collapse
|
28
|
An Integrated Change Detection Method Based on Spectral Unmixing and the CNN for Hyperspectral Imagery. REMOTE SENSING 2022. [DOI: 10.3390/rs14112523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Hyperspectral remote sensing image (HSI) include rich spectral information that can be very beneficial for change detection (CD) technology. Due to the existence of many mixed pixels, pixel-wise approaches can lead to considerable errors in the resulting CD map. The spectral unmixing (SU) method is a potential solution to this problem, as it decomposes mixed pixels into a set of fractions of land cover. Subsequently, the CD map is created by comparing the abundance images. However, based only on the abundance images created through the SU method, they are unable to effectively provide detailed change information. Meanwhile, the features of change information cannot be sufficiently extracted by the traditional sub-pixel CD framework, which leads to a poor CD result. To address these problems, this paper presents an integrated CD method based on multi-endmember spectral unmixing, joint matrix and CNN (MSUJMC) for HSI. Three main steps are considered to accomplish this task. First, considering the endmember spectral variability, more reliable endmember abundance information is obtained by multi-endmember spectral unmixing (MSU). Second, the original image features are incorporated with the abundance images using a joint matrix (JM) algorithm to provide more temporal and spatial land cover change information characteristics. Third, to efficiently extract the change features and to better handle the fused multi-source information, the convolutional neural network (CNN) is introduced to realize a high-accuracy CD result. The proposed method has been verified on simulated and real multitemporal HSI datasets, which provide multiple changes. Experimental results verify the effectiveness of the proposed approach.
Collapse
|
29
|
Abstract
Artificial intelligence is applied to many fields and contributes to many important applications and research areas, such as intelligent data processing, natural language processing, autonomous vehicles, and robots. The adoption of artificial intelligence in several fields has been the subject of many research papers. Still, recently, the space sector is a field where artificial intelligence is receiving significant attention. This paper aims to survey the most relevant problems in the field of space applications solved by artificial intelligence techniques. We focus on applications related to mission design, space exploration, and Earth observation, and we provide a taxonomy of the current challenges. Moreover, we present and discuss current solutions proposed for each challenge to allow researchers to identify and compare the state of the art in this context.
Collapse
|
30
|
Hyperspectral Image Classification via Deep Structure Dictionary Learning. REMOTE SENSING 2022. [DOI: 10.3390/rs14092266] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The construction of diverse dictionaries for sparse representation of hyperspectral image (HSI) classification has been a hot topic over the past few years. However, compared with convolutional neural network (CNN) models, dictionary-based models cannot extract deeper spectral information, which will reduce their performance for HSI classification. Moreover, dictionary-based methods have low discriminative capability, which leads to less accurate classification. To solve the above problems, we propose a deep learning-based structure dictionary for HSI classification in this paper. The core ideas are threefold, as follows: (1) To extract the abundant spectral information, we incorporate deep residual neural networks in dictionary learning and represent input signals in the deep feature domain. (2) To enhance the discriminative ability of the proposed model, we optimize the structure of the dictionary and design sharing constraint in terms of sub-dictionaries. Thus, the general and specific feature of HSI samples can be learned separately. (3) To further enhance classification performance, we design two kinds of loss functions, including coding loss and discriminating loss. The coding loss is used to realize the group sparsity of code coefficients, in which within-class spectral samples can be represented intensively and effectively. The Fisher discriminating loss is used to enforce the sparse representation coefficients with large between-class scatter. Extensive tests performed on hyperspectral dataset with bright prospects prove the developed method to be effective and outperform other existing methods.
Collapse
|
31
|
A New Spatial–Temporal Depthwise Separable Convolutional Fusion Network for Generating Landsat 8-Day Surface Reflectance Time Series over Forest Regions. REMOTE SENSING 2022. [DOI: 10.3390/rs14092199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Landsat has provided the longest fine resolution data archive of Earth’s environment since 1972; however, one of the challenges in using Landsat data for various applications is its frequent large data gaps and heavy cloud contaminations. One pressing research topic is to generate the regular time series by integrating coarse-resolution satellite data through data fusion techniques. This study presents a novel spatiotemporal fusion (STF) method based on a depthwise separable convolutional neural network (DSC), namely, STFDSC, to generate Landsat-surface reflectance time series at 8-day intervals by fusing Landsat 30 m with high-quality Moderate Resolution Imaging Spectroradiometer (MODIS) 500 m surface reflectance data. The STFDSC method consists of three main stages: feature extraction, feature fusion and prediction. Features were first extracted from Landsat and MODIS surface reflectance changes, and the extracted multilevel features were then stacked and fused. Both low-level and middle-level features that were generally ignored in convolutional neural network (CNN)-based fusion models were included in STFDSC to avoid key information loss and thus ensure high prediction accuracy. The prediction stage generated a Landsat residual image and is combined with original Landsat data to obtain predictions of Landsat imagery at the target date. The performance of STFDSC was evaluated in the Greater Khingan Mountains (GKM) in Northeast China and the Ziwuling (ZWL) forest region in Northwest China. A comparison of STFDSC with four published fusion methods, including two classic fusion methods (FSDAF, ESTARFM) and two machine learning methods (EDCSTFN and STFNET), was also carried out. The results showed that STFDSC made stable and more accurate predictions of Landsat surface reflectance than other methods in both the GKM and ZWL regions. The root-mean-square-errors (RMSEs) of TM bands 2, 3, 4, and 7 were 0.0046, 0.0038, 0.0143, and 0.0055 in GKM, respectively, and 0.0246, 0.0176, 0.0280, and 0.0141 in ZWL, respectively; it can be potentially used for generating the global surface reflectance and other high-level land products.
Collapse
|
32
|
Hyperspectral Image Mixed Noise Removal Using a Subspace Projection Attention and Residual Channel Attention Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14092071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Although the existing deep-learning-based hyperspectral image (HSI) denoising methods have achieved tremendous success, recovering high-quality HSIs in complex scenes that contain mixed noise is still challenging. Besides, these methods have not fully explored the local and global spatial–spectral information of HSIs. To address the above issues, a novel HSI mixed noise removal network called subspace projection attention and residual channel attention network (SPARCA-Net) is proposed. Specifically, we propose an orthogonal subspace projection attention (OSPA) module to adaptively learn to generate bases of the signal subspace and project the input into such space to remove noise. By leveraging the local and global spatial relations, OSPA is able to reconstruct the local structure of the feature maps more precisely. We further propose a residual channel attention (RCA) module to emphasize the interdependence between feature maps and exploit the global channel correlation of them, which could enhance the channel-wise adaptive learning. In addition, multiscale joint spatial–spectral input and residual learning strategies are employed to capture multiscale spatial–spectral features and reduce the degradation problem, respectively. Synthetic and real HSI data experiments demonstrated that the proposed HSI denoising network outperforms many of the advanced methods in both quantitative and qualitative assessments.
Collapse
|
33
|
Abstract
Hyperspectral image-anomaly detection (HSI-AD) has become one of the research hotspots in the field of remote sensing. Because HSI’s features of integrating image and spectrum provide a considerable data basis for abnormal object detection, HSI-AD has a huge application potential in HSI analysis. It is difficult to effectively extract a large number of nonlinear features contained in HSI data using traditional machine learning methods, and deep learning has incomparable advantages in the extraction of nonlinear features. Therefore, deep learning has been widely used in HSI-AD and has shown excellent performance. This review systematically summarizes the related reference of HSI-AD based on deep learning and classifies the corresponding methods into performance comparisons. Specifically, we first introduce the characteristics of HSI-AD and the challenges faced by traditional methods and introduce the advantages of deep learning in dealing with these problems. Then, we systematically review and classify the corresponding methods of HSI-AD. Finally, the performance of the HSI-AD method based on deep learning is compared on several mainstream data sets, and the existing challenges are summarized. The main purpose of this article is to give a more comprehensive overview of the HSI-AD method to provide a reference for future research work.
Collapse
|
34
|
Wan J, Li J, Xu M, Liu S, Sheng H. Node-splitting optimized canonical correlation forest algorithm for sea fog detection using MODIS data. OPTICS EXPRESS 2022; 30:13810-13824. [PMID: 35472986 DOI: 10.1364/oe.454570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/23/2022] [Indexed: 06/14/2023]
Abstract
In this paper, a node splitting optimized canonical correlation forest algorithm for sea fog detection is proposed by using active and passive satellites. The traditional canonical correlation forest (CCF) algorithm insufficiently accounts for the spectral characteristics and the reliability of each classifier during integration. To deal with the problem, the information gain rate of node entropy is used as the splitting criterion, and the spectral characteristics of clouds and fogs are also combined into the model generation process. The proposed algorithm was verified using the meteorological station data and compared with five state-of-the-art algorithms, which demonstrated that the algorithm has the best performance in sea fog detection and can identify mist better.
Collapse
|
35
|
Abstract
A reliable quality assessment procedure for pansharpening methods is of critical importance for the development of the related solutions. Unfortunately, the lack of ground truths to be used as guidance for an objective evaluation has pushed the community to resort to two approaches, which can also be jointly applied. Hence, two kinds of indexes can be found in the literature: (i) reference-based reduced-resolution indexes aimed to assess the synthesis ability; (ii) no-reference subjective quality indexes for full-resolution datasets aimed to assess spectral and spatial consistency. Both reference-based and no-reference indexes present critical shortcomings, which motivate the community to explore new solutions. In this work, we propose an alternative no-reference full-resolution assessment framework. On one side, we introduce a protocol, namely the reprojection protocol, to take care of the spectral consistency issue. On the other side, a new index of the spatial consistency between the pansharpened image and the panchromatic band at full resolution is also proposed. Experimental results carried out on different datasets/sensors demonstrate the effectiveness of the proposed approach.
Collapse
|
36
|
A Two-Branch Convolutional Neural Network Based on Multi-Spectral Entropy Rate Superpixel Segmentation for Hyperspectral Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14071569] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Convolutional neural networks (CNNs) can extract advanced features of joint spectral–spatial information, which are useful for hyperspectral image (HSI) classification. However, the patch-based neighborhoods of samples with fixed sizes are usually used as the input of the CNNs, which cannot dig out the homogeneousness between the pixels within and outside of the patch. In addition, the spatial features are quite different in different spectral bands, which are not fully utilized by the existing methods. In this paper, a two-branch convolutional neural network based on multi-spectral entropy rate superpixel segmentation (TBN-MERS) is designed for HSI classification. Firstly, entropy rate superpixel (ERS) segmentation is performed on the image of each spectral band in an HSI, respectively. The segmented images obtained are stacked band by band, called multi-spectral entropy rate superpixel segmentation image (MERSI), and then preprocessed to serve as the input of one branch in TBN-MERS. The preprocessed HSI is used as the input of the other branch in TBN-MERS. TBN-MERS extracts features from both the HSI and the MERSI and then utilizes the fused spectral–spatial features for the classification of HSIs. TBN-MERS makes full use of the joint spectral–spatial information of HSIs at the scale of superpixels and the scale of neighborhood. Therefore, it achieves excellent performance in the classification of HSIs. Experimental results on four datasets demonstrate that the proposed TBN-MERS can effectively extract features from HSIs and significantly outperforms some state-of-the-art methods with a few training samples.
Collapse
|
37
|
Manifold-Based Multi-Deep Belief Network for Feature Extraction of Hyperspectral Image. REMOTE SENSING 2022. [DOI: 10.3390/rs14061484] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Deep belief networks (DBNs) have been widely applied in hyperspectral imagery (HSI) processing. However, the original DBN model fails to explore the prior knowledge of training samples which limits the discriminant capability of extracted features for classification. In this paper, we proposed a new deep learning method, termed manifold-based multi-DBN (MMDBN), to obtain deep manifold features of HSI. MMDBN designed a hierarchical initialization method that initializes the network by local geometric structure hidden in data. On this basis, a multi-DBN structure is built to learn deep features in each land-cover class, and it was used as the front-end of the whole model. Then, a discrimination manifold layer is developed to improve the discriminability of extracted deep features. To discover the manifold structure contained in HSI, an intrinsic graph and a penalty graph are constructed in this layer by using label information of training samples. After that, the deep manifold features can be obtained for classification. MMDBN not only effectively extracts the deep features from each class in HSI, but also maximizes the margins between different manifolds in low-dimensional embedding space. Experimental results on Indian Pines, Salinas, and Botswana datasets reach 78.25%, 90.48%, and 97.35% indicating that MMDBN possesses better classification performance by comparing with some state-of-the-art methods.
Collapse
|
38
|
Geospatial Analysis of Geo-Ecotourism Site Suitability Using AHP and GIS for Sustainable and Resilient Tourism Planning in West Bengal, India. SUSTAINABILITY 2022. [DOI: 10.3390/su14042422] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The current study intended to geospatially analyze the potentiality and site suitability of geo-ecotourism in West Bengal, India. The state of West Bengal is a great platform for diverse tourism and has enormous potential to cultivate geo-ecotourism, as has come up in recent years. The current effort throws some valuable light on the possibility of turning the many geologically, geomorphologically and ecologically significant tourist spots of West Bengal into geo-ecotourism sites, aided with geospatial techniques. The study deals with the qualitative and quantitative investigation of the potentiality of the whole state by dividing it into several geo-ecotourism zones, based on its physiographic setting and Land Use Land Cover (LULC) features, using satellite image data. The application of geospatial technology combined with Remote Sensing (RS) and Geographic Information System (GIS) was employed for this geospatial analysis to portray the potential zones using cartographic and statistical techniques. Furthermore, nine criteria were selected to run the Analytic Hierarchy Process (AHP) method to determine the site suitability for geo-ecotourism. The present submission attempts to record the mapping and analysis of geo-ecotourism of West Bengal employing a secondary database, an expert's opinions and primary observations, with the application of the AHP method and GIS. The outcomes of the study were found to be very significant, as they indicate a proviso for geo-ecotourism development in the state and will contribute to the formation of location-specific planning and the sustainable management of geo-ecotourism.
Collapse
|
39
|
Liu Y, Chu M, Guo H, Hu X, Yu J, He X, Yi H, He X. Multispectral Differential Reconstruction Strategy for Bioluminescence Tomography. Front Oncol 2022; 12:768137. [PMID: 35251958 PMCID: PMC8895370 DOI: 10.3389/fonc.2022.768137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 01/14/2022] [Indexed: 11/13/2022] Open
Abstract
Bioluminescence tomography (BLT) is a promising in vivo molecular imaging tool that allows non-invasive monitoring of physiological and pathological processes at the cellular and molecular levels. However, the accuracy of the BLT reconstruction is significantly affected by the forward modeling errors in the simplified photon propagation model, the measurement noise in data acquisition, and the inherent ill-posedness of the inverse problem. In this paper, we present a new multispectral differential strategy (MDS) on the basis of analyzing the errors generated from the simplification from radiative transfer equation (RTE) to diffusion approximation and data acquisition of the imaging system. Through rigorous theoretical analysis, we learn that spectral differential not only can eliminate the errors caused by the approximation of RTE and imaging system measurement noise but also can further increase the constraint condition and decrease the condition number of system matrix for reconstruction compared with traditional multispectral (TM) reconstruction strategy. In forward simulations, energy differences and cosine similarity of the measured surface light energy calculated by Monte Carlo (MC) and diffusion equation (DE) showed that MDS can reduce the systematic errors in the process of light transmission. In addition, in inverse simulations and in vivo experiments, the results demonstrated that MDS was able to alleviate the ill-posedness of the inverse problem of BLT. Thus, the MDS method had superior location accuracy, morphology recovery capability, and image contrast capability in the source reconstruction as compared with the TM method and spectral derivative (SD) method. In vivo experiments verified the practicability and effectiveness of the proposed method.
Collapse
Affiliation(s)
- Yanqiu Liu
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- School of Information Sciences and Technology, Northwest University, Xi’an, China
| | - Mengxiang Chu
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- Network and Data Center, Northwest University, Xi’an, China
| | - Hongbo Guo
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- School of Information Sciences and Technology, Northwest University, Xi’an, China
- *Correspondence: Hongbo Guo, ; Xiaowei He,
| | - Xiangong Hu
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- Network and Data Center, Northwest University, Xi’an, China
| | - Jingjing Yu
- School of Physics and Information Technology, Shaanxi Normal University, Xi’an, China
| | - Xuelei He
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- School of Information Sciences and Technology, Northwest University, Xi’an, China
| | - Huangjian Yi
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- School of Information Sciences and Technology, Northwest University, Xi’an, China
| | - Xiaowei He
- The Xi’an Key Laboratory of Radiomics and Intelligent Perception, Xi’an, China
- School of Information Sciences and Technology, Northwest University, Xi’an, China
- Network and Data Center, Northwest University, Xi’an, China
- *Correspondence: Hongbo Guo, ; Xiaowei He,
| |
Collapse
|
40
|
Transferable Deep Learning from Time Series of Landsat Data for National Land-Cover Mapping with Noisy Labels: A Case Study of China. REMOTE SENSING 2021. [DOI: 10.3390/rs13214194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Large-scale land-cover classification using a supervised algorithm is a challenging task. Enormous efforts have been made to manually process and check the production of national land-cover maps. This has led to complex pre- and post-processing and even the production of inaccurate mapping products from large-scale remote sensing images. Inspired by the recent success of deep learning techniques, in this study we provided a feasible automatic solution for improving the quality of national land-cover maps. However, the application of deep learning to national land-cover mapping remains limited because only small-scale noisy labels are available. To this end, a mutual transfer network MTNet was developed. MTNet is capable of learning better feature representations by mutually transferring pre-trained models from time-series of data and fine-tuning current data. An interactive training strategy such as this can effectively alleviate the effects of inaccurate or noisy labels and unbalanced sample distributions, thus yielding a relatively stable classification system. Extensive experiments were conducted by focusing on several representative regions to evaluate the classification results of our proposed method. Quantitative results showed that the proposed MTNet outperformed its baseline model about 1%, and the accuracy can be improved up to 6.45% compared with the model trained by the training set of another year. We also visualized the national classification maps generated by MTNet for two different time periods to quantitatively analyze the performance gain. It was concluded that the proposed MTNet provides an efficient method for large-scale land cover mapping.
Collapse
|
41
|
Khalil A, Rahimi A, Luthfi A, Azizan MM, Satapathy SC, Hasikin K, Lai KW. Brain Tumour Temporal Monitoring of Interval Change Using Digital Image Subtraction Technique. Front Public Health 2021; 9:752509. [PMID: 34621723 PMCID: PMC8490781 DOI: 10.3389/fpubh.2021.752509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 08/25/2021] [Indexed: 11/13/2022] Open
Abstract
A process that involves the registration of two brain Magnetic Resonance Imaging (MRI) acquisitions is proposed for the subtraction between previous and current images at two different follow-up (FU) time points. Brain tumours can be non-cancerous (benign) or cancerous (malignant). Treatment choices for these conditions rely on the type of brain tumour as well as its size and location. Brain cancer is a fast-spreading tumour that must be treated in time. MRI is commonly used in the detection of early signs of abnormality in the brain area because it provides clear details. Abnormalities include the presence of cysts, haematomas or tumour cells. A sequence of images can be used to detect the progression of such abnormalities. A previous study on conventional (CONV) visual reading reported low accuracy and speed in the early detection of abnormalities, specifically in brain images. It can affect the proper diagnosis and treatment of the patient. A digital subtraction technique that involves two images acquired at two interval time points and their subtraction for the detection of the progression of abnormalities in the brain image was proposed in this study. MRI datasets of five patients, including a series of brain images, were retrieved retrospectively in this study. All methods were carried out using the MATLAB programming platform. ROI volume and diameter for both regions were recorded to analyse progression details, location, shape variations and size alteration of tumours. This study promotes the use of digital subtraction techniques on brain MRIs to track any abnormality and achieve early diagnosis and accuracy whilst reducing reading time. Thus, improving the diagnostic information for physicians can enhance the treatment plan for patients.
Collapse
Affiliation(s)
- Azira Khalil
- Faculty of Science and Technology, Universiti Sains Islam Malaysia, Bandar Baru Nilai, Malaysia
| | - Aisyah Rahimi
- Faculty of Science and Technology, Universiti Sains Islam Malaysia, Bandar Baru Nilai, Malaysia
| | - Aida Luthfi
- Faculty of Science and Technology, Universiti Sains Islam Malaysia, Bandar Baru Nilai, Malaysia
| | - Muhammad Mokhzaini Azizan
- Department of Electrical and Electronic Engineering, Faculty of Engineering and Built Environment, Universiti Sains Islam Malaysia, Bandar Baru Nilai, Malaysia
| | - Suresh Chandra Satapathy
- School of Computer Engineering, Kalinga Institute of Industrial Technology, Deemed to Be University, Bhubaneshwar, India
| | - Khairunnisa Hasikin
- Biomedical Engineering Department, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Khin Wee Lai
- Biomedical Engineering Department, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
42
|
Abstract
Predicting soybean [Glycine max (L.) Merr.] seed yield is of interest for crop producers to make important agronomic and economic decisions. Evaluating the soybean canopy across a range of common agronomic practices, using canopy measurements, provides a large inference for soybean producers. The individual and synergistic relationships between fractional green canopy cover (FGCC), photosynthetically active radiation (PAR) interception, and a normalized difference vegetative index (NDVI) measurements taken throughout the growing season to predict soybean seed yield in North Dakota, USA, were investigated in 12 environments. Canopy measurements were evaluated across early and late planting dates, 407,000 and 457,000 seeds ha−1 seeding rates, 0.5 and 0.8 relative maturities, and 30.5 and 61 cm row spacings. The single best yield predictor was an NDVI measurement at R5 (beginning of seed development) with a coefficient of determination of 0.65 followed by an FGCC measurement at R5 (R2 = 0.52). Stepwise and Lasso multiple regression methods were used to select the best prediction models using the canopy measurements explaining 69% and 67% of the variation in yield, respectively. Including plant density, which can be easily measured by a producer, with an individual canopy measurement did not improve the explanation in yield. Using FGCC to estimate yield across the growing season explained a range of 49% to 56% of yield variation, and a single FGCC measurement at R5 (R2 = 0.52) being the most efficient and practical method for a soybean producer to estimate yield.
Collapse
|
43
|
Abstract
Convolution-based autoencoder networks have yielded promising performances in exploiting spatial–contextual signatures for spectral unmixing. However, the extracted spectral and spatial features of some networks are aggregated, which makes it difficult to balance their effects on unmixing results. In this paper, we propose two gated autoencoder networks with the intention of adaptively controlling the contribution of spectral and spatial features in unmixing process. Gating mechanism is adopted in the networks to filter and regularize spatial features to construct an unmixing algorithm based on spectral information and supplemented by spatial information. In addition, abundance sparsity regularization and gating regularization are introduced to ensure the appropriate implementation. Experimental results validate the superiority of the proposed method to the state-of-the-art techniques in both synthetic and real-world scenes. This study confirms the effectiveness of gating mechanism in improving the accuracy and efficiency of utilizing spatial signatures for spectral unmixing.
Collapse
|
44
|
Hong D, Hu J, Yao J, Chanussot J, Zhu XX. Multimodal remote sensing benchmark datasets for land cover classification with a shared and specific feature learning model. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2021; 178:68-80. [PMID: 34433999 PMCID: PMC8336649 DOI: 10.1016/j.isprsjprs.2021.05.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 05/13/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 - hyperspectral and multispectral data, Berlin - hyperspectral and synthetic aperture radar (SAR) data, Augsburg - hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at https://github.com/danfenghong/ISPRS_S2FL.
Collapse
Affiliation(s)
- Danfeng Hong
- Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany
| | - Jingliang Hu
- Data Science in Earth Observation, Technical University of Munich, 80333 Munich, Germany
| | - Jing Yao
- Aerospace Information Research Institute, Chinese Academy of Sciences, 100094 Beijing, China
| | - Jocelyn Chanussot
- Aerospace Information Research Institute, Chinese Academy of Sciences, 100094 Beijing, China
- Univ. Grenoble Alpes, INRIA, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Xiao Xiang Zhu
- Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany
- Data Science in Earth Observation, Technical University of Munich, 80333 Munich, Germany
| |
Collapse
|
45
|
Lu X, Zhang J, Yang D, Xu L, Jia F. Cascaded Convolutional Neural Network-Based Hyperspectral Image Resolution Enhancement via an Auxiliary Panchromatic Image. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6815-6828. [PMID: 34310305 DOI: 10.1109/tip.2021.3098246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Owing to the limits of incident energy and hardware system, hyperspectral (HS) images always suffer from low spatial resolution, compared with multispectral (MS) or panchromatic (PAN) images. Therefore, image fusion has emerged as a useful technology that is able to combine the characteristics of high spectral and spatial resolutions of HS and PAN/MS images. In this paper, a novel HS and PAN image fusion method based on convolutional neural network (CNN) is proposed. The proposed method incorporates the ideas of both hyper-sharpening and MS pan-sharpening techniques, thereby employing a two-stage cascaded CNN to reconstruct the anticipated high-resolution HS image. Technically, the proposed CNN architecture consists of two sub-networks, the detail injection sub-network and unmixing sub-network. The former aims at producing a latent high-resolution MS image, whereas the latter estimates the desired high-resolution abundance maps by exploring the spatial and spectral information of both HS and MS images. Moreover, two model-training fashions are presented in this paper for the sake of effectively training our network. Experiments on simulated and real remote sensing data demonstrate that the proposed method can improve the spatial resolution and spectral fidelity of HS image, and achieve better performance than some state-of-the-art HS pan-sharpening algorithms.
Collapse
|
46
|
Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing. REMOTE SENSING 2021. [DOI: 10.3390/rs13132637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Hyperspectral unmixing (HU) is a research hotspot of hyperspectral remote sensing technology. As a classical HU method, the nonnegative matrix factorization (NMF) unmixing method can decompose an observed hyperspectral data matrix into the product of two nonnegative matrices, i.e., endmember and abundance matrices. Because the objective function of NMF is the traditional least-squares function, NMF is sensitive to noise. In order to improve the robustness of NMF, this paper proposes a maximum likelihood estimation (MLE) based NMF model (MLENMF) for unmixing of hyperspectral images (HSIs), which substitutes the least-squares objective function in traditional NMF by a robust MLE-based loss function. Experimental results on a simulated and two widely used real hyperspectral data sets demonstrate the superiority of our MLENMF over existing NMF methods.
Collapse
|
47
|
Hong D, Yokoya N, Chanussot J, Xu J, Zhu XX. Joint and Progressive Subspace Analysis (JPSA) With Spatial-Spectral Manifold Alignment for Semisupervised Hyperspectral Dimensionality Reduction. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3602-3615. [PMID: 33175688 DOI: 10.1109/tcyb.2020.3028931] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Conventional nonlinear subspace learning techniques (e.g., manifold learning) usually introduce some drawbacks in explainability (explicit mapping) and cost effectiveness (linearization), generalization capability (out-of-sample), and representability (spatial-spectral discrimination). To overcome these shortcomings, a novel linearized subspace analysis technique with spatial-spectral manifold alignment is developed for a semisupervised hyperspectral dimensionality reduction (HDR), called joint and progressive subspace analysis (JPSA). The JPSA learns a high-level, semantically meaningful, joint spatial-spectral feature representation from hyperspectral (HS) data by: 1) jointly learning latent subspaces and a linear classifier to find an effective projection direction favorable for classification; 2) progressively searching several intermediate states of subspaces to approach an optimal mapping from the original space to a potential more discriminative subspace; and 3) spatially and spectrally aligning a manifold structure in each learned latent subspace in order to preserve the same or similar topological property between the compressed data and the original data. A simple but effective classifier, that is, nearest neighbor (NN), is explored as a potential application for validating the algorithm performance of different HDR approaches. Extensive experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely used HS datasets: 1) Indian Pines (92.98%) and 2) the University of Houston (86.09%) in comparison with previous state-of-the-art HDR methods. The demo of this basic work (i.e., ECCV2018) is openly available at https://github.com/danfenghong/ECCV2018_J-Play.
Collapse
|
48
|
A Novel Change Detection Approach Based on Spectral Unmixing from Stacked Multitemporal Remote Sensing Images with a Variability of Endmembers. REMOTE SENSING 2021. [DOI: 10.3390/rs13132550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Due to the high temporal repetition rates, median/low spatial resolution remote sensing images are the main data source of change detection (CD). It is worth noting that they contain a large number of mixed pixels, which makes adequately capturing the details in the resulting thematic map challenging. The spectral unmixing (SU) method is a potential solution to this problem, as it decomposes mixed pixels into a set of fractions of the land covers. However, there are accumulated errors in the fractional difference images, which lead to a poor change detection results. Meanwhile, the spectra variation of the endmember and the heterogeneity of the land cover materials cannot be fully considered in the traditional framework. In order to solve this problem, a novel change detection approach with image stacking and dividing based on spectral unmixing while considering the variability of endmembers (CD_SDSUVE) was proposed in this paper. Firstly, the remote sensing images at different times were stacked into a unified framework. After that, several patch images were produced by dividing the stacked images so that the similar endmembers according to each land cover can be completely extracted and compared. Finally, the multiple endmember spectral mixture analysis (MESMA) is performed, and the abundant images were combined to produce the entire change detection thematic map. This proposed algorithm was implemented and compared to four relevant state-of-the-art methods on three experimental data, whereby the results confirmed that it effectively improved the accuracy. In the simulated data, the overall accuracy (OA) and Kappa coefficient values were 99.61% and 0.99. In the two real data, the maximum of OA were acquired with 93.26% and 80.85%, which gained 14.88% and 13.42% over the worst results at most. Meanwhile, the Kappa coefficient value was consistent with the OA.
Collapse
|
49
|
Early Identification of Root Rot Disease by Using Hyperspectral Reflectance: The Case of Pathosystem Grapevine/Armillaria. REMOTE SENSING 2021. [DOI: 10.3390/rs13132436] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Armillaria genus represents one of the most common causes of chronic root rot disease in woody plants. Prompt recognition of diseased plants is crucial to control the pathogen. However, the current disease detection methods are limited at a field scale. Therefore, an alternative approach is needed. In this study, we investigated the potential of hyperspectral techniques to identify fungi-infected vs. healthy plants of Vitis vinifera. We used the hyperspectral imaging sensor Specim-IQ to acquire leaves’ reflectance data of the Teroldego Rotaliano grapevine cultivar. We analyzed three different groups of plants: healthy, asymptomatic, and diseased. Highly significant differences were found in the near-infrared (NIR) spectral region with a decreasing pattern from healthy to diseased plants attributable to the leaf mesophyll changes. Asymptomatic plants emerged from the other groups due to a lower reflectance in the red edge spectrum (around 705 nm), ascribable to an accumulation of secondary metabolites involved in plant defense strategies. Further significant differences were observed in the wavelengths close to 550 nm in diseased vs. asymptomatic plants. We evaluated several machine learning paradigms to differentiate the plant groups. The Naïve Bayes (NB) algorithm, combined with the most discriminant variables among vegetation indices and spectral narrow bands, provided the best results with an overall accuracy of 90% and 75% in healthy vs. diseased and healthy vs. asymptomatic plants, respectively. To our knowledge, this study represents the first report on the possibility of using hyperspectral data for root rot disease diagnosis in woody plants. Although further validation studies are required, it appears that the spectral reflectance technique, possibly implemented on unmanned aerial vehicles (UAVs), could be a promising tool for a cost-effective, non-invasive method of Armillaria disease diagnosis and mapping in-field, contributing to a significant step forward in precision viticulture.
Collapse
|
50
|
Tree Height Growth Modelling Using LiDAR-Derived Topography Information. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10060419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The concepts of ecotopes and forest sites are used to describe the correlative complexes defined by landform, vegetation structure, forest stand characteristics and the relationship between soil and physiography. Physically heterogeneous landscapes such as karst, which is characterized by abundant sinkholes and outcrops, exhibit diverse microtopography. Understanding the variation in the growth of trees in a heterogeneous topography is important for sustainable forest management. An R script for detailed stem analysis was used to reconstruct the height growth histories of individual trees (steam analysis). The results of this study reveal that the topographic factors influencing the height growth of silver fir trees can be detected within forest stands. Using topography modelling, we classified silver fir trees into groups with significant differences in height growth. This study provides a sound basis for the comparison of forest site differences and may be useful in the calibration of models for various tree species.
Collapse
|