1
|
Wang Y, Wang Y, Khan ZA, Huang A, Sang J. Multi-level feature fusion networks for smoke recognition in remote sensing imagery. Neural Netw 2025; 184:107112. [PMID: 39793493 DOI: 10.1016/j.neunet.2024.107112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 12/10/2024] [Accepted: 12/28/2024] [Indexed: 01/13/2025]
Abstract
Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.
Collapse
Affiliation(s)
- Yupeng Wang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Yongli Wang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Zaki Ahmad Khan
- Department of Computer Science, University of Worcester, Worcester, UK.
| | - Anqi Huang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Jianghui Sang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
2
|
Chen Y, Yang X, Yan H, Liu J, Jiang J, Mao Z, Wang T. Chrysanthemum classification method integrating deep visual features from both the front and back sides. FRONTIERS IN PLANT SCIENCE 2025; 15:1463113. [PMID: 39906232 PMCID: PMC11790631 DOI: 10.3389/fpls.2024.1463113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 12/26/2024] [Indexed: 02/06/2025]
Abstract
Introducion Chrysanthemum morifolium Ramat (hereinafter referred to as Chrysanthemum) is one of the most beloved and economically valuable Chinese herbal crops, which contains abundant medicinal ingredients and wide application prospects. Therefore, identifying the classification and origin of Chrysanthemum is important for producers, consumers, and market regulators. The existing Chrysanthemum classification methods mostly rely on visual subjective identification, are time-consuming, and always need high equipment costs. Methods A novel method is proposed to accurately identify the Chrysanthemum classification in a swift, non-invasive, and non-contact way. The proposed method is based on the fusion of deep visual features of both the front and back sides. Firstly, the different Chrysanthemums images are collected and labeled with origins and classifications. Secondly, the background area with less available information is removed by image preprocessing. Thirdly, a two-stream feature extraction network is designed with two inputs which are the preprocessed front and back Chrysanthemum images. Meanwhile, the incorporation of single-stream residual connections and cross-stream residual connections is employed to extend the receptive field of the network and fully fusion the features from both the front and back sides. Results Experimental results demonstrate that the proposed method achieves an accuracy of 93.8%, outperforming existing methods and exhibiting superior stability. Discussion The proposed method provides an effective and dependable solution for identifying Chrysanthemum classification and origin while offering practical benefits for quality assurance in production, consumer markets, and regulatory processes. Code and data are available at https://github.com/dart-into/CCMIFB.
Collapse
Affiliation(s)
- Yifan Chen
- School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Xichen Yang
- School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Hui Yan
- Nanjing University of Chinese Medicine, National and Local Collaborative Engineering Center of Chinese Medicinal Resources Industrialization and Formulae Innovative Medicine, Nanjing, China
- Jiangsu Collaborative Innovation Center of Chinese Medicinal Resources Industrialization, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Jia Liu
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Jian Jiang
- School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Zhongyuan Mao
- School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Tianshu Wang
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
- Jiangsu Province Engineering Research Center of Traditional Chinese Medicine (TCM) Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| |
Collapse
|
3
|
Liu Y, Li H, Hu C, Luo S, Luo Y, Chen CW. Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:595-609. [PMID: 38261502 DOI: 10.1109/tnnls.2023.3336563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
The task of instance segmentation in remote sensing images, aiming at performing per-pixel labeling of objects at the instance level, is of great importance for various civil applications. Despite previous successes, most existing instance segmentation methods designed for natural images encounter sharp performance degradations when they are directly applied to top-view remote sensing images. Through careful analysis, we observe that the challenges mainly come from the lack of discriminative object features due to severe scale variations, low contrasts, and clustered distributions. In order to address these problems, a novel context aggregation network (CATNet) is proposed to improve the feature extraction process. The proposed model exploits three lightweight plug-and-play modules, namely, dense feature pyramid network (DenseFPN), spatial context pyramid (SCP), and hierarchical region of interest extractor (HRoIE), to aggregate global visual context at feature, spatial, and instance domains, respectively. DenseFPN is a multi-scale feature propagation module that establishes more flexible information flows by adopting interlevel residual connections, cross-level dense connections, and feature reweighting strategy. Leveraging the attention mechanism, SCP further augments the features by aggregating global spatial context into local regions. For each instance, HRoIE adaptively generates RoI features for different downstream tasks. Extensive evaluations of the proposed scheme on iSAID, DIOR, NWPU VHR-10, and HRSID datasets demonstrate that the proposed approach outperforms state-of-the-arts under similar computational costs. Source code and pretrained models are available at https://github.com/yeliudev/CATNet.
Collapse
|
4
|
Zhang X, Xie W, Li Y, Lei J, Jiang K, Fang L, Du Q. Block-Wise Partner Learning for Model Compression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17582-17595. [PMID: 37656638 DOI: 10.1109/tnnls.2023.3306512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Despite the great potential of convolutional neural networks (CNNs) in various tasks, the resource-hungry nature greatly hinders their wide deployment in cost-sensitive and low-powered scenarios, especially applications in remote sensing. Existing model pruning approaches, implemented by a "subtraction" operation, impose a performance ceiling on the slimmed model. Self-knowledge distillation (Self-KD) resorts to auxiliary networks that are only active in the training phase for performance improvement. However, the knowledge is holistic and crude, and the learning-based knowledge transfer is mediate and lossy. Here, we propose a novel model-compression method, termed block-wise partner learning (BPL), which comprises "extension" and "fusion" operations and liberates the compressed model from the bondage of baseline. Different from the Self-KD, the proposed BPL creates a partner for each block for performance enhancement in training. For the model to absorb more diverse information, a diversity loss (DL) is designed to evaluate the difference between the original block and the partner. Besides, the partner is fused equivalently instead of being discarded directly. After training, we can simply adopt the fused compressed model that contains the enhancement information of partners but with fewer parameters and less inference cost. As validated using the UC Merced land-use, NWPU-RESISC45, and RSD46-WHU datasets, the BPL demonstrates superiority over other compared model-compression approaches. For example, it attains a substantial floating-point operations (FLOPs) reduction of 73.97% with only 0.24 accuracy (ACC.) loss for ResNet-50 on the UC Merced land-use dataset. The code is available at https://github.com/zhangxin-xd/BPL.
Collapse
|
5
|
Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L. Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17935-17949. [PMID: 37672374 DOI: 10.1109/tnnls.2023.3309889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In recent years, object localization and detection methods in remote sensing images (RSIs) have received increasing attention due to their broad applications. However, most previous fully supervised methods require a large number of time-consuming and labor-intensive instance-level annotations. Compared with those fully supervised methods, weakly supervised object localization (WSOL) aims to recognize object instances using only image-level labels, which greatly saves the labeling costs of RSIs. In this article, we propose a self-directed weakly supervised strategy (SD-WSS) to perform WSOL in RSIs. To specify, we fully exploit and enhance the spatial feature extraction capability of the RSIs' classification model to accurately localize the objects of interest. To alleviate the serious discriminative region problem exhibited by previous WSOL methods, the spatial location information implicit in the classification model is carefully extracted by GradCAM++ to guide the learning procedure. Furthermore, to eliminate the interference from complex backgrounds of RSIs, we design a novel self-directed loss to make the model optimize itself and explicitly tell it where to look. Finally, we review and annotate the existing remote sensing scene classification dataset and create two new WSOL benchmarks in RSIs, named C45V2 and PN2. We conduct extensive experiments to evaluate the proposed method and six mainstream WSOL methods with three backbones on C45V2 and PN2. The results demonstrate that our proposed method achieves better performance when compared with state-of-the-arts.
Collapse
|
6
|
Xing C, Zhao J, Wang Z, Wang M. Deep Ring-Block-Wise Network for Hyperspectral Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14125-14137. [PMID: 37220048 DOI: 10.1109/tnnls.2023.3274745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Deep learning has achieved many successes in the field of the hyperspectral image (HSI) classification. Most of existing deep learning-based methods have no consideration of feature distribution, which may yield lowly separable and discriminative features. From the perspective of spatial geometry, one excellent feature distribution form requires to satisfy both properties, i.e., block and ring. The block means that in a feature space, the distance of intraclass samples is close and the one of interclass samples is far. The ring represents that all class samples are overall distributed in a ring topology. Accordingly, in this article, we propose a novel deep ring-block-wise network (DRN) for the HSI classification, which takes full consideration of feature distribution. To obtain the good distribution used for high classification performance, in this DRN, a ring-block perception (RBP) layer is built by integrating the self-representation and ring loss into a perception model. By such way, the exported features are imposed to follow the requirements of both block and ring, so as to be more separably and discriminatively distributed compared with traditional deep networks. Besides, we also design an optimization strategy with alternating update to obtain the solution of this RBP layer model. Extensive results on the Salinas, Pavia Centre, Indian Pines, and Houston datasets have demonstrated that the proposed DRN method achieves the better classification performance in contrast to the state-of-the-art approaches.
Collapse
|
7
|
Zhou M, Zhou Y, Yang D, Song K. Remote Sensing Image Classification Based on Canny Operator Enhanced Edge Features. SENSORS (BASEL, SWITZERLAND) 2024; 24:3912. [PMID: 38931695 PMCID: PMC11207323 DOI: 10.3390/s24123912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 06/07/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024]
Abstract
Remote sensing image classification plays a crucial role in the field of remote sensing interpretation. With the exponential growth of multi-source remote sensing data, accurately extracting target features and comprehending target attributes from complex images significantly impacts classification accuracy. To address these challenges, we propose a Canny edge-enhanced multi-level attention feature fusion network (CAF) for remote sensing image classification. The original image is specifically inputted into a convolutional network for the extraction of global features, while increasing the depth of the convolutional layer facilitates feature extraction at various levels. Additionally, to emphasize detailed target features, we employ the Canny operator for edge information extraction and utilize a convolution layer to capture deep edge features. Finally, by leveraging the Attentional Feature Fusion (AFF) network, we fuse global and detailed features to obtain more discriminative representations for scene classification tasks. The performance of our proposed method (CAF) is evaluated through experiments conducted across three openly accessible datasets for classifying scenes in remote sensing images: NWPU-RESISC45, UCM, and MSTAR. The experimental findings indicate that our approach based on incorporating edge detail information outperforms methods relying solely on global feature-based classifications.
Collapse
Affiliation(s)
| | | | | | - Kai Song
- College of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, China; (M.Z.)
| |
Collapse
|
8
|
Li Z, Hu J, Wu K, Miao J, Zhao Z, Wu J. Local feature acquisition and global context understanding network for very high-resolution land cover classification. Sci Rep 2024; 14:12597. [PMID: 38824153 PMCID: PMC11144191 DOI: 10.1038/s41598-024-63363-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/28/2024] [Indexed: 06/03/2024] Open
Abstract
Very high-resolution remote sensing images hold promising applications in ground observation tasks, paving the way for highly competitive solutions using image processing techniques for land cover classification. To address the challenges faced by convolutional neural network (CNNs) in exploring contextual information in remote sensing image land cover classification and the limitations of vision transformer (ViT) series in effectively capturing local details and spatial information, we propose a local feature acquisition and global context understanding network (LFAGCU). Specifically, we design a multidimensional and multichannel convolutional module to construct a local feature extractor aimed at capturing local information and spatial relationships within images. Simultaneously, we introduce a global feature learning module that utilizes multiple sets of multi-head attention mechanisms for modeling global semantic information, abstracting the overall feature representation of remote sensing images. Validation, comparative analyses, and ablation experiments conducted on three different scales of publicly available datasets demonstrate the effectiveness and generalization capability of the LFAGCU method. Results show its effectiveness in locating category attribute information related to remote sensing areas and its exceptional generalization capability. Code is available at https://github.com/lzp-lkd/LFAGCU .
Collapse
Affiliation(s)
- Zhengpeng Li
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China
| | - Jun Hu
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China.
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China.
| | - Kunyang Wu
- College of Instrumentation and Electrical Engineering, Jilin University, Changchun, China
- National Geophysical Exploration Equipment Engineering Research Center, Jilin University, Changchun, China
- Key Laboratory of Geophysical Exploration Equipment Ministry of Education of China (Jilin University), Changchun, China
| | - Jiawei Miao
- School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China
- Liaoning Province Key Laboratory of Intelligent Construction and Internet of Things Application Technologies, Anshan, China
| | - Zixue Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, China
| | - Jiansheng Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, China
| |
Collapse
|
9
|
Gao J, Jiao L, Liu X, Li L, Chen P, Liu F, Yang S. Multiscale Dynamic Curvelet Scattering Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7999-8012. [PMID: 36427283 DOI: 10.1109/tnnls.2022.3223212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The feature representation learning process greatly determines the performance of networks in classification tasks. By combining multiscale geometric tools and networks, better representation and learning can be achieved. However, relatively fixed geometric features and multiscale structures are always used. In this article, we propose a more flexible framework called the multiscale dynamic curvelet scattering network (MSDCCN). This data-driven dynamic network is based on multiscale geometric prior knowledge. First, multiresolution scattering and multiscale curvelet features are efficiently aggregated in different levels. Then, these features can be reused in networks flexibly and dynamically, depending on the multiscale intervention flag. The initial value of this flag is based on the complexity assessment, and it is updated according to feature sparsity statistics on the pretrained model. With the multiscale dynamic reuse structure, the feature representation learning process can be improved in the following training process. Also, multistage fine-tuning can be performed to further improve the classification accuracy. Furthermore, a novel multiscale dynamic curvelet scattering module, which is more flexible, is developed to be further embedded into other networks. Extensive experimental results show that better classification accuracies can be achieved by MSDCCN. In addition, necessary evaluation experiments have been performed, including convergence analysis, insight analysis, and adaptability analysis.
Collapse
|
10
|
Wan Y, Zhong Y, Ma A, Wang J, Zhang L. E2SCNet: Efficient Multiobjective Evolutionary Automatic Search for Remote Sensing Image Scene Classification Network Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7752-7766. [PMID: 36395135 DOI: 10.1109/tnnls.2022.3220699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Remote sensing image scene classification methods based on deep learning have been widely studied and discussed. However, most of the network architectures are directly reliant on natural image processing methods and are fixed. A few studies have focused on automatic search mechanisms, but they cannot weigh the interpretation accuracy and the parameter quantity for practical application. As a result, automatic global search methods based on multiobjective evolutionary computation have more advantages. However, in the ranking process, the network individuals with large parameter quantities are easy to eliminate, but a higher accuracy may be obtained after full training. In addition, evolutionary neural architecture search methods often take several days. In this article, in order to solve the above concerns, we propose an efficient multiobjective evolutionary automatic search framework for remote sensing image scene classification deep learning network architectures (E2SCNet). In E2SCNet, eight kinds of lightweight operators are used to build a diversified search space, and the coding connection mode is flexible. In the search process, a large model retention mechanism is implemented through two-step multiobjective modeling and evolutionary search, where one step involves the "parameter quantity and accuracy," and the other step involves the "parameter quantity and accuracy growth quantity." Moreover, a super network is constructed to share the weight in the process of individual network evaluation and promote the search speed. The effectiveness of E2SCNet is proven by comparison with several networks designed by human experts and networks obtained by gradient and evolutionary computing-based search methods.
Collapse
|
11
|
Muksimova S, Umirzakova S, Kang S, Cho YI. CerviLearnNet: Advancing cervical cancer diagnosis with reinforcement learning-enhanced convolutional networks. Heliyon 2024; 10:e29913. [PMID: 38694035 PMCID: PMC11061669 DOI: 10.1016/j.heliyon.2024.e29913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/03/2024] Open
Abstract
Women tend to face many problems throughout their lives; cervical cancer is one of the most dangerous diseases that they can face, and it has many negative consequences. Regular screening and treatment of precancerous lesions play a vital role in the fight against cervical cancer. It is becoming increasingly common in medical practice to predict the early stages of serious illnesses, such as heart attacks, kidney failure, and cancer, using machine learning-based techniques. To overcome these obstacles, we propose the use of auxiliary modules and a special residual block, to record contextual interactions between object classes and to support the object reference strategy. Unlike the latest state-of-the-art classification method, we create a new architecture called the Reinforcement Learning Cancer Network, "RL-CancerNet", which diagnoses cervical cancer with incredible accuracy. We trained and tested our method on two well-known publicly available datasets, SipaKMeD and Herlev, to assess it and enable comparisons with earlier methods. Cervical cancer images were labeled in this dataset; therefore, they had to be marked manually. Our study shows that, compared to previous approaches for the assignment of classifying cervical cancer as an early cellular change, the proposed approach generates a more reliable and stable image derived from images of datasets of vastly different sizes, indicating that it will be effective for other datasets.
Collapse
Affiliation(s)
- Shakhnoza Muksimova
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, South Korea
| | - Sabina Umirzakova
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, South Korea
| | - Seokwhan Kang
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, South Korea
| | - Young Im Cho
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, South Korea
| |
Collapse
|
12
|
Qu Q, Pan B, Xu X, Li T, Shi Z. Unmixing Guided Unsupervised Network for RGB Spectral Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:4856-4867. [PMID: 37527312 DOI: 10.1109/tip.2023.3299197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Spectral super-resolution has attracted research attention recently, which aims to generate hyperspectral images from RGB images. However, most of the existing spectral super-resolution algorithms work in a supervised manner, requiring pairwise data for training, which is difficult to obtain. In this paper, we propose an Unmixing Guided Unsupervised Network (UnGUN), which does not require pairwise imagery to achieve unsupervised spectral super-resolution. In addition, UnGUN utilizes arbitrary other hyperspectral imagery as the guidance image to guide the reconstruction of spectral information. The UnGUN mainly includes three branches: two unmixing branches and a reconstruction branch. Hyperspectral unmixing branch and RGB unmixing branch decompose the guidance and RGB images into corresponding endmembers and abundances respectively, from which the spectral and spatial priors are extracted. Meanwhile, the reconstruction branch integrates the above spectral-spatial priors to generate a coarse hyperspectral image and then refined it. Besides, we design a discriminator to ensure that the distribution of generated image is close to the guidance hyperspectral imagery, so that the reconstructed image follows the characteristics of a real hyperspectral image. The major contribution is that we develop an unsupervised framework based on spectral unmixing, which realizes spectral super-resolution without paired hyperspectral-RGB images. Experiments demonstrate the superiority of UnGUN when compared with some SOTA methods.
Collapse
|
13
|
Huang G, Wang Y, Lv K, Jiang H, Huang W, Qi P, Song S. Glance and Focus Networks for Dynamic Visual Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4605-4621. [PMID: 35939472 DOI: 10.1109/tpami.2022.3196959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Spatial redundancy widely exists in visual recognition tasks, i.e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand. Therefore, static models which process all the pixels with an equal amount of computation result in considerable redundancy in terms of time and space consumption. In this paper, we formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. Specifically, the proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. The sequential process naturally facilitates adaptive inference at test time, as it can be terminated once the model is sufficiently confident about its prediction, avoiding further redundant computation. It is worth noting that the problem of locating discriminant regions in our model is formulated as a reinforcement learning task, thus requiring no additional manual annotations other than classification labels. GFNet is general and flexible as it is compatible with any off-the-shelf backbone models (such as MobileNets, EfficientNets and TSM), which can be conveniently deployed as the feature extractor. Extensive experiments on a variety of image classification and video recognition tasks and with various backbone models demonstrate the remarkable efficiency of our method. For example, it reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 1.3x without sacrificing accuracy. Code and pre-trained models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.
Collapse
|
14
|
Ning H, Lei T, An M, Sun H, Hu Z, Nandi AK. Scale‐wise interaction fusion and knowledge distillation network for aerial scene recognition. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Affiliation(s)
- Hailong Ning
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Tao Lei
- School of Electronic Information and Artificial Intelligence Shaanxi University of Science and Technology Xi'an China
| | - Mengyuan An
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Hao Sun
- School of Computer Central China Normal University Wuhan China
| | - Zhanxuan Hu
- School of Computer Science and Technology Xi'an University of Posts and Telecommunications Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an China
- Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an China
| | - Asoke K. Nandi
- Department of Electronic and Electrical Engineering Brunel University London London UK
- Xi'an Jiaotong University Xi'an China
| |
Collapse
|
15
|
CDTNet: Improved Image Classification Method Using Standard, Dilated and Transposed Convolutions. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125984] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Convolutional neural networks (CNNs) have achieved great success in image classification tasks. In the process of a convolutional operation, a larger input area can capture more context information. Stacking several convolutional layers can enlarge the receptive field, but this increases the parameters. Most CNN models use pooling layers to extract important features, but the pooling operations cause information loss. Transposed convolution can increase the spatial size of the feature maps to recover the lost low-resolution information. In this study, we used two branches with different dilated rates to obtain different size features. The dilated convolution can capture richer information, and the outputs from the two channels are concatenated together as input for the next block. The small size feature maps of the top blocks are transposed to increase the spatial size of the feature maps to recover low-resolution prediction maps. We evaluated the model on three image classification benchmark datasets (CIFAR-10, SVHN, and FMNIST) with four state-of-the-art models, namely, VGG16, VGG19, ResNeXt, and DenseNet. The experimental results show that CDTNet achieved lower loss, higher accuracy, and faster convergence speed in the training and test stages. The average test accuracy of CDTNet increased by 54.81% at most on SVHN with VGG19 and by 1.28% at least on FMNIST with VGG16, which proves that CDTNet has better performance and strong generalization abilities, as well as fewer parameters.
Collapse
|
16
|
DMH-FSL: Dual-Modal Hypergraph for Few-Shot Learning. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10684-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
17
|
Soleymanpour S, Sadr H, Nazari Soleimandarabi M. CSCNN: Cost-Sensitive Convolutional Neural Network for Encrypted Traffic Classification. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10534-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|