1
|
Zhang J, Hu Y, Zhang X, Chen M, Wang Z. Saccade and purify: Task adapted multi-view feature calibration network for few shot learning. Neural Netw 2025; 188:107482. [PMID: 40305990 DOI: 10.1016/j.neunet.2025.107482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 05/02/2025]
Abstract
Current few-shot image classification methods encounter challenges in extracting multi-view features that can complement each other and selecting optimal features for classification in a specific task. To address this problem, we propose a novel Task-adapted Multi-view feature Calibration Network (TMCN) inspired by the different saccade patterns observed in the human visual system. The TMCN is designed to "saccade" for extracting complementary multi-view features and "purify" multi-view features in a task-adapted manner. To capture more representative features, we propose a multi-view feature extraction method that simulates the voluntary saccades and scanning saccades in the human visual system, which generates global, local grid, and randomly sampled multi-view features. To purify and obtain the most appropriate features, we employ a global local feature calibration module to calibrate global and local grid features for achieving more stable non-local image features. Furthermore, a sampling feature fusion method is proposed to fuse the randomly sampled features from classes to obtain better prototypes, and a multi-view feature calibrating module is proposed to adaptively fuse purified multi-view features based on the task information obtained from the task feature extracting module. Extensive experiments conducted on three widely used public datasets prove that our proposed TMCN can achieve excellent performance and surpass state-of-the-art methods. The code is available at the following address: https://github.com/huyunzuo/TMCN.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yunzuo Hu
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Xinzhou Zhang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Mingzhe Chen
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
2
|
Yang X, Kong D, Wang N, Gao X. Hyperbolic Insights With Knowledge Distillation for Cross-Domain Few-Shot Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1921-1933. [PMID: 40126970 DOI: 10.1109/tip.2025.3551647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
Cross-domain few-shot learning aims to achieve swift generalization between a source domain and a target domain using a limited number of images. Current research predominantly relies on generalized feature embeddings, employing metric classifiers in Euclidean space for classification. However, due to existing disparities among different data domains, attaining generalized features in the embedding becomes challenging. Additionally, the rise in data domains leads to high-dimensional Euclidean spaces. To address the above problems, we introduce a cross-domain few-shot learning method named Hyperbolic Insights with Knowledge Distillation (HIKD). By integrating knowledge distillation, it enhances the model's generalization performance, thereby significantly improving task performance. Hyperbolic space, in comparison to Euclidean space, offers a larger capacity and supports the learning of hierarchical structures among images, which can aid generalized learning across different data domains. So we map the Euclidean space features to the hyperbolic space via hyperbolic embedding and utilize hyperbolic fitting distillation method in the meta-training phase to obtain multi-domain unified generalization representation. In the meta-testing phase, accounting for biases between the source and target domains, we present a hyperbolic adaptive module to adjust embedded features and eliminate inter-domain gap. Experiments on the Meta-Dataset demonstrate that HIKD outperforms state-of-the-arts methods with the average accuracy of 80.6%.
Collapse
|
3
|
Wei W, Wei P, Liao Z, Qin J, Cheng X, Liu M, Zheng N. Semantic Consistency Reasoning for 3-D Object Detection in Point Clouds. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3356-3369. [PMID: 38113156 DOI: 10.1109/tnnls.2023.3341097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.
Collapse
|
4
|
Yang L, Zhao H, Li H, Qiao L, Yang Z, Li X. GCSTG: Generating Class-confusion-aware Samples with a Tree-structure Graph for Few-shot Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:772-784. [PMID: 40031273 DOI: 10.1109/tip.2025.3530792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Few-Shot Object Detection (FSOD) aims to detect the objects of novel classes using only a few manually annotated samples. With the few novel class samples, learning the inter-class relationships among foreground and constructing the corresponding class hierarchy in FSOD is a challenging task. The poor construction of the class hierarchy will result in the inter-class confusion problem, which has been identified as a primary cause of inferior performance in novel classes by recent FSOD methods. In this work, we further find that the intra-super-class confusion, where samples are misclassified as classes within their associated super-classes, is the main challenge in solving the confusion problem. To solve this issue, this work generates class-confusion-aware samples with a pre-defined tree-structure graph, for helping models to construct a precise class hierarchy. In precise, for generating class-confusion-aware samples, we add the noise into available samples and update the noise to maximize confidence scores on associated confusion categories of samples. Then, a confusion-aware curriculum learning strategy is proposed to make generated samples gradually participate in the training, which benefits the model convergence while learning the generated samples. Experimental results show that our method can be used as a plug-in in recent FSOD methods and consistently improve the model performance.
Collapse
|
5
|
Guo Y, Du R, Sain A, Liang K, Dong Y, Song YZ, Ma Z. Understanding Episode Hardness in Few-Shot Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:616-633. [PMID: 39378258 DOI: 10.1109/tpami.2024.3476075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Achieving generalization for deep learning models has usually suffered from the bottleneck of annotated sample scarcity. As a common way of tackling this issue, few-shot learning focuses on "episodes", i.e., sampled tasks that help the model acquire generalizable knowledge onto unseen categories - better the episodes, the higher a model's generalisability. Despite extensive research, the characteristics of episodes and their potential effects are relatively less explored. A recent paper discussed that different episodes exhibit different prediction difficulties, and coined a new metric "hardness" to quantify episodes, which however is too wide-range for an arbitrary dataset and thus remains impractical for realistic applications. In this paper therefore, we for the first time conduct an algebraic analysis of the critical factors influencing episode hardness supported by experimental demonstrations, that reveal episode hardness to largely depend on classes within an episode, and importantly propose an efficient pre-sampling hardness assessment technique named Inverse-Fisher Discriminant Ratio (IFDR). This enables sampling hard episodes at the class level via class-level (CL) sampling scheme that drastically decreases quantification cost. Delving deeper, we also develop a variant called class-pair-level (CPL) sampling, which further reduces the sampling cost while guaranteeing the sampled distribution. Finally, comprehensive experiments conducted on benchmark datasets verify the efficacy of our proposed method.
Collapse
|
6
|
Wu J, Chang D, Sain A, Li X, Ma Z, Cao J, Guo J, Song YZ. Bi-Directional Ensemble Feature Reconstruction Network for Few-Shot Fine-Grained Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6082-6096. [PMID: 38478433 DOI: 10.1109/tpami.2024.3376686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting - a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. We introduce the snapshot ensemble method in the episodic learning strategy - a simple trick to further improve model performance without increasing training costs. Experimental results on three widely used fine-grained image classification datasets, as well as general and cross-domain few-shot image datasets, consistently show considerable improvements compared with other methods.
Collapse
|
7
|
Cheng H, Wang Y, Li H, Kot AC, Wen B. Disentangled Feature Representation for Few-Shot Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10422-10435. [PMID: 37027772 DOI: 10.1109/tnnls.2023.3241919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Learning the generalizable feature representation is critical to few-shot image classification. While recent works exploited task-specific feature embedding using meta-tasks for few-shot learning, they are limited in many challenging tasks as being distracted by the excursive features such as the background, domain, and style of the image samples. In this work, we propose a novel disentangled feature representation (DFR) framework, dubbed DFR, for few-shot learning applications. DFR can adaptively decouple the discriminative features that are modeled by the classification branch, from the class-irrelevant component of the variation branch. In general, most of the popular deep few-shot learning methods can be plugged in as the classification branch, thus DFR can boost their performance on various few-shot tasks. Furthermore, we propose a novel FS-DomainNet dataset based on DomainNet, for benchmarking the few-shot domain generalization (DG) tasks. We conducted extensive experiments to evaluate the proposed DFR on general, fine-grained, and cross-domain few-shot classification, as well as few-shot DG, using the corresponding four benchmarks, i.e., mini-ImageNet, tiered-ImageNet, Caltech-UCSD Birds 200-2011 (CUB), and the proposed FS-DomainNet. Thanks to the effective feature disentangling, the DFR-based few-shot classifiers achieved state-of-the-art results on all datasets.
Collapse
|
8
|
Wu Z, Zhang X, Li F, Wang S, Li J. TransRender: a transformer-based boundary rendering segmentation network for stroke lesions. Front Neurosci 2023; 17:1259677. [PMID: 37901438 PMCID: PMC10601640 DOI: 10.3389/fnins.2023.1259677] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/26/2023] [Indexed: 10/31/2023] Open
Abstract
Vision transformer architectures attract widespread interest due to their robust representation capabilities of global features. Transformer-based methods as the encoder achieve superior performance compared to convolutional neural networks and other popular networks in many segmentation tasks for medical images. Due to the complex structure of the brain and the approximate grayscale of healthy tissue and lesions, lesion segmentation suffers from over-smooth boundaries or inaccurate segmentation. Existing methods, including the transformer, utilize stacked convolutional layers as the decoder to uniformly treat each pixel as a grid, which is convenient for feature computation. However, they often neglect the high-frequency features of the boundary and focus excessively on the region features. We propose an effective method for lesion boundary rendering called TransRender, which adaptively selects a series of important points to compute the boundary features in a point-based rendering way. The transformer-based method is selected to capture global information during the encoding stage. Several renders efficiently map the encoded features of different levels to the original spatial resolution by combining global and local features. Furthermore, the point-based function is employed to supervise the render module generating points, so that TransRender can continuously refine the uncertainty region. We conducted substantial experiments on different stroke lesion segmentation datasets to prove the efficiency of TransRender. Several evaluation metrics illustrate that our method can automatically segment the stroke lesion with relatively high accuracy and low calculation complexity.
Collapse
Affiliation(s)
- Zelin Wu
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, China
| | - Xueying Zhang
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, China
| | - Fenglian Li
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, China
| | - Suzhe Wang
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, China
| | - Jiaying Li
- The First Clinical Medical College, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
9
|
Zhou Z, Luo L, Liao Q, Liu X, Zhu E. Improving Embedding Generalization in Few-Shot Learning With Instance Neighbor Constraints. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5197-5208. [PMID: 37669186 DOI: 10.1109/tip.2023.3310329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Recently, metric-based meta-learning methods have been effectively applied to few-shot image classification. These methods classify images based on the relationship between samples in an embedding space, avoiding over-fitting that can occur when training classifiers with limited samples. However, finding an embedding space with good generalization properties remains a challenge. Our work highlights that having an initial manifold space that preserves sample neighbor relationships can prevent the metric model from reaching a suboptimal solution. We propose a feature learning method that leverages Instance Neighbor Constraints (INC). This theory is thoroughly evaluated and analyzed through experiments, demonstrating its effectiveness in improving the efficiency of learning and the overall performance of the model. We further integrate the INC into an alternate optimization training framework (AOT) that leverages both batch learning and episode learning to better optimize the metric-based model. We conduct extensive experiments on 5-way 1-shot and 5-way 5-shot settings on four popular few-shot image benchmarks: miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), and Caltech-UCSD Birds-200-2011(CUB). Results show that our method achieves consistent performance gains on benchmarks and state-of-the-art performance. Our findings suggest that initializing the embedding space appropriately and leveraging both batch and episode learning can significantly improve few-shot learning performance.
Collapse
|
10
|
Pan S, Yan H, Liu Z, Chen N, Miao Y, Hou Y. Automatic pavement texture recognition using lightweight few-shot learning. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220166. [PMID: 37454689 DOI: 10.1098/rsta.2022.0166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 05/08/2023] [Indexed: 07/18/2023]
Abstract
Texture is a crucial characteristic of roads, closely related to their performance. The recognition of pavement texture is of great significance for road maintenance professionals to detect potential safety hazards and carry out necessary countermeasures. Although deep learning models have been applied for recognition, the scarcity of data has always been a limitation. To address this issue, this paper proposes a few-shot learning model based on the Siamese network for pavement texture recognition with a limited dataset. The model achieved 89.8% accuracy in a four-way five-shot task classifying the pavement textures of dense asphalt concrete, micro surface, open-graded friction course and stone matrix asphalt. To align with engineering practice, global average pooling (GAP) and one-dimensional convolution are implemented, creating lightweight models that save storage and training time. Comparative experiments show that the lightweight model with GAP implemented on dense layers and one-dimensional convolution on convolutional layers reduced storage volume by 94% and training time by 99%, despite a 2.9% decrease in classification accuracy. Moreover, the model with only GAP implemented on dense layers achieved the highest accuracy at 93.5%, while reducing storage volume and training time by 83% and 6%, respectively. This article is part of the theme issue 'Artificial intelligence in failure analysis of transportation infrastructure and materials'.
Collapse
Affiliation(s)
- Shuo Pan
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Hai Yan
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Zhuo Liu
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Ning Chen
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Yinghao Miao
- National Center for Materials Service Safety, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
| | - Yue Hou
- Department of Civil Engineering, Faculty of Science and Engineering, Swansea University, Swansea SA1 8EN, UK
| |
Collapse
|
11
|
Fatlawi HK, Kiss A. Similarity-Based Adaptive Window for Improving Classification of Epileptic Seizures with Imbalance EEG Data Stream. ENTROPY (BASEL, SWITZERLAND) 2022; 24:e24111641. [PMID: 36421496 PMCID: PMC9689083 DOI: 10.3390/e24111641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/06/2022] [Accepted: 11/08/2022] [Indexed: 06/12/2023]
Abstract
Data stream mining techniques have recently received increasing research interest, especially in medical data classification. An unbalanced representation of the classification's targets in these data is a common challenge because classification techniques are biased toward the major class. Many methods have attempted to address this problem but have been exaggeratedly biased toward the minor class. In this work, we propose a method for balancing the presence of the minor class within the current window of the data stream while preserving the data's original majority as much as possible. The proposed method utilized similarity analysis for selecting specific instances from the previous window. This group of minor-class was then added to the current window's instances. Implementing the proposed method using the Siena dataset showed promising results compared to the Skew ensemble method and some other research methods.
Collapse
Affiliation(s)
- Hayder K. Fatlawi
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Center of Information Technology Research and Development, University of Kufa, Najaf 540011, Iraq
| | - Attila Kiss
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Department of Informatics, J. Selye University, 94501 Komárno, Slovakia
| |
Collapse
|