1
|
Liu C, Li B, Shi M, Chen X, Ye Q, Ji X. Explicit Margin Equilibrium for Few-Shot Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8072-8084. [PMID: 38980785 DOI: 10.1109/tnnls.2024.3422216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Under low data regimes, few-shot object detection (FSOD) transfers related knowledge from base classes with sufficient annotations to novel classes with limited samples in a two-step paradigm, including base training and balanced fine-tuning. In base training, the learned embedding space needs to be dispersed with large class margins to facilitate novel class accommodation and avoid feature aliasing while in balanced fine-tuning properly concentrating with small margins to represent novel classes precisely. Although obsession with the discrimination and representation dilemma has stimulated substantial progress, explorations for the equilibrium of class margins within the embedding space are still in full swing. In this study, we propose a class margin optimization scheme, termed explicit margin equilibrium (EME), by explicitly leveraging the quantified relationship between base and novel classes. EME first maximizes base-class margins to reserve adequate space to prepare for novel class adaptation. During fine-tuning, it quantifies the interclass semantic relationships by calculating the equilibrium coefficients based on the assumption that novel instances can be represented by linear combinations of base-class prototypes. EME finally reweights margin loss using equilibrium coefficients to adapt base knowledge for novel instance learning with the help of instance disturbance (ID) augmentation. As a plug-and-play module, EME can also be applied to few-shot classification. Consistent performance gains upon various baseline methods and benchmarks validate the generality and efficacy of EME. The code is available at github.com/Bohao-Lee/EME.
Collapse
|
2
|
Fu M, Wang X, Wang J, Yi Z. Prototype Bayesian Meta-Learning for Few-Shot Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7010-7024. [PMID: 38837923 DOI: 10.1109/tnnls.2024.3403865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Meta-learning aims to leverage prior knowledge from related tasks to enable a base learner to quickly adapt to new tasks with limited labeled samples. However, traditional meta-learning methods have limitations as they provide an optimal initialization for all new tasks, disregarding the inherent uncertainty induced by few-shot tasks and impeding task-specific self-adaptation initialization. In response to this challenge, this article proposes a novel probabilistic meta-learning approach called prototype Bayesian meta-learning (PBML). PBML focuses on meta-learning variational posteriors within a Bayesian framework, guided by prototype-conditioned prior information. Specifically, to capture model uncertainty, PBML treats both meta- and task-specific parameters as random variables and integrates their posterior estimates into hierarchical Bayesian modeling through variational inference (VI). During model inference, PBML employs Laplacian estimation to approximate the integral term over the likelihood loss, deriving a rigorous upper-bound for generalization errors. To enhance the model's expressiveness and enable task-specific adaptive initialization, PBML proposes a data-driven approach to model the task-specific variational posteriors. This is achieved by designing a generative model structure that incorporates prototype-conditioned task-dependent priors into the random generation of task-specific variational posteriors. Additionally, by performing latent embedding optimization, PBML decouples the gradient-based meta-learning from the high-dimensional variational parameter space. Experimental results on benchmark datasets for few-shot image classification illustrate that PBML attains state-of-the-art or competitive performance when compared to other related works. Versatility studies demonstrate the adaptability and applicability of PBML in addressing diverse and challenging few-shot tasks. Furthermore, ablation studies validate the performance gains attributed to the inference and model components.
Collapse
|
3
|
Lu J, Xiao C, Zhang C. Meta-Modulation: A General Learning Framework for Cross-Task Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6407-6421. [PMID: 38837924 DOI: 10.1109/tnnls.2024.3405938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Building learning systems possessing adaptive flexibility to different tasks is critical and challenging. In this article, we propose a novel and general meta-learning framework, called meta-modulation (MeMo), to foster the adaptation capability of a base learner across different tasks where only a few training data are available per task. For one independent task, MeMo proceeds like a "feedback regulation system," which achieves an adaptive modulation on the so-called definitive embeddings of query data to maximize the corresponding task objective. Specifically, we devise a type of efficient feedback information, definitive embedding feedback (DEF), to mathematize and quantify the unsuitability between the few training data and the base learner as well as the promising adjustment direction to reduce this unsuitability. The DEFs are encoded into high-level representation and temporarily stored as task-specific modulator templates by a modulation encoder. For coming query data, we develop an attention mechanism acting upon these modulator templates and combine both task/data-level modulation to generate the final data-specific meta-modulator. This meta-modulator is then used to modulate the query's embedding for correct decision-making. Our framework is scalable for various base learner models like multi-layer perceptron (MLP), long short-term memory (LSTM), convolutional neural network (CNN), and transformer, and applicable to different learning problems like language modeling and image recognition. Experimental results on a 2-D point synthetic dataset and various benchmarks in language and vision domains demonstrate the effectiveness and competitiveness of our framework.
Collapse
|
4
|
Lai L, Chen J, Zhang Z, Lin G, Wu Q. CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5522-5534. [PMID: 38593016 DOI: 10.1109/tnnls.2024.3383039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Few-shot single-view 3D reconstruction learns to reconstruct the novel category objects based on a query image and a few support shapes. However, since the query image and the support shapes are of different modalities, there is an inherent feature misalignment problem damaging the reconstruction. Previous works in the literature do not consider this problem. To this end, we propose the cross-modal feature alignment network (CMFAN) with two novel techniques. One is a strategy for model pretraining, namely, cross-modal contrastive learning (CMCL), here the 2D images and 3D shapes of the same objects compose the positives, and those from different objects form the negatives. With CMCL, the model learns to embed the 2D and 3D modalities of the same object into a tight area in the feature space and push away those from different objects, thus effectively aligning the global cross-modal features. The other is cross-modal feature fusion (CMFF), which further aligns and fuses the local features. Specifically, it first re-represents the local features with the cross-attention operation, making the local features share more information. Then, CMFF generates a descriptor for the support features and attaches it to each local feature vector of the query image with dense concatenation. Moreover, CMFF can be applied to multilevel local features and brings further advantages. We conduct extensive experiments to evaluate the effectiveness of our designs, and CMFAN sets new state-of-the-art performance in all of the 1-/10-/25-shot tasks of ShapeNet and ModelNet datasets.
Collapse
|
5
|
Wang K, Fei X, Su L, Fang T, Shen H. Auxiliary meta-learning strategy for cancer recognition: leveraging external data and optimized feature mapping. BMC Cancer 2025; 25:367. [PMID: 40016648 PMCID: PMC11869438 DOI: 10.1186/s12885-025-13740-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Accepted: 02/14/2025] [Indexed: 03/01/2025] Open
Abstract
As reported by the International Agency for Research on Cancer (IARC), the global incidence of cancer reached nearly 20 million new cases in recent years, with cancer-related fatalities amounting to around 9.7 million. This underscores the profound impact cancer has on public health worldwide. Deep learning has become a mainstream approach in cancer recognition. Despite its significant progress, deep learning is known for its requirement of large quantities of labeled data. Few-shot learning addresses this limitation by reducing the need for extensive labeled samples. In the field of cancer recognition, data collection is particularly challenging due to the scarcity of categories compared to other fields, and current few-shot learning methods have not yielded satisfactory results. To tackle this, we propose an auxiliary meta-learning strategy for cancer recognition. During the auxiliary training phase, the feature mapping model is trained in conjunction with external data. This process neutralizes the prediction probability of misclassification, allowing the model to more readily learn distinguishing features and avoid performance degradation caused by discrepancies in external data. Additionally, the redundancy of some input principal components in the feature mapping model is reduced, while the implicit information within these components is extracted. The training process is further accelerated by utilizing depthwise over-parameterized convolutional layers. Moreover, the implementation of a three-branch structure contributes to faster training and enhanced performance. In the meta-training stage, the feature mapping model is optimized within the embedding space, utilizing category prototypes and cosine distance. During the meta-testing phase, a small number of labeled samples are employed to classify unknown data. We have conducted extensive experiments on the BreakHis, Pap smear, and ISIC 2018 datasets. The results demonstrate that our method achieves superior accuracy in cancer recognition. Furthermore, experiments on few-shot benchmark datasets indicate that our approach exhibits excellent generalization capabilities.
Collapse
Affiliation(s)
- Kang Wang
- Key Laboratory of Multidisciplinary Management and Control of Complex Systems of Anhui Higher Education Institutes, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
| | - Xihong Fei
- Key Laboratory of Multidisciplinary Management and Control of Complex Systems of Anhui Higher Education Institutes, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
| | - Lei Su
- Key Laboratory of Multidisciplinary Management and Control of Complex Systems of Anhui Higher Education Institutes, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
| | - Tian Fang
- Key Laboratory of Multidisciplinary Management and Control of Complex Systems of Anhui Higher Education Institutes, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
| | - Hao Shen
- Key Laboratory of Multidisciplinary Management and Control of Complex Systems of Anhui Higher Education Institutes, Anhui University of Technology, Ma'anshan, 243032, Anhui, China.
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China.
| |
Collapse
|
6
|
Kohler M, Eisenbach M, Gross HM. Few-Shot Object Detection: A Comprehensive Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11958-11978. [PMID: 37067965 DOI: 10.1109/tnnls.2023.3265051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection (FSOD) aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in FSOD. We categorize approaches according to their training scheme and architectural layout. For each type of approach, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze the reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of FSOD.
Collapse
|
7
|
Cheng H, Wang Y, Li H, Kot AC, Wen B. Disentangled Feature Representation for Few-Shot Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10422-10435. [PMID: 37027772 DOI: 10.1109/tnnls.2023.3241919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Learning the generalizable feature representation is critical to few-shot image classification. While recent works exploited task-specific feature embedding using meta-tasks for few-shot learning, they are limited in many challenging tasks as being distracted by the excursive features such as the background, domain, and style of the image samples. In this work, we propose a novel disentangled feature representation (DFR) framework, dubbed DFR, for few-shot learning applications. DFR can adaptively decouple the discriminative features that are modeled by the classification branch, from the class-irrelevant component of the variation branch. In general, most of the popular deep few-shot learning methods can be plugged in as the classification branch, thus DFR can boost their performance on various few-shot tasks. Furthermore, we propose a novel FS-DomainNet dataset based on DomainNet, for benchmarking the few-shot domain generalization (DG) tasks. We conducted extensive experiments to evaluate the proposed DFR on general, fine-grained, and cross-domain few-shot classification, as well as few-shot DG, using the corresponding four benchmarks, i.e., mini-ImageNet, tiered-ImageNet, Caltech-UCSD Birds 200-2011 (CUB), and the proposed FS-DomainNet. Thanks to the effective feature disentangling, the DFR-based few-shot classifiers achieved state-of-the-art results on all datasets.
Collapse
|
8
|
Zhou Z, Luo L, Zhou S, Li W, Yang X, Liu X, Zhu E. Task-Related Saliency for Few-Shot Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10751-10763. [PMID: 37027620 DOI: 10.1109/tnnls.2023.3243903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A weakness of the existing metric-based few-shot classification method is that task-unrelated objects or backgrounds may mislead the model since the small number of samples in the support set is insufficient to reveal the task-related targets. An essential cue of human wisdom in the few-shot classification task is that they can recognize the task-related targets by a glimpse of support images without being distracted by task-unrelated things. Thus, we propose to explicitly learn task-related saliency features and make use of them in the metric-based few-shot learning schema. We divide the tackling of the task into three phases, namely, the modeling, the analyzing, and the matching. In the modeling phase, we introduce a saliency sensitive module (SSM), which is an inexact supervision task jointly trained with a standard multiclass classification task. SSM not only enhances the fine-grained representation of feature embedding but also can locate the task-related saliency features. Meanwhile, we propose a self-training-based task-related saliency network (TRSN) which is a lightweight network to distill task-related salience produced by SSM. In the analyzing phase, we freeze TRSN and use it to handle novel tasks. TRSN extracts task-relevant features while suppressing the disturbing task-unrelated features. We, therefore, can discriminate samples accurately in the matching phase by strengthening the task-related features. We conduct extensive experiments on five-way 1-shot and 5-shot settings to evaluate the proposed method. Results show that our method achieves a consistent performance gain on benchmarks and achieves the state-of-the-art.
Collapse
|
9
|
Han M, Zhan Y, Luo Y, Du B, Hu H, Wen Y, Tao D. Not All Instances Contribute Equally: Instance-Adaptive Class Representation Learning for Few-Shot Visual Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5447-5460. [PMID: 36136920 DOI: 10.1109/tnnls.2022.3204684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances. Many few-shot visual recognition methods adopt the metric-based meta-learning paradigm by comparing the query representation with class representations to predict the category of query instance. However, the current metric-based methods generally treat all instances equally and consequently often obtain biased class representation, considering not all instances are equally significant when summarizing the instance-level representations for the class-level representation. For example, some instances may contain unrepresentative information, such as too much background and information of unrelated concepts, which skew the results. To address the above issues, we propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition. Specifically, we develop an adaptive instance revaluing network (AIRN) with the capability to address the biased representation issue when generating the class representation, by learning and assigning adaptive weights for different instances according to their relative significance in the support set of corresponding class. In addition, we design an improved bilinear instance representation and incorporate two novel structural losses, i.e., intraclass instance clustering loss and interclass representation distinguishing loss, to further regulate the instance revaluation process and refine the class representation. We conduct extensive experiments on four commonly adopted few-shot benchmarks: miniImageNet, tieredImageNet, CIFAR-FS, and FC100 datasets. The experimental results compared with the state-of-the-art approaches demonstrate the superiority of our ICRL-Net.
Collapse
|
10
|
Zhao Y, Yu G, Wang J, Domeniconi C, Guo M, Zhang X, Cui L. Personalized Federated Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2534-2544. [PMID: 35862332 DOI: 10.1109/tnnls.2022.3190359] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Personalized federated learning (PFL) learns a personalized model for each client in a decentralized manner, where each client owns private data that are not shared and data among clients are non-independent and identically distributed (i.i.d.) However, existing PFL solutions assume that clients have sufficient training samples to jointly induce personalized models. Thus, existing PFL solutions cannot perform well in a few-shot scenario, where most or all clients only have a handful of samples for training. Furthermore, existing few-shot learning (FSL) approaches typically need centralized training data; as such, these FSL methods are not applicable in decentralized scenarios. How to enable PFL with limited training samples per client is a practical but understudied problem. In this article, we propose a solution called personalized federated few-shot learning (pFedFSL) to tackle this problem. Specifically, pFedFSL learns a personalized and discriminative feature space for each client by identifying which models perform well on which clients, without exposing local data of clients to the server and other clients, and which clients should be selected for collaboration with the target client. In the learned feature spaces, each sample is made closer to samples of the same category and farther away from samples of different categories. Experimental results on four benchmark datasets demonstrate that pFedFSL outperforms competitive baselines across different settings.
Collapse
|
11
|
Wang Z, Liu L, Duan Y, Tao D. SIN: Semantic Inference Network for Few-Shot Streaming Label Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9952-9965. [PMID: 35507625 DOI: 10.1109/tnnls.2022.3162747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Streaming label learning aims to model newly emerged labels for multilabel classification systems, which requires plenty of new label data for training. However, in changing environments, only a small amount of new label data can practically be collected. In this work, we formulate and study few-shot streaming label learning (FSLL), which models emerging new labels with only a few annotated examples by utilizing the knowledge learned from past labels. We propose a meta-learning framework, semantic inference network (SIN), which can learn and infer the semantic correlation between new labels and past labels to adapt FSLL tasks from a few examples effectively. SIN leverages label semantic representation to regularize the output space and acquires labelwise meta-knowledge based on gradient-based meta-learning. Moreover, SIN incorporates a novel label decision module with a meta-threshold loss to find the optimal confidence thresholds for each new label. Theoretically, we illustrate that the proposed semantic inference mechanism could constrain the complexity of hypotheses space to reduce the risk of overfitting and achieve better generalizability. Experimentally, extensive empirical results and ablation studies demonstrate the performance of SIN is superior to the prior state-of-the-art methods on FSLL.
Collapse
|
12
|
NC$$^2$$E: boosting few-shot learning with novel class center estimation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08080-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
13
|
Yang B, Wan F, Liu C, Li B, Ji X, Ye Q. Part-Based Semantic Transform for Few-Shot Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7141-7152. [PMID: 34101605 DOI: 10.1109/tnnls.2021.3084252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Few-shot semantic segmentation remains an open problem for the lack of an effective method to handle the semantic misalignment between objects. In this article, we propose part-based semantic transform (PST) and target at aligning object semantics in support images with those in query images by semantic decomposition-and-match. The semantic decomposition process is implemented with prototype mixture models (PMMs), which use an expectation-maximization (EM) algorithm to decompose object semantics into multiple prototypes corresponding to object parts. The semantic match between prototypes is performed with a min-cost flow module, which encourages correct correspondence while depressing mismatches between object parts. With semantic decomposition-and-match, PST enforces the network's tolerance to objects' appearance and/or pose variation and facilities channelwise and spatial semantic activation of objects in query images. Extensive experiments on Pascal VOC and MS-COCO datasets show that PST significantly improves upon state-of-the-arts. In particular, on MS-COCO, it improves the performance of five-shot semantic segmentation by up to 7.79% with a moderate cost of inference speed and model size. Code for PST is released at https://github.com/Yang-Bob/PST.
Collapse
|
14
|
Tian P, Li W, Gao Y. Consistent Meta-Regularization for Better Meta-Knowledge in Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7277-7288. [PMID: 34106865 DOI: 10.1109/tnnls.2021.3084733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, meta-learning provides a powerful paradigm to deal with the few-shot learning problem. However, existing meta-learning approaches ignore the prior fact that good meta-knowledge should alleviate the data inconsistency between training and test data, caused by the extremely limited data, in each few-shot learning task. Moreover, legitimately utilizing the prior understanding of meta-knowledge can lead us to design an efficient method to improve the meta-learning model. Under this circumstance, we consider the data inconsistency from the distribution perspective, making it convenient to bring in the prior fact, and propose a new consistent meta-regularization (Con-MetaReg) to help the meta-learning model learn how to reduce the data-distribution discrepancy between the training and test data. In this way, the ability of meta-knowledge on keeping the training and test data consistent is enhanced, and the performance of the meta-learning model can be further improved. The extensive analyses and experiments demonstrate that our method can indeed improve the performances of different meta-learning models in few-shot regression, classification, and fine-grained classification.
Collapse
|
15
|
Ma Y, Bai S, Liu W, Wang S, Yu Y, Bai X, Liu X, Wang M. Transductive Relation-Propagation With Decoupling Training for Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6652-6664. [PMID: 34138714 DOI: 10.1109/tnnls.2021.3082928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Few-shot learning, aiming to learn novel concepts from one or a few labeled examples, is an interesting and very challenging problem with many practical advantages. Existing few-shot methods usually utilize data of the same classes to train the feature embedding module and in a row, which is unable to learn adapting to new tasks. Besides, traditional few-shot models fail to take advantage of the valuable relations of the support-query pairs, leading to performance degradation. In this article, we propose a transductive relation-propagation graph neural network (GNN) with a decoupling training strategy (TRPN-D) to explicitly model and propagate such relations across support-query pairs, and empower the few-shot module the ability of transferring past knowledge to new tasks via the decoupling training. Our few-shot module, namely TRPN, treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intraclass commonality and interclass uniqueness. Through relation propagation, the model could generate the discriminative relation embeddings for support-query pairs. To the best of our knowledge, this is the first work that decouples the training of the embedding network and the few-shot graph module with different tasks, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.
Collapse
|
16
|
Zhang X, Wei Y, Li Z, Yan C, Yang Y. Rich Embedding Features for One-Shot Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6484-6493. [PMID: 34161244 DOI: 10.1109/tnnls.2021.3081693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One-shot semantic segmentation poses the challenging task of segmenting object regions from unseen categories with only one annotated example as guidance. Thus, how to effectively construct robust feature representations from the guidance image is crucial to the success of one-shot semantic segmentation. To this end, we propose in this article a simple, yet effective approach named rich embedding features (REFs). Given a reference image accompanied with its annotated mask, our REF constructs rich embedding features of the support object from three perspectives: 1) global embedding to capture the general characteristics; 2) peak embedding to capture the most discriminative information; 3) adaptive embedding to capture the internal long-range dependencies. By combining these informative features, we can easily harvest sufficient and rich guidance even from a single reference image. In addition to REF, we further propose a simple depth-priority context module to obtain useful contextual cues from the query image. This successfully raises the performance of one-shot semantic segmentation to a new level. We conduct experiments on pattern analysis, statical modeling and computational learning (Pascal) visual object classes (VOC) 2012 and common object in context (COCO) to demonstrate the effectiveness of our approach.
Collapse
|
17
|
Hou R, Chen J, He S, Li F, Zhou Z. Prototype augmented network with metric-mixed under limited samples for mechanical intelligent fault recognition. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Liu L, Wang B, Kuang Z, Xue JH, Chen Y, Yang W, Liao Q, Zhang W. GenDet: Meta Learning to Generate Detectors From Few Shots. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3448-3460. [PMID: 33523819 DOI: 10.1109/tnnls.2021.3053005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Object detection has made enormous progress and has been widely used in many applications. However, it performs poorly when only limited training data is available for novel classes that the model has never seen before. Most existing approaches solve few-shot detection tasks implicitly without directly modeling the detectors for novel classes. In this article, we propose GenDet, a new meta-learning-based framework that can effectively generate object detectors for novel classes from few shots and, thus, conducts few-shot detection tasks explicitly. The detector generator is trained by numerous few-shot detection tasks sampled from base classes each with sufficient samples, and thus, it is expected to generalize well on novel classes. An adaptive pooling module is further introduced to suppress distracting samples and aggregate the detectors generated from multiple shots. Moreover, we propose to train a reference detector for each base class in the conventional way, with which to guide the training of the detector generator. The reference detectors and the detector generator can be trained simultaneously. Finally, the generated detectors of different classes are encouraged to be orthogonal to each other for better generalization. The proposed approach is extensively evaluated on the ImageNet, VOC, and COCO data sets under various few-shot detection settings, and it achieves new state-of-the-art results.
Collapse
|
19
|
TDDA-Net: A transitive distant domain adaptation network for industrial sample enhancement. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.05.109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
20
|
Hu Y, Chapman A, Wen G, Hall DW. What Can Knowledge Bring to Machine Learning?—A Survey of Low-shot Learning for Structured Data. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3510030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include heavy reliance on massive training data, limited generalizability, and poor expressiveness of high-level semantics. Low-shot Learning attempts to address these drawbacks. Low-shot learning allows the model to obtain good predictive power with very little or no training data, where structured knowledge plays a key role as a high-level semantic representation of human. This article will review the fundamental factors of low-shot learning technologies, with a focus on the operation of structured knowledge under different low-shot conditions. We also introduce other techniques relevant to low-shot learning. Finally, we point out the limitations of low-shot learning, the prospects and gaps of industrial applications, and future research directions.
Collapse
Affiliation(s)
- Yang Hu
- University of Southampton, United Kingdom and South China University of Technology, Guangzhou, Guangdong, China
| | - Adriane Chapman
- University of Southampton, Southampton, Hampshire, United Kingdom
| | - Guihua Wen
- South China University of Technology, Guangzhou, Guangdong, China
| | - Dame Wendy Hall
- University of Southampton, Southampton, Hampshire, United Kingdom
| |
Collapse
|
21
|
Cheng J, Hao F, Liu L, Tao D. Imposing Semantic Consistency of Local Descriptors for Few-Shot Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1587-1600. [PMID: 35073265 DOI: 10.1109/tip.2022.3143692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Few-shot learning suffers from the scarcity of labeled training data. Regarding local descriptors of an image as representations for the image could greatly augment existing labeled training data. Existing local descriptor based few-shot learning methods have taken advantage of this fact but ignore that the semantics exhibited by local descriptors may not be relevant to the image semantic. In this paper, we deal with this issue from a new perspective of imposing semantic consistency of local descriptors of an image. Our proposed method consists of three modules. The first one is a local descriptor extractor module, which can extract a large number of local descriptors in a single forward pass. The second one is a local descriptor compensator module, which compensates the local descriptors with the image-level representation, in order to align the semantics between local descriptors and the image semantic. The third one is a local descriptor based contrastive loss function, which supervises the learning of the whole pipeline, with the aim of making the semantics carried by the local descriptors of an image relevant and consistent with the image semantic. Theoretical analysis demonstrates the generalization ability of our proposed method. Comprehensive experiments conducted on benchmark datasets indicate that our proposed method achieves the semantic consistency of local descriptors and the state-of-the-art performance.
Collapse
|
22
|
Zheng W, Yan L, Gou C, Zhang Z, Zhang JJ, Hu M, Wang F. Learning to learn by yourself: Unsupervised meta-learning with self-knowledge distillation for COVID-19 diagnosis from pneumonia cases. INT J INTELL SYST 2021; 36:4033-4064. [PMID: 38607826 PMCID: PMC8242586 DOI: 10.1002/int.22449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 03/16/2021] [Accepted: 04/09/2021] [Indexed: 12/15/2022]
Abstract
The goal of diagnosing the coronavirus disease 2019 (COVID-19) from suspected pneumonia cases, that is, recognizing COVID-19 from chest X-ray or computed tomography (CT) images, is to improve diagnostic accuracy, leading to faster intervention. The most important and challenging problem here is to design an effective and robust diagnosis model. To this end, there are three challenges to overcome: (1) The lack of training samples limits the success of existing deep-learning-based methods. (2) Many public COVID-19 data sets contain only a few images without fine-grained labels. (3) Due to the explosive growth of suspected cases, it is urgent and important to diagnose not only COVID-19 cases but also the cases of other types of pneumonia that are similar to the symptoms of COVID-19. To address these issues, we propose a novel framework called Unsupervised Meta-Learning with Self-Knowledge Distillation to address the problem of differentiating COVID-19 from pneumonia cases. During training, our model cannot use any true labels and aims to gain the ability of learning to learn by itself. In particular, we first present a deep diagnosis model based on a relation network to capture and memorize the relation among different images. Second, to enhance the performance of our model, we design a self-knowledge distillation mechanism that distills knowledge within our model itself. Our network is divided into several parts, and the knowledge in the deeper parts is squeezed into the shallow ones. The final results are derived from our model by learning to compare the features of images. Experimental results demonstrate that our approach achieves significantly higher performance than other state-of-the-art methods. Moreover, we construct a new COVID-19 pneumonia data set based on text mining, consisting of 2696 COVID-19 images (347 X-ray + 2349 CT), 10,155 images (9661 X-ray + 494 CT) about other types of pneumonia, and the fine-grained labels of all. Our data set considers not only a bacterial infection or viral infection which causes pneumonia but also a viral infection derived from the influenza virus or coronavirus.
Collapse
Affiliation(s)
- Wenbo Zheng
- School of Software EngineeringXi'an Jiaotong UniversityXi'anChina
- The State Key Laboratory for Management and Control of Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina
| | - Lan Yan
- The State Key Laboratory for Management and Control of Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina
- School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
| | - Chao Gou
- School of Intelligent Systems EngineeringSun Yat‐sen UniversityGuangzhouChina
| | - Zhi‐Cheng Zhang
- Seventh Medical CenterGeneral Hospital of People's Liberation ArmyBeijingChina
| | - Jun J. Zhang
- The State Key Laboratory for Management and Control of Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina
- School of Electrical Engineering and AutomationWuhan UniversityWuhanChina
| | - Ming Hu
- Intensive Care UnitWuhan Pulmonary HospitalWuhanChina
| | - Fei‐Yue Wang
- The State Key Laboratory for Management and Control of Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina
| |
Collapse
|