1
|
Wei XS, Yu HT, Xu A, Zhang F, Peng Y. MECOM: A Meta-Completion Network for Fine-Grained Recognition With Incomplete Multi-Modalities. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3456-3469. [PMID: 38787666 DOI: 10.1109/tip.2024.3403051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Our work focuses on tackling the problem of fine-grained recognition with incomplete multi-modal data, which is overlooked by previous work in the literature. It is desirable to not only capture fine-grained patterns of objects but also alleviate the challenges of missing modalities for such a practical problem. In this paper, we propose to leverage a meta-learning strategy to learn model abilities of both fast modal adaptation and more importantly missing modality completion across a variety of incomplete multi-modality learning tasks. Based on that, we develop a meta-completion method, termed as MECOM, to perform multimodal fusion and explicit missing modality completion by our proposals of cross-modal attention and decoupling reconstruction. To further improve fine-grained recognition accuracy, an additional partial stream (as a counterpart of the main stream of MECOM, i.e., holistic) and the part-level features (corresponding to fine-grained objects' parts) selection are designed, which are tailored for fine-grained nature to capture discriminative but subtle part-level patterns. Comprehensive experiments from quantitative and qualitative aspects, as well as various ablation studies, on two fine-grained multimodal datasets and one generic multimodal dataset show our superiority over competing methods. Our code is open-source and available at https://github.com/SEU-VIPGroup/MECOM.
Collapse
|
2
|
Yao J, Han B, Zhou Z, Zhang Y, Tsang IW. Latent Class-Conditional Noise Model. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:9964-9980. [PMID: 37027688 DOI: 10.1109/tpami.2023.3247629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Learning with noisy labels has become imperative in the Big Data era, which saves expensive human labors on accurate annotations. Previous noise-transition-based methods have achieved theoretically-grounded performance under the Class-Conditional Noise model (CCN). However, these approaches builds upon an ideal but impractical anchor set available to pre-estimate the noise transition. Even though subsequent works adapt the estimation as a neural layer, the ill-posed stochastic learning of its parameters in back-propagation easily falls into undesired local minimums. We solve this problem by introducing a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. By projecting the noise transition into the Dirichlet space, the learning is constrained on a simplex characterized by the complete dataset, instead of some ad-hoc parametric space wrapped by the neural layer. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels to train the classifier and to model the noise. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples. We further generalize LCCN to different counterparts compatible with open-set noisy labels, semi-supervised learning as well as cross-model training. A range of experiments demonstrate the advantages of LCCN and its variants over the current state-of-the-art methods. The code is available at here.
Collapse
|
3
|
Training CNN Classifiers Solely on Webly Data. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2023. [DOI: 10.2478/jaiscr-2023-0005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Abstract
Real life applications of deep learning (DL) are often limited by the lack of expert labeled data required to effectively train DL models. Creation of such data usually requires substantial amount of time for manual categorization, which is costly and is considered to be one of the major impediments in development of DL methods in many areas. This work proposes a classification approach which completely removes the need for costly expert labeled data and utilizes noisy web data created by the users who are not subject matter experts. The experiments are performed with two well-known Convolutional Neural Network (CNN) architectures: VGG16 and ResNet50 trained on three randomly collected Instagram-based sets of images from three distinct domains: metropolitan cities, popular food and common objects - the last two sets were compiled by the authors and made freely available to the research community. The dataset containing common objects is a webly counterpart of PascalVOC2007 set. It is demonstrated that despite significant amount of label noise in the training data, application of proposed approach paired with standard training CNN protocol leads to high classification accuracy on representative data in all three above-mentioned domains. Additionally, two straightforward procedures of automatic cleaning of the data, before its use in the training process, are proposed. Apparently, data cleaning does not lead to improvement of results which suggests that the presence of noise in webly data is actually helpful in learning meaningful and robust class representations. Manual inspection of a subset of web-based test data shows that labels assigned to many images are ambiguous even for humans. It is our conclusion that for the datasets and CNN architectures used in this paper, in case of training with webly data, a major factor contributing to the final classification accuracy is representativeness of test data rather than application of data cleaning procedures.
Collapse
|
4
|
Wei XS, Song YZ, Aodha OM, Wu J, Peng Y, Tang J, Yang J, Belongie S. Fine-Grained Image Analysis With Deep Learning: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8927-8948. [PMID: 34752384 DOI: 10.1109/tpami.2021.3126648] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas - fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.
Collapse
|
5
|
Wu X, Chang J, Lai YK, Yang J, Tian Q. BiSPL: Bidirectional Self-Paced Learning for Recognition From Web Data. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6512-6527. [PMID: 34252026 DOI: 10.1109/tip.2021.3094744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning (DL) is inherently subject to the requirement of a large amount of well-labeled data, which is expensive and time-consuming to obtain manually. In order to broaden the reach of DL, leveraging free web data becomes an attractive strategy to alleviate the issue of data scarcity. However, directly utilizing collected web data to train a deep model is ineffective because of the mixed noisy data. To address such problems, we develop a novel bidirectional self-paced learning (BiSPL) framework which reduces the effect of noise by learning from web data in a meaningful order. Technically, the BiSPL framework consists of two essential steps. Relying on distances defined between web samples and labeled source samples, first, the web samples with short distances are sampled and combined to form a new training set. Second, based on the new training set, both easy and hard samples are initially employed to train deep models for higher stability, and hard samples are gradually dropped to reduce the noise as the training progresses. By iteratively alternating such steps, deep models converge to a better solution. We mainly focus on the fine-grained visual classification (FGVC) tasks because their corresponding datasets are generally small and therefore face a more significant data scarcity problem. Experiments conducted on six public FGVC tasks demonstrate that our proposed method outperforms the state-of-the-art approaches. Especially, BiSPL suffices to achieve the highest stable performance when the scale of the well-labeled training set decreases dramatically.
Collapse
|
6
|
How to handle noisy labels for robust learning from uncertainty. Neural Netw 2021; 143:209-217. [PMID: 34157645 DOI: 10.1016/j.neunet.2021.06.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 11/22/2022]
Abstract
Most deep neural networks (DNNs) are trained with large amounts of noisy labels when they are applied. As DNNs have the high capacity to fit any noisy labels, it is known to be difficult to train DNNs robustly with noisy labels. These noisy labels cause the performance degradation of DNNs due to the memorization effect by over-fitting. Earlier state-of-the-art methods used small loss tricks to efficiently resolve the robust training problem with noisy labels. In this paper, relationship between the uncertainties and the clean labels is analyzed. We present novel training method to use not only small loss trick but also labels that are likely to be clean labels selected from uncertainty called "Uncertain Aware Co-Training (UACT)". Our robust learning techniques (UACT) avoid over-fitting the DNNs by extremely noisy labels. By making better use of the uncertainty acquired from the network itself, we achieve good generalization performance. We compare the proposed method to the current state-of-the-art algorithms for noisy versions of MNIST, CIFAR-10, CIFAR-100, T-ImageNet and News to demonstrate its excellence.
Collapse
|
7
|
|
8
|
Feng J, Wang X, Liu W. Deep graph cut network for weakly-supervised semantic segmentation. SCIENCE CHINA INFORMATION SCIENCES 2021; 64:130105. [PMCID: PMC7881314 DOI: 10.1007/s11432-020-3065-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 06/13/2020] [Accepted: 07/30/2020] [Indexed: 06/13/2023]
Abstract
The scarcity of fully-annotated data becomes the biggest obstacle that prevents many deep learning approaches from widely applied. Weakly-supervised visual learning which can utilize inexact annotations is developed rapidly to remedy such a situation. In this paper, we study the weakly-supervised task achieving pixel-level semantic segmentation only with image-level labels as supervision. Different from other methods, our approach tries to transform the weakly-supervised visual learning problem into a semi-supervised visual learning problem and then utilizes semi-supervised learning methods to solve it. Utilizing this transformation, we can adopt effective semi-supervised methods to perform transductive learning with context information. In the semi-supervised learning module, we propose to use the graph cut algorithm to label more supervision from the activation seeds generated from a classification network. The generated labels can provide the segmentation model with effective supervision information; moreover, the graph cut module can benefit from features extracted by the segmentation model. Then, each of them updates and optimizes the other iteratively until convergence. Experiment results on PASCAL VOC and COCO benchmarks demonstrate the effectiveness of the proposed deep graph cut algorithm for weakly-supervised semantic segmentation.
Collapse
Affiliation(s)
- Jiapei Feng
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430000 China
| | - Xinggang Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430000 China
| | - Wenyu Liu
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430000 China
| |
Collapse
|
9
|
Yang J, Wu X, Liang J, Sun X, Cheng MM, Rosin PL, Wang L. Self-Paced Balance Learning for Clinical Skin Disease Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2832-2846. [PMID: 31199274 DOI: 10.1109/tnnls.2019.2917524] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes that contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by resampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category that is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the self-paced learning paradigm. We then assign each class a penalty weight that is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark data sets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks, such as indoor scene image recognition and object classification.
Collapse
|
10
|
Udmale SS, Patil SS, Phalle VM, Singh SK. A bearing vibration data analysis based on spectral kurtosis and ConvNet. Soft comput 2018. [DOI: 10.1007/s00500-018-3644-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Yao J, Wang J, Tsang IW, Zhang Y, Sun J, Zhang C, Zhang R. Deep Learning from Noisy Image Labels with Quality Embedding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:1909-1922. [PMID: 30369444 DOI: 10.1109/tip.2018.2877939] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
There is an emerging trend to leverage noisy image datasets in many visual recognition tasks. However, the label noise among datasets severely degenerates the performance of deep learning approaches. Recently, one mainstream is to introduce the latent label to handle label noise, which has shown promising improvement in the network designs. Nevertheless, the mismatch between latent labels and noisy labels still affects the predictions in such methods. To address this issue, we propose a probabilistic model, which explicitly introduces an extra variable to represent the trustworthiness of noisy labels, termed as the quality variable. Our key idea is to identify the mismatch between the latent and noisy labels by embedding the quality variables into different subspaces, which effectively minimizes the influence of label noise. At the same time, reliable labels are still able to be applied for training. To instantiate the model, we further propose a Contrastive-Additive Noise network (CAN), which consists of two important layers: (1) the contrastive layer that estimates the quality variable in the embedding space to reduce the influence of noisy labels; and (2) the additive layer that aggregates the prior prediction and noisy labels as the posterior to train the classifier. Moreover, to tackle the challenges in optimization, we deduce an SGD algorithm with the reparameterization tricks, which makes our method scalable to big data.We validate the proposed method on a range of noisy image datasets. Comprehensive results have demonstrated that CAN outperforms the state-of-the-art deep learning approaches.
Collapse
|