1
|
Kumar V, Tripathi V, Pant B, Diwakar M, Singh P, Bijalwan A. Learning optimal image representations through noise injection for fine-grained search. Sci Rep 2025; 15:15560. [PMID: 40319122 PMCID: PMC12049428 DOI: 10.1038/s41598-025-97528-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 04/04/2025] [Indexed: 05/07/2025] Open
Abstract
In recent years, fine-grained image search has been an area of interest within the computer vision community. Many current works follow deep feature learning paradigms, which generally exploit the pre-trained convolutional layer's activations as representations and learn a low-dimensional embedding. This embedding is usually learned by defining loss functions based on local structure like triplet loss. However, triplet loss requires an expensive sampling strategy. In addition, softmax-based loss (when the problem is treated as a classification task) performs faster than triplet loss but suffers from early saturation. To this end, a novel approach is proposed to enhance fine-grained representation learning by incorporating noise injection in both input and features. At the input, input image is made noised and the goal is set to reduce the distance between the L2 normalized features of input image and its noisy version in the embedding space, relative to other instances. Concurrently, noise injection in the features acts as regularization, facilitating the acquisition of generalized features and mitigating model overfitting. The proposed approach is tested on three public datasets: Oxford flower-17, Cub-200-2011 and Cars-196, and achieves better retrieval results than other existing methods. In addition, we also tested our approach in the Zero-Shot setting and got favorable results compared to the prior methods on Cars-196 and Cub-200-2011.
Collapse
Affiliation(s)
- Vidit Kumar
- Department of CSE, Graphic Era Deemed to be University, Dehradun, India
| | - Vikas Tripathi
- Department of CSE, Graphic Era Deemed to be University, Dehradun, India
| | - Bhaskar Pant
- Department of CSE, Graphic Era Deemed to be University, Dehradun, India
| | - Manoj Diwakar
- Department of CSE, Graphic Era Deemed to be University, Dehradun, India
| | - Prabhishek Singh
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | - Anchit Bijalwan
- Faculty of Electrical and Computer Engineering, Arba Minch University, Arba Minch, Ethiopia.
| |
Collapse
|
2
|
Xu Y, Zhang X, Huang C, Qiu X. Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy? PLoS One 2024; 19:e0298228. [PMID: 38422007 PMCID: PMC10903886 DOI: 10.1371/journal.pone.0298228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/19/2024] [Indexed: 03/02/2024] Open
Abstract
This article investigates whether higher classification accuracy can always be achieved by utilizing a pre-trained deep learning model as the feature extractor in the Bag-of-Deep-Visual-Words (BoDVW) classification model, as opposed to directly using the new classification layer of the pre-trained model for classification. Considering the multiple factors related to the feature extractor -such as model architecture, fine-tuning strategy, number of training samples, feature extraction method, and feature encoding method-we investigate these factors through experiments and then provide detailed answers to the question. In our experiments, we use five feature encoding methods: hard-voting, soft-voting, locally constrained linear coding, super vector coding, and fisher vector (FV). We also employ two popular feature extraction methods: one (denoted as Ext-DFs(CP)) uses a convolutional or non-global pooling layer, and another (denoted as Ext-DFs(FC)) uses a fully-connected or global pooling layer. Three pre-trained models-VGGNet-16, ResNext-50(32×4d), and Swin-B-are utilized as feature extractors. Experimental results on six datasets (15-Scenes, TF-Flowers, MIT Indoor-67, COVID-19 CXR, NWPU-RESISC45, and Caltech-101) reveal that compared to using the pre-trained model with only the new classification layer re-trained for classification, employing it as the feature extractor in the BoDVW model improves the accuracy in 35 out of 36 experiments when using FV. With Ext-DFs(CP), the accuracy increases by 0.13% to 8.43% (averaged at 3.11%), and with Ext-DFs(FC), it increases by 1.06% to 14.63% (averaged at 5.66%). Furthermore, when all layers of the pre-trained model are fine-tuned and used as the feature extractor, the results vary depending on the methods used. If FV and Ext-DFs(FC) are used, the accuracy increases by 0.21% to 5.65% (averaged at 1.58%) in 14 out of 18 experiments. Our results suggest that while using a pre-trained deep learning model as the feature extractor does not always improve classification accuracy, it holds great potential as an accuracy improvement technique.
Collapse
Affiliation(s)
- Ye Xu
- School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China
| | - Xin Zhang
- School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China
| | - Chongpeng Huang
- School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China
| | - Xiaorong Qiu
- School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China
| |
Collapse
|
3
|
Zeng C, Zhao S, Chen B, Zeng A, Li S. Feature-correlation-aware history-preserving-sparse-coding framework for automatic vertebra recognition. Comput Biol Med 2023; 160:106977. [PMID: 37163964 DOI: 10.1016/j.compbiomed.2023.106977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/17/2023] [Accepted: 04/23/2023] [Indexed: 05/12/2023]
Abstract
Automatic vertebra recognition from magnetic resonance imaging (MRI) is of significance in disease diagnosis and surgical treatment of spinal patients. Although modern methods have achieved remarkable progress, vertebra recognition still faces two challenges in practice: (1) Vertebral appearance challenge: The vertebral repetitive nature causes similar appearance among different vertebrae, while pathological variation causes different appearance among the same vertebrae; (2) Field of view (FOV) challenge: The FOVs of the input MRI images are unpredictable, which exacerbates the appearance challenge because there may be no specific-appearing vertebrae to assist recognition. In this paper, we propose a Feature-cOrrelation-aware history-pReserving-sparse-Coding framEwork (FORCE) to extract highly discriminative features and alleviate these challenges. FORCE is a recognition framework with two elaborated modules: (1) A feature similarity regularization (FSR) module to constrain the features of the vertebrae with the same label (but potentially with different appearances) to be closer in the latent feature space in an Eigenmap-based regularization manner. (2) A cumulative sparse representation (CSR) module to achieve feed-forward sparse coding while preventing historical features from being erased, which leverages both the intrinsic advantages of sparse codes and the historical features for obtaining more discriminative sparse codes encoding each vertebra. These two modules are embedded into the vertebra recognition framework in a plug-and-play manner to improve feature discrimination. FORCE is trained and evaluated on a challenging dataset containing 600 MRI images. The evaluation results show that FORCE achieves high performance in vertebra recognition and outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Chenyi Zeng
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Shen Zhao
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China.
| | - Bin Chen
- ZheJiang University, Hangzhou, Zhejiang, China
| | - An Zeng
- Guangdong University Of Technology, Guangzhou, Guangdong, China
| | | |
Collapse
|
4
|
Zhang Y, Deng Y, Zhou Z, Zhang X, Jiao P, Zhao Z. Multimodal learning for fetal distress diagnosis using a multimodal medical information fusion framework. Front Physiol 2022; 13:1021400. [PMID: 36419838 PMCID: PMC9676934 DOI: 10.3389/fphys.2022.1021400] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/25/2022] [Indexed: 09/08/2024] Open
Abstract
Cardiotocography (CTG) monitoring is an important medical diagnostic tool for fetal well-being evaluation in late pregnancy. In this regard, intelligent CTG classification based on Fetal Heart Rate (FHR) signals is a challenging research area that can assist obstetricians in making clinical decisions, thereby improving the efficiency and accuracy of pregnancy management. Most existing methods focus on one specific modality, that is, they only detect one type of modality and inevitably have limitations such as incomplete or redundant source domain feature extraction, and poor repeatability. This study focuses on modeling multimodal learning for Fetal Distress Diagnosis (FDD); however, exists three major challenges: unaligned multimodalities; failure to learn and fuse the causality and inclusion between multimodal biomedical data; modality sensitivity, that is, difficulty in implementing a task in the absence of modalities. To address these three issues, we propose a Multimodal Medical Information Fusion framework named MMIF, where the Category Constrained-Parallel ViT model (CCPViT) was first proposed to explore multimodal learning tasks and address the misalignment between multimodalities. Based on CCPViT, a cross-attention-based image-text joint component is introduced to establish a Multimodal Representation Alignment Network model (MRAN), explore the deep-level interactive representation between cross-modal data, and assist multimodal learning. Furthermore, we designed a simple-structured FDD test model based on the highly modal alignment MMIF, realizing task delegation from multimodal model training (image and text) to unimodal pathological diagnosis (image). Extensive experiments, including model parameter sensitivity analysis, cross-modal alignment assessment, and pathological diagnostic accuracy evaluation, were conducted to show our models' superior performance and effectiveness.
Collapse
Affiliation(s)
- Yefei Zhang
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Yanjun Deng
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Zhixin Zhou
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Xianfei Zhang
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Pengfei Jiao
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
| | - Zhidong Zhao
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
5
|
Bera A, Wharton Z, Liu Y, Bessis N, Behera A. SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:6017-6031. [PMID: 36103441 DOI: 10.1109/tip.2022.3205215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/ scene. To this end, we propose a method that effectively captures subtle changes by aggregating context-aware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
Collapse
|
6
|
Yang L, Fan W, Bouguila N. Robust unsupervised image categorization based on variational autoencoder with disentangled latent representations. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
|
8
|
Han J, Yao X, Cheng G, Feng X, Xu D. P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:579-590. [PMID: 31398107 DOI: 10.1109/tpami.2019.2933510] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
9
|
Liu B, Xie H, Xiao Y. Multi-task analysis discriminative dictionary learning for one-class learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Zhao R, Liu T, Xiao J, Lun DPK, Lam KM. Invertible Image Decolorization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6081-6095. [PMID: 34185645 DOI: 10.1109/tip.2021.3091902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Invertible image decolorization is a useful color compression technique to reduce the cost in multimedia systems. Invertible decolorization aims to synthesize faithful grayscales from color images, which can be fully restored to the original color version. In this paper, we propose a novel color compression method to produce invertible grayscale images using invertible neural networks (INNs). Our key idea is to separate the color information from color images, and encode the color information into a set of Gaussian distributed latent variables via INNs. By this means, we force the color information lost in grayscale generation to be independent of the input color image. Therefore, the original color version can be efficiently recovered by randomly re-sampling a new set of Gaussian distributed variables, together with the synthetic grayscale, through the reverse mapping of INNs. To effectively learn the invertible grayscale, we introduce the wavelet transformation into a UNet-like INN architecture, and further present a quantization embedding to prevent the information omission in format conversion, which improves the generalizability of the framework in real-world scenarios. Extensive experiments on three widely used benchmarks demonstrate that the proposed method achieves a state-of-the-art performance in terms of both qualitative and quantitative results, which shows its superiority in multimedia communication and storage systems.
Collapse
|
11
|
|
12
|
Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval. SENSORS 2021; 21:s21041139. [PMID: 33561989 PMCID: PMC7914434 DOI: 10.3390/s21041139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 11/16/2022]
Abstract
Convolutional neural networks (CNN) are relational with grid-structures and spatial dependencies for two-dimensional images to exploit location adjacencies, color values, and hidden patterns. Convolutional neural networks use sparse connections at high-level sensitivity with layered connection complying indiscriminative disciplines with local spatial mapping footprints. This fact varies with architectural dependencies, insight inputs, number and types of layers and its fusion with derived signatures. This research focuses this gap by incorporating GoogLeNet, VGG-19, and ResNet-50 architectures with maximum response based Eigenvalues textured and convolutional Laplacian scaled object features with mapped colored channels to obtain the highest image retrieval rates over millions of images from versatile semantic groups and benchmarks. Time and computation efficient formulation of the presented model is a step forward in deep learning fusion and smart signature capsulation for innovative descriptor creation. Remarkable results on challenging benchmarks are presented with a thorough contextualization to provide insight CNN effects with anchor bindings. The presented method is tested on well-known datasets including ALOT (250), Corel-1000, Cifar-10, Corel-10000, Cifar-100, Oxford Buildings, FTVL Tropical Fruits, 17-Flowers, Fashion (15), Caltech-256, and reported outstanding performance. The presented work is compared with state-of-the-art methods and experimented over tiny, large, complex, overlay, texture, color, object, shape, mimicked, plain and occupied background, multiple objected foreground images, and marked significant accuracies.
Collapse
|
13
|
Tang H, Mao L, Zeng S, Deng S, Ai Z. Discriminative dictionary learning algorithm with pairwise local constraints for histopathological image classification. Med Biol Eng Comput 2021; 59:153-164. [PMID: 33386592 DOI: 10.1007/s11517-020-02281-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 10/22/2020] [Indexed: 10/22/2022]
Abstract
Histopathological image contains rich pathological information that is valued for the aided diagnosis of many diseases such as cancer. An important issue in histopathological image classification is how to learn a high-quality discriminative dictionary due to diverse tissue pattern, a variety of texture, and different morphologies structure. In this paper, we propose a discriminative dictionary learning algorithm with pairwise local constraints (PLCDDL) for histopathological image classification. Inspired by the one-to-one mapping between dictionary atom and profile, we learn a pair of discriminative graph Laplacian matrices that are less sensitive to noise or outliers to capture the locality and discriminating information of data manifold by utilizing the local geometry information of category-specific dictionaries rather than input data. Furthermore, graph-based pairwise local constraints are designed and incorporated into the original dictionary learning model to effectively encode the locality consistency with intra-class samples and the locality inconsistency with inter-class samples. Specifically, we learn the discriminative localities for representations by jointly optimizing both the intra-class locality and inter-class locality, which can significantly improve the discriminability and robustness of dictionary. Extensive experiments on the challenging datasets verify that the proposed PLCDDL algorithm can achieve a better classification accuracy and powerful robustness compared with the state-of-the-art dictionary learning methods. Graphical abstract The proposed PLCDDL algorithm. 1) A pair of graph Laplacian matrices are first learned based on the class-specific dictionaries. 2) Graph-based pairwise local constraints are designed to transfer the locality for coding coefficients. 3) Class-specific dictionaries can be further updated.
Collapse
Affiliation(s)
- Hongzhong Tang
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China. .,College of Automation and Electronic Information, Xiangtan University, Xiangtan, Hunan, People's Republic of China. .,Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People's Republic of China.
| | - Lizhen Mao
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China
| | - Shuying Zeng
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China
| | - Shijun Deng
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China.,College of Automation and Electronic Information, Xiangtan University, Xiangtan, Hunan, People's Republic of China
| | - Zhaoyang Ai
- Institute of Biophysics Linguistics, College of Foreign Languages, Hunan University, Changsha, Hunan, People's Republic of China
| |
Collapse
|
14
|
Kasaei SH, Lopes LS, Tome AM. Local-LDA: Open-Ended Learning of Latent Topics for 3D Object Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2567-2580. [PMID: 31283495 DOI: 10.1109/tpami.2019.2926459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Service robots are expected to be more autonomous and work effectively in human-centric environments. This implies that robots should have special capabilities, such as learning from past experiences and real-time object category recognition. This paper proposes an open-ended 3D object recognition system which concurrently learns both the object categories and the statistical features for encoding objects. In particular, we propose an extension of Latent Dirichlet Allocation to learn structural semantic features (i.e., visual topics), from low-level feature co-occurrences, for each category independently. Moreover, topics in each category are discovered in an unsupervised fashion and are updated incrementally using new object views. In this way, the advantages of both the (hand-crafted) local features and the (learned) structural semantic features have been considered and combined in an efficient way. An extensive set of experiments has been performed to assess the performance of the proposed Local-LDA in terms of descriptiveness, scalability, and computation time. Experimental results show that the overall classification performance obtained with Local-LDA is clearly better than the best performances obtained with the state-of-the-art approaches. Moreover, the best scalability, in terms of number of learned categories, was obtained with the proposed Local-LDA approach, closely followed by a Bag-of-Words (BoW) approach. Concerning computation time, the best result was obtained with BoW, immediately followed by the Local-LDA approach.
Collapse
|
15
|
|
16
|
Zhou T, Zhang C, Gong C, Bhaskar H, Yang J. Multiview Latent Space Learning With Feature Redundancy Minimization. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1655-1668. [PMID: 30571651 DOI: 10.1109/tcyb.2018.2883673] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multiview learning has received extensive research interest and has demonstrated promising results in recent years. Despite the progress made, there are two significant challenges within multiview learning. First, some of the existing methods directly use original features to reconstruct data points without considering the issue of feature redundancy. Second, existing methods cannot fully exploit the complementary information across multiple views and meanwhile preserve the view-specific properties; therefore, the degraded learning performance will be generated. To address the above issues, we propose a novel multiview latent space learning framework with feature redundancy minimization. We aim to learn a latent space to mitigate the feature redundancy and use the learned representation to reconstruct every original data point. More specifically, we first project the original features from multiple views onto a latent space, and then learn a shared dictionary and view-specific dictionaries to, respectively, exploit the correlations across multiple views as well as preserve the view-specific properties. Furthermore, the Hilbert-Schmidt independence criterion is adopted as a diversity constraint to explore the complementarity of multiview representations, which further ensures the diversity from multiple views and preserves the local structure of the data in each view. Experimental results on six public datasets have demonstrated the effectiveness of our multiview learning approach against other state-of-the-art methods.
Collapse
|
17
|
Chi H, Xia H, Zhang L, Zhang C, Tang X. Competitive and collaborative representation for classification. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.06.019] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Min S, Yao H, Xie H, Zha ZJ, Zhang Y. Multi-Objective Matrix Normalization for Fine-grained Visual Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4996-5009. [PMID: 32149637 DOI: 10.1109/tip.2020.2977457] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity. These three regularizers can not only stabilize the second-order information, but also compact the bilinear features and promote model generalization. In MOMN, a core challenge is how to jointly optimize three non-smooth regularizers of different convex properties. To this end, MOMN first formulates them into an augmented Lagrange formula with approximated regularizer constraints. Then, auxiliary variables are introduced to relax different constraints, which allow each regularizer to be solved alternately. Finally, several updating strategies based on gradient descent are designed to obtain consistent convergence and efficient implementation. Consequently, MOMN is implemented with only matrix multiplication, which is well-compatible with GPU acceleration, and the normalized bilinear features are stabilized and discriminative. Experiments on five public benchmarks for FGVC demonstrate that the proposed MOMN is superior to existing normalization-based methods in terms of both accuracy and efficiency. The code is available: https://github.com/mboboGO/MOMN.
Collapse
|
19
|
Li Z, Zhang Z, Qin J, Zhang Z, Shao L. Discriminative Fisher Embedding Dictionary Learning Algorithm for Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:786-800. [PMID: 31056524 DOI: 10.1109/tnnls.2019.2910146] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Both interclass variances and intraclass similarities are crucial for improving the classification performance of discriminative dictionary learning (DDL) algorithms. However, existing DDL methods often ignore the combination between the interclass and intraclass properties of dictionary atoms and coding coefficients. To address this problem, in this paper, we propose a discriminative Fisher embedding dictionary learning (DFEDL) algorithm that simultaneously establishes Fisher embedding models on learned atoms and coefficients. Specifically, we first construct a discriminative Fisher atom embedding model by exploring the Fisher criterion of the atoms, which encourages the atoms of the same class to reconstruct the corresponding training samples as much as possible. At the same time, a discriminative Fisher coefficient embedding model is formulated by imposing the Fisher criterion on the profiles (row vectors of the coding coefficient matrix) and coding coefficients, which forces the coding coefficient matrix to become a block-diagonal matrix. Since the profiles can indicate which training samples are represented by the corresponding atoms, the proposed two discriminative Fisher embedding models can alternatively and interactively promote the discriminative capabilities of the learned dictionary and coding coefficients. The extensive experimental results demonstrate that the proposed DFEDL algorithm achieves superior performance in comparison with some state-of-the-art dictionary learning algorithms on both hand-crafted and deep learning-based features.
Collapse
|
20
|
Du H, Ma L, Li G, Wang S. Low-rank graph preserving discriminative dictionary learning for image recognition. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.06.031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
21
|
Abdi A, Rahmati M, Ebadzadeh MM. Dictionary learning enhancement framework: Learning a non-linear mapping model to enhance discriminative dictionary learning methods. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Ship Detection for PolSAR Images via Task-Driven Discriminative Dictionary Learning. REMOTE SENSING 2019. [DOI: 10.3390/rs11070769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Ship detection with polarimetric synthetic aperture radar (PolSAR) has received increasing attention for its wide usage in maritime applications. However, extracting discriminative features to implement ship detection is still a challenging problem. In this paper, we propose a novel ship detection method for PolSAR images via task-driven discriminative dictionary learning (TDDDL). An assumption that ship and clutter information are sparsely coded under two separate dictionaries is made. Contextual information is considered by imposing superpixel-level joint sparsity constraints. In order to amplify the discrimination of the ship and clutter, we impose incoherence constraints between the two sub-dictionaries in the objective of feature coding. The discriminative dictionary is trained jointly with a linear classifier in task-driven dictionary learning (TDDL) framework. Based on the learnt dictionary and classifier, we extract discriminative features by sparse coding, and obtain robust detection results through binary classification. Different from previous methods, our ship detection cue is obtained through active learning strategies rather than artificially designed rules, and thus, is more adaptive, effective and robust. Experiments performed on synthetic images and two RADARSAT-2 images demonstrate that our method outperforms other comparative methods. In addition, the proposed method yields better shape-preserving ability and lower computation cost.
Collapse
|
23
|
Zhang C, Cheng J, Li C, Tian Q. Image-Specific Classification With Local and Global Discriminations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4479-4486. [PMID: 28961130 DOI: 10.1109/tnnls.2017.2748952] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Most image classification methods try to learn classifiers for each class using training images alone. Due to the interclass and intraclass variations, it would be more effective to take the testing images into consideration for classifier learning. In this brief, we propose a novel image-specific classification method by combing the local and global discriminations of training images. We adaptively train classifier for each testing image instead of generating classifiers for each class with training images alone. For each testing image, we first select its ${k}$ nearest neighbors in the training set with the corresponding labels for local classifier training. This helps to model the distinctive characters of each testing image. Besides, we also use all the training images for global discrimination modeling. The local and global discriminations are combined for final classification. In this way, we could not only model the specific character of each testing image but also avoid the local optimum by jointly considering all the training images. To evaluate the usefulness of the proposed image-specific classification with local and global discrimination (ISC-LG) method, we conduct image classification experiments on several public image data sets. The superior performances over other baseline methods prove the effectiveness of the proposed ISC-LG method.
Collapse
|
24
|
Zhang C, Cheng J, Li L, Li C, Tian Q. Object Categorization Using Class-Specific Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4528-4534. [PMID: 29990030 DOI: 10.1109/tnnls.2017.2757497] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Object categorization refers to the task of automatically classifying objects based on the visual content. Existing approaches simply represent each image with the visual features without considering the specific characters of images within the same class. However, objects of the same class may exhibit unique characters, which should be represented accordingly. In this brief, we propose a novel class-specific representation strategy for object categorization. For each class, we first model the characters of images within the same class using Gaussian mixture model (GMM). We then represent each image by calculating the Euclidean distance and relative Euclidean distance between the image and the GMM model for each class. We concatenate the representations of each class for joint representation. In this way, we can represent an image by not only considering the visual contents but also combining the class-specific characters. Experiments on several public available data sets validate the superiority of the proposed class-specific representation method over well-established algorithms for object category predictions.
Collapse
|
25
|
Zhang C, Cheng J, Tian Q. Incremental Codebook Adaptation for Visual Representation and Categorization. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2012-2023. [PMID: 28749362 DOI: 10.1109/tcyb.2017.2726079] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The bag-of-visual-words model is widely used for visual content analysis. For visual data, the codebook plays an important role for efficient representation. However, the codebook has to be relearned with the changes of training images. Once the codebook is changed, the encoding parameters of local features have to be recomputed. To alleviate this problem, in this paper, we propose an incremental codebook adaptation method for efficient visual representation. Instead of learning a new codebook, we gradually adapt a prelearned codebook using new images in an incremental way. To make use of the prelearned codebook, we try to make changes to the prelearned codebook with sparsity constraint and low-rank correlation. Besides, we also encode visually similar local features within a neighborhood to take advantage of locality information and ensure the encoded parameters are consistent. To evaluate the effectiveness of the proposed method, we apply the proposed method for categorization tasks on several public image datasets. Experimental results prove the effectiveness and usefulness of the proposed method over other codebook-based methods.
Collapse
|
26
|
Peng Y, He X, Zhao J. Object-Part Attention Model for Fine-Grained Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1487-1500. [PMID: 29990123 DOI: 10.1109/tip.2017.2774041] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, such as 200 subcategories belonging to the bird, which is highly challenging due to large variance in the same subcategory and small variance among different subcategories. Existing methods generally first locate the objects or parts and then discriminate which subcategory the image belongs to. However, they mainly have two limitations: 1) relying on object or part annotations which are heavily labor consuming; and 2) ignoring the spatial relationships between the object and its parts as well as among these parts, both of which are significantly helpful for finding discriminative parts. Therefore, this paper proposes the object-part attention model (OPAM) for weakly supervised fine-grained image classification and the main novelties are: 1) object-part attention model integrates two level attentions: object-level attention localizes objects of images, and part-level attention selects discriminative parts of object. Both are jointly employed to learn multi-view and multi-scale features to enhance their mutual promotion; and 2) Object-part spatial constraint model combines two spatial constraints: object spatial constraint ensures selected parts highly representative and part spatial constraint eliminates redundancy and enhances discrimination of selected parts. Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories. Importantly, neither object nor part annotations are used in our proposed approach, which avoids the heavy labor consumption of labeling. Compared with more than ten state-of-the-art methods on four widely-used datasets, our OPAM approach achieves the best performance.
Collapse
|
27
|
Ship Classification Based on MSHOG Feature and Task-Driven Dictionary Learning with Structured Incoherent Constraints in SAR Images. REMOTE SENSING 2018. [DOI: 10.3390/rs10020190] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we present a novel method for ship classification in synthetic aperture radar (SAR) images. The proposed method consists of feature extraction and classifier training. Inspired by SAR-HOG feature in automatic target recognition, we first design a novel feature named MSHOG by improving SAR-HOG, adapting it to ship classification, and employing manifold learning to achieve dimensionality reduction. Then, we train the classifier and dictionary jointly in task-driven dictionary learning (TDDL) framework. To further improve the performance of TDDL, we enforce structured incoherent constraints on it and develop an efficient algorithm for solving corresponding optimization problem. Extensive experiments performed on two datasets with TerraSAR-X images demonstrate that the proposed method, MSHOG feature and TDDL with structured incoherent constraints, outperforms other existing methods and achieves state-of-art performance.
Collapse
|
28
|
Laplace Graph Embedding Class Specific Dictionary Learning for Face Recognition. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2018. [DOI: 10.1155/2018/2179049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The sparse representation based classification (SRC) method and collaborative representation based classification (CRC) method have attracted more and more attention in recent years due to their promising results and robustness. However, both SRC and CRC algorithms directly use the training samples as the dictionary, which leads to a large fitting error. In this paper, we propose the Laplace graph embedding class specific dictionary learning (LGECSDL) algorithm, which trains a weight matrix and embeds a Laplace graph to reconstruct the dictionary. Firstly, it can increase the dimension of the dictionary matrix, which can be used to classify the small sample database. Secondly, it gives different dictionary atoms with different weights to improve classification accuracy. Additionally, in each class dictionary training process, the LGECSDL algorithm introduces the Laplace graph embedding method to the objective function in order to keep the local structure of each class, and the proposed method is capable of improving the performance of face recognition according to the class specific dictionary learning and Laplace graph embedding regularizer. Moreover, we also extend the proposed method to an arbitrary kernel space. Extensive experimental results on several face recognition benchmark databases demonstrate the superior performance of our proposed algorithm.
Collapse
|
29
|
Porikli F. Optimal Couple Projections for Domain Adaptive Sparse Representation-Based Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5922-5935. [PMID: 28858805 DOI: 10.1109/tip.2017.2745684] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In recent years, sparse representation-based classification (SRC) is one of the most successful methods and has been shown impressive performance in various classification tasks. However, when the training data have a different distribution than the testing data, the learned sparse representation may not be optimal, and the performance of SRC will be degraded significantly. To address this problem, in this paper, we propose an optimal couple projections for domain-adaptive SRC (OCPD-SRC) method, in which the discriminative features of data in the two domains are simultaneously learned with the dictionary that can succinctly represent the training and testing data in the projected space. OCPD-SRC is designed based on the decision rule of SRC, with the objective to learn coupled projection matrices and a common discriminative dictionary such that the between-class sparse reconstruction residuals of data from both domains are maximized, and the within-class sparse reconstruction residuals of data are minimized in the projected low-dimensional space. Thus, the resulting representations can well fit SRC and simultaneously have a better discriminant ability. In addition, our method can be easily extended to multiple domains and can be kernelized to deal with the nonlinear structure of data. The optimal solution for the proposed method can be efficiently obtained following the alternative optimization method. Extensive experimental results on a series of benchmark databases show that our method is better or comparable to many state-of-the-art methods.
Collapse
|
30
|
Monga V. Fast Low-Rank Shared Dictionary Learning for Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5160-5175. [PMID: 28742035 DOI: 10.1109/tip.2017.2729885] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e., claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Furthermore, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image data sets establish the advantages of our method over the state-of-the-art dictionary learning methods.
Collapse
|
31
|
Qu Y, Lin L, Shen F, Lu C, Wu Y, Xie Y, Tao D. Joint Hierarchical Category Structure Learning and Large-Scale Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:4331-4346. [PMID: 27723591 DOI: 10.1109/tip.2016.2615423] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. The experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.
Collapse
|
32
|
Wang X, Gu Y. Cross-Label Suppression: A Discriminative and Fast Dictionary Learning With Group Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3859-3873. [PMID: 28500002 DOI: 10.1109/tip.2017.2703101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper addresses image classification through learning a compact and discriminative dictionary efficiently. Given a structured dictionary with each atom (columns in the dictionary matrix) related to some label, we propose crosslabel suppression constraint to enlarge the difference among representations for different classes. Meanwhile, we introduce group regularization to enforce representations to preserve label properties of original samples, meaning the representations for the same class are encouraged to be similar. Upon the crosslabel suppression, we donot resort to frequently-used ℓ0-norm or ℓ1-norm for coding, and obtain computational efficiency without losing the discriminative power for categorization. Moreover, two simple classification schemes are also developed to take full advantage of the learnt dictionary. Extensive experiments on six data sets, including face recognition, object categorization, scene classification, texture recognition, and sport action categorization are conducted, and the results show that the proposed approach can outperform lots of recently presented dictionary algorithms on both recognition accuracy and computational efficiency.
Collapse
|
33
|
Zhang C, Liang C, Li L, Liu J, Huang Q, Tian Q. Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1550-1559. [PMID: 28060711 DOI: 10.1109/tnnls.2016.2545112] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper tries to separate fine-grained images by jointly learning the encoding parameters and codebooks through low-rank sparse coding (LRSC) with general and class-specific codebook generation. Instead of treating each local feature independently, we encode the local features within a spatial region jointly by LRSC. This ensures that the spatially nearby local features with similar visual characters are encoded by correlated parameters. In this way, we can make the encoded parameters more consistent for fine-grained image representation. Besides, we also learn a general codebook and a number of class-specific codebooks in combination with the encoding scheme. Since images of fine-grained classes are visually similar, the difference is relatively small between the general codebook and each class-specific codebook. We impose sparsity constraints to model this relationship. Moreover, the incoherences with different codebooks and class-specific codebooks are jointly considered. We evaluate the proposed method on several public image data sets. The experimental results show that by learning general and class-specific codebooks with the joint encoding of local features, we are able to model the differences among different fine-grained classes than many other fine-grained image classification methods.
Collapse
|
34
|
Han J, Yue J, Zhang Y, Bai L. Local structure preserving sparse coding for infrared target recognition. PLoS One 2017; 12:e0173613. [PMID: 28323824 PMCID: PMC5360252 DOI: 10.1371/journal.pone.0173613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 02/22/2017] [Indexed: 12/02/2022] Open
Abstract
Sparse coding performs well in image classification. However, robust target recognition requires a lot of comprehensive template images and the sparse learning process is complex. We incorporate sparsity into a template matching concept to construct a local sparse structure matching (LSSM) model for general infrared target recognition. A local structure preserving sparse coding (LSPSc) formulation is proposed to simultaneously preserve the local sparse and structural information of objects. By adding a spatial local structure constraint into the classical sparse coding algorithm, LSPSc can improve the stability of sparse representation for targets and inhibit background interference in infrared images. Furthermore, a kernel LSPSc (K-LSPSc) formulation is proposed, which extends LSPSc to the kernel space to weaken the influence of the linear structure constraint in nonlinear natural data. Because of the anti-interference and fault-tolerant capabilities, both LSPSc- and K-LSPSc-based LSSM can implement target identification based on a simple template set, which just needs several images containing enough local sparse structures to learn a sufficient sparse structure dictionary of a target class. Specifically, this LSSM approach has stable performance in the target detection with scene, shape and occlusions variations. High performance is demonstrated on several datasets, indicating robust infrared target recognition in diverse environments and imaging conditions.
Collapse
Affiliation(s)
- Jing Han
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Jiang Yue
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Yi Zhang
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Lianfa Bai
- Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
35
|
Zhang D. A Locality-Constrained and Label Embedding Dictionary Learning Algorithm for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:278-293. [PMID: 28055916 DOI: 10.1109/tnnls.2015.2508025] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Locality and label information of training samples play an important role in image classification. However, previous dictionary learning algorithms do not take the locality and label information of atoms into account together in the learning process, and thus their performance is limited. In this paper, a discriminative dictionary learning algorithm, called the locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm, was proposed for image classification. First, the locality information was preserved using the graph Laplacian matrix of the learned dictionary instead of the conventional one derived from the training samples. Then, the label embedding term was constructed using the label information of atoms instead of the classification error term, which contained discriminating information of the learned dictionary. The optimal coding coefficients derived by the locality-based and label-based reconstruction were effective for image classification. Experimental results demonstrated that the LCLE-DL algorithm can achieve better performance than some state-of-the-art algorithms.
Collapse
|
36
|
Zhang C, Zhu G, Huang Q, Tian Q. Image classification by search with explicitly and implicitly semantic representations. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.10.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
37
|
Wang M, Xie J, Zhu F, Fang Y. Linear discrimination dictionary learning for shape descriptors. Pattern Recognit Lett 2016. [DOI: 10.1016/j.patrec.2016.05.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
Liu BD, Shen B, Gui L, Wang YX, Li X, Yan F, Wang YJ. Face recognition using class specific dictionary learning for sparse representation and collaborative representation. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.08.128] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
39
|
Babaee M, Wolf T, Rigoll G. Toward semantic attributes in dictionary learning and non-negative matrix factorization. Pattern Recognit Lett 2016. [DOI: 10.1016/j.patrec.2016.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
40
|
Wu F, Jing XY, Yue D. Multi-view Discriminant Dictionary Learning via Learning View-specific and Shared Structured Dictionaries for Image Classification. Neural Process Lett 2016. [DOI: 10.1007/s11063-016-9545-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
41
|
Do MN. Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1713-1725. [PMID: 26890872 DOI: 10.1109/tip.2016.2531289] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a fine-grained image categorization system with easy deployment. We do not use any object/part annotation (weakly supervised) in the training or in the testing stage, but only class labels for training images. Fine-grained image categorization aims to classify objects with only subtle distinctions (e.g., two breeds of dogs that look alike). Most existing works heavily rely on object/part detectors to build the correspondence between object parts, which require accurate object or object part annotations at least for training images. The need for expensive object annotations prevents the wide usage of these methods. Instead, we propose to generate multi-scale part proposals from object proposals, select useful part proposals, and use them to compute a global image representation for categorization. This is specially designed for the weakly supervised fine-grained categorization task, because useful parts have been shown to play a critical role in existing annotation-dependent works, but accurate part detectors are hard to acquire. With the proposed image representation, we can further detect and visualize the key (most discriminative) parts in objects of different classes. In the experiments, the proposed weakly supervised method achieves comparable or better accuracy than the state-of-the-art weakly supervised methods and most existing annotation-dependent methods on three challenging datasets. Its success suggests that it is not always necessary to learn expensive object/part detectors in fine-grained image categorization.
Collapse
|
42
|
Williams K. A Feature Learning and Object Recognition Framework for Underwater Fish Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1862-1872. [PMID: 26930683 DOI: 10.1109/tip.2016.2535342] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Live fish recognition is one of the most crucial elements of fisheries survey applications where the vast amount of data is rapidly acquired. Different from general scenarios, challenges to underwater image recognition are posted by poor image quality, uncontrolled objects and environment, and difficulty in acquiring representative samples. In addition, most existing feature extraction techniques are hindered from automation due to involving human supervision. Toward this end, we propose an underwater fish recognition framework that consists of a fully unsupervised feature learning technique and an error-resilient classifier. Object parts are initialized based on saliency and relaxation labeling to match object parts correctly. A non-rigid part model is then learned based on fitness, separation, and discrimination criteria. For the classifier, an unsupervised clustering approach generates a binary class hierarchy, where each node is a classifier. To exploit information from ambiguous images, the notion of partial classification is introduced to assign coarse labels by optimizing the benefit of indecision made by the classifier. Experiments show that the proposed framework achieves high accuracy on both public and self-collected underwater fish images with high uncertainty and class imbalance.
Collapse
|
43
|
Zhang X, Xiong H, Zhou W, Tian Q. Fused One-vs-All Features With Semantic Alignments for Fine-Grained Visual Categorization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:878-892. [PMID: 26685247 DOI: 10.1109/tip.2015.2509425] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Fine-grained visual categorization is an emerging research area and has been attracting growing attention recently. Due to the large inter-class similarity and intra-class variance, it is extremely challenging to recognize objects in fine-grained domains. A traditional spatial pyramid matching model could obtain desirable results for the basic-level category classification by weak alignment, but may easily fail in fine-grained domains, since the discriminative features are extremely localized. This paper proposes a new framework for fine-grained visual categorization. First, an efficient part localization method incorporates semantic prior into geometric alignment. It detects the less deformable parts, such as the head of birds with a template-based model, and localizes other highly deformable parts with simple geometric alignment. Second, we learn one-vs-all features, which are simple and transplantable. The learned mid-level features are dimension friendly and more robust to outlier instances. Furthermore, in view that some subcategories are too similar to tell them apart easily, we fuse the subcategories iteratively according to their similarities, and learn fused one-vs-all features. Experimental results show the superior performance of our algorithms over the existing methods.
Collapse
|
44
|
|
45
|
Zhang C, Cheng J, Liu J, Pang J, Huang Q, Tian Q. Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:5777-5788. [PMID: 26441449 DOI: 10.1109/tip.2015.2485783] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The bag-of-visual-words model plays a very important role for visual applications. Local features are first extracted and then encoded to get the histogram-based image representation. To encode local features, a proper codebook is needed. Usually, the codebook has to be generated for each data set which means the codebook is data set dependent. Besides, the codebook may be biased when we only have a limited number of training images. Moreover, the codebook has to be pre-learned which cannot be updated quickly, especially when applied for online visual applications. To solve the problems mentioned above, in this paper, we propose a novel implicit codebook transfer method for visual representation. Instead of explicitly generating the codebook for the new data set, we try to make use of pre-learned codebooks using non-linear transfer. This is achieved by transferring the pre-learned codebooks with non-linear transformation and use them to reconstruct local features with sparsity constraints. The codebook does not need to be explicitly generated but can be implicitly transferred. In this way, we are able to make use of pre-learned codebooks for new visual applications by implicitly learning the codebook and the corresponding encoding parameters for image representation. We apply the proposed method for image classification and evaluate the performance on several public image data sets. Experimental results demonstrate the effectiveness and efficiency of the proposed method.
Collapse
|
46
|
Multiple graph regularized sparse coding and multiple hypergraph regularized sparse coding for image representation. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.11.067] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
47
|
|