1
|
Hosseinzadeh Taher MR, Haghighi F, Gotway MB, Liang J. Large-scale benchmarking and boosting transfer learning for medical image analysis. Med Image Anal 2025; 102:103487. [PMID: 40117988 DOI: 10.1016/j.media.2025.103487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 08/03/2024] [Accepted: 01/27/2025] [Indexed: 03/23/2025]
Abstract
Transfer learning, particularly fine-tuning models pretrained on photographic images to medical images, has proven indispensable for medical image analysis. There are numerous models with distinct architectures pretrained on various datasets using different strategies. But, there is a lack of up-to-date large-scale evaluations of their transferability to medical imaging, posing a challenge for practitioners in selecting the most proper pretrained models for their tasks at hand. To fill this gap, we conduct a comprehensive systematic study, focusing on (i) benchmarking numerous conventional and modern convolutional neural network (ConvNet) and vision transformer architectures across various medical tasks; (ii) investigating the impact of fine-tuning data size on the performance of ConvNets compared with vision transformers in medical imaging; (iii) examining the impact of pretraining data granularity on transfer learning performance; (iv) evaluating transferability of a wide range of recent self-supervised methods with diverse training objectives to a variety of medical tasks across different modalities; and (v) delving into the efficacy of domain-adaptive pretraining on both photographic and medical datasets to develop high-performance models for medical tasks. Our large-scale study (∼5,000 experiments) yields impactful insights: (1) ConvNets demonstrate higher transferability than vision transformers when fine-tuning for medical tasks; (2) ConvNets prove to be more annotation efficient than vision transformers when fine-tuning for medical tasks; (3) Fine-grained representations, rather than high-level semantic features, prove pivotal for fine-grained medical tasks; (4) Self-supervised models excel in learning holistic features compared with supervised models; and (5) Domain-adaptive pretraining leads to performant models via harnessing knowledge acquired from ImageNet and enhancing it through the utilization of readily accessible expert annotations associated with medical datasets. As open science, all codes and pretrained models are available at GitHub.com/JLiangLab/BenchmarkTransferLearning (Version 2).
Collapse
Affiliation(s)
| | - Fatemeh Haghighi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | | | - Jianming Liang
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
2
|
Liu SL, Ding YN, Zhang JR, Liu KY, Zhang SF, Wang FL, Huang G. Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7615-7626. [PMID: 38619962 DOI: 10.1109/tnnls.2024.3384770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Graph convolutional networks (GCNs) have been widely used in skeleton-based action recognition. However, existing approaches are limited in fine-grained action recognition due to the similarity of interclass data. Moreover, the noisy data from pose extraction increase the challenge of fine-grained recognition. In this work, we propose a flexible attention block called channel-variable spatial-temporal attention (CVSTA) to enhance the discriminative power of spatial-temporal joints and obtain a more compact intraclass feature distribution. Based on CVSTA, we construct a multidimensional refinement GCN (MDR-GCN) that can improve the discrimination among channel-, joint-, and frame-level features for fine-grained actions. Furthermore, we propose a robust decouple loss (RDL) that significantly boosts the effect of the CVSTA and reduces the impact of noise. The proposed method combining MDR-GCN with RDL outperforms the known state-of-the-art skeleton-based approaches on fine-grained datasets, FineGym99 and FSD-10, and also on the coarse NTU-RGB + D 120 dataset and NTU-RGB + D X-view version. Our code is publicly available at https://github.com/dingyn-Reno/MDR-GCN.
Collapse
|
3
|
Bai X, Zhang P, Yu X, Zheng J, Hancock ER, Zhou J, Gu L. Learning From Human Attention for Attribute-Assisted Visual Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:11152-11167. [PMID: 39259624 DOI: 10.1109/tpami.2024.3458921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
With prior knowledge of seen objects, humans have a remarkable ability to recognize novel objects using shared and distinct local attributes. This is significant for the challenging tasks of zero-shot learning (ZSL) and fine-grained visual classification (FGVC), where the discriminative attributes of objects have played an important role. Inspired by human visual attention, neural networks have widely exploited the attention mechanism to learn the locally discriminative attributes for challenging tasks. Though greatly promoted the development of these fields, existing works mainly focus on learning the region embeddings of different attribute features and neglect the importance of discriminative attribute localization. It is also unclear whether the learned attention truly matches the real human attention. To tackle this problem, this paper proposes to employ real human gaze data for visual recognition networks to learn from human attention. Specifically, we design a unified Attribute Attention Network (A 2Net) that learns from human attention for both ZSL and FGVC tasks. The overall model consists of an attribute attention branch and a baseline classification network. On top of the image feature maps provided by the baseline classification network, the attribute attention branch employs attribute prototypes to produce attribute attention maps and attribute features. The attribute attention maps are converted to gaze-like attentions to be aligned with real human gaze attention. To guarantee the effectiveness of attribute feature learning, we further align the extracted attribute features with attribute-defined class embeddings. To facilitate learning from human gaze attention for the visual recognition problems, we design a bird classification game to collect real human gaze data using the CUB dataset via an eye-tracker device. Experiments on ZSL and FGVC tasks without/with real human gaze data validate the benefits and accuracy of our proposed model. This work supports the promising benefits of collecting human gaze datasets and automatic gaze estimation algorithms learning from human attention for high-level computer vision tasks.
Collapse
|
4
|
Sikdar A, Liu Y, Kedarisetty S, Zhao Y, Ahmed A, Behera A. Interweaving Insights: High-Order Feature Interaction for Fine-Grained Visual Recognition. Int J Comput Vis 2024; 133:1755-1779. [PMID: 40160952 PMCID: PMC11953118 DOI: 10.1007/s11263-024-02260-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 09/26/2024] [Indexed: 04/02/2025]
Abstract
This paper presents a novel approach for Fine-Grained Visual Classification (FGVC) by exploring Graph Neural Networks (GNNs) to facilitate high-order feature interactions, with a specific focus on constructing both inter- and intra-region graphs. Unlike previous FGVC techniques that often isolate global and local features, our method combines both features seamlessly during learning via graphs. Inter-region graphs capture long-range dependencies to recognize global patterns, while intra-region graphs delve into finer details within specific regions of an object by exploring high-dimensional convolutional features. A key innovation is the use of shared GNNs with an attention mechanism coupled with the Approximate Personalized Propagation of Neural Predictions (APPNP) message-passing algorithm, enhancing information propagation efficiency for better discriminability and simplifying the model architecture for computational efficiency. Additionally, the introduction of residual connections improves performance and training stability. Comprehensive experiments showcase state-of-the-art results on benchmark FGVC datasets, affirming the efficacy of our approach. This work underscores the potential of GNN in modeling high-level feature interactions, distinguishing it from previous FGVC methods that typically focus on singular aspects of feature representation. Our source code is available at https://github.com/Arindam-1991/I2-HOFI.
Collapse
Affiliation(s)
- Arindam Sikdar
- Department of Computer Science, Edge Hill University, Ormskirk, UK
| | - Yonghuai Liu
- Department of Computer Science, Edge Hill University, Ormskirk, UK
| | - Siddhardha Kedarisetty
- Department of Aerospace Engineering, Technion—Israel Institute of Technology, Haifa, Israel
| | - Yitian Zhao
- Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Amr Ahmed
- Department of Computer Science, Edge Hill University, Ormskirk, UK
| | - Ardhendu Behera
- Department of Computer Science, Edge Hill University, Ormskirk, UK
| |
Collapse
|
5
|
Yang S, Yang X, Wu J, Feng B. Significant feature suppression and cross-feature fusion networks for fine-grained visual classification. Sci Rep 2024; 14:24051. [PMID: 39402140 PMCID: PMC11473661 DOI: 10.1038/s41598-024-74654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 09/27/2024] [Indexed: 10/17/2024] Open
Abstract
The technique of extracting different distinguishing features by locating different part regions to achieve fine-grained visual classification (FGVC) has made significant improvements. Utilizing attention mechanisms for feature extraction has become one of the mainstream methods in computer vision, but these methods have certain limitations. They typically focus on the most discriminative regions and directly combine the features of these parts, neglecting other less prominent yet still discriminative regions. Additionally, these methods may not fully explore the intrinsic connections between higher-order and lower-order features to optimize model classification performance. By considering the potential relationships between different higher-order feature representations in the object image, we can enable the integrated higher-order features to contribute more significantly to the model's classification decision-making capabilities. To this end, we propose a saliency feature suppression and cross-feature fusion network model (SFSCF-Net) to explore the interaction learning between different higher-order feature representations. These include (1) an object-level image generator (OIG): the intersection of the output feature maps of the last two convolutional blocks of the backbone network is used as an object mask and mapped to the original image for cropping to obtain an object-level image, which can effectively reduce the interference caused by complex backgrounds. (2) A saliency feature suppression module (SFSM): the most distinguishing part of the object image is obtained by a feature extractor, and the part is masked by a two-dimensional suppression method, which improves the accuracy of feature suppression. (3) A cross-feature fusion method (CFM) based on inter-layer interaction: the output feature maps of different network layers are interactively integrated to obtain high-dimensional features, and then the high-dimensional features are channel compressed to obtain the inter-layer interaction feature representation, which enriches the output feature semantic information. The proposed SFSCF-Net can be trained end-to-end and achieves state-of-the-art or competitive results on four FGVC benchmark datasets.
Collapse
Affiliation(s)
- Shengying Yang
- Zhejiang University of Science and Technology, Hangzhou, 310023, China.
| | - Xinqi Yang
- Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Jianfeng Wu
- Zhejiang Shuren University, Hangzhou, 310023, China.
| | - Boyang Feng
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3000, Australia
| |
Collapse
|
6
|
Li R, Huang Y, Wang Y, Song C, Lai X. MRI-based deep learning for differentiating between bipolar and major depressive disorders. Psychiatry Res Neuroimaging 2024; 345:111907. [PMID: 39357171 DOI: 10.1016/j.pscychresns.2024.111907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 09/22/2024] [Accepted: 09/23/2024] [Indexed: 10/04/2024]
Abstract
Mood disorders, particularly bipolar disorder (BD) and major depressive disorder (MDD), manifest changes in brain structure that can be detected using structural magnetic resonance imaging (MRI). Although structural MRI is a promising diagnostic tool, prevailing diagnostic criteria for BD and MDD are predominantly subjective, sometimes leading to misdiagnosis. This challenge is compounded by a limited understanding of the underlying causes of these disorders. In response, we present SE-ResNet, a Residual Network (ResNet)-based framework designed to discriminate between BD, MDD, and healthy controls (HC) using structural MRI data. Our approach extends the traditional Squeeze-and-Excitation (SE) layer by incorporating a dedicated branch for spatial attention map generation, equipped with soft-pooling, a 7 × 7 convolution, and a sigmoid function, intended to detect complex spatial patterns. The fusion of channel and spatial attention maps through element-wise addition aims to enhance the model's ability to discriminate features. Unlike conventional methods that use max-pooling for downsampling, our methodology employs soft-pooling, which aims to preserve a richer representation of input features and reduce data loss. When evaluated on a proprietary dataset comprising 303 subjects, the SE-ResNet achieved an accuracy of 85.8 %, a recall of 85.7 %, a precision of 85.9 %, and an F1 score of 85.8 %. These performance metrics suggest that the SE-ResNet framework has potential as a tool for detecting psychiatric disorders using structural MRI data.
Collapse
Affiliation(s)
- Ruipeng Li
- Third People's Hospital of Hangzhou, Hangzhou, 310010, China.
| | - Yueqi Huang
- Seventh People's Hospital of Hangzhou, Hangzhou, 310013, China
| | - Yanbin Wang
- Third People's Hospital of Hangzhou, Hangzhou, 310010, China
| | - Chen Song
- Third People's Hospital of Hangzhou, Hangzhou, 310010, China.
| | - Xiaobo Lai
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, 310053, China.
| |
Collapse
|
7
|
Xie F, Xu P, Xi X, Gu X, Zhang P, Wang H, Shen X. Oral mucosal disease recognition based on dynamic self-attention and feature discriminant loss. Oral Dis 2024; 30:3094-3107. [PMID: 37731172 DOI: 10.1111/odi.14732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/22/2023] [Accepted: 08/25/2023] [Indexed: 09/22/2023]
Abstract
OBJECTIVES To develop a dynamic self-attention and feature discrimination loss function (DSDF) model for identifying oral mucosal diseases presented to solve the problems of data imbalance, complex image background, and high similarity and difference of visual characteristics among different types of lesion areas. METHODS In DSDF, dynamic self-attention network can fully mine the context information between adjacent areas, improve the visual representation of the network, and promote the network model to learn and locate the image area of interest. Then, the feature discrimination loss function is used to constrain the diversity of channel characteristics, so as to enhance the feature discrimination ability of local similar areas. RESULTS The experimental results show that the recognition accuracy of the proposed method for oral mucosal disease is the highest at 91.16%, and is about 6% ahead of other advanced methods. In addition, DSDF has recall of 90.87% and F1 of 90.60%. CONCLUSIONS Convolutional neural networks can effectively capture the visual features of the oral mucosal disease lesions, and the distinguished visual features of different oral lesions can be extracted better using dynamic self-attention and feature discrimination loss function, which is conducive to the auxiliary diagnosis of oral mucosal diseases.
Collapse
Affiliation(s)
- Fei Xie
- Xi'an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, Xijing University, Xi'an, China
- School of AOAIR, Xidian University, Xi'an, China
| | - Pengfei Xu
- School of Information Science and Technology, Northwest University, Xi'an, China
| | - Xinyi Xi
- School of Information Science and Technology, Northwest University, Xi'an, China
| | - Xiaokang Gu
- School of Information Science and Technology, Northwest University, Xi'an, China
| | - Panpan Zhang
- School of Information Science and Technology, Northwest University, Xi'an, China
| | - Hexu Wang
- Xi'an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, Xijing University, Xi'an, China
| | - Xuemin Shen
- Department of Oral Mucosal Diseases, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
8
|
Zhao LJ, Chen ZD, Ma ZX, Luo X, Xu XS. Angular Isotonic Loss Guided Multi-Layer Integration for Few-Shot Fine-Grained Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3778-3792. [PMID: 38870000 DOI: 10.1109/tip.2024.3411474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Recent research on few-shot fine-grained image classification (FSFG) has predominantly focused on extracting discriminative features. The limited attention paid to the role of loss functions has resulted in weaker preservation of similarity relationships between query and support instances, thereby potentially limiting the performance of FSFG. In this regard, we analyze the limitations of widely adopted cross-entropy loss and introduce a novel Angular ISotonic (AIS) loss. The AIS loss introduces an angular margin to constrain the prototypes to maintain a certain distance from a pre-set threshold. It guides the model to converge more stably, learn clearer boundaries among highly similar classes, and achieve higher accuracy faster with limited instances. Moreover, to better accommodate the feature requirements of the AIS loss and fully exploit its potential in FSFG, we propose a Multi-Layer Integration (MLI) network that captures object features from multiple perspectives to provide more comprehensive and informative representations of the input images. Extensive experiments demonstrate the effectiveness of our proposed method on four standard fine-grained benchmarks. Codes are available at: https://github.com/Legenddddd/AIS-MLI.
Collapse
|
9
|
Wei XS, Yu HT, Xu A, Zhang F, Peng Y. MECOM: A Meta-Completion Network for Fine-Grained Recognition With Incomplete Multi-Modalities. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3456-3469. [PMID: 38787666 DOI: 10.1109/tip.2024.3403051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Our work focuses on tackling the problem of fine-grained recognition with incomplete multi-modal data, which is overlooked by previous work in the literature. It is desirable to not only capture fine-grained patterns of objects but also alleviate the challenges of missing modalities for such a practical problem. In this paper, we propose to leverage a meta-learning strategy to learn model abilities of both fast modal adaptation and more importantly missing modality completion across a variety of incomplete multi-modality learning tasks. Based on that, we develop a meta-completion method, termed as MECOM, to perform multimodal fusion and explicit missing modality completion by our proposals of cross-modal attention and decoupling reconstruction. To further improve fine-grained recognition accuracy, an additional partial stream (as a counterpart of the main stream of MECOM, i.e., holistic) and the part-level features (corresponding to fine-grained objects' parts) selection are designed, which are tailored for fine-grained nature to capture discriminative but subtle part-level patterns. Comprehensive experiments from quantitative and qualitative aspects, as well as various ablation studies, on two fine-grained multimodal datasets and one generic multimodal dataset show our superiority over competing methods. Our code is open-source and available at https://github.com/SEU-VIPGroup/MECOM.
Collapse
|
10
|
Zhang X, Dong S, Chen J, Tian Q, Gong Y, Hong X. Deep Class-Incremental Learning From Decentralized Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7190-7203. [PMID: 36315536 DOI: 10.1109/tnnls.2022.3214573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories. We initiate the study of data-decentralized class-incremental learning (DCIL) by making the following contributions. First, we formulate the DCIL problem and develop the experimental protocol. Second, we introduce a paradigm to create a basic decentralized counterpart of typical (centralized) CIL approaches, and as a result, establish a benchmark for the DCIL study. Third, we further propose a decentralized composite knowledge incremental distillation (DCID) framework to transfer knowledge from historical models and multiple local sites to the general model continually. DCID consists of three main components, namely, local CIL, collaborated knowledge distillation (KD) among local models, and aggregated KD from local models to the general one. We comprehensively investigate our DCID framework by using a different implementation of the three components. Extensive experimental results demonstrate the effectiveness of our DCID framework. The source code of the baseline methods and the proposed DCIL is available at https://github.com/Vision-Intelligence-and-Robots-Group/DCIL.
Collapse
|
11
|
Liang Y, Zhu L, Wang X, Yang Y. Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7048-7059. [PMID: 36409807 DOI: 10.1109/tnnls.2022.3213563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Though significant progress has been achieved on fine-grained visual classification (FGVC), severe overfitting still hinders model generalization. A recent study shows that hard samples in the training set can be easily fit, but most existing FGVC methods fail to classify some hard examples in the test set. The reason is that the model overfits those hard examples in the training set, but does not learn to generalize to unseen examples in the test set. In this article, we propose a moderate hard example modulation (MHEM) strategy to properly modulate the hard examples. MHEM encourages the model to not overfit hard examples and offers better generalization and discrimination. First, we introduce three conditions and formulate a general form of a modulated loss function. Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods. Moreover, we demonstrate that our baseline can be readily incorporated into the existing methods and empower these methods to be more discriminative. Equipped with our strong baseline, we achieve consistent improvements on three typical FGVC datasets, i.e., CUB-200-2011, Stanford Cars, and FGVC-Aircraft. We hope the idea of moderate hard example modulation will inspire future research work toward more effective fine-grained visual recognition.
Collapse
|
12
|
Pu Y, Han Y, Wang Y, Feng J, Deng C, Huang G. Fine-Grained Recognition With Learnable Semantic Data Augmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3130-3144. [PMID: 38662557 DOI: 10.1109/tip.2024.3364500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2024]
Abstract
Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA.
Collapse
|
13
|
Ye S, Peng Q, Sun W, Xu J, Wang Y, You X, Cheung YM. Discriminative Suprasphere Embedding for Fine-Grained Visual Categorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5092-5102. [PMID: 36107889 DOI: 10.1109/tnnls.2022.3202534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Despite the great success of the existing work in fine-grained visual categorization (FGVC), there are still several unsolved challenges, e.g., poor interpretation and vagueness contribution. To circumvent this drawback, motivated by the hypersphere embedding method, we propose a discriminative suprasphere embedding (DSE) framework, which can provide intuitive geometric interpretation and effectively extract discriminative features. Specifically, DSE consists of three modules. The first module is a suprasphere embedding (SE) block, which learns discriminative information by emphasizing weight and phase. The second module is a phase activation map (PAM) used to analyze the contribution of local descriptors to the suprasphere feature representation, which uniformly highlights the object region and exhibits remarkable object localization capability. The last module is a class contribution map (CCM), which quantitatively analyzes the network classification decision and provides insight into the domain knowledge about classified objects. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method in comparison with state-of-the-art methods.
Collapse
|
14
|
Niu ZB, Jia SY, Xu HH. Automated graptolite identification at high taxonomic resolution using residual networks. iScience 2024; 27:108549. [PMID: 38213629 PMCID: PMC10783601 DOI: 10.1016/j.isci.2023.108549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 08/23/2023] [Accepted: 11/20/2023] [Indexed: 01/13/2024] Open
Abstract
Graptolites, fossils significant for evolutionary studies and shale gas exploration, are traditionally identified visually by taxonomists due to their intricate morphologies and preservation challenges. Artificial intelligence (AI) holds great promise for transforming such meticulous tasks. In this paper, we demonstrate that graptolites can be identified with taxonomist accuracy using a deep learning model. We construct the most sophisticated and largest professional single organisms image dataset to date, which is composed of >34,000 images of 113 graptolite species annotated at pixel-level resolution to train the model, develop, and evaluate deep learning networks to classify graptolites. The model's performance surpassed taxonomists in accuracy, time, and generalization, achieving 86% and 81% accuracy in identifying graptolite genus and species, respectively. This AI-based method, capable of recognizing minute morphological details better than taxonomists, can be integrated into web and mobile apps, extending graptolite identification beyond research institutes and enhancing shale gas exploration efficiency.
Collapse
Affiliation(s)
- Zhi-Bin Niu
- College of Intelligence and Computing, Tianjin University, Tianjin 300354, China
- State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing 210008, China
| | - Si-Yuan Jia
- College of Intelligence and Computing, Tianjin University, Tianjin 300354, China
| | - Hong-He Xu
- State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing 210008, China
| |
Collapse
|
15
|
Zhang Y, Hu J, Jiang R, Lin Z, Chen Z. Fine-Grained Radio Frequency Fingerprint Recognition Network Based on Attention Mechanism. ENTROPY (BASEL, SWITZERLAND) 2023; 26:29. [PMID: 38248155 PMCID: PMC10814318 DOI: 10.3390/e26010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/23/2024]
Abstract
With the rapid development of the internet of things (IoT), hundreds of millions of IoT devices, such as smart home appliances, intelligent-connected vehicles, and wearable devices, have been connected to the network. The open nature of IoT makes it vulnerable to cybersecurity threats. Traditional cryptography-based encryption methods are not suitable for IoT due to their complexity and high communication overhead requirements. By contrast, RF-fingerprint-based recognition is promising because it is rooted in the inherent non-reproducible hardware defects of the transmitter. However, it still faces the challenges of low inter-class variation and large intra-class variation among RF fingerprints. Inspired by fine-grained recognition in computer vision, we propose a fine-grained RF fingerprint recognition network (FGRFNet) in this article. The network consists of a top-down feature pathway hierarchy to generate pyramidal features, attention modules to locate discriminative regions, and a fusion module to adaptively integrate features from different scales. Experiments demonstrate that the proposed FGRFNet achieves recognition accuracies of 89.8% on 100 ADS-B devices, 99.5% on 54 Zigbee devices, and 83.0% on 25 LoRa devices.
Collapse
Affiliation(s)
| | - Jun Hu
- School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen 518107, China; (Y.Z.); (R.J.); (Z.L.); (Z.C.)
| | | | | | | |
Collapse
|
16
|
Yang Y, Feng Y, Zhu L, Fu H, Pan X, Jin C. Feature fusion network based on few-shot fine-grained classification. Front Neurorobot 2023; 17:1301192. [PMID: 38023453 PMCID: PMC10665847 DOI: 10.3389/fnbot.2023.1301192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023] Open
Abstract
The objective of few-shot fine-grained learning is to identify subclasses within a primary class using a limited number of labeled samples. However, many current methodologies rely on the metric of singular feature, which is either global or local. In fine-grained image classification tasks, where the inter-class distance is small and the intra-class distance is big, relying on a singular similarity measurement can lead to the omission of either inter-class or intra-class information. We delve into inter-class information through global measures and tap into intra-class information via local measures. In this study, we introduce the Feature Fusion Similarity Network (FFSNet). This model employs global measures to accentuate the differences between classes, while utilizing local measures to consolidate intra-class data. Such an approach enables the model to learn features characterized by enlarge inter-class distances and reduce intra-class distances, even with a limited dataset of fine-grained images. Consequently, this greatly enhances the model's generalization capabilities. Our experimental results demonstrated that the proposed paradigm stands its ground against state-of-the-art models across multiple established fine-grained image benchmark datasets.
Collapse
Affiliation(s)
| | | | - Li Zhu
- College of Information Technology, Jilin Agriculture University, Changchun, China
| | | | | | | |
Collapse
|
17
|
Lyu X, Gao L, Zeng P, Shen HT, Song J. Adaptive Fine-Grained Predicates Learning for Scene Graph Generation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:13921-13940. [PMID: 37788219 DOI: 10.1109/tpami.2023.3298356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., "woman-on/standing on/walking on-beach". As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in keeping with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.
Collapse
|
18
|
Liu Y, Hong X, Tao X, Dong S, Shi J, Gong Y. Model Behavior Preserving for Class-Incremental Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7529-7540. [PMID: 35120008 DOI: 10.1109/tnnls.2022.3144183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep models have shown to be vulnerable to catastrophic forgetting, a phenomenon that the recognition performance on old data degrades when a pre-trained model is fine-tuned on new data. Knowledge distillation (KD) is a popular incremental approach to alleviate catastrophic forgetting. However, it usually fixes the absolute values of neural responses for isolated historical instances, without considering the intrinsic structure of the responses by a convolutional neural network (CNN) model. To overcome this limitation, we recognize the importance of the global property of the whole instance set and treat it as a behavior characteristic of a CNN model relevant to model incremental learning. On this basis: 1) we design an instance neighborhood-preserving (INP) loss to maintain the order of pair-wise instance similarities of the old model in the feature space; 2) we devise a label priority-preserving (LPP) loss to preserve the label ranking lists within instance-wise label probability vectors in the output space; and 3) we introduce an efficient derivable ranking algorithm for calculating the two loss functions. Extensive experiments conducted on CIFAR100 and ImageNet show that our approach achieves the state-of-the-art performance.
Collapse
|
19
|
Li Y, Xia T, Luo H, He B, Jia F. MT-FiST: A Multi-Task Fine-Grained Spatial-Temporal Framework for Surgical Action Triplet Recognition. IEEE J Biomed Health Inform 2023; 27:4983-4994. [PMID: 37498758 DOI: 10.1109/jbhi.2023.3299321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Surgical action triplet recognition plays a significant role in helping surgeons facilitate scene analysis and decision-making in computer-assisted surgeries. Compared to traditional context-aware tasks such as phase recognition, surgical action triplets, comprising the instrument, verb, and target, can offer more comprehensive and detailed information. However, current triplet recognition methods fall short in distinguishing the fine-grained subclasses and disregard temporal correlation in action triplets. In this article, we propose a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition named MT-FiST. The proposed method utilizes a multi-label mutual channel loss, which consists of diversity and discriminative components. This loss function decouples global task features into class-aligned features, enabling the learning of more local details from the surgical scene. The proposed framework utilizes partial shared-parameters LSTM units to capture temporal correlations between adjacent frames. We conducted experiments on the CholecT50 dataset proposed in the MICCAI 2021 Surgical Action Triplet Recognition Challenge. Our framework is evaluated on the private test set of the challenge to ensure fair comparisons. Our model apparently outperformed state-of-the-art models in instrument, verb, target, and action triplet recognition tasks, with mAPs of 82.1% (+4.6%), 51.5% (+4.0%), 45.50% (+7.8%), and 35.8% (+3.1%), respectively. The proposed MT-FiST boosts the recognition of surgical action triplets in a context-aware surgical assistant system, further solving multi-task recognition by effective temporal aggregation and fine-grained features.
Collapse
|
20
|
Hayee S, Hussain F, Yousaf MH. A Novel FDLSR-Based Technique for View-Independent Vehicle Make and Model Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:7920. [PMID: 37765976 PMCID: PMC10537004 DOI: 10.3390/s23187920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/04/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023]
Abstract
Vehicle make and model recognition (VMMR) is an important aspect of intelligent transportation systems (ITS). In VMMR systems, surveillance cameras capture vehicle images for real-time vehicle detection and recognition. These captured images pose challenges, including shadows, reflections, changes in weather and illumination, occlusions, and perspective distortion. Another significant challenge in VMMR is the multiclass classification. This scenario has two main categories: (a) multiplicity and (b) ambiguity. Multiplicity concerns the issue of different forms among car models manufactured by the same company, while the ambiguity problem arises when multiple models from the same manufacturer have visually similar appearances or when vehicle models of different makes have visually comparable rear/front views. This paper introduces a novel and robust VMMR model that can address the above-mentioned issues with accuracy comparable to state-of-the-art methods. Our proposed hybrid CNN model selects the best descriptive fine-grained features with the help of Fisher Discriminative Least Squares Regression (FDLSR). These features are extracted from a deep CNN model fine-tuned on the fine-grained vehicle datasets Stanford-196 and BoxCars21k. Using ResNet-152 features, our proposed model outperformed the SVM and FC layers in accuracy by 0.5% and 4% on Stanford-196 and 0.4 and 1% on BoxCars21k, respectively. Moreover, this model is well-suited for small-scale fine-grained vehicle datasets.
Collapse
Affiliation(s)
- Sobia Hayee
- Department of Computer Engineering, University of Engineering & Technology, Taxila 47050, Pakistan; (S.H.); (M.H.Y.)
| | - Fawad Hussain
- Department of Computer Engineering, University of Engineering & Technology, Taxila 47050, Pakistan; (S.H.); (M.H.Y.)
| | - Muhammad Haroon Yousaf
- Department of Computer Engineering, University of Engineering & Technology, Taxila 47050, Pakistan; (S.H.); (M.H.Y.)
- SWARM Robotics Lab, National Center of Robotics & Automation (NCRA), Taxila 47050, Pakistan
| |
Collapse
|
21
|
Qin H. Design of oral English teaching model based on multi-modal perception of the Internet of Things and improved conventional neural networks. PeerJ Comput Sci 2023; 9:e1503. [PMID: 37705645 PMCID: PMC10495997 DOI: 10.7717/peerj-cs.1503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/04/2023] [Indexed: 09/15/2023]
Abstract
Oral English instruction plays a pivotal role in educational endeavors. The emergence of online teaching in response to the epidemic has created an urgent demand for a methodology to evaluate and monitor oral English instruction. In the post-epidemic era, distance learning has become indispensable for educational pursuits. Given the distinct teaching modality and approach of oral English instruction, it is imperative to explore an intelligent scoring technique that can effectively oversee the content of English teaching. With this objective in mind, we have devised a scoring approach for oral English instruction based on multi-modal perception utilizing the Internet of Things (IoT). Initially, a trained convolutional neural network (CNN) model is employed to extract and quantify visual information and audio features from the IoT, reducing them to a fixed dimension. Subsequently, an external attention model is proposed to compute spoken English and image characteristics. Lastly, the content of English instruction is classified and graded based on the quantitative attributes of oral dialogue. Our findings illustrate that our scoring model for oral English instruction surpasses others, achieving the highest rankings and an accuracy of 88.8%, outperforming others by more than 2%.
Collapse
Affiliation(s)
- Haitao Qin
- College of Foreign Studies, Hubei Normal University, Huangshi, Hubei, China
| |
Collapse
|
22
|
Nwoye CI, Alapatt D, Yu T, Vardazaryan A, Xia F, Zhao Z, Xia T, Jia F, Yang Y, Wang H, Yu D, Zheng G, Duan X, Getty N, Sanchez-Matilla R, Robu M, Zhang L, Chen H, Wang J, Wang L, Zhang B, Gerats B, Raviteja S, Sathish R, Tao R, Kondo S, Pang W, Ren H, Abbing JR, Sarhan MH, Bodenstedt S, Bhasker N, Oliveira B, Torres HR, Ling L, Gaida F, Czempiel T, Vilaça JL, Morais P, Fonseca J, Egging RM, Wijma IN, Qian C, Bian G, Li Z, Balasubramanian V, Sheet D, Luengo I, Zhu Y, Ding S, Aschenbrenner JA, van der Kar NE, Xu M, Islam M, Seenivasan L, Jenke A, Stoyanov D, Mutter D, Mascagni P, Seeliger B, Gonzalez C, Padoy N. CholecTriplet2021: A benchmark challenge for surgical action triplet recognition. Med Image Anal 2023; 86:102803. [PMID: 37004378 DOI: 10.1016/j.media.2023.102803] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 12/13/2022] [Accepted: 03/23/2023] [Indexed: 03/29/2023]
Abstract
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of ‹instrument, verb, target› combination delivers more comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and the assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms from the competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
Collapse
|
23
|
Zhang Z, Cao W. Visual-Semantic Consistency Matching Network for Generalized Zero-shot Learning. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
24
|
Dual-domain reciprocal learning design for few-shot image classification. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08255-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
25
|
Huang YS, Wang TC, Huang SZ, Zhang J, Chen HM, Chang YC, Chang RF. An improved 3-D attention CNN with hybrid loss and feature fusion for pulmonary nodule classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 229:107278. [PMID: 36463674 DOI: 10.1016/j.cmpb.2022.107278] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/17/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND AND OBJECTIVE Lung cancer has the highest cancer-related mortality worldwide, and lung nodule usually presents with no symptom. Low-dose computed tomography (LDCT) was an important tool for lung cancer detection and diagnosis. It provided a complete three-dimensional (3-D) chest image with a high resolution.Recently, convolutional neural network (CNN) had flourished and been proven the CNN-based computer-aided diagnosis (CADx) system could extract the features and help radiologists to make a preliminary diagnosis. Therefore, a 3-D ResNeXt-based CADx system was proposed to assist radiologists for diagnosis in this study. METHODS The proposed CADx system consists of image preprocessing and a 3-D CNN-based classification model for pulmonary nodule classification. First, the image preprocessing was executed to generate the normalized volumn of interest (VOI) only including nodule information and a few surrounding tissues. Then, the extracted VOI was forwarded to the 3-D nodule classification model. In the classification model, the RestNext was employed as the backbone and the attention scheme was embedded to focus on the important features. Moreover, a multi-level feature fusion network incorporating feature information of different scales was used to enhance the prediction accuracy of small malignant nodules. Finally, a hybrid loss based on channel optimization which make the network learn more detailed information was empolyed to replace a binary cross-entropy (BCE) loss. RESULTS In this research, there were a total of 880 low-dose CT images including 440 benign and 440 malignant nodules from the American National Lung Screening Trial (NLST) for system evaluation. The results showed that our system could achieve the accuracy of 85.3%, the sensitivity of 86.8%, the specificity of 83.9%, and the area-under-curve (AUC) value was 0.9042. It was confirmed that the designed system had a good diagnostic ability. CONCLUSION In this study, a CADx composed of the image preprocessing and a 3-D nodule classification model with attention scheme, feature fusion, and hybrid loss was proposed for pulmonary nodule classification in LDCT. The results indicated that the proposed CADx system had potential for achieving high performance in classifying lung nodules as benign and malignant.
Collapse
Affiliation(s)
- Yao-Sian Huang
- Department of Computer Science and Information Engineering, National Changhua University of Education, Changhua, Taiwan, ROC
| | - Teh-Chen Wang
- Department of Medical Imaging, Taipei City Hospital Yangming Branch, Taipei, Taiwan, ROC
| | - Sheng-Zhi Huang
- Graduate Institute of Network and Multimedia, National Taiwan University, Taipei, Taiwan, ROC
| | - Jun Zhang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan, ROC
| | - Hsin-Ming Chen
- Department of Medical Imaging, National Taiwan University Hospital Hsin-Chu Branch, Hsin-Chu, Taiwan, ROC
| | - Yeun-Chung Chang
- Department of Medical Imaging, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei 10617, Taiwan, ROC.
| | - Ruey-Feng Chang
- Graduate Institute of Network and Multimedia, National Taiwan University, Taipei, Taiwan, ROC; Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan, ROC; Department of Computer Science and Information Engineering, National Taiwan University, Taipei 10617, Taiwan, ROC; MOST Joint Research Center for AI Technology and All Vista Healthcare, Taipei, Taiwan, ROC.
| |
Collapse
|
26
|
Zhang J, Qi C, Mecha P, Zuo Y, Ben Z, Liu H, Chen K. Pseudo high-frequency boosts the generalization of a convolutional neural network for cassava disease detection. PLANT METHODS 2022; 18:136. [PMID: 36517873 PMCID: PMC9749340 DOI: 10.1186/s13007-022-00969-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
Frequency is essential in signal transmission, especially in convolutional neural networks. It is vital to maintain the signal frequency in the neural network to maintain the performance of a convolutional neural network. Due to destructive signal transmission in convolutional neural network, signal frequency downconversion in channels results into incomplete spatial information. In communication theory, the number of Fourier series coefficients determines the integrity of the information transmitted in channels. Consequently, the number of Fourier series coefficients of the signals can be replenished to reduce the information transmission loss. To achieve this, the ArsenicNetPlus neural network was proposed for signal transmission modulation in detecting cassava diseases. First, multiattention was used to maintain the long-term dependency of the features of cassava diseases. Afterward, depthwise convolution was implemented to remove aliasing signals and downconvert before the sampling operation. Instance batch normalization algorithm was utilized to keep features in an appropriate form in the convolutional neural network channels. Finally, the ArsenicPlus block was implemented to generate pseudo high-frequency in the residual structure. The proposed method was tested on the Cassava Datasets and compared with the V2-ResNet-101, EfficientNet-B5, RepVGG-B3g4 and AlexNet. The results showed that the proposed method performed [Formula: see text] in terms of accuracy, 1.2440 in terms of loss, and [Formula: see text] in terms of the F1-score, outperforming the comparison algorithms.
Collapse
Affiliation(s)
- Jiayu Zhang
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Chao Qi
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Peter Mecha
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Yi Zuo
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Zongyou Ben
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Haolu Liu
- Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing, China
| | - Kunjie Chen
- College of Engineering, Nanjing Agricultural University, Nanjing, China.
| |
Collapse
|
27
|
Wei K, Deng C, Yang X, Tao D. Incremental Zero-Shot Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13788-13799. [PMID: 34591777 DOI: 10.1109/tcyb.2021.3110369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The goal of zero-shot learning (ZSL) is to recognize objects from unseen classes correctly without corresponding training samples. The existing ZSL methods are trained on a set of predefined classes and do not have the ability to learn from a stream of training data. However, in many real-world applications, training data are collected incrementally; this is one of the main reasons why ZSL methods cannot be applied to certain real-world situations. Accordingly, in order to handle practical learning tasks of this kind, we introduce a novel ZSL setting, referred to as incremental ZSL (IZSL), the goal of which is to accumulate historical knowledge and alleviate Catastrophic Forgetting to facilitate better recognition when incrementally trained on new classes. We further propose a novel method to realize IZSL, which employs a generative replay strategy to produce virtual samples of previously seen classes. The historical knowledge is then transferred from the former learning step to the current step through joint training on both real new and virtual old data. Subsequently, a knowledge distillation strategy is leveraged to distill the knowledge from the former model to the current model, which regularizes the training process of the current model. In addition, our method can be flexibly equipped with the most generative-ZSL methods to tackle IZSL. Extensive experiments on three challenging benchmarks indicate that the proposed method can effectively tackle the IZSL problem effectively, while the existing ZSL methods fail.
Collapse
|
28
|
Wei XS, Song YZ, Aodha OM, Wu J, Peng Y, Tang J, Yang J, Belongie S. Fine-Grained Image Analysis With Deep Learning: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8927-8948. [PMID: 34752384 DOI: 10.1109/tpami.2021.3126648] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas - fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.
Collapse
|
29
|
Du R, Xie J, Ma Z, Chang D, Song YZ, Guo J. Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9521-9535. [PMID: 34752385 DOI: 10.1109/tpami.2021.3126668] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary - the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a consistent block convolution that encourages the network to learn the category-consistent features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and demonstrate the proposed method consistently outperforms existing alternatives or delivers competitive results. Codes are available at https://github.com/PRIS-CV/PMG-V2.
Collapse
|
30
|
Li X, Li Y, Zheng Y, Zhu R, Ma Z, Xue JH, Cao J. ReNAP: Relation Network with Adaptive Prototypical Learning for Few-Shot Classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
31
|
Fine-grained image recognition via trusted multi-granularity information fusion. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01685-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Zhao P, Miao Q, Li H, Liu R, Quan Y, Song J. Refined Probability Distribution Module for Fine-Grained Visual Categorization. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
33
|
Zhang G, Wei S, Pang H, Qiu S, Zhao Y. Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5976-5988. [PMID: 36094980 DOI: 10.1109/tip.2022.3204213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Composed image retrieval aims at retrieving the desired images, given a reference image and a text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of the reference image against the text piece, and the other is to replenish the desired details in the image against the text piece. Nowadays, the existing methods neglect to distinguish between the two subprocesses and implicitly put them together to solve the composed image retrieval task. To explicitly and orderly model the two subprocesses of the task, we propose a novel composed image retrieval method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS), Text-semantic Complementary Selection module (TCS), and Semantic Space Alignment constraints (SSA). Concretely, MDS is to erase irrelated details of the reference image by suppressing its semantic features. TCS aims to select and enhance the semantic features of the text piece and then replenish them to the reference image. In the end, to facilitate the erasure and replenishment subprocesses, SSA aligns the semantics of the two modality features in the final space. Extensive experiments on three benchmark datasets (Shoes, FashionIQ, and Fashion200K) show the superior performance of our approach against state-of-the-art methods.
Collapse
|
34
|
Bera A, Wharton Z, Liu Y, Bessis N, Behera A. SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:6017-6031. [PMID: 36103441 DOI: 10.1109/tip.2022.3205215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/ scene. To this end, we propose a method that effectively captures subtle changes by aggregating context-aware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
Collapse
|
35
|
Multi-scale confusion and filling mechanism for pressure footprint recognition. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07777-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
Symmetrical irregular local features for fine-grained visual classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.07.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Liu K, Chen K, Jia K. Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5570-5584. [PMID: 35981063 DOI: 10.1109/tip.2022.3197931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fine-grained visual classification can be addressed by deep representation learning under supervision of manually pre-defined targets (e.g., one-hot or the Hadamard codes). Such target coding schemes are less flexible to model inter-class correlation and are sensitive to sparse and imbalanced data distribution as well. In light of this, this paper introduces a novel target coding scheme - dynamic target relation graphs (DTRG), which, as an auxiliary feature regularization, is a self-generated structural output to be mapped from input images. Specifically, online computation of class-level feature centers is designed to generate cross-category distance in the representation space, which can thus be depicted by a dynamic graph in a non-parametric manner. Explicitly minimizing intra-class feature variations anchored on those class-level centers can encourage learning of discriminative features. Moreover, owing to exploiting inter-class dependency, the proposed target graphs can alleviate data sparsity and imbalanceness in representation learning. Inspired by recent success of the mixup style data augmentation, this paper introduces randomness into soft construction of dynamic target relation graphs to further explore relation diversity of target classes. Experimental results can demonstrate the effectiveness of our method on a number of diverse benchmarks of multiple visual classification, especially achieving the state-of-the-art performance on three popular fine-grained object benchmarks and superior robustness against sparse and imbalanced data. Source codes are made publicly available at https://github.com/AkonLau/DTRG.
Collapse
|
38
|
Chen J, Li H, Liang J, Su X, Zhai Z, Chai X. Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
39
|
GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07617-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
40
|
Lei J, Zhang Z, Pan Z, Liu D, Liu X, Chen Y, Ling N. Disparity-Aware Reference Frame Generation Network for Multiview Video Coding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4515-4526. [PMID: 35727785 DOI: 10.1109/tip.2022.3183436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiview video coding (MVC) aims to compress the multiview video through the elimination of video redundancies, where the quality of the reference frame directly affects the compression efficiency. In this paper, we propose a deep virtual reference frame generation method based on a disparity-aware reference frame generation network (DAG-Net) to transform the disparity relationship between different viewpoints and generate a more reliable reference frame. The proposed DAG-Net consists of a multi-level receptive field module, a disparity-aware alignment module, and a fusion reconstruction module. First, a multi-level receptive field module is designed to enlarge the receptive field, and extract the multi-scale deep features of the temporal and inter-view reference frames. Then, a disparity-aware alignment module is proposed to learn the disparity relationship, and perform disparity shift on the inter-view reference frame to align it with the temporal reference frame. Finally, a fusion reconstruction module is utilized to fuse the complementary information and generate a more reliable virtual reference frame. Experiments demonstrate that the proposed reference frame generation method achieves superior performance for multiview video coding.
Collapse
|
41
|
Deng W, Marsh J, Gould S, Zheng L. Fine-Grained Classification via Categorical Memory Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4186-4196. [PMID: 35700253 DOI: 10.1109/tip.2022.3181492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Motivated by the desire to exploit patterns shared across classes, we present a simple yet effective class-specific memory module for fine-grained feature learning. The memory module stores the prototypical feature representation for each category as a moving average. We hypothesize that the combination of similarities with respect to each category is itself a useful discriminative cue. To detect these similarities, we use attention as a querying mechanism. The attention scores with respect to each class prototype are used as weights to combine prototypes via weighted sum, producing a uniquely tailored response feature representation for a given input. The original and response features are combined to produce an augmented feature for classification. We integrate our class-specific memory module into a standard convolutional neural network, yielding a Categorical Memory Network. Our memory module significantly improves accuracy over baseline CNNs, achieving competitive accuracy with state-of-the-art methods on four benchmarks, including CUB-200-2011, Stanford Cars, FGVC Aircraft, and NABirds.
Collapse
|
42
|
Liu Z, Wang H, Chen W, Wang L, Li T. Bilateral discriminative autoencoder model orienting co-representation learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
43
|
Progressive Training Technique with Weak-Label Boosting for Fine-Grained Classification on Unbalanced Training Data. ELECTRONICS 2022. [DOI: 10.3390/electronics11111684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In practical classification tasks, the sample distribution of the dataset is often unbalanced; for example, this is the case in a dataset that contains a massive quantity of samples with weak labels and for which concrete identification is unavailable. Even in samples with exact labels, the number of samples corresponding to many labels is small, resulting in difficulties in learning the concepts through a small number of labeled samples. In addition, there is always a small interclass variance and a large intraclass variance among categories. Weak labels, few-shot problems, and fine-grained analysis are the key challenges affecting the performance of the classification model. In this paper, we develop a progressive training technique to address the few-shot challenge, along with a weak-label boosting method, by considering all of the weak IDs as negative samples of every predefined ID in order to take full advantage of the more numerous weak-label data. We introduce an instance-aware hard ID mining strategy in the classification loss and then further develop the global and local feature-mapping loss to expand the decision margin. We entered the proposed method into the Kaggle competition, which aims to build an algorithm to identify individual humpback whales in images. With a few other common training tricks, the proposed approach won first place in the competition. All three problems (weak labels, few-shot problems, and fine-grained analysis) exist in the dataset used in the competition. Additionally, we applied our method to CUB-2011 and Cars-196, which are the most widely-used datasets for fine-grained visual categorization tasks, and achieved respective accuracies of 90.1% and 94.9%. This experiment shows that the proposed method achieves the optimal effect compared with other common baselines, and verifies the effectiveness of our method. Our solution has been made available as an open source project.
Collapse
|
44
|
Grouping Bilinear Pooling for Fine-Grained Image Classification. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12105063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Fine-grained image classification is a challenging computer visual task due to the small interclass variations and large intra-class variations. Extracting expressive feature representation is an effective way to improve the accuracy of fine-grained image classification. Bilinear pooling is a simple and effective high-order feature interaction method. Compared with common pooling methods, bilinear pooling can obtain better feature representation by capturing complex associations between high-order features. However, the dimensions of bilinear representation are often up to hundreds of thousands or even millions. In order to get compact bilinear representation, we propose grouping bilinear pooling (GBP) for fine-grained image classification in this paper. Firstly, by dividing the feature layers into different groups, and then carrying out intra-group bilinear pooling or inter-group bilinear pooling. The representation captured by GBP can achieve the same accuracy with less than 0.4% parameters compared with full bilinear representation when using the same backbone. This extreme compact representation largely overcomes the high redundancy of the full bilinear representation, the computational cost and storage consumption. Besides, it is because GBP compresses the bilinear representation to the extreme that it can be used with more powerful backbones as a plug-and-play module. The effectiveness of GBP is proved by experiments on the widely used fine-grained recognition datasets CUB and Stanford Cars.
Collapse
|
45
|
Chengcheng H, Jian Y, Xiao Q. Research and Application of Fine-Grained Image Classification Based on Small Collar Dataset. Front Comput Neurosci 2022; 15:766284. [PMID: 35480229 PMCID: PMC9035927 DOI: 10.3389/fncom.2021.766284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 11/29/2021] [Indexed: 12/01/2022] Open
Abstract
With the rapid development of apparel e-commerce, the variety of apparel is increasing, and it becomes more and more important to classify the apparel according to its collar design. Traditional image processing methods have been difficult to cope with the increasingly complex image backgrounds. To solve this problem, an EMRes-50 classification algorithm is proposed to solve the problem of garment collar image classification, which is designed based on the ECA-ResNet50 model combined with the MC-Loss loss function method. Applying the improved algorithm to the Coller-6 dataset, and the classification accuracy obtained was 73.6%. To further verify the effectiveness of the algorithm, it was applied to the DeepFashion-6 dataset, and the classification accuracy obtained was 86.09%. The experimental results show that the improved model has higher accuracy than the existing CNN model, and the model has better feature extraction ability, which is helpful to solve the problem of the difficulty of fine-grained collar classification and promote the further development of clothing product image classification.
Collapse
Affiliation(s)
- Huang Chengcheng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
- Guangxi International Businiess Vocational College, Nanning, China
| | - Yuan Jian
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
- Guangxi University for Nationalities, Nanning, China
- *Correspondence: Yuan Jian
| | - Qin Xiao
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
- Qin Xiao
| |
Collapse
|
46
|
Abstract
For the quality inspection of brown rice, the segmentation of connected brown rice and the identification of germ integrity are very important. However, there is no better traditional algorithm to achieve better segmentation and recognition results. This paper improves the brown rice (BR) segmentation algorithm based on background skeleton. The candidate matching points are obtained by the background skeleton method, and the optimal matching points are found by the ant colony algorithm. Experimental results show that the proposed segmentation algorithm achieves 96% accuracy, indicating that it can effectively suppress the interference from the endosperm surface. After segmentation is complete, identification of embryo integrity is performed. Firstly, a convolutional neural network (CNN) is built to identify the germ direction; then, the germ direction is normalized; finally, an improved Inception-v3 network is built to identify the germ integrity. On the basis of the Inception-v3 network, additional branches are added to improve the detection accuracy of small objects. In addition, mutual-channel loss and mlpconv are added to enable the model to better approximate the abstraction of the latent space. The experimental results show that the comprehensive recognition accuracy of the proposed algorithm is as high as 94.83%, which is significantly higher than the current mainstream recognition algorithms.
Collapse
|
47
|
Li M, Zhou G, Cai W, Li J, Li M, He M, Hu Y, Li L. Multi-scale Sparse Network with Cross-Attention Mechanism for image-based butterflies fine-grained classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108419] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
48
|
Yu J, Li K, Peng J. Reference-guided face inpainting with reference attention network. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06961-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
49
|
Mishra P, Kumar S, Chaube MK. Classifying Chart Based on Structural Dissimilarities using Improved Regularized Loss Function. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10735-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
50
|
Liao Q, Wang D, Xu M. Category attention transfer for efficient fine-grained visual categorization. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2021.11.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|