1
|
Wang W, Chang F, Liu C, Wang B, Liu Z. TODO-Net: Temporally Observed Domain Contrastive Network for 3-D Early Action Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6122-6133. [PMID: 38743544 DOI: 10.1109/tnnls.2024.3394254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Early action prediction aiming to recognize which classes the actions belong to before they are fully conveyed is a very challenging task, owing to the insufficient discrimination information caused by the domain gaps among different temporally observed domains. Most of the existing approaches focus on using fully observed temporal domains to "guide" the partially observed domains while ignoring the discrepancies between the harder low-observed temporal domains and the easier highly observed temporal domains. The recognition models tend to learn the easier samples from the highly observed temporal domains and may lead to significant performance drops on low-observed temporal domains. Therefore, in this article, we propose a novel temporally observed domain contrastive network, namely, TODO-Net, to explicitly mine the discrimination information from the hard actions samples from the low-observed temporal domains by mitigating the domain gaps among various temporally observed domains for 3-D early action prediction. More specifically, the proposed TODO-Net is able to mine the relationship between the low-observed sequences and all the highly observed sequences belonging to the same action category to boost the recognition performance of the hard samples with fewer observed frames. We also introduce a temporal domain conditioned supervised contrastive (TD-conditioned SupCon) learning scheme to empower our TODO-Net with the ability to minimize the gaps between the temporal domains within the same action categories, meanwhile pushing apart the temporal domains belonging to different action classes. We conduct extensive experiments on two public 3-D skeleton-based activity datasets, and the results show the efficacy of the proposed TODO-Net.
Collapse
|
2
|
Wang W, Li H, Wang C, Huang C, Ding Z, Nie F, Cao X. Deep Label Propagation with Nuclear Norm Maximization for Visual Domain Adaptation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:1246-1258. [PMID: 40031314 DOI: 10.1109/tip.2025.3533199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Domain adaptation aims to leverage abundant label information from a source domain to an unlabeled target domain with two different distributions. Existing methods usually rely on a classifier to generate high-quality pseudo-labels for the target domain, facilitating the learning of discriminative features. Label propagation (LP), as an effective classifier, propagates labels from the source domain to the target domain by designing a smooth function over a similarity graph, which represents structural relationships among data points in feature space. However, LP has not been thoroughly explored in deep neural network-based domain adaptation approaches. Additionally, the probability labels generated by LP are low-confident and LP is sensitive to class imbalance problem. To address these problems, we propose a novel approach for domain adaptation named deep label propagation with nuclear norm maximization (DLP-NNM). Specifically, we employ the constraint of nuclear norm maximization to enhance both label confidence and class diversity in LP and propose an efficient algorithm to solve the corresponding optimization problem. Subsequently, we utilize the proposed LP to guide the classifier layer in a deep discriminative adaptation network using the cross-entropy loss. As such, the network could produce more reliable predictions for the target domain, thereby facilitating more effective discriminative feature learning. Extensive experimental results on three cross-domain benchmark datasets demonstrate that the proposed DLP-NNM surpasses existing state-of-the-art domain adaptation approaches.
Collapse
|
3
|
Li L, Ding W, Huang L, Zhuang X, Grau V. Multi-modality cardiac image computing: A survey. Med Image Anal 2023; 88:102869. [PMID: 37384950 DOI: 10.1016/j.media.2023.102869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 05/01/2023] [Accepted: 06/12/2023] [Indexed: 07/01/2023]
Abstract
Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images could have a direct impact on clinical research and evidence-based patient management. However, these require overcoming significant challenges including inter-modality misalignment and finding optimal methods to integrate information from different modalities. This paper aims to provide a comprehensive review of multi-modality imaging in cardiology, the computing methods, the validation strategies, the related clinical workflows and future perspectives. For the computing methodologies, we have a favored focus on the three tasks, i.e., registration, fusion and segmentation, which generally involve multi-modality imaging data, either combining information from different modalities or transferring information across modalities. The review highlights that multi-modality cardiac imaging data has the potential of wide applicability in the clinic, such as trans-aortic valve implantation guidance, myocardial viability assessment, and catheter ablation therapy and its patient selection. Nevertheless, many challenges remain unsolved, such as missing modality, modality selection, combination of imaging and non-imaging data, and uniform analysis and representation of different modalities. There is also work to do in defining how the well-developed techniques fit in clinical workflows and how much additional and relevant information they introduce. These problems are likely to continue to be an active field of research and the questions to be answered in the future.
Collapse
Affiliation(s)
- Lei Li
- Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Wangbin Ding
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Liqin Huang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Xiahai Zhuang
- School of Data Science, Fudan University, Shanghai, China
| | - Vicente Grau
- Department of Engineering Science, University of Oxford, Oxford, UK
| |
Collapse
|
4
|
Li L, Yang J, Ma Y, Kong X. Pseudo-labeling Integrating Centers and Samples with Consistent Selection Mechanism for Unsupervised Domain Adaptation. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
5
|
Moradi M, Hamidzadeh J. A domain adaptation method by incorporating belief function in twin quarter-sphere SVM. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01857-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
6
|
Dey S, Sahidullah M, Saha G. Cross-corpora spoken language identification with domain diversification and generalization. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
|
7
|
Zhao Q, Wu H, Zhu J. Margin-Based Modal Adaptive Learning for Visible-Infrared Person Re-Identification. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23031426. [PMID: 36772466 PMCID: PMC9921303 DOI: 10.3390/s23031426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/15/2023] [Accepted: 01/24/2023] [Indexed: 05/14/2023]
Abstract
Visible-infrared person re-identification (VIPR) has great potential for intelligent transportation systems for constructing smart cities, but it is challenging to utilize due to the huge modal discrepancy between visible and infrared images. Although visible and infrared data can appear to be two domains, VIPR is not identical to domain adaptation as it can massively eliminate modal discrepancies. Because VIPR has complete identity information on both visible and infrared modalities, once the domain adaption is overemphasized, the discriminative appearance information on the visible and infrared domains would drain. For that, we propose a novel margin-based modal adaptive learning (MMAL) method for VIPR in this paper. On each domain, we apply triplet and label smoothing cross-entropy functions to learn appearance-discriminative features. Between the two domains, we design a simple yet effective marginal maximum mean discrepancy (M3D) loss function to avoid an excessive suppression of modal discrepancies to protect the features' discriminative ability on each domain. As a result, our MMAL method could learn modal-invariant yet appearance-discriminative features for improving VIPR. The experimental results show that our MMAL method acquires state-of-the-art VIPR performance, e.g., on the RegDB dataset in the visible-to-infrared retrieval mode, the rank-1 accuracy is 93.24% and the mean average precision is 83.77%.
Collapse
Affiliation(s)
- Qianqian Zhao
- College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China
| | - Hanxiao Wu
- College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China
| | - Jianqing Zhu
- College of Engineering, Huaqiao University, Quanzhou 362021, China
- Xiamen Yealink Network Technology Company Limited, Xiamen 361015, China
- Correspondence:
| |
Collapse
|
8
|
Mukherjee S, Sarkar R, Manich M, Labruyere E, Olivo-Marin JC. Domain Adapted Multitask Learning for Segmenting Amoeboid Cells in Microscopy. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:42-54. [PMID: 36044485 DOI: 10.1109/tmi.2022.3203022] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The method proposed in this paper is a robust combination of multi-task learning and unsupervised domain adaptation for segmenting amoeboid cells in microscopy. A highlight of this work is the manner in which the model's hyperparameters are estimated. The detriments of ad-hoc parameter estimation are well known, but this issue remains largely unaddressed in the context of CNN-based segmentation. Using a novel min-max formulation of the segmentation cost function our proposed method analytically estimates the model's hyperparameters, while simultaneously learning the CNN weights during training. This end-to-end framework provides a consolidated mechanism to harness the potential of multi-task learning to isolate and segment clustered cells from low contrast brightfield images, and it simultaneously leverages deep domain adaptation to segment fluorescent cells without explicit pixel-level re- annotation of the data. Experimental validations on multi-cellular images strongly suggest the effectiveness of the proposed technique, and our quantitative results show at least 15% and 10% improvement in cell segmentation on brightfield and fluorescence images respectively compared to contemporary supervised segmentation methods.
Collapse
|
9
|
Wang W, Shen Z, Li D, Zhong P, Chen Y. Probability-Based Graph Embedding Cross-Domain and Class Discriminative Feature Learning for Domain Adaptation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:72-87. [PMID: 37015526 DOI: 10.1109/tip.2022.3226405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Feature-based domain adaptation methods project samples from different domains into the same feature space and try to align the distribution of two domains to learn an effective transferable model. The vital problem is how to find a proper way to reduce the domain shift and improve the discriminability of features. To address the above issues, we propose a unified Probability-based Graph embedding Cross-domain and class Discriminative feature learning framework for unsupervised domain adaptation (PGCD). Specifically, we propose novel graph embedding structures to be the class discriminative transfer feature learning item and cross-domain alignment item, which can make the same-category samples compact in each domain, and fully align the local and global geometric structure across domains. Besides, two theoretical analyses are given to prove the interpretability of the proposed graph structures, which can further describe the relationships between samples to samples in single-domain and cross-domain transfer feature learning scenarios. Moreover, we adopt novel weight strategies via probability information to generate robust centroids in each proposed item to enhance the accuracy of transfer feature learning and reduce the error accumulation. Compared with the advanced approaches by comprehensive experiments, the promising performance on the benchmark datasets verify the effectiveness of the proposed model.
Collapse
|
10
|
Luo L, Chen L, Hu S. Attention Regularized Laplace Graph for Domain Adaptation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7322-7337. [PMID: 36306308 DOI: 10.1109/tip.2022.3216781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In leveraging manifold learning in domain adaptation (DA), graph embedding-based DA methods have shown their effectiveness in preserving data manifold through the Laplace graph. However, current graph embedding DA methods suffer from two issues: 1). they are only concerned with preservation of the underlying data structures in the embedding and ignore sub-domain adaptation, which requires taking into account intra-class similarity and inter-class dissimilarity, thereby leading to negative transfer; 2). manifold learning is proposed across different feature/label spaces separately, thereby hindering unified comprehensive manifold learning. In this paper, starting from our previous DGA-DA, we propose a novel DA method, namely A ttention R egularized Laplace G raph-based D omain A daptation (ARG-DA), to remedy the aforementioned issues. Specifically, by weighting the importance across different sub-domain adaptation tasks, we propose the A ttention R egularized Laplace Graph for class aware DA, thereby generating the attention regularized DA. Furthermore, using a specifically designed FEEL strategy, our approach dynamically unifies alignment of the manifold structures across different feature/label spaces, thus leading to comprehensive manifold learning. Comprehensive experiments are carried out to verify the effectiveness of the proposed DA method, which consistently outperforms the state of the art DA methods on 7 standard DA benchmarks, i.e., 37 cross-domain image classification tasks including object, face, and digit images. An in-depth analysis of the proposed DA method is also discussed, including sensitivity, convergence, and robustness.
Collapse
|
11
|
Liu H, Zhuang Y, Song E, Xu X, Hung CC. A bidirectional multilayer contrastive adaptation network with anatomical structure preservation for unpaired cross-modality medical image segmentation. Comput Biol Med 2022; 149:105964. [PMID: 36007288 DOI: 10.1016/j.compbiomed.2022.105964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/16/2022] [Accepted: 08/13/2022] [Indexed: 11/03/2022]
Abstract
Multi-modal medical image segmentation has achieved great success through supervised deep learning networks. However, because of domain shift and limited annotation information, unpaired cross-modality segmentation tasks are still challenging. The unsupervised domain adaptation (UDA) methods can alleviate the segmentation degradation of cross-modality segmentation by knowledge transfer between different domains, but current methods still suffer from the problems of model collapse, adversarial training instability, and mismatch of anatomical structures. To tackle these issues, we propose a bidirectional multilayer contrastive adaptation network (BMCAN) for unpaired cross-modality segmentation. The shared encoder is first adopted for learning modality-invariant encoding representations in image synthesis and segmentation simultaneously. Secondly, to retain the anatomical structure consistency in cross-modality image synthesis, we present a structure-constrained cross-modality image translation approach for image alignment. Thirdly, we construct a bidirectional multilayer contrastive learning approach to preserve the anatomical structures and enhance encoding representations, which utilizes two groups of domain-specific multilayer perceptron (MLP) networks to learn modality-specific features. Finally, a semantic information adversarial learning approach is designed to learn structural similarities of semantic outputs for output space alignment. Our proposed method was tested on three different cross-modality segmentation tasks: brain tissue, brain tumor, and cardiac substructure segmentation. Compared with other UDA methods, experimental results show that our proposed BMCAN achieves state-of-the-art segmentation performance on the above three tasks, and it has fewer training components and better feature representations for overcoming overfitting and domain shift problems. Our proposed method can efficiently reduce the annotation burden of radiologists in cross-modality image analysis.
Collapse
Affiliation(s)
- Hong Liu
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Yuzhou Zhuang
- Institute of Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Enmin Song
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Xiangyang Xu
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Chih-Cheng Hung
- Center for Machine Vision and Security Research, Kennesaw State University, Marietta, MA, 30060, USA.
| |
Collapse
|
12
|
Unsupervised domain adaptation via discriminative feature learning and classifier adaptation from center-based distances. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Discriminative transfer feature learning based on robust-centers. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
Wang M, Deng W. Adaptive Face Recognition Using Adversarial Information Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4909-4921. [PMID: 35839179 DOI: 10.1109/tip.2022.3189830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In many real-world applications, face recognition models often degenerate when training data (referred to as source domain) are different from testing data (referred to as target domain). To alleviate this mismatch caused by some factors like pose and skin tone, the utilization of pseudo-labels generated by clustering algorithms is an effective way in unsupervised domain adaptation. However, they always miss some hard positive samples. Supervision on pseudo-labeled samples attracts them towards their prototypes and would cause an intra-domain gap between pseudo-labeled samples and the remaining unlabeled samples within target domain, which results in the lack of discrimination in face recognition. In this paper, considering the particularity of face recognition, we propose a novel adversarial information network (AIN) to address it. First, a novel adversarial mutual information (MI) loss is proposed to alternately minimize MI with respect to the target classifier and maximize MI with respect to the feature extractor. By this min-max manner, the positions of target prototypes are adaptively modified which makes unlabeled images clustered more easily such that intra-domain gap can be mitigated. Second, to assist adversarial MI loss, we utilize a graph convolution network to predict linkage likelihoods between target data and generate pseudo-labels. It leverages valuable information in the context of nodes and can achieve more reliable results. The proposed method is evaluated under two scenarios, i.e., domain adaptation across poses and image conditions, and domain adaptation across faces with different skin tones. Extensive experiments show that AIN successfully improves cross-domain generalization and offers a new state-of-the-art on RFW dataset.
Collapse
|
15
|
Zhao X, Stanislawski R, Gardoni P, Sulowicz M, Glowacz A, Krolczyk G, Li Z. Adaptive Contrastive Learning with Label Consistency for Source Data Free Unsupervised Domain Adaptation. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22114238. [PMID: 35684857 PMCID: PMC9185254 DOI: 10.3390/s22114238] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/27/2022] [Accepted: 05/29/2022] [Indexed: 06/12/2023]
Abstract
Unsupervised domain adaptation, which aims to alleviate the domain shift between source domain and target domain, has attracted extensive research interest; however, this is unlikely in practical application scenarios, which may be due to privacy issues and intellectual rights. In this paper, we discuss a more challenging and practical source-free unsupervised domain adaptation, which needs to adapt the source domain model to the target domain without the aid of source domain data. We propose label consistent contrastive learning (LCCL), an adaptive contrastive learning framework for source-free unsupervised domain adaptation, which encourages target domain samples to learn class-level discriminative features. Considering that the data in the source domain are unavailable, we introduce the memory bank to store the samples with the same pseudo label output and the samples obtained by clustering, and the trusted historical samples are involved in contrastive learning. In addition, we demonstrate that LCCL is a general framework that can be applied to unsupervised domain adaptation. Extensive experiments on digit recognition and image classification benchmark datasets demonstrate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Xuejun Zhao
- CRRC Academy Co., Ltd., Beijing 100070, China;
| | - Rafal Stanislawski
- Department of Electrical, Control and Computer Engineering, Opole University of Technology, 45-758 Opole, Poland;
| | - Paolo Gardoni
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA;
| | - Maciej Sulowicz
- Department of Electrical Engineering, Cracow University of Technology, 31-155 Cracow, Poland; (M.S.); (A.G.)
| | - Adam Glowacz
- Department of Electrical Engineering, Cracow University of Technology, 31-155 Cracow, Poland; (M.S.); (A.G.)
| | - Grzegorz Krolczyk
- Faculty of Mechanical Engineering, Opole University of Technology, 45-758 Opole, Poland;
| | - Zhixiong Li
- Faculty of Mechanical Engineering, Opole University of Technology, 45-758 Opole, Poland;
- Yonsei Frontier Lab, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
16
|
Wang M, Deng W, Liu CL. Unsupervised Structure-Texture Separation Network for Oracle Character Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3137-3150. [PMID: 35420984 DOI: 10.1109/tip.2022.3165989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Oracle bone script is the earliest-known Chinese writing system of the Shang dynasty and is precious to archeology and philology. However, real-world scanned oracle data are rare and few experts are available for annotation which make the automatic recognition of scanned oracle characters become a challenging task. Therefore, we aim to explore unsupervised domain adaptation to transfer knowledge from handprinted oracle data, which are easy to acquire, to scanned domain. We propose a structure-texture separation network (STSN), which is an end-to-end learning framework for joint disentanglement, transformation, adaptation and recognition. First, STSN disentangles features into structure (glyph) and texture (noise) components by generative models, and then aligns handprinted and scanned data in structure feature space such that the negative influence caused by serious noises can be avoided when adapting. Second, transformation is achieved via swapping the learned textures across domains and a classifier for final classification is trained to predict the labels of the transformed scanned characters. This not only guarantees the absolute separation, but also enhances the discriminative ability of the learned features. Extensive experiments on Oracle-241 dataset show that STSN outperforms other adaptation methods and successfully improves recognition performance on scanned data even when they are contaminated by long burial and careless excavation.
Collapse
|
17
|
Hedegaard L, Sheikh-Omar OA, Iosifidis A. Supervised Domain Adaptation: A Graph Embedding Perspective and a Rectified Experimental Protocol. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8619-8631. [PMID: 34648445 DOI: 10.1109/tip.2021.3118978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Domain Adaptation is the process of alleviating distribution gaps between data from different domains. In this paper, we show that Domain Adaptation methods using pair-wise relationships between source and target domain data can be formulated as a Graph Embedding in which the domain labels are incorporated into the structure of the intrinsic and penalty graphs. Specifically, we analyse the loss functions of three existing state-of-the-art Supervised Domain Adaptation methods and demonstrate that they perform Graph Embedding. Moreover, we highlight some generalisation and reproducibility issues related to the experimental setup commonly used to demonstrate the few-shot learning capabilities of these methods. To assess and compare Supervised Domain Adaptation methods accurately, we propose a rectified evaluation protocol, and report updated benchmarks on the standard datasets Office31 (Amazon, DSLR, and Webcam), Digits (MNIST, USPS, SVHN, and MNIST-M) and VisDA (Synthetic, Real).
Collapse
|
18
|
Feng Z, Xu C, Tao D. Open-Set Hypothesis Transfer With Semantic Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6473-6484. [PMID: 34224354 DOI: 10.1109/tip.2021.3093393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Unsupervised open-set domain adaptation (UODA) is a realistic problem where unlabeled target data contain unknown classes. Prior methods rely on the coexistence of both source and target domain data to perform domain alignment, which greatly limits their applications when source domain data are restricted due to privacy concerns. In this paper we address the challenging hypothesis transfer setting for UODA, where data from source domain are no longer available during adaptation on target domain. Specifically, we propose to use pseudo-labels and a novel consistency regularization on target data, where using conventional formulations fails in this open-set setting. Firstly, our method discovers confident predictions on target domain and performs classification with pseudo-labels. Then we enforce the model to output consistent and definite predictions on semantically similar transformed inputs, discovering all latent class semantics. As a result, unlabeled data can be classified into discriminative classes coincided with either source classes or unknown classes. We theoretically prove that under perfect semantic transformation, the proposed objective that enforces consistency can recover the information of true labels in prediction. Experimental results show that our model outperforms state-of-the-art methods on UODA benchmarks.
Collapse
|
19
|
Zhou H, Azzam M, Zhong J, Liu C, Wu S, Wong HS. Knowledge Exchange Between Domain-Adversarial and Private Networks Improves Open Set Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5807-5818. [PMID: 34138710 DOI: 10.1109/tip.2021.3088642] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Both target-specific and domain-invariant features can facilitate Open Set Domain Adaptation (OSDA). To exploit these features, we propose a Knowledge Exchange (KnowEx) model which jointly trains two complementary constituent networks: (1) a Domain-Adversarial Network (DAdvNet) learning the domain-invariant representation, through which the supervision in source domain can be exploited to infer the class information of unlabeled target data; (2) a Private Network (PrivNet) exclusive for target domain, which is beneficial for discriminating between instances from known and unknown classes. The two constituent networks exchange training experience in the learning process. Toward this end, we exploit an adversarial perturbation process against DAdvNet to regularize PrivNet. This enhances the complementarity between the two networks. At the same time, we incorporate an adaptation layer into DAdvNet to address the unreliability of the PrivNet's experience. Therefore, DAdvNet and PrivNet are able to mutually reinforce each other during training. We have conducted thorough experiments on multiple standard benchmarks to verify the effectiveness and superiority of KnowEx in OSDA.
Collapse
|
20
|
Han C, Zhou D, Xie Y, Lei Y, Shi J. Label propagation with multi-stage inference for visual domain adaptation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
21
|
Jiao Y, Yao H, Xu C. SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2155-2167. [PMID: 33471752 DOI: 10.1109/tip.2021.3049948] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-domain pedestrian detection, which has been attracting much attention, assumes that the training and test images are drawn from different data distributions. Existing methods focus on aligning the descriptions of whole candidate instances between source and target domains. Since there exists a giant visual difference among the candidate instances, aligning whole candidate instances between two domains cannot overcome the inter-instance difference. Compared with aligning the whole candidate instances, we consider that aligning each type of instances separately is a more reasonable manner. Therefore, we propose a novel Selective Alignment Network for cross-domain pedestrian detection, which consists of three components: a Base Detector, an Image-Level Adaptation Network, and an Instance-Level Adaptation Network. The Image-Level Adaptation Network and Instance-Level Adaptation Network can be regarded as the global-level and local-level alignments, respectively. Similar to the Faster R-CNN, the Base Detector, which is composed of a Feature module, an RPN module and a Detection module, is used to infer a robust pedestrian detector with the annotated source data. Once obtaining the image description extracted by the Feature module, the Image-Level Adaptation Network is proposed to align the image description with an adversarial domain classifier. Given the candidate proposals generated by the RPN module, the Instance-Level Adaptation Network firstly clusters the source candidate proposals into several groups according to their visual features, and thus generates the pseudo label for each candidate proposal. After generating the pseudo labels, we align the source and target domains by maximizing and minimizing the discrepancy between the prediction of two classifiers iteratively. Extensive evaluations on several benchmarks demonstrate the effectiveness of the proposed approach for cross-domain pedestrian detection.
Collapse
|
22
|
Han C, Li X, Yang Z, Zhou D, Zhao Y, Kong W. Sample-Guided Adaptive Class Prototype for Visual Domain Adaptation. SENSORS (BASEL, SWITZERLAND) 2020; 20:E7036. [PMID: 33316906 PMCID: PMC7764304 DOI: 10.3390/s20247036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 12/02/2020] [Accepted: 12/07/2020] [Indexed: 11/21/2022]
Abstract
Domain adaptation aims to handle the distribution mismatch of training and testing data, which achieves dramatic progress in multi-sensor systems. Previous methods align the cross-domain distributions by some statistics, such as the means and variances. Despite their appeal, such methods often fail to model the discriminative structures existing within testing samples. In this paper, we present a sample-guided adaptive class prototype method, which consists of the no distribution matching strategy. Specifically, two adaptive measures are proposed. Firstly, the modified nearest class prototype is raised, which allows more diversity within same class, while keeping most of the class wise discrimination information. Secondly, we put forward an easy-to-hard testing scheme by taking into account the different difficulties in recognizing target samples. Easy samples are classified and selected to assist the prediction of hard samples. Extensive experiments verify the effectiveness of the proposed method.
Collapse
Affiliation(s)
| | - Xiaoyang Li
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China; (C.H.); (Z.Y.); (D.Z.); (Y.Z.); (W.K.)
| | | | | | | | | |
Collapse
|
23
|
Jin X, Yang X, Fu B, Chen S. Joint distribution matching embedding for unsupervised domain adaptation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.098] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
24
|
Chen S, Harandi M, Jin X, Yang X. Domain Adaptation by Joint Distribution Invariant Projections. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8264-8277. [PMID: 32755860 DOI: 10.1109/tip.2020.3013167] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Domain adaptation addresses the learning problem where the training data are sampled from a source joint distribution (source domain), while the test data are sampled from a different target joint distribution (target domain). Because of this joint distribution mismatch, a discriminative classifier naively trained on the source domain often generalizes poorly to the target domain. In this paper, we therefore present a Joint Distribution Invariant Projections (JDIP) approach to solve this problem. The proposed approach exploits linear projections to directly match the source and target joint distributions under the L2-distance. Since the traditional kernel density estimators for distribution estimation tend to be less reliable as the dimensionality increases, we propose a least square method to estimate the L2-distance without the need to estimate the two joint distributions, leading to a quadratic problem with analytic solution. Furthermore, we introduce a kernel version of JDIP to account for inherent nonlinearity in the data. We show that the proposed learning problems can be naturally cast as optimization problems defined on the product of Riemannian manifolds. To be comprehensive, we also establish an error bound, theoretically explaining how our method works and contributes to reducing the target domain generalization error. Extensive empirical evidence demonstrates the benefits of our approach over state-of-the-art domain adaptation methods on several visual data sets.
Collapse
|