1
|
Chen S. Joint weight optimization for partial domain adaptation via kernel statistical distance estimation. Neural Netw 2024; 180:106739. [PMID: 39299038 DOI: 10.1016/j.neunet.2024.106739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 09/04/2024] [Accepted: 09/12/2024] [Indexed: 09/22/2024]
Abstract
The goal of Partial Domain Adaptation (PDA) is to transfer a neural network from a source domain (joint source distribution) to a distinct target domain (joint target distribution), where the source label space subsumes the target label space. To address the PDA problem, existing works have proposed to learn the marginal source weights to match the weighted marginal source distribution to the marginal target distribution. However, this is sub-optimal, since the neural network's target performance is concerned with the joint distribution disparity, not the marginal distribution disparity. In this paper, we propose a Joint Weight Optimization (JWO) approach that optimizes the joint source weights to match the weighted joint source distribution to the joint target distribution in the neural network's feature space. To measure the joint distribution disparity, we exploit two statistical distances: the distribution-difference-based L2-distance and the distribution-ratio-based χ2-divergence. Since these two distances are unknown in practice, we propose a Kernel Statistical Distance Estimation (KSDE) method to estimate them from the weighted source data and the target data. Our KSDE method explicitly expresses the two estimated statistical distances as functions of the joint source weights. Therefore, we can optimize the joint weights to minimize the estimated distance functions and reduce the joint distribution disparity. Finally, we achieve the PDA goal by training the neural network on the weighted source data. Experiments on several popular datasets are conducted to demonstrate the effectiveness of our approach. Intro video and Pytorch code are available at https://github.com/sentaochen/Joint-Weight-Optimation. Interested readers can also visit https://github.com/sentaochen for more source codes of the related domain adaptation, multi-source domain adaptation, and domain generalization approaches.
Collapse
Affiliation(s)
- Sentao Chen
- Department of Computer Science, Shantou University, China.
| |
Collapse
|
2
|
Wen L, Chen S, Xie M, Liu C, Zheng L. Training multi-source domain adaptation network by mutual information estimation and minimization. Neural Netw 2024; 171:353-361. [PMID: 38128299 DOI: 10.1016/j.neunet.2023.12.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 12/01/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023]
Abstract
We address the problem of Multi-Source Domain Adaptation (MSDA), which trains a neural network using multiple labeled source datasets and an unlabeled target dataset, and expects the trained network to well classify the unlabeled target data. The main challenge in this problem is that the datasets are generated by relevant but different joint distributions. In this paper, we propose to address this challenge by estimating and minimizing the mutual information in the network latent feature space, which leads to the alignment of the source joint distributions and target joint distribution simultaneously. Here, the estimation of the mutual information is formulated into a convex optimization problem, such that the global optimal solution can be easily found. We conduct experiments on several public datasets, and show that our algorithm statistically outperforms its competitors. Video and code are available at https://github.com/sentaochen/Mutual-Information-Estimation-and-Minimization.
Collapse
Affiliation(s)
- Lisheng Wen
- Department of Computer Science, Shantou University, China
| | - Sentao Chen
- Department of Computer Science, Shantou University, China.
| | - Mengying Xie
- College of Computer Science, Chongqing University, China
| | - Cheng Liu
- Department of Computer Science, Shantou University, China
| | - Lin Zheng
- Department of Computer Science, Shantou University, China
| |
Collapse
|
3
|
Xie M, Liu X, Yang X, Cai W. Multichannel Image Completion With Mixture Noise: Adaptive Sparse Low-Rank Tensor Subspace Meets Nonlocal Self-Similarity. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7521-7534. [PMID: 35580099 DOI: 10.1109/tcyb.2022.3169800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multichannel image completion with mixture noise is a common but complex problem in the fields of machine learning, image processing, and computer vision. Most existing algorithms devote to explore global low-rank information and fail to optimize local and joint-mode structures, which may lead to oversmooth restoration results or lower quality restoration details. In this study, we propose a novel model to deal with multichannel image completion with mixture noise based on adaptive sparse low-rank tensor subspace and nonlocal self-similarity (ASLTS-NS). In the proposed model, a nonlocal similar patch matching framework cooperating with Tucker decomposition is used to explore information of global and joint modes and optimize the local structure for improving restoration quality. In order to enhance the robustness of low-rank decomposition to data missing and mixture noise, we present an adaptive sparse low-rank regularization to construct robust tensor subspace for self-weighing importance of different modes and capturing a stable inherent structure. In addition, joint tensor Frobenius and l1 regularizations are exploited to control two different types of noise. Based on alternating directions method of multipliers (ADMM), a convergent learning algorithm is designed to solve this model. Experimental results on three different types of multichannel image sets demonstrate the advantages of ASLTS-NS under five complex scenarios.
Collapse
|
4
|
Chen S, Hong Z, Harandi M, Yang X. Domain Neural Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8630-8641. [PMID: 35259116 DOI: 10.1109/tnnls.2022.3151683] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Domain adaptation is concerned with the problem of generalizing a classification model to a target domain with little or no labeled data, by leveraging the abundant labeled data from a related source domain. The source and target domains possess different joint probability distributions, making it challenging for model generalization. In this article, we introduce domain neural adaptation (DNA): an approach that exploits nonlinear deep neural network to 1) match the source and target joint distributions in the network activation space and 2) learn the classifier in an end-to-end manner. Specifically, we employ the relative chi-square divergence to compare the two joint distributions, and show that the divergence can be estimated via seeking the maximal value of a quadratic functional over the reproducing kernel hilbert space. The analytic solution to this maximization problem enables us to explicitly express the divergence estimate as a function of the neural network mapping. We optimize the network parameters to minimize the estimated joint distribution divergence and the classification loss, yielding a classification model that generalizes well to the target domain. Empirical results on several visual datasets demonstrate that our solution is statistically better than its competitors.
Collapse
|
5
|
Andéol L, Kawakami Y, Wada Y, Kanamori T, Müller KR, Montavon G. Learning domain invariant representations by joint Wasserstein distance minimization. Neural Netw 2023; 167:233-243. [PMID: 37660672 DOI: 10.1016/j.neunet.2023.07.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/15/2023] [Accepted: 07/17/2023] [Indexed: 09/05/2023]
Abstract
Domain shifts in the training data are common in practical applications of machine learning; they occur for instance when the data is coming from different sources. Ideally, a ML model should work well independently of these shifts, for example, by learning a domain-invariant representation. However, common ML losses do not give strong guarantees on how consistently the ML model performs for different domains, in particular, whether the model performs well on a domain at the expense of its performance on another domain. In this paper, we build new theoretical foundations for this problem, by contributing a set of mathematical relations between classical losses for supervised ML and the Wasserstein distance in joint space (i.e. representation and output space). We show that classification or regression losses, when combined with a GAN-type discriminator between domains, form an upper-bound to the true Wasserstein distance between domains. This implies a more invariant representation and also more stable prediction performance across domains. Theoretical results are corroborated empirically on several image datasets. Our proposed approach systematically produces the highest minimum classification accuracy across domains, and the most invariant representation.
Collapse
Affiliation(s)
- Léo Andéol
- Machine Learning group, Technische Universität Berlin, 10587 Berlin, Germany; Berlin Institute for the Foundations of Learning and Data - BIFOLD, 10587 Berlin, Germany.
| | - Yusei Kawakami
- Tokyo Institute of Technology, Tokyo, Japan; Fujitsu Laboratories Ltd., Japan.
| | | | | | - Klaus-Robert Müller
- Machine Learning group, Technische Universität Berlin, 10587 Berlin, Germany; Berlin Institute for the Foundations of Learning and Data - BIFOLD, 10587 Berlin, Germany; Max Planck Institute for Informatics, Stuhlsatzenhausweg 4, 66123 Saarbrücken, Germany; Department of Artificial Intelligence, Korea University, Seoul 136-713, South Korea; Google Deepmind, Berlin, Germany.
| | - Grégoire Montavon
- Machine Learning group, Technische Universität Berlin, 10587 Berlin, Germany; Berlin Institute for the Foundations of Learning and Data - BIFOLD, 10587 Berlin, Germany; Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany.
| |
Collapse
|
6
|
Moradi M, Hamidzadeh J. A domain adaptation method by incorporating belief function in twin quarter-sphere SVM. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01857-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
7
|
Xia H, Jing T, Ding Z. Generative Inference Network for Imbalanced Domain Generalization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1694-1704. [PMID: 37028055 DOI: 10.1109/tip.2023.3251103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Domain generalization (DG) aims to learn transferable knowledge from multiple source domains and generalize it to the unseen target domain. To achieve such expectation, the intuitive solution is to seek domain-invariant representations via generative adversarial mechanism or minimization of cross-domain discrepancy. However, the widespread imbalanced data scale problem across source domains and category in real-world applications becomes the key bottleneck of improving generalization ability of model due to its negative effect on learning the robust classification model. Motivated by this observation, we first formulate a practical and challenging imbalance domain generalization (IDG) scenario, and then propose a straightforward but effective novel method generative inference network (GINet), which augments reliable samples for minority domain/category to promote discriminative ability of the learned model. Concretely, GINet utilizes the available cross-domain images from the identical category and estimates their common latent variable, which derives to discover domain-invariant knowledge for unseen target domain. According to these latent variables, our GINet further generates more novel samples with optimal transport constraint and deploys them to enhance the desired model with more robustness and generalization ability. Considerable empirical analysis and ablation studies on three popular benchmarks under normal DG and IDG setups suggests the advantage of our method over other DG methods on elevating model generalization. The source code is available in GitHub https://github.com/HaifengXia/IDG.
Collapse
|
8
|
Xie M, Liu X, Yang X. A Nonlocal Self-Similarity-Based Weighted Tensor Low-Rank Decomposition for Multichannel Image Completion With Mixture Noise. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:73-87. [PMID: 35544496 DOI: 10.1109/tnnls.2022.3172184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multichannel image completion with mixture noise is a challenging problem in the fields of machine learning, computer vision, image processing, and data mining. Traditional image completion models are not appropriate to deal with this problem directly since their reconstruction priors may mismatch corruption priors. To address this issue, we propose a novel nonlocal self-similarity-based weighted tensor low-rank decomposition (NSWTLD) model that can achieve global optimization and local enhancement. In the proposed model, based on the corruption priors and the reconstruction priors, a pixel weighting strategy is given to characterize the joint effects of missing data, the Gaussian noise, and the impulse noise. To discover and utilize the accurate nonlocal self-similarity information to enhance the restoration quality of the details, the traditional nonlocal learning framework is optimized by employing improved index determination of patch group and handling strip noise caused by patch overlapping. In addition, an efficient and convergent algorithm is presented to solve the NSWTLD model. Comprehensive experiments are conducted on four types of multichannel images under various corruption scenarios. The results demonstrate the efficiency and effectiveness of the proposed model.
Collapse
|
9
|
Xu B, Zeng Z, Lian C, Ding Z. Few-Shot Domain Adaptation via Mixup Optimal Transport. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2518-2528. [PMID: 35275818 DOI: 10.1109/tip.2022.3157139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Unsupervised domain adaptation aims to learn a classification model for the target domain without any labeled samples by transferring the knowledge from the source domain with sufficient labeled samples. The source and the target domains usually share the same label space but are with different data distributions. In this paper, we consider a more difficult but insufficient-explored problem named as few-shot domain adaptation, where a classifier should generalize well to the target domain given only a small number of examples in the source domain. In such a problem, we recast the link between the source and target samples by a mixup optimal transport model. The mixup mechanism is integrated into optimal transport to perform the few-shot adaptation by learning the cross-domain alignment matrix and domain-invariant classifier simultaneously to augment the source distribution and align the two probability distributions. Moreover, spectral shrinkage regularization is deployed to improve the transferability and discriminability of the mixup optimal transport model by utilizing all singular eigenvectors. Experiments conducted on several domain adaptation tasks demonstrate the effectiveness of our proposed model dealing with the few-shot domain adaptation problem compared with state-of-the-art methods.
Collapse
|
10
|
Ren CX, Liu YH, Zhang XW, Huang KK. Multi-Source Unsupervised Domain Adaptation via Pseudo Target Domain. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2122-2135. [PMID: 35196236 DOI: 10.1109/tip.2022.3152052] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Multi-source domain adaptation (MDA) aims to transfer knowledge from multiple source domains to an unlabeled target domain. MDA is a challenging task due to the severe domain shift, which not only exists between target and source but also exists among diverse sources. Prior studies on MDA either estimate a mixed distribution of source domains or combine multiple single-source models, but few of them delve into the relevant information among diverse source domains. For this reason, we propose a novel MDA approach, termed Pseudo Target for MDA (PTMDA). Specifically, PTMDA maps each group of source and target domains into a group-specific subspace using adversarial learning with a metric constraint, and constructs a series of pseudo target domains correspondingly. Then we align the remainder source domains with the pseudo target domain in the subspace efficiently, which allows to exploit additional structured source information through the training on pseudo target domain and improves the performance on the real target domain. Besides, to improve the transferability of deep neural networks (DNNs), we replace the traditional batch normalization layer with an effective matching normalization layer, which enforces alignments in latent layers of DNNs and thus gains further promotion. We give theoretical analysis showing that PTMDA as a whole can reduce the target error bound and leads to a better approximation of the target risk in MDA settings. Extensive experiments demonstrate PTMDA's effectiveness on MDA tasks, as it outperforms state-of-the-art methods in most experimental settings.
Collapse
|
11
|
|
12
|
Hedegaard L, Sheikh-Omar OA, Iosifidis A. Supervised Domain Adaptation: A Graph Embedding Perspective and a Rectified Experimental Protocol. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8619-8631. [PMID: 34648445 DOI: 10.1109/tip.2021.3118978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Domain Adaptation is the process of alleviating distribution gaps between data from different domains. In this paper, we show that Domain Adaptation methods using pair-wise relationships between source and target domain data can be formulated as a Graph Embedding in which the domain labels are incorporated into the structure of the intrinsic and penalty graphs. Specifically, we analyse the loss functions of three existing state-of-the-art Supervised Domain Adaptation methods and demonstrate that they perform Graph Embedding. Moreover, we highlight some generalisation and reproducibility issues related to the experimental setup commonly used to demonstrate the few-shot learning capabilities of these methods. To assess and compare Supervised Domain Adaptation methods accurately, we propose a rectified evaluation protocol, and report updated benchmarks on the standard datasets Office31 (Amazon, DSLR, and Webcam), Digits (MNIST, USPS, SVHN, and MNIST-M) and VisDA (Synthetic, Real).
Collapse
|
13
|
|
14
|
Wu H, Zhu H, Yan Y, Wu J, Zhang Y, Ng MK. Heterogeneous Domain Adaptation by Information Capturing and Distribution Matching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6364-6376. [PMID: 34236965 DOI: 10.1109/tip.2021.3094137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Heterogeneous domain adaptation (HDA) is a challenging problem because of the different feature representations in the source and target domains. Most HDA methods search for mapping matrices from the source and target domains to discover latent features for learning. However, these methods barely consider the reconstruction error to measure the information loss during the mapping procedure. In this paper, we propose to jointly capture the information and match the source and target domain distributions in the latent feature space. In the learning model, we propose to minimize the reconstruction loss between the original and reconstructed representations to preserve information during transformation and reduce the Maximum Mean Discrepancy between the source and target domains to align their distributions. The resulting minimization problem involves two projection variables with orthogonal constraints that can be solved by the generalized gradient flow method, which can preserve orthogonal constraints in the computational procedure. We conduct extensive experiments on several image classification datasets to demonstrate that the effectiveness and efficiency of the proposed method are better than those of state-of-the-art HDA methods.
Collapse
|
15
|
|
16
|
Joint distribution adaptation network with adversarial learning for rolling bearing fault diagnosis. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106974] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
17
|
|