1
|
Li J, Liang B, Lu X, Li M, Lu G, Xu Y. From Global to Local: Multi-Patch and Multi-Scale Contrastive Similarity Learning for Unsupervised Defocus Blur Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1158-1169. [PMID: 37022428 DOI: 10.1109/tip.2023.3240856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Defocus blur detection (DBD), which aims to detect out-of-focus or in-focus pixels from a single image, has been widely applied to many vision tasks. To remove the limitation on the abundant pixel-level manual annotations, unsupervised DBD has attracted much attention in recent years. In this paper, a novel deep network named Multi-patch and Multi-scale Contrastive Similarity (M2CS) learning is proposed for unsupervised DBD. Specifically, the predicted DBD mask from a generator is first exploited to re-generate two composite images by transporting the estimated clear and unclear areas from the source image to realistic full-clear and full-blurred images, respectively. To encourage these two composite images to be completely in-focus or out-of-focus, a global similarity discriminator is exploited to measure the similarity of each pair in a contrastive way, through which each two positive samples (two clear images or two blurred images) are enforced to be close while each two negative samples (a clear image and a blurred image) are inversely far. Since the global similarity discriminator only focuses on the blur-level of a whole image and there do exist some fail-detected pixels which only cover a small part of areas, a set of local similarity discriminators are further designed to measure the similarity of image patches in multiple scales. Thanks to this joint global and local strategy, as well as the contrastive similarity learning, the two composite images are more efficiently moved to be all-clear or all-blurred. Experimental results on real-world datasets substantiate the superiority of our proposed method both in quantification and visualization. The source code is released at: https://github.com/jerysaw/M2CS.
Collapse
|
2
|
Li H, Xu K, Li J, Yu Z. Dual-stream Reciprocal Disentanglement Learning for domain adaptation person re-identification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Chen X, Wang C, Lan X, Zheng N, Zeng W. Neighborhood Geometric Structure-Preserving Variational Autoencoder for Smooth and Bounded Data Sources. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3598-3611. [PMID: 33556022 DOI: 10.1109/tnnls.2021.3053591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Many data sources, such as human poses, lie on low-dimensional manifolds that are smooth and bounded. Learning low-dimensional representations for such data is an important problem. One typical solution is to utilize encoder-decoder networks. However, due to the lack of effective regularization in latent space, the learned representations usually do not preserve the essential data relations. For example, adjacent video frames in a sequence may be encoded into very different zones across the latent space with holes in between. This is problematic for many tasks such as denoising because slightly perturbed data have the risk of being encoded into very different latent variables, leaving output unpredictable. To resolve this problem, we first propose a neighborhood geometric structure-preserving variational autoencoder (SP-VAE), which not only maximizes the evidence lower bound but also encourages latent variables to preserve their structures as in ambient space. Then, we learn a set of small surfaces to approximately bound the learned manifold to deal with holes in latent space. We extensively validate the properties of our approach by reconstruction, denoising, and random image generation experiments on a number of data sources, including synthetic Swiss roll, human pose sequences, and facial expression images. The experimental results show that our approach learns more smooth manifolds than the baselines. We also apply our approach to the tasks of human pose refinement and facial expression image interpolation where it gets better results than the baselines.
Collapse
|
4
|
Li J, Zhang B, Lu G, Xu Y, Wu F, Zhang D. Harmonization Shared Autoencoder Gaussian Process Latent Variable Model With Relaxed Hamming Distance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5093-5107. [PMID: 33027008 DOI: 10.1109/tnnls.2020.3026876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview learning has shown its superiority in visual classification compared with the single-view-based methods. Especially, due to the powerful representation capacity, the Gaussian process latent variable model (GPLVM)-based multiview approaches have achieved outstanding performances. However, most of them only follow the assumption that the shared latent variables can be generated from or projected to the multiple observations but fail to exploit the harmonization in the back constraint and adaptively learn a classifier according to these learned variables, which would result in performance degradation. To tackle these two issues, in this article, we propose a novel harmonization shared autoencoder GPLVM with a relaxed Hamming distance (HSAGP-RHD). Particularly, an autoencoder structure with the Gaussian process (GP) prior is first constructed to learn the shared latent variable for multiple views. To enforce the agreement among various views in the encoder, a harmonization constraint is embedded into the model by making consistency for the view-specific similarity. Furthermore, we also propose a novel discriminative prior, which is directly imposed on the latent variable to simultaneously learn the fused features and adaptive classifier in a unit model. In detail, the centroid matrix corresponding to the centroids of different categories is first obtained. A relaxed Hamming distance (RHD)-based measurement is subsequently presented to measure the similarity and dissimilarity between the latent variable and centroids, not only allowing us to get the closed-form solutions but also encouraging the points belonging to the same class to be close, while those belonging to different classes to be far. Due to this novel prior, the category of the out-of-sample is also allowed to be simply assigned in the testing phase. Experimental results conducted on three real-world data sets demonstrate the effectiveness of the proposed method compared with state-of-the-art approaches.
Collapse
|
5
|
Lu Y, Qin X, Fan H, Lai T, Li Z. WBC-Net: A white blood cell segmentation network based on UNet++ and ResNet. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107006] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
6
|
Song G, Wang S, Huang Q, Tian Q. Harmonized Multimodal Learning with Gaussian Process Latent Variable Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:858-872. [PMID: 31545710 DOI: 10.1109/tpami.2019.2942028] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent representations and kernel hyperparameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.
Collapse
|
7
|
Li J, Lu G, Zhang B, You J, Zhang D. Shared Linear Encoder-Based Multikernel Gaussian Process Latent Variable Model for Visual Classification. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:534-547. [PMID: 31170087 DOI: 10.1109/tcyb.2019.2915789] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multiview learning has been widely studied in various fields and achieved outstanding performances in comparison to many single-view-based approaches. In this paper, a novel multiview learning method based on the Gaussian process latent variable model (GPLVM) is proposed. In contrast to existing GPLVM methods which only assume that there are transformations from the latent variable to the multiple observed inputs, our proposed method simultaneously takes a back constraint into account, encoding multiple observations to the latent variable by enjoying the Gaussian process (GP) prior. Particularly, to overcome the difficulty of the covariance matrix calculation in the encoder, a linear projection is designed to map different observations to a consistent subspace first. The obtained variable in this subspace is then projected to the latent variable in the manifold space with the GP prior. Furthermore, different from most GPLVM methods which strongly assume that the covariance matrices follow a certain kernel function, for example, radial basis function (RBF), we introduce a multikernel strategy to design the covariance matrix, being more reasonable and adaptive for the data representation. In order to apply the presented approach to the classification, a discriminative prior is also embedded to the learned latent variables to encourage samples belonging to the same category to be close and those belonging to different categories to be far. Experimental results on three real-world databases substantiate the effectiveness and superiority of the proposed method compared with state-of-the-art approaches.
Collapse
|
8
|
A diversified shared latent variable model for efficient image characteristics extraction and modelling. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
9
|
Li J, Zhang B, Lu G, You J, Xu Y, Wu F, Zhang D. Relaxed Asymmetric Deep Hashing Learning: Point-to-Angle Matching. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4791-4805. [PMID: 31902779 DOI: 10.1109/tnnls.2019.2958061] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Due to the powerful capability of the data representation, deep learning has achieved a remarkable performance in supervised hash function learning. However, most of the existing hashing methods focus on point-to-point matching that is too strict and unnecessary. In this article, we propose a novel deep supervised hashing method by relaxing the matching between each pair of instances to a point-to-angle way. Specifically, an inner product is introduced to asymmetrically measure the similarity and dissimilarity between the real-valued output and the binary code. Different from existing methods that strictly enforce each element in the real-valued output to be either +1 or -1, we only encourage the output to be close to its corresponding semantic-related binary code under the cross-angle. This asymmetric product not only projects both the real-valued output and the binary code into the same Hamming space but also relaxes the output with wider choices. To further exploit the semantic affinity, we propose a novel Hamming-distance-based triplet loss, efficiently making a ranking for the positive and negative pairs. An algorithm is then designed to alternatively achieve optimal deep features and binary codes. Experiments on four real-world data sets demonstrate the effectiveness and superiority of our approach to the state of the art.
Collapse
|
10
|
Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin CW. Deep learning on image denoising: An overview. Neural Netw 2020; 131:251-275. [PMID: 32829002 DOI: 10.1016/j.neunet.2020.07.025] [Citation(s) in RCA: 197] [Impact Index Per Article: 39.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/17/2020] [Accepted: 07/21/2020] [Indexed: 01/19/2023]
Abstract
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analyses. Finally, we point out some potential challenges and directions of future research.
Collapse
Affiliation(s)
- Chunwei Tian
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China
| | - Lunke Fei
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Wenxian Zheng
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, Guangdong, China
| | - Yong Xu
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Chia-Wen Lin
- Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan
| |
Collapse
|
11
|
Wang Z, Chen L, Zhang J, Yin Y, Li D. Multi-view ensemble learning with empirical kernel for heart failure mortality prediction. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2020; 36:e3273. [PMID: 31680466 DOI: 10.1002/cnm.3273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/30/2019] [Accepted: 09/30/2019] [Indexed: 06/10/2023]
Abstract
Heart failure (HF) refers to the heart's inability to pump sufficient blood to maintain the body's needs, which has a very serious impact on human health. In recent years, the prevalence of HF has remained high. This paper proposes a multi-view ensemble learning algorithm based on empirical kernel mapping called MVE-EK, which predicts the mortality of patient through hospital records. Multi-view ensemble learning can take advantage of the consistency and complementarity of different views. The MVE-EK first divides the patient's features into multiple views and then divides the samples of each view to multiple subsets through under sampling, which can reduce the imbalance rate of the original dataset and obtain some relatively balanced subsets. Each subset is mapped into kernel space by empirical kernel mapping, which can map samples from linearly inseparable spaces to linearly separable spaces. Finally, the multi-view ensemble learning is performed by the designed loss of acquaintance between views. The effectiveness of the algorithm is verified on the three datasets of HF patient in the real world. The performance of the algorithm is better than other comparison algorithms. The datasets are collected from Shanghai Shuguang Hospital and involve 10 203 hospitalization records for 4682 HF patients between March 2009 and April 2016. The prediction information provided by the algorithm can assist the clinician in providing a more personalized treatment plan for patients with HF.
Collapse
Affiliation(s)
- Zhe Wang
- Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, People's Republic of China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Lilong Chen
- Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, People's Republic of China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Jing Zhang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Yichao Yin
- Information Center, Shanghai Shuguang Hospital, Shanghai, People's Republic of China
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, People's Republic of China
| |
Collapse
|