1
|
Pang M, Wang B, Ye M, Cheung YM, Zhou Y, Huang W, Wen B. Heterogeneous Prototype Learning From Contaminated Faces Across Domains via Disentangling Latent Factors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7169-7183. [PMID: 38691434 DOI: 10.1109/tnnls.2024.3393072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
This article studies an emerging practical problem called heterogeneous prototype learning (HPL). Unlike the conventional heterogeneous face synthesis (HFS) problem that focuses on precisely translating a face image from a source domain to another target one without removing facial variations, HPL aims at learning the variation-free prototype of an image in the target domain while preserving the identity characteristics. HPL is a compounded problem involving two cross-coupled subproblems, that is, domain transfer and prototype learning (PL), thus making most of the existing HFS methods that simply transfer the domain style of images unsuitable for HPL. To tackle HPL, we advocate disentangling the prototype and domain factors in their respective latent feature spaces and then replacing the source domain with the target one for generating a new heterogeneous prototype. In doing so, the two subproblems in HPL can be solved jointly in a unified manner. Based on this, we propose a disentangled HPL framework, dubbed DisHPL, which is composed of one encoder-decoder generator and two discriminators. The generator and discriminators play adversarial games such that the generator embeds contaminated images into a prototype feature space only capturing identity information and a domain-specific feature space, while generating realistic-looking heterogeneous prototypes. Experiments on various heterogeneous datasets with diverse variations validate the superiority of DisHPL.
Collapse
|
2
|
Zhao R, Zhu M, Wang N, Gao X. Few-Shot Face Stylization via GAN Prior Distillation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4492-4503. [PMID: 38536698 DOI: 10.1109/tnnls.2024.3377609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Face stylization has made notable progress in recent years. However, when training on limited data, the performance of existing approaches significantly declines. Although some studies have attempted to tackle this problem, they either failed to achieve the few-shot setting (less than 10) or can only get suboptimal results. In this article, we propose GAN Prior Distillation (GPD) to enable effective few-shot face stylization. GPD contains two models: a teacher network with GAN Prior and a student network that fulfills end-to-end translation. Specifically, we adapt the teacher network trained on large-scale data in the source domain to the target domain using a handful of samples, where it can learn the target domain's knowledge. Then, we can achieve few-shot augmentation by generating source domain and target domain images simultaneously with the same latent codes. We propose an anchor-based knowledge distillation module that can fully use the difference between the training and the augmented data to distill the knowledge of the teacher network into the student network. The trained student network achieves excellent generalization performance with the absorption of additional knowledge. Qualitative and quantitative experiments demonstrate that our method achieves superior results than state-of-the-art approaches in a few-shot setting.
Collapse
|
3
|
Zhang M, Bai H, Shang W, Guo J, Li Y, Gao X. MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality Enhancement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2410-2422. [PMID: 38285580 DOI: 10.1109/tnnls.2024.3354982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.
Collapse
|
4
|
Kong X, Deng Y, Tang F, Dong W, Ma C, Chen Y, He Z, Xu C. Exploring the Temporal Consistency of Arbitrary Style Transfer: A Channelwise Perspective. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8482-8496. [PMID: 37018565 DOI: 10.1109/tnnls.2022.3230084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Arbitrary image stylization by neural networks has become a popular topic, and video stylization is attracting more attention as an extension of image stylization. However, when image stylization methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted a detailed and comprehensive analysis of the cause of such flickering effects. Systematic comparisons among typical neural style transfer approaches show that the feature migration modules for state-of-the-art (SOTA) learning systems are ill-conditioned and could lead to a channelwise misalignment between the input content representations and the generated frames. Unlike traditional methods that relieve the misalignment via additional optical flow constraints or regularization modules, we focus on keeping the temporal consistency by aligning each output frame with the input frame. To this end, we propose a simple yet efficient multichannel correlation network (MCCNet), to ensure that output frames are directly aligned with inputs in the hidden feature space while maintaining the desired style patterns. An inner channel similarity loss is adopted to eliminate side effects caused by the absence of nonlinear operations such as softmax for strict alignment. Furthermore, to improve the performance of MCCNet under complex light conditions, we introduce an illumination loss during training. Qualitative and quantitative evaluations demonstrate that MCCNet performs well in arbitrary video and image style transfer tasks. Code is available at https://github.com/kongxiuxiu/MCCNetV2.
Collapse
|
5
|
Roizman V, Jonckheere M, Pascal F. A Flexible EM-Like Clustering Algorithm for Noisy Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2709-2721. [PMID: 38015701 DOI: 10.1109/tpami.2023.3337195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Though very popular, it is well known that the Expectation-Maximisation (EM) algorithm for the Gaussian mixture model performs poorly for non-Gaussian distributions or in the presence of outliers or noise. In this paper, we propose a Flexible EM-like Clustering Algorithm (FEMCA): a new clustering algorithm following an EM procedure is designed. It is based on both estimations of cluster centers and covariances. In addition, using a semi-parametric paradigm, the method estimates an unknown scale parameter per data point. This allows the algorithm to accommodate heavier tail distributions, noise, and outliers without significantly losing efficiency in various classical scenarios. We first present the general underlying model for independent, but not necessarily identically distributed, samples of elliptical distributions. We then derive and analyze the proposed algorithm in this context, showing in particular important distribution-free properties of the underlying data distributions. The algorithm convergence and accuracy properties are analyzed by considering the first synthetic data. Finally, we show that FEMCA outperforms other classical unsupervised methods of the literature, such as k-means, EM for Gaussian mixture models, and its recent modifications or spectral clustering when applied to real data sets as MNIST, NORB, and 20newsgroups.
Collapse
|
6
|
Melnik A, Miasayedzenkau M, Makaravets D, Pirshtuk D, Akbulut E, Holzmann D, Renusch T, Reichert G, Ritter H. Face Generation and Editing With StyleGAN: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3557-3576. [PMID: 38224501 DOI: 10.1109/tpami.2024.3350004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face stylization, face restoration, and even Deepfake applications. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
Collapse
|
7
|
Lin F, Bao K, Li Y, Zeng D, Ge S. Learning Contrast-Enhanced Shape-Biased Representations for Infrared Small Target Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3047-3058. [PMID: 38656838 DOI: 10.1109/tip.2024.3391011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Detecting infrared small targets under cluttered background is mainly challenged by dim textures, low contrast and varying shapes. This paper proposes an approach to facilitate infrared small target detection by learning contrast-enhanced shape-biased representations. The approach cascades a contrast-shape encoder and a shape-reconstructable decoder to learn discriminative representations that can effectively identify target objects. The contrast-shape encoder applies a stem of central difference convolutions and a few large-kernel convolutions to extract shape-preserving features from input infrared images. This specific design in convolutions can effectively overcome the challenges of low contrast and varying shapes in a unified way. Meanwhile, the shape-reconstructable decoder accepts the edge map of input infrared image and is learned by simultaneously optimizing two shape-related consistencies: the internal one decodes the encoder representations by upsampling reconstruction and constraints segmentation consistency, whilst the external one cascades three gated ResNet blocks to hierarchically fuse edge maps and decoder representations and constrains contour consistency. This decoding way can bypass the challenge of dim texture and varying shapes. In our approach, the encoder and decoder are learned in an end-to-end manner, and the resulting shape-biased encoder representations are suitable for identifying infrared small targets. Extensive experimental evaluations are conducted on public benchmarks and the results demonstrate the effectiveness of our approach.
Collapse
|
8
|
Zhang M, Wu Q, Guo J, Li Y, Gao X. Heat Transfer-Inspired Network for Image Super-Resolution Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1810-1820. [PMID: 35776820 DOI: 10.1109/tnnls.2022.3185529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Image super-resolution (SR) is a critical image preprocessing task for many applications. How to recover features as accurately as possible is the focus of SR algorithms. Most existing SR methods tend to guide the image reconstruction process with gradient maps, frequency perception modules, etc. and improve the quality of recovered images from the perspective of enhancing edges, but rarely optimize the neural network structure from the system level. In this article, we conduct an in- depth exploration for the inner nature of the SR network structure. In light of the consistency between thermal particles in the thermal field and pixels in the image domain, we propose a novel heat-transfer-inspired network (HTI-Net) for image SR reconstruction based on the theoretical basis of heat transfer. With the finite difference theory, we use a second-order mixed-difference equation to redesign the residual network (ResNet), which can fully integrate multiple information to achieve better feature reuse. In addition, according to the thermal conduction differential equation (TCDE) in the thermal field, the pixel value flow equation (PVFE) in the image domain is derived to mine deep potential feature information. The experimental results on multiple standard databases demonstrate that the proposed HTI-Net has superior edge detail reconstruction effect and parameter performance compared with the existing SR methods. The experimental results on the microscope chip image (MCI) database consisting of realistic low-resolution (LR) and high-resolution (HR) images show that the proposed HTI-Net for image SR reconstruction can improve the effectiveness of the hardware Trojan detection system.
Collapse
|
9
|
Zhang M, Xin J, Zhang J, Tao D, Gao X. Curvature Consistent Network for Microscope Chip Image Super-Resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10538-10551. [PMID: 35482691 DOI: 10.1109/tnnls.2022.3168540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detecting hardware Trojan (HT) from a microscope chip image (MCI) is crucial for many applications, such as financial infrastructure and transport security. It takes an inordinate cost in scanning high-resolution (HR) microscope images for HT detection. It is useful when the chip image is in low-resolution (LR), which can be acquired faster and at a lower cost than its HR counterpart. However, the lost details and noises due to the electric charge effect in LR MCIs will affect the detection performance, making the problem more challenging. In this article, we address this issue by first discussing why recovering curvature information matters for HT detection and then proposing a novel MCI super-resolution (SR) method via a curvature consistent network (CCN). It consists of a homogeneous workflow and a heterogeneous workflow, where the former learns a mapping between homogeneous images, i.e., LR and HR MCIs, and the latter learns a mapping between heterogeneous images, i.e., MCIs and curvature images. Besides, a collaborative fusion strategy is used to leverage features learned from both workflows level-by-level by recovering the HR image eventually. To mitigate the issue of lacking an MCI dataset, we construct a new benchmark consisting of realistic MCIs at different resolutions, called MCI. Experiments on MCI demonstrate that the proposed CCN outperforms representative SR methods by recovering more delicate circuit lines and yields higher HT detection performance. The dataset is available at github.com/RuiZhang97/CCN.
Collapse
|
10
|
Ma Z, Lin T, Li X, Li F, He D, Ding E, Wang N, Gao X. Dual-Affinity Style Embedding Network for Semantic-Aligned Image Style Transfer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7404-7417. [PMID: 35108207 DOI: 10.1109/tnnls.2022.3143356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Image style transfer aims at synthesizing an image with the content from one image and the style from another. User studies have revealed that the semantic correspondence between style and content greatly affects subjective perception of style transfer results. While current studies have made great progress in improving the visual quality of stylized images, most methods directly transfer global style statistics without considering semantic alignment. Current semantic style transfer approaches still work in an iterative optimization fashion, which is impractically computationally expensive. Addressing these issues, we introduce a novel dual-affinity style embedding network (DaseNet) to synthesize images with style aligned at semantic region granularity. In the dual-affinity module, feature correlation and semantic correspondence between content and style images are modeled jointly for embedding local style patterns according to semantic distribution. Furthermore, the semantic-weighted style loss and the region-consistency loss are introduced to ensure semantic alignment and content preservation. With the end-to-end network architecture, DaseNet can well balance visual quality and inference efficiency for semantic style transfer. Experimental results on different scene categories have demonstrated the effectiveness of the proposed method.
Collapse
|
11
|
Kong F, Pu Y, Lee I, Nie R, Zhao Z, Xu D, Qian W, Liang H. Unpaired Artistic Portrait Style Transfer via Asymmetric Double-Stream GAN. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5427-5439. [PMID: 37459266 DOI: 10.1109/tnnls.2023.3263846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
With the development of image style transfer technologies, portrait style transfer has attracted growing attention in this research community. In this article, we present an asymmetric double-stream generative adversarial network (ADS-GAN) to solve the problems that caused by cartoonization and other style transfer techniques when they are applied to portrait photos, such as facial deformation, contours missing, and stiff lines. By observing the characteristics between source and target images, we propose an edge contour retention (ECR) regularized loss to constrain the local and global contours of generated portrait images to avoid the portrait deformation. In addition, a content-style feature fusion module is introduced for further learning of the target image style, which uses a style attention mechanism to integrate features and embeds style features into content features of portrait photos according to the attention weights. Finally, a guided filter is introduced in content encoder to smooth the textures and specific details of source image, thereby eliminating its negative impact on style transfer. We conducted overall unified optimization training on all components and got an ADS-GAN for unpaired artistic portrait style transfer. Qualitative comparisons and quantitative analyses demonstrate that the proposed method generates superior results than benchmark work in preserving the overall structure and contours of portrait; ablation and parameter study demonstrate the effectiveness of each component in our framework.
Collapse
|
12
|
Zhang M, Wu Q, Zhang J, Gao X, Guo J, Tao D. Fluid Micelle Network for Image Super-Resolution Reconstruction. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:578-591. [PMID: 35442898 DOI: 10.1109/tcyb.2022.3163294] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Most existing convolutional neural-network-based super-resolution (SR) methods focus on designing effective neural blocks but rarely describe the image SR mechanism from the perspective of image evolution in the SR process. In this study, we explore a new research routine by abstracting the movement of pixels in the reconstruction process as the flow of fluid in the field of fluid dynamics (FD), where explicit motion laws of particles have been discovered. Specifically, a novel fluid micelle network is devised for image SR based on the theory of FD that follows the residual learning scheme but learns the residual structure by solving the finite difference equation in FD. The pixel motion equation in the SR process is derived from the Navier-Stokes (N-S) FD equation, establishing a guided branch that is aware of edge information. Thus, the second-order residual drives the network for feature extraction, and the guided branch corrects the direction of the pixel stream to supplement the details. Experiments on popular benchmarks and a real-world microscope chip image dataset demonstrate that the proposed method outperforms other modern methods in terms of both objective metrics and visual quality. The proposed method can also reconstruct clear geometric structures, offering the potential for real-world applications.
Collapse
|
13
|
Li P, Sheng B, Chen CLP. Face Sketch Synthesis Using Regularized Broad Learning System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5346-5360. [PMID: 33852397 DOI: 10.1109/tnnls.2021.3070463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
There are two main categories of face sketch synthesis: data- and model-driven. The data-driven method synthesizes sketches from training photograph-sketch patches at the cost of detail loss. The model-driven method can preserve more details, but the mapping from photographs to sketches is a time-consuming training process, especially when the deep structures require to be refined. We propose a face sketch synthesis method via regularized broad learning system (RBLS). The broad learning-based system directly transforms photographs into sketches with rich details preserved. Also, the incremental learning scheme of broad learning system (BLS) ensures that our method easily increases feature mappings and remodels the network without retraining when the extracted feature mapping nodes are not sufficient. Besides, a Bayesian estimation-based regularization is introduced with the BLS to aid further feature selection and improve the generalization ability and robustness. Various experiments on the CUHK student data set and Aleix Robert (AR) data set demonstrated the effectiveness and efficiency of our RBLS method. Unlike existing methods, our method synthesizes high-quality face sketches much efficiently and greatly reduces computational complexity both in the training and test processes.
Collapse
|
14
|
Shared Dictionary Learning Via Coupled Adaptations for Cross-Domain Classification. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10967-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
15
|
Nie L, Liu L, Wu Z, Kang W. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
Zhang Y, Cheung YM. Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3560-3576. [PMID: 33534702 DOI: 10.1109/tpami.2021.3056510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e., nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.
Collapse
|
17
|
Edge-Preserving Convolutional Generative Adversarial Networks for SAR-to-Optical Image Translation. REMOTE SENSING 2021. [DOI: 10.3390/rs13183575] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the ability for all-day, all-weather acquisition, synthetic aperture radar (SAR) remote sensing is an important technique in modern Earth observation. However, the interpretation of SAR images is a highly challenging task, even for well-trained experts, due to the imaging principle of SAR images and the high-frequency speckle noise. Some image-to-image translation methods are used to convert SAR images into optical images that are closer to what we perceive through our eyes. There exist two weaknesses in these methods: (1) these methods are not designed for an SAR-to-optical translation task, thereby losing sight of the complexity of SAR images and the speckle noise. (2) The same convolution filters in a standard convolution layer are utilized for the whole feature maps, which ignore the details of SAR images in each window and generate images with unsatisfactory quality. In this paper, we propose an edge-preserving convolutional generative adversarial network (EPCGAN) to enhance the structure and aesthetics of the output image by leveraging the edge information of the SAR image and implementing content-adaptive convolution. The proposed edge-preserving convolution (EPC) decomposes the content of the convolution input into texture components and content components and then generates a content-adaptive kernel to modify standard convolutional filter weights for the content components. Based on the EPC, the EPCGAN is presented for SAR-to-optical image translation. It uses a gradient branch to assist in the recovery of structural image information. Experiments on the SEN1-2 dataset demonstrated that the proposed method can outperform other SAR-to-optical methods by recovering more structures and yielding a superior evaluation index.
Collapse
|
18
|
Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q. Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4350-4362. [PMID: 32149668 DOI: 10.1109/tcyb.2020.2972944] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It covers wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this article, we propose using facial composition information to help the synthesis of face sketch/photo. Especially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs, including a face photo/sketch and the corresponding pixelwise face labels for generating a sketch/photo. Next, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. In addition, we employ a perceptual loss function to encourage the synthesized image and real image to be perceptually similar. Finally, we use stacked CA-GANs (SCA-GANs) to further rectify defects and add compelling details. The experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. In addition, our method significantly decreases the best previous Fréchet inception distance (FID) from 36.2 to 26.2 for sketch synthesis, and from 60.9 to 30.5 for photo synthesis. Besides, we demonstrate that the proposed method is of considerable generalization ability.
Collapse
|
19
|
Wan W, Yang Y, Lee HJ. Generative adversarial learning for detail-preserving face sketch synthesis. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
20
|
Gao F, Xu X, Yu J, Shang M, Li X, Tao D. Complementary, Heterogeneous and Adversarial Networks for Image-to-Image Translation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3487-3498. [PMID: 33646952 DOI: 10.1109/tip.2021.3061286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Image-to-image translation is to transfer images from a source domain to a target domain. Conditional Generative Adversarial Networks (GANs) have enabled a variety of applications. Initial GANs typically conclude one single generator for generating a target image. Recently, using multiple generators has shown promising results in various tasks. However, generators in these works are typically of homogeneous architectures. In this paper, we argue that heterogeneous generators are complementary to each other and will benefit the generation of images. By heterogeneous, we mean that generators are of different architectures, focus on diverse positions, and perform over multiple scales. To this end, we build two generators by using a deep U-Net and a shallow residual network, respectively. The former concludes a series of down-sampling and up-sampling layers, which typically have large perception field and great spatial locality. In contrast, the residual network has small perceptual fields and works well in characterizing details, especially textures and local patterns. Afterwards, we use a gated fusion network to combine these two generators for producing a final output. The gated fusion unit automatically induces heterogeneous generators to focus on different positions and complement each other. Finally, we propose a novel approach to integrate multi-level and multi-scale features in the discriminator. This multi-layer integration discriminator encourages generators to produce realistic details from coarse to fine scales. We quantitatively and qualitatively evaluate our model on various benchmark datasets. Experimental results demonstrate that our method significantly improves the quality of transferred images, across a variety of image-to-image translation tasks. We have made our code and results publicly available: http://aiart.live/chan/.
Collapse
|
21
|
|