1
|
Wang Q, Li B, Li X, Cao B, Ma L, Lu H, Jia X. CharacterFactory: Sampling Consistent Characters With GANs for Diffusion Models. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:2544-2559. [PMID: 40227897 DOI: 10.1109/tip.2025.3558668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consider the word embeddings of celeb names as ground truths for the identity-consistent generation task and train a GAN model to learn the mapping from a latent space to the celeb embedding space. In addition, we design a context-consistent loss to ensure that the generated identity embeddings can produce identity-consistent images in various contexts. Remarkably, the whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference. Extensive experiments demonstrate excellent performance of the proposed CharacterFactory on character creation in terms of identity consistency and editability. Furthermore, the generated characters can be seamlessly combined with the off-the-shelf image/video/3D diffusion models. We believe that the proposed CharacterFactory is an important step for identity-consistent character generation. Code and Gradio demo are available at: https://qinghew.github.io/CharacterFactory/.
Collapse
|
2
|
Shen J, Yuan L, Lu Y, Lyu S. Leveraging Predictions of Task-Related Latents for Interactive Visual Navigation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:704-717. [PMID: 38039173 DOI: 10.1109/tnnls.2023.3335416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2023]
Abstract
Interactive visual navigation (IVN) involves tasks where embodied agents learn to interact with the objects in the environment to reach the goals. Current approaches exploit visual features to train a reinforcement learning (RL) navigation control policy network. However, RL-based methods continue to struggle at the IVN tasks as they are inefficient in learning a good representation of the unknown environment in partially observable settings. In this work, we introduce predictions of task-related latents (PTRLs), a flexible self-supervised RL framework for IVN tasks. PTRL learns the latent structured information about environment dynamics and leverages multistep representations of the sequential observations. Specifically, PTRL trains its representation by explicitly predicting the next pose of the agent conditioned on the actions. Moreover, an attention and memory module is employed to associate the learned representation to each action and exploit spatiotemporal dependencies. Furthermore, a state value boost module is introduced to adapt the model to previously unseen environments by leveraging input perturbations and regularizing the value function. Sample efficiency in the training of RL networks is enhanced by modular training and hierarchical decomposition. Extensive evaluations have proved the superiority of the proposed method in increasing the accuracy and generalization capacity.
Collapse
|
3
|
Yang S, Dong M, Wang Y, Xu C. Adversarial Recurrent Time Series Imputation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1639-1650. [PMID: 32749970 DOI: 10.1109/tnnls.2020.3010524] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
For the real-world time series analysis, data missing is a ubiquitously existing problem due to anomalies during data collecting and storage. If not treated properly, this problem will seriously hinder the classification, regression, or related tasks. Existing methods for time series imputation either impose too strong assumptions on the distribution of missing data or cannot fully exploit, even simply ignore, the informative temporal dependencies and feature correlations across different time steps. In this article, inspired by the idea of conditional generative adversarial networks, we propose a generative adversarial learning framework for time series imputation under the condition of observed data (as well as the labels, if possible). In our model, we employ a modified bidirectional RNN structure as the generator G, which is aimed at generating the missing values by taking advantage of the temporal and nontemporal information extracted from the observed time series. The discriminator D is designed to distinguish whether each value in a time series is generated or not so that it can help the generator to make an adjustment toward a more authentic imputation result. For an empirical verification of our model, we conduct imputation and classification experiments on several real-world time series data sets. The experimental results show an eminent improvement compared with state-of-the-art baseline models.
Collapse
|
4
|
Wang X, Yang Y, Wang W, Zhou Y, Yin Y, Gong Z. Generative adversarial networks based motion learning towards robotic calligraphy synthesis. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
|
5
|
Maldonado-Romo J, Maldonado-Romo A, Aldape-Pérez M. Path Generator with Unpaired Samples Employing Generative Adversarial Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:9411. [PMID: 36502113 PMCID: PMC9738659 DOI: 10.3390/s22239411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/12/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Interactive technologies such as augmented reality have grown in popularity, but specialized sensors and high computer power must be used to perceive and analyze the environment in order to obtain an immersive experience in real time. However, these kinds of implementations have high costs. On the other hand, machine learning has helped create alternative solutions for reducing costs, but it is limited to particular solutions because the creation of datasets is complicated. Due to this problem, this work suggests an alternate strategy for dealing with limited information: unpaired samples from known and unknown surroundings are used to generate a path on embedded devices, such as smartphones, in real time. This strategy creates a path that avoids virtual elements through physical objects. The authors suggest an architecture for creating a path using imperfect knowledge. Additionally, an augmented reality experience is used to describe the generated path, and some users tested the proposal to evaluate the performance. Finally, the primary contribution is the approximation of a path produced from a known environment by using an unpaired dataset.
Collapse
Affiliation(s)
- Javier Maldonado-Romo
- Institute of Advanced Materials and Sustainable Manufacturing, Tecnologico de Monterrey, Mexico City 14380, Mexico
- Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos, Juan de Dios Bátiz s/n esq. Miguel Othón de Mendizábal, Mexico City 07700, Mexico
| | - Alberto Maldonado-Romo
- Centro de Investigación en Computación, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos, Juan de Dios Bátiz s/n esq. Miguel Othón de Mendizábal, Mexico City 07700, Mexico
| | - Mario Aldape-Pérez
- Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos, Juan de Dios Bátiz s/n esq. Miguel Othón de Mendizábal, Mexico City 07700, Mexico
| |
Collapse
|
6
|
Artistic Neural Style Transfer using CycleGAN and FABEMD by adaptive information selection. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Zhao J, Lee F, Hu C, Yu H, Chen Q. LDA-GAN: Lightweight Domain-attention GAN for Unpaired Image-to-Image Translation. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.07.084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
8
|
Juefei-Xu F, Wang R, Huang Y, Guo Q, Ma L, Liu Y. Countering Malicious DeepFakes: Survey, Battleground, and Horizon. Int J Comput Vis 2022; 130:1678-1734. [PMID: 35528632 PMCID: PMC9066404 DOI: 10.1007/s11263-022-01606-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 03/11/2022] [Indexed: 11/28/2022]
Abstract
The creation or manipulation of facial appearance through deep generative approaches, known as DeepFake, have achieved significant progress and promoted a wide range of benign and malicious applications, e.g., visual effect assistance in movie and misinformation generation by faking famous persons. The evil side of this new technique poses another popular study, i.e., DeepFake detection aiming to identify the fake faces from the real ones. With the rapid development of the DeepFake-related studies in the community, both sides (i.e., DeepFake generation and detection) have formed the relationship of battleground, pushing the improvements of each other and inspiring new directions, e.g., the evasion of DeepFake detection. Nevertheless, the overview of such battleground and the new direction is unclear and neglected by recent surveys due to the rapid increase of related publications, limiting the in-depth understanding of the tendency and future works. To fill this gap, in this paper, we provide a comprehensive overview and detailed analysis of the research work on the topic of DeepFake generation, DeepFake detection as well as evasion of DeepFake detection, with more than 318 research papers carefully surveyed. We present the taxonomy of various DeepFake generation methods and the categorization of various DeepFake detection methods, and more importantly, we showcase the battleground between the two parties with detailed interactions between the adversaries (DeepFake generation) and the defenders (DeepFake detection). The battleground allows fresh perspective into the latest landscape of the DeepFake research and can provide valuable analysis towards the research challenges and opportunities as well as research trends and future directions. We also elaborately design interactive diagrams (http://www.xujuefei.com/dfsurvey) to allow researchers to explore their own interests on popular DeepFake generators or detectors.
Collapse
Affiliation(s)
| | - Run Wang
- Key Laboratory of Aerospace Information Security and Trust Computing, School of Cyber Science and Engineering, Wuhan University, Wuhan, China
| | - Yihao Huang
- East China Normal University, Shanghai, China
| | - Qing Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
- Nanyang Technological University, Singapore, Singapore
| | - Lei Ma
- Alberta Machine Intelligence Institute (AMII), University of Alberta, Edmonton, AB Canada
| | - Yang Liu
- Nanyang Technological University, Singapore, Singapore
- Zhejiang Sci-Tech University, Hangzhou, China
| |
Collapse
|
9
|
Wei T, Chen D, Zhou W, Liao J, Zhang W, Yuan L, Hua G, Yu N. E2Style: Improve the Efficiency and Effectiveness of StyleGAN Inversion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3267-3280. [PMID: 35439133 DOI: 10.1109/tip.2022.3167305] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper studies the problem of StyleGAN inversion, which plays an essential role in enabling the pretrained StyleGAN to be used for real image editing tasks. The goal of StyleGAN inversion is to find the exact latent code of the given image in the latent space of StyleGAN. This problem has a high demand for quality and efficiency. Existing optimization-based methods can produce high-quality results, but the optimization often takes a long time. On the contrary, forward-based methods are usually faster but the quality of their results is inferior. In this paper, we present a new feed-forward network "E2Style" for StyleGAN inversion, with significant improvement in terms of efficiency and effectiveness. In our inversion network, we introduce: 1) a shallower backbone with multiple efficient heads across scales; 2) multi-layer identity loss and multi-layer face parsing loss to the loss function; and 3) multi-stage refinement. Combining these designs together forms an effective and efficient method that exploits all benefits of optimization-based and forward-based methods. Quantitative and qualitative results show that our E2Style performs better than existing forward-based methods and comparably to state-of-the-art optimization-based methods while maintaining the high efficiency as well as forward-based methods. Moreover, a number of real image editing applications demonstrate the efficacy of our E2Style. Our code is available at https://github.com/wty-ustc/e2style.
Collapse
|
10
|
Perceptual adversarial non-residual learning for blind image denoising. Soft comput 2022. [DOI: 10.1007/s00500-022-06853-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Liao YS, Huang CR. Semantic Context-Aware Image Style Transfer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1911-1923. [PMID: 35143399 DOI: 10.1109/tip.2022.3149237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
To provide semantic image style transfer results which are consistent with human perception, transferring styles of semantic regions of the style image to their corresponding semantic regions of the content image is necessary. However, when the object categories between the content and style images are not the same, it is difficult to match semantic regions between two images for semantic image style transfer. To solve the semantic matching problem and guide the semantic image style transfer based on matched regions, we propose a novel semantic context-aware image style transfer method by performing semantic context matching followed by a hierarchical local-to-global network architecture. The semantic context matching aims to obtain the corresponding regions between the content and style images by using context correlations of different object categories. Based on the matching results, we retrieve semantic context pairs where each pair is composed of two semantically matched regions from the content and style images. To achieve semantic context-aware style transfer, a hierarchical local-to-global network architecture, which contains two sub-networks including the local context network and the global context network, is proposed. The former focuses on style transfer for each semantic context pair from the style image to the content image, and generates a local style transfer image storing the detailed style feature representations for corresponding semantic regions. The latter aims to derive the stylized image by considering the content, the style, and the intermediate local style transfer images, so that inconsistency between different corresponding semantic regions can be addressed and solved. The experimental results show that the stylized results using our method are more consistent with human perception compared with the state-of-the-art methods.
Collapse
|
12
|
Liu S, Tang K, Yao X. Generative Adversarial Construction of Parallel Portfolios. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:784-795. [PMID: 32356768 DOI: 10.1109/tcyb.2020.2984546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Since automatic algorithm configuration methods have been very effective, recently there is increasing research interest in utilizing them for automatic solver construction, resulting in several notable approaches. For these approaches, a basic assumption is that the given training set could sufficiently represent the target use cases such that the constructed solvers can generalize well. However, such an assumption does not always hold in practice since in some cases, we might only have scarce and biased training data. This article studies effective construction approaches for the parallel algorithm portfolios that are less affected in these cases. Unlike previous approaches, the proposed approach simultaneously considers instance generation and portfolio construction in an adversarial process, in which the aim of the former is to generate instances that are challenging for the current portfolio, while the aim of the latter is to find a new component solver for the portfolio to better solve the newly generated instances. Applied to two widely studied problem domains, that is, the Boolean satisfiability problems (SAT) and the traveling salesman problems (TSPs), the proposed approach identified parallel portfolios with much better generalization than the ones generated by the existing approaches when the training data were scarce and biased. Moreover, it was further demonstrated that the generated portfolios could even rival the state-of-the-art manually designed parallel solvers.
Collapse
|
13
|
Multi-CartoonGAN with Conditional Adaptive Instance-Layer Normalization for Conditional Artistic Face Translation. AI 2022. [DOI: 10.3390/ai3010003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In CycleGAN, an image-to-image translation architecture was established without the use of paired datasets by employing both adversarial and cycle consistency loss. The success of CycleGAN was followed by numerous studies that proposed new translation models. For example, StarGAN works as a multi-domain translation model based on a single generator–discriminator pair, while U-GAT-IT aims to close the large face-to-anime translation gap by adapting its original normalization to the process. However, constructing robust and conditional translation models requires tradeoffs when the computational costs of training on graphic processing units (GPUs) are considered. This is because, if designers attempt to implement conditional models with complex convolutional neural network (CNN) layers and normalization functions, the GPUs will need to secure large amounts of memory when the model begins training. This study aims to resolve this tradeoff issue via the development of Multi-CartoonGAN, which is an improved CartoonGAN architecture that can output conditional translated images and adapt to large feature gap translations between the source and target domains. To accomplish this, Multi-CartoonGAN reduces the computational cost by using a pretrained VGGNet to calculate the consistency loss instead of reusing the generator. Additionally, we report on the development of the conditional adaptive layer-instance normalization (CAdaLIN) process for use with our model to make it robust to unique feature translations. We performed extensive experiments using Multi-CartoonGAN to translate real-world face images into three different artistic styles: portrait, anime, and caricature. An analysis of the visualized translated images and GPU computation comparison shows that our model is capable of performing translations with unique style features that follow the conditional inputs and at a reduced GPU computational cost during training.
Collapse
|
14
|
Image Motion Deblurring Based on Deep Residual Shrinkage and Generative Adversarial Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5605846. [PMID: 35096042 PMCID: PMC8799329 DOI: 10.1155/2022/5605846] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/03/2022] [Accepted: 01/04/2022] [Indexed: 01/22/2023]
Abstract
A network structure (DRSN-GAN) is proposed for image motion deblurring that combines a deep residual shrinkage network (DRSN) with a generative adversarial network (GAN) to address the issues of poor noise immunity and low generalizability in deblurring algorithms based solely on GANs. First, an end-to-end approach is used to recover a clear image from a blurred image, without the need to estimate a blurring kernel. Next, a DRSN is used as the generator in a GAN to remove noise from the input image while learning residuals to improve robustness. The BN and ReLU layers in the DRSN were moved to the front of the convolution layer, making the network easier to train. Finally, deblurring performance was verified using the GoPro, Köhler, and Lai datasets. Experimental results showed that deblurred images were produced with more subjective visual effects and a higher objective evaluation, compared with algorithms such as MPRNet. Furthermore, image edge and texture restoration effects were improved along with image quality. Our model produced slightly higher PSNR and SSIM values than the latest MPRNet, as well as increased YOLO detection accuracy. The number of required parameters in the DRSN-GAN was also reduced by 21.89%.
Collapse
|
15
|
Li Z, Deng C, Wei K, Liu W, Tao D. Learning semantic priors for texture-realistic sketch-to-image synthesis. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Li H, Sheng B, Li P, Ali R, Chen CLP. Globally and Locally Semantic Colorization via Exemplar-Based Broad-GAN. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8526-8539. [PMID: 34633929 DOI: 10.1109/tip.2021.3117061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Given a target grayscale image and a reference color image, exemplar-based image colorization aims to generate a visually natural-looking color image by transforming meaningful color information from the reference image to the target image. It remains a challenging problem due to the differences in semantic content between the target image and the reference image. In this paper, we present a novel globally and locally semantic colorization method called exemplar-based conditional broad-GAN, a broad generative adversarial network (GAN) framework, to deal with this limitation. Our colorization framework is composed of two sub-networks: the match sub-net and the colorization sub-net. We reconstruct the target image with a dictionary-based sparse representation in the match sub-net, where the dictionary consists of features extracted from the reference image. To enforce global-semantic and local-structure self-similarity constraints, global-local affinity energy is explored to constrain the sparse representation for matching consistency. Then, the matching information of the match sub-net is fed into the colorization sub-net as the perceptual information of the conditional broad-GAN to facilitate the personalized results. Finally, inspired by the observation that a broad learning system is able to extract semantic features efficiently, we further introduce a broad learning system into the conditional GAN and propose a novel loss, which substantially improves the training stability and the semantic similarity between the target image and the ground truth. Extensive experiments have shown that our colorization approach outperforms the state-of-the-art methods, both perceptually and semantically.
Collapse
|
17
|
PeaceGAN: A GAN-Based Multi-Task Learning Method for SAR Target Image Generation with a Pose Estimator and an Auxiliary Classifier. REMOTE SENSING 2021. [DOI: 10.3390/rs13193939] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Although generative adversarial networks (GANs) are successfully applied to diverse fields, training GANs on synthetic aperture radar (SAR) data is a challenging task due to speckle noise. On the one hand, in a learning perspective of human perception, it is natural to learn a task by using information from multiple sources. However, in the previous GAN works on SAR image generation, information on target classes has only been used. Due to the backscattering characteristics of SAR signals, the structures of SAR images are strongly dependent on their pose angles. Nevertheless, the pose angle information has not been incorporated into GAN models for SAR images. In this paper, we propose a novel GAN-based multi-task learning (MTL) method for SAR target image generation, called PeaceGAN, that has two additional structures, a pose estimator and an auxiliary classifier, at the side of its discriminator in order to effectively combine the pose and class information via MTL. Extensive experiments showed that the proposed MTL framework can help the PeaceGAN’s generator effectively learn the distributions of SAR images so that it can better generate the SAR target images more faithfully at intended pose angles for desired target classes in comparison with the recent state-of-the-art methods.
Collapse
|
18
|
Adversarial Gaussian Denoiser for Multiple-Level Image Denoising. SENSORS 2021; 21:s21092998. [PMID: 33923320 PMCID: PMC8123214 DOI: 10.3390/s21092998] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 04/17/2021] [Accepted: 04/23/2021] [Indexed: 12/28/2022]
Abstract
Image denoising is a challenging task that is essential in numerous computer vision and image processing problems. This study proposes and applies a generative adversarial network-based image denoising training architecture to multiple-level Gaussian image denoising tasks. Convolutional neural network-based denoising approaches come across a blurriness issue that produces denoised images blurry on texture details. To resolve the blurriness issue, we first performed a theoretical study of the cause of the problem. Subsequently, we proposed an adversarial Gaussian denoiser network, which uses the generative adversarial network-based adversarial learning process for image denoising tasks. This framework resolves the blurriness problem by encouraging the denoiser network to find the distribution of sharp noise-free images instead of blurry images. Experimental results demonstrate that the proposed framework can effectively resolve the blurriness problem and achieve significant denoising efficiency than the state-of-the-art denoising methods.
Collapse
|
19
|
Gao F, Xu X, Yu J, Shang M, Li X, Tao D. Complementary, Heterogeneous and Adversarial Networks for Image-to-Image Translation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3487-3498. [PMID: 33646952 DOI: 10.1109/tip.2021.3061286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Image-to-image translation is to transfer images from a source domain to a target domain. Conditional Generative Adversarial Networks (GANs) have enabled a variety of applications. Initial GANs typically conclude one single generator for generating a target image. Recently, using multiple generators has shown promising results in various tasks. However, generators in these works are typically of homogeneous architectures. In this paper, we argue that heterogeneous generators are complementary to each other and will benefit the generation of images. By heterogeneous, we mean that generators are of different architectures, focus on diverse positions, and perform over multiple scales. To this end, we build two generators by using a deep U-Net and a shallow residual network, respectively. The former concludes a series of down-sampling and up-sampling layers, which typically have large perception field and great spatial locality. In contrast, the residual network has small perceptual fields and works well in characterizing details, especially textures and local patterns. Afterwards, we use a gated fusion network to combine these two generators for producing a final output. The gated fusion unit automatically induces heterogeneous generators to focus on different positions and complement each other. Finally, we propose a novel approach to integrate multi-level and multi-scale features in the discriminator. This multi-layer integration discriminator encourages generators to produce realistic details from coarse to fine scales. We quantitatively and qualitatively evaluate our model on various benchmark datasets. Experimental results demonstrate that our method significantly improves the quality of transferred images, across a variety of image-to-image translation tasks. We have made our code and results publicly available: http://aiart.live/chan/.
Collapse
|
20
|
Wang Y, Zhang Z, Hao W, Song C. Multi-Domain Image-to-Image Translation via a Unified Circular Framework. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:670-684. [PMID: 33201817 DOI: 10.1109/tip.2020.3037528] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The image-to-image translation aims to learn the corresponding information between the source and target domains. Several state-of-the-art works have made significant progress based on generative adversarial networks (GANs). However, most existing one-to-one translation methods ignore the correlations among different domain pairs. We argue that there is common information among different domain pairs and it is vital to multiple domain pairs translation. In this paper, we propose a unified circular framework for multiple domain pairs translation, leveraging a shared knowledge module across numerous domains. One selected translation pair can benefit from the complementary information from other pairs, and the sharing knowledge is conducive to mutual learning between domains. Moreover, absolute consistency loss is proposed and applied in the corresponding feature maps to ensure intra-domain consistency. Furthermore, our model can be trained in an end-to-end manner. Extensive experiments demonstrate the effectiveness of our approach on several complex translation scenarios, such as Thermal IR switching, weather changing, and semantic transfer tasks.
Collapse
|
21
|
Li R, Wu CH, Liu S, Wang J, Wang G, Liu G, Zeng B. SDP-GAN: Saliency Detail Preservation Generative Adversarial Networks for High Perceptual Quality Style Transfer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:374-385. [PMID: 33186111 DOI: 10.1109/tip.2020.3036754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The paper proposes a solution to effectively handle salient regions for style transfer between unpaired datasets. Recently, Generative Adversarial Networks (GAN) have demonstrated their potentials of translating images from source domain X to target domain Y in the absence of paired examples. However, such a translation cannot guarantee to generate high perceptual quality results. Existing style transfer methods work well with relatively uniform content, they often fail to capture geometric or structural patterns that always belong to salient regions. Detail losses in structured regions and undesired artifacts in smooth regions are unavoidable even if each individual region is correctly transferred into the target style. In this paper, we propose SDP-GAN, a GAN-based network for solving such problems while generating enjoyable style transfer results. We introduce a saliency network, which is trained with the generator simultaneously. The saliency network has two functions: (1) providing constraints for content loss to increase punishment for salient regions, and (2) supplying saliency features to generator to produce coherent results. Moreover, two novel losses are proposed to optimize the generator and saliency networks. The proposed method preserves the details on important salient regions and improves the total image perceptual quality. Qualitative and quantitative comparisons against several leading prior methods demonstrates the superiority of our method.
Collapse
|
22
|
Abstract
Many image processing, computer graphics, and computer vision problems can be treated as image-to-image translation tasks. Such translation entails learning to map one visual representation of a given input to another representation. Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation, object transfiguration-related translation, etc. However, image-to-image translation techniques suffer from some problems, such as mode collapse, instability, and a lack of diversity. This article provides a comprehensive overview of image-to-image translation based on GAN algorithms and its variants. It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations. Finally, open issues and future research directions utilizing reinforcement learning and three-dimensional (3D) modal translation are summarized and discussed.
Collapse
|
23
|
Liu S, Hong C, He J, Tian Z. Robust Nonparametric Distribution Transfer with Exposure Correction for Image Neural Style Transfer. SENSORS 2020; 20:s20185232. [PMID: 32937788 PMCID: PMC7571219 DOI: 10.3390/s20185232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 09/08/2020] [Accepted: 09/09/2020] [Indexed: 11/16/2022]
Abstract
Image neural style transfer is a process of utilizing convolutional neural networks to render a content image based on a style image. The algorithm can compute a stylized image with original content from the given content image but a new style from the given style image. Style transfer has become a hot topic both in academic literature and industrial applications. The stylized results of current existing models are not ideal because of the color difference between two input images and the inconspicuous details of content image. To solve the problems, we propose two style transfer models based on robust nonparametric distribution transfer. The first model converts the color probability density function of the content image into that of the style image before style transfer. When the color dynamic range of the content image is smaller than that of style image, this model renders more reasonable spatial structure than the existing models. Then, an adaptive detail-enhanced exposure correction algorithm is proposed for underexposed images. Based this, the second model is proposed for the style transfer of underexposed content images. It can further improve the stylized results of underexposed images. Compared with popular methods, the proposed methods achieve the satisfactory qualitative and quantitative results.
Collapse
Affiliation(s)
- Shuai Liu
- Correspondence: ; Tel.: +86-1569-197-3161
| | | | | | | |
Collapse
|
24
|
Ma Z, Li J, Wang N, Gao X. Semantic-related image style transfer with dual-consistency loss. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Chen P, Zhang Y, Tan M, Xiao H, Huang D, Gan C. Generating Visually Aligned Sound from Videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8292-8302. [PMID: 32746241 DOI: 10.1109/tip.2020.3009820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated outside a camera can not be inferred from video content. The model may be forced to learn an incorrect mapping between visual content and these irrelevant sounds. To address this challenge, we propose a framework named REGNET. In this framework, we first extract appearance and motion features from video frames to better distinguish the object that emits sound from complex background information. We then introduce an innovative audio forwarding regularizer that directly considers the real sound as input and outputs bottlenecked sound features. Using both visual and bottlenecked sound features for sound prediction during training provides stronger supervision for the sound prediction. The audio forwarding regularizer can control the irrelevant sound component and thus prevent the model from learning an incorrect mapping between video frames and sound emitted by the object that is out of the screen. During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features. Extensive evaluations based on Amazon Mechanical Turk demonstrate that our method significantly improves both temporal and contentwise alignment. Remarkably, our generated sound can fool the human with a 68.12% success rate. Code and pre-trained models are publicly available at https://github.com/PeihaoChen/regnet.
Collapse
|
26
|
Khan A, Jin W, Ahmad M, Naqvi RA, Wang D. An Input-Perceptual Reconstruction Adversarial Network for Paired Image-to-Image Conversion. SENSORS (BASEL, SWITZERLAND) 2020; 20:E4161. [PMID: 32726915 PMCID: PMC7435982 DOI: 10.3390/s20154161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/22/2020] [Accepted: 07/23/2020] [Indexed: 11/16/2022]
Abstract
Image-to-image conversion based on deep learning techniques is a topic of interest in the fields of robotics and computer vision. A series of typical tasks, such as applying semantic labels to building photos, edges to photos, and raining to de-raining, can be seen as paired image-to-image conversion problems. In such problems, the image generation network learns from the information in the form of input images. The input images and the corresponding targeted images must share the same basic structure to perfectly generate target-oriented output images. However, the shared basic structure between paired images is not as ideal as assumed, which can significantly affect the output of the generating model. Therefore, we propose a novel Input-Perceptual and Reconstruction Adversarial Network (IP-RAN) as an all-purpose framework for imperfect paired image-to-image conversion problems. We demonstrate, through the experimental results, that our IP-RAN method significantly outperforms the current state-of-the-art techniques.
Collapse
Affiliation(s)
- Aamir Khan
- School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China; (A.K.); (D.W.)
| | - Weidong Jin
- School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China; (A.K.); (D.W.)
- China-ASEAN International Joint Laboratory of Integrated Transport, Nanning University, Nanning 530000, China
| | - Muqeet Ahmad
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China;
| | - Rizwan Ali Naqvi
- Department of Unmanned Vehicle Engineering, Sejong University, Seoul 05006, Korea;
| | - Desheng Wang
- School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China; (A.K.); (D.W.)
| |
Collapse
|
27
|
Song L, Xu Y, Zhang L, Du B, Zhang Q, Wang X. Learning from Synthetic Images via Active Pseudo-Labeling. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6452-6465. [PMID: 32386150 DOI: 10.1109/tip.2020.2989100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Synthetic visual data refers to the data automatically rendered by the mature computer graphic algorithms. With the rapid development of these techniques, we can now collect photo-realistic synthetic images with accurate pixel-level annotations without much effort. However, due to the domain gaps between synthetic data and real data, in terms of not only visual appearance but also label distribution, directly applying models trained on synthetic images to real ones can hardly yield satisfactory performance. Since the collection of accurate labels for real images is very laborious and time-consuming, developing algorithms which can learn from synthetic images is of great significance. In this paper, we propose a novel framework, namely Active Pseudo-Labeling (APL), to reduce the domain gaps between synthetic images and real images. In APL framework, we first predict pseudo-labels for the unlabeled real images in the target domain by actively adapting the style of the real images to source domain. Specifically, the style of real images is adjusted via a novel task guided generative model, and then pseudo-labels are predicted for these actively adapted images. Lastly, we fine-tune the source-trained model in the pseudo-labeled target domain, which helps to fit the distribution of the real data. Experiments on both semantic segmentation and object detection tasks with several challenging benchmark data sets demonstrate the priority of our proposed method compared to the existing state-of-the-art approaches.
Collapse
|
28
|
|
29
|
|
30
|
Liang DT, Liang D, Xing SM, Li P, Wu XC. A robot calligraphy writing method based on style transferring algorithm and similarity evaluation. INTEL SERV ROBOT 2019. [DOI: 10.1007/s11370-019-00298-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
An Improved Style Transfer Algorithm Using Feedforward Neural Network for Real-Time Image Conversion. SUSTAINABILITY 2019. [DOI: 10.3390/su11205673] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Creation of art is a complex process for its abstraction and novelty. In order to create those art with less cost, style transfer using advanced machine learning technology becomes a popular method in computer vision field. However, traditional transferred image still troubles with color anamorphosis, content losing, and time-consuming problems. In this paper, we propose an improved style transfer algorithm using the feedforward neural network. The whole network is composed of two parts, a style transfer network and a loss network. The style transfer network owns the ability of directly mapping the content image into the stylized image after training. Content loss, style loss, and Total Variation (TV) loss are calculated by the loss network to update the weight of the style transfer network. Additionally, a cross training strategy is proposed to better preserve the details of the content image. Plenty of experiments are conducted to show the superior performance of our presented algorithm compared to the classic neural style transfer algorithm.
Collapse
|
32
|
Deng C, Yang E, Liu T, Li J, Liu W, Tao D. Unsupervised Semantic-Preserving Adversarial Hashing for Image Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4032-4044. [PMID: 30872226 DOI: 10.1109/tip.2019.2903661] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hashing plays a pivotal role in nearest-neighbor searching for large-scale image retrieval. Recently, deep learning-based hashing methods have achieved promising performance. However, most of these deep methods involve discriminative models, which require large-scale, labeled training datasets, thus hindering their real-world applications. In this paper, we propose a novel strategy to exploit the semantic similarity of the training data and design an efficient generative adversarial framework to learn binary hash codes in an unsupervised manner. Specifically, our model consists of three different neural networks: an encoder network to learn hash codes from images, a generative network to generate images from hash codes, and a discriminative network to distinguish between pairs of hash codes and images. By adversarially training these networks, we successfully learn mutually coherent encoder and generative networks, and can output efficient hash codes from the encoder network. We also propose a novel strategy, which utilizes both feature and neighbor similarities, to construct a semantic similarity matrix, then use this matrix to guide the hash code learning process. Integrating the supervision of this semantic similarity matrix into the adversarial learning framework can efficiently preserve the semantic information of training data in Hamming space. The experimental results on three widely used benchmarks show that our method not only significantly outperforms several state-of-the-art unsupervised hashing methods, but also achieves comparable performance with popular supervised hashing methods.
Collapse
|
33
|
Chen Y, Tao J, Wang J, Chen X, Xie J, Xiong J, Yang K. The Novel Sensor Network Structure for Classification Processing Based on the Machine Learning Method of the ACGAN. SENSORS (BASEL, SWITZERLAND) 2019; 19:E3145. [PMID: 31319556 PMCID: PMC6679324 DOI: 10.3390/s19143145] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Revised: 07/12/2019] [Accepted: 07/16/2019] [Indexed: 11/25/2022]
Abstract
To address the problem of unstable training and poor accuracy in image classification algorithms based on generative adversarial networks (GAN), a novel sensor network structure for classification processing using auxiliary classifier generative adversarial networks (ACGAN) is proposed in this paper. Firstly, the real/fake discrimination of sensor samples in the network has been canceled at the output layer of the discriminative network and only the posterior probability estimation of the sample tag is outputted. Secondly, by regarding the real sensor samples as supervised data and the generative sensor samples as labeled fake data, we have reconstructed the loss function of the generator and discriminator by using the real/fake attributes of sensor samples and the cross-entropy loss function of the label. Thirdly, the pooling and caching method has been introduced into the discriminator to enable more effective extraction of the classification features. Finally, feature matching has been added to the discriminative network to ensure the diversity of the generative sensor samples. Experimental results have shown that the proposed algorithm (CP-ACGAN) achieves better classification accuracy on the MNIST dataset, CIFAR10 dataset and CIFAR100 dataset than other solutions. Moreover, when compared with the ACGAN and CNN classification algorithms, which have the same deep network structure as CP-ACGAN, the proposed method continues to achieve better classification effects and stability than other main existing sensor solutions.
Collapse
Affiliation(s)
- Yuantao Chen
- School of Computer and Communication Engineering & Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China
| | - Jiajun Tao
- School of Computer and Communication Engineering & Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China
| | - Jin Wang
- School of Computer and Communication Engineering & Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China.
- School of Information Science and Engineering, Fujian University of Technology, Fuzhou 350118, China.
| | - Xi Chen
- School of Computer and Communication Engineering & Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China
| | - Jingbo Xie
- Hunan Institute of Scientific and Technical Information, Changsha 410001, China
| | - Jie Xiong
- Electronics & Information School, Yangtze University, Jingzhou 434023, China
| | - Kai Yang
- Technical Quality Department, Hunan ZOOMLION Heavy Industry Intelligent Technology Corporation Limited, Changsha 410005, China
| |
Collapse
|