1
|
Qi X, Sun M, Wang Z, Liu J, Li Q, Zhao F, Zhang S, Shan C. Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2182-2195. [PMID: 38113153 DOI: 10.1109/tnnls.2023.3341246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Biphasic face photo-sketch synthesis has significant practical value in wide-ranging fields such as digital entertainment and law enforcement. Previous approaches directly generate the photo-sketch in a global view, they always suffer from the low quality of sketches and complex photograph variations, leading to unnatural and low-fidelity results. In this article, we propose a novel semantic-driven generative adversarial network to address the above issues, cooperating with graph representation learning. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator to provide style-based spatial information for synthesized face photographs and sketches. In addition, to enhance the authenticity of details in generated faces, we construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the intraclass semantic graph (IASG) and the interclass structure graph (IRSG). Specifically, the IASG effectively models the intraclass semantic correlations of each facial semantic component, thus producing realistic facial details. To preserve the generated faces being more structure-coordinated, the IRSG models interclass structural relations among every facial component by graph representation learning. To further enhance the perceptual quality of synthesized images, we present a biphasic interactive cycle training strategy by fully taking advantage of the multilevel feature consistency between the photograph and sketch. Extensive experiments demonstrate that our method outperforms the state-of-the-art competitors on the CUHK Face Sketch (CUFS) and CUHK Face Sketch FERET (CUFSF) datasets.
Collapse
|
2
|
Tang D, Jiang X, Wang K, Guo W, Zhang J, Lin Y, Pu H. Toward identity preserving in face sketch-photo synthesis using a hybrid CNN-Mamba framework. Sci Rep 2024; 14:22495. [PMID: 39341858 PMCID: PMC11438986 DOI: 10.1038/s41598-024-72066-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 09/03/2024] [Indexed: 10/01/2024] Open
Abstract
The synthesis of facial sketch-photo has important applications in practical life, such as crime investigation. Many convolutional neural networks (CNNs) based methods have been proposed to address this issue. However, due to the substantial modal differences between sketch and photo, the CNN's insensitivity to global information, and insufficient utilization of hierarchical features, synthesized photos struggle to balance both identity preservation and image quality. Recently, State Space Sequence Models (SSMs) have achieved exciting results in computer vision (CV) tasks. Inspired by SSMs, we design a hybrid CNN-SSM model called FaceMamba for the Face Sketch-Photo Synthesis (FSPS) task. It includes an original Face Vision Mamba Attention for modeling in latent space using SSM. Additionally, it incorporates a general auxiliary method called Attention Feature Injection that combines encoding features, decoding features, and external auxiliary features using attention mechanisms. FaceMamba combines Mamba's modeling ability for long-range dependencies with CNN's powerful local feature extraction ability, and utilizes hierarchical features at the appropriate position. Adequate experimental and evaluation results reveal that FaceMamba has strong competitiveness in FSPS task, achieving the best balance between identity preservation and image quality.
Collapse
Affiliation(s)
- Duoxun Tang
- College of Science, Sichuan Agricultural University, Ya'an, 625000, China
| | - Xinhang Jiang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Kunpeng Wang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Weichen Guo
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Jingyuan Zhang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Ye Lin
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Haibo Pu
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China.
| |
Collapse
|
3
|
Melnik A, Miasayedzenkau M, Makaravets D, Pirshtuk D, Akbulut E, Holzmann D, Renusch T, Reichert G, Ritter H. Face Generation and Editing With StyleGAN: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3557-3576. [PMID: 38224501 DOI: 10.1109/tpami.2024.3350004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face stylization, face restoration, and even Deepfake applications. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
Collapse
|
4
|
Zhang M, Wu Q, Guo J, Li Y, Gao X. Heat Transfer-Inspired Network for Image Super-Resolution Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1810-1820. [PMID: 35776820 DOI: 10.1109/tnnls.2022.3185529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Image super-resolution (SR) is a critical image preprocessing task for many applications. How to recover features as accurately as possible is the focus of SR algorithms. Most existing SR methods tend to guide the image reconstruction process with gradient maps, frequency perception modules, etc. and improve the quality of recovered images from the perspective of enhancing edges, but rarely optimize the neural network structure from the system level. In this article, we conduct an in- depth exploration for the inner nature of the SR network structure. In light of the consistency between thermal particles in the thermal field and pixels in the image domain, we propose a novel heat-transfer-inspired network (HTI-Net) for image SR reconstruction based on the theoretical basis of heat transfer. With the finite difference theory, we use a second-order mixed-difference equation to redesign the residual network (ResNet), which can fully integrate multiple information to achieve better feature reuse. In addition, according to the thermal conduction differential equation (TCDE) in the thermal field, the pixel value flow equation (PVFE) in the image domain is derived to mine deep potential feature information. The experimental results on multiple standard databases demonstrate that the proposed HTI-Net has superior edge detail reconstruction effect and parameter performance compared with the existing SR methods. The experimental results on the microscope chip image (MCI) database consisting of realistic low-resolution (LR) and high-resolution (HR) images show that the proposed HTI-Net for image SR reconstruction can improve the effectiveness of the hardware Trojan detection system.
Collapse
|
5
|
Zhang M, Xin J, Zhang J, Tao D, Gao X. Curvature Consistent Network for Microscope Chip Image Super-Resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10538-10551. [PMID: 35482691 DOI: 10.1109/tnnls.2022.3168540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detecting hardware Trojan (HT) from a microscope chip image (MCI) is crucial for many applications, such as financial infrastructure and transport security. It takes an inordinate cost in scanning high-resolution (HR) microscope images for HT detection. It is useful when the chip image is in low-resolution (LR), which can be acquired faster and at a lower cost than its HR counterpart. However, the lost details and noises due to the electric charge effect in LR MCIs will affect the detection performance, making the problem more challenging. In this article, we address this issue by first discussing why recovering curvature information matters for HT detection and then proposing a novel MCI super-resolution (SR) method via a curvature consistent network (CCN). It consists of a homogeneous workflow and a heterogeneous workflow, where the former learns a mapping between homogeneous images, i.e., LR and HR MCIs, and the latter learns a mapping between heterogeneous images, i.e., MCIs and curvature images. Besides, a collaborative fusion strategy is used to leverage features learned from both workflows level-by-level by recovering the HR image eventually. To mitigate the issue of lacking an MCI dataset, we construct a new benchmark consisting of realistic MCIs at different resolutions, called MCI. Experiments on MCI demonstrate that the proposed CCN outperforms representative SR methods by recovering more delicate circuit lines and yields higher HT detection performance. The dataset is available at github.com/RuiZhang97/CCN.
Collapse
|
6
|
Zhang M, Wu Q, Zhang J, Gao X, Guo J, Tao D. Fluid Micelle Network for Image Super-Resolution Reconstruction. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:578-591. [PMID: 35442898 DOI: 10.1109/tcyb.2022.3163294] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Most existing convolutional neural-network-based super-resolution (SR) methods focus on designing effective neural blocks but rarely describe the image SR mechanism from the perspective of image evolution in the SR process. In this study, we explore a new research routine by abstracting the movement of pixels in the reconstruction process as the flow of fluid in the field of fluid dynamics (FD), where explicit motion laws of particles have been discovered. Specifically, a novel fluid micelle network is devised for image SR based on the theory of FD that follows the residual learning scheme but learns the residual structure by solving the finite difference equation in FD. The pixel motion equation in the SR process is derived from the Navier-Stokes (N-S) FD equation, establishing a guided branch that is aware of edge information. Thus, the second-order residual drives the network for feature extraction, and the guided branch corrects the direction of the pixel stream to supplement the details. Experiments on popular benchmarks and a real-world microscope chip image dataset demonstrate that the proposed method outperforms other modern methods in terms of both objective metrics and visual quality. The proposed method can also reconstruct clear geometric structures, offering the potential for real-world applications.
Collapse
|
7
|
Yu S, Han H, Shan S, Chen X. CMOS-GAN: Semi-Supervised Generative Adversarial Model for Cross-Modality Face Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:144-158. [PMID: 37015478 DOI: 10.1109/tip.2022.3226413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Cross-modality face image synthesis such as sketch-to-photo, NIR-to-RGB, and RGB-to-depth has wide applications in face recognition, face animation, and digital entertainment. Conventional cross-modality synthesis methods usually require paired training data, i.e., each subject has images of both modalities. However, paired data can be difficult to acquire, while unpaired data commonly exist. In this paper, we propose a novel semi-supervised cross-modality synthesis method (namely CMOS-GAN), which can leverage both paired and unpaired face images to learn a robust cross-modality synthesis model. Specifically, CMOS-GAN uses a generator of encoder-decoder architecture for new modality synthesis. We leverage pixel-wise loss, adversarial loss, classification loss, and face feature loss to exploit the information from both paired multi-modality face images and unpaired face images for model learning. In addition, since we expect the synthetic new modality can also be helpful for improving face recognition accuracy, we further use a modified triplet loss to retain the discriminative features of the subject in the synthetic modality. Experiments on three cross-modality face synthesis tasks (NIR-to-VIS, RGB-to-depth, and sketch-to-photo) show the effectiveness of the proposed approach compared with the state-of-the-art. In addition, we also collect a large-scale RGB-D dataset (VIPL-MumoFace-3K) for the RGB-to-depth synthesis task. We plan to open-source our code and VIPL-MumoFace-3K dataset to the community (https://github.com/skgyu/CMOS-GAN).
Collapse
|
8
|
FRAN: feature-filtered residual attention network for realistic face sketch-to-photo transformation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04352-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
9
|
Li P, Sheng B, Chen CLP. Face Sketch Synthesis Using Regularized Broad Learning System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5346-5360. [PMID: 33852397 DOI: 10.1109/tnnls.2021.3070463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
There are two main categories of face sketch synthesis: data- and model-driven. The data-driven method synthesizes sketches from training photograph-sketch patches at the cost of detail loss. The model-driven method can preserve more details, but the mapping from photographs to sketches is a time-consuming training process, especially when the deep structures require to be refined. We propose a face sketch synthesis method via regularized broad learning system (RBLS). The broad learning-based system directly transforms photographs into sketches with rich details preserved. Also, the incremental learning scheme of broad learning system (BLS) ensures that our method easily increases feature mappings and remodels the network without retraining when the extracted feature mapping nodes are not sufficient. Besides, a Bayesian estimation-based regularization is introduced with the BLS to aid further feature selection and improve the generalization ability and robustness. Various experiments on the CUHK student data set and Aleix Robert (AR) data set demonstrated the effectiveness and efficiency of our RBLS method. Unlike existing methods, our method synthesizes high-quality face sketches much efficiently and greatly reduces computational complexity both in the training and test processes.
Collapse
|
10
|
Nie F, Xue J, Wu D, Wang R, Li H, Li X. Coordinate Descent Method for k-means. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2371-2385. [PMID: 34061737 DOI: 10.1109/tpami.2021.3085739] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
k-means method using Lloyd heuristic is a traditional clustering method which has played a key role in multiple downstream tasks of machine learning because of its simplicity. However, Lloyd heuristic always finds a bad local minimum, i.e., the bad local minimum makes objective function value not small enough, which limits the performance of k-means. In this paper, we use coordinate descent (CD) method to solve the problem. First, we show that the k-means minimization problem can be reformulated as a trace maximization problem, then a simple and efficient coordinate descent scheme is proposed to solve the maximization problem. Two interesting findings through theory are that Lloyd cannot decrease the objective function value of k-means produced by our CD further, and our proposed method CD to solve k-means problem can avoid produce empty clusters. In addition, according to the computational complexity analysis, it is verified CD has the same time complexity with original k-means method. Extensive experiments including statistical hypothesis testing, on several real-world datasets with varying number of clusters, varying number of samples and varying number of dimensions show that CD performs better compared to Lloyd, i.e., lower objective value, better local minimum and fewer iterations. And CD is more robust to initialization than Lloyd whether the initialization strategy is random or initialization of k-means++.
Collapse
|
11
|
Azhar I, Sharif M, Raza M, Khan MA, Yong HS. A Decision Support System for Face Sketch Synthesis Using Deep Learning and Artificial Intelligence. SENSORS (BASEL, SWITZERLAND) 2021; 21:8178. [PMID: 34960274 PMCID: PMC8708226 DOI: 10.3390/s21248178] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/29/2021] [Accepted: 11/30/2021] [Indexed: 12/02/2022]
Abstract
The recent development in the area of IoT technologies is likely to be implemented extensively in the next decade. There is a great increase in the crime rate, and the handling officers are responsible for dealing with a broad range of cyber and Internet issues during investigation. IoT technologies are helpful in the identification of suspects, and few technologies are available that use IoT and deep learning together for face sketch synthesis. Convolutional neural networks (CNNs) and other constructs of deep learning have become major tools in recent approaches. A new-found architecture of the neural network is anticipated in this work. It is called Spiral-Net, which is a modified version of U-Net fto perform face sketch synthesis (the phase is known as the compiler network C here). Spiral-Net performs in combination with a pre-trained Vgg-19 network called the feature extractor F. It first identifies the top n matches from viewed sketches to a given photo. F is again used to formulate a feature map based on the cosine distance of a candidate sketch formed by C from the top n matches. A customized CNN configuration (called the discriminator D) then computes loss functions based on differences between the candidate sketch and the feature. Values of these loss functions alternately update C and F. The ensemble of these nets is trained and tested on selected datasets, including CUFS, CUFSF, and a part of the IIT photo-sketch dataset. Results of this modified U-Net are acquired by the legacy NLDA (1998) scheme of face recognition and its newer version, OpenBR (2013), which demonstrate an improvement of 5% compared with the current state of the art in its relevant domain.
Collapse
Affiliation(s)
- Irfan Azhar
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt 47040, Pakistan; (I.A.); (M.R.)
| | - Muhammad Sharif
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt 47040, Pakistan; (I.A.); (M.R.)
| | - Mudassar Raza
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt 47040, Pakistan; (I.A.); (M.R.)
| | | | - Hwan-Seung Yong
- Department of Computer Science & Engineering, Ewha Womans University, Seoul 03760, Korea;
| |
Collapse
|
12
|
Radman A, Suandi SA. BiLSTM regression model for face sketch synthesis using sequential patterns. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05916-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Edge-Preserving Convolutional Generative Adversarial Networks for SAR-to-Optical Image Translation. REMOTE SENSING 2021. [DOI: 10.3390/rs13183575] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the ability for all-day, all-weather acquisition, synthetic aperture radar (SAR) remote sensing is an important technique in modern Earth observation. However, the interpretation of SAR images is a highly challenging task, even for well-trained experts, due to the imaging principle of SAR images and the high-frequency speckle noise. Some image-to-image translation methods are used to convert SAR images into optical images that are closer to what we perceive through our eyes. There exist two weaknesses in these methods: (1) these methods are not designed for an SAR-to-optical translation task, thereby losing sight of the complexity of SAR images and the speckle noise. (2) The same convolution filters in a standard convolution layer are utilized for the whole feature maps, which ignore the details of SAR images in each window and generate images with unsatisfactory quality. In this paper, we propose an edge-preserving convolutional generative adversarial network (EPCGAN) to enhance the structure and aesthetics of the output image by leveraging the edge information of the SAR image and implementing content-adaptive convolution. The proposed edge-preserving convolution (EPC) decomposes the content of the convolution input into texture components and content components and then generates a content-adaptive kernel to modify standard convolutional filter weights for the content components. Based on the EPC, the EPCGAN is presented for SAR-to-optical image translation. It uses a gradient branch to assist in the recovery of structural image information. Experiments on the SEN1-2 dataset demonstrated that the proposed method can outperform other SAR-to-optical methods by recovering more structures and yielding a superior evaluation index.
Collapse
|
14
|
Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q. Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4350-4362. [PMID: 32149668 DOI: 10.1109/tcyb.2020.2972944] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It covers wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this article, we propose using facial composition information to help the synthesis of face sketch/photo. Especially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs, including a face photo/sketch and the corresponding pixelwise face labels for generating a sketch/photo. Next, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. In addition, we employ a perceptual loss function to encourage the synthesized image and real image to be perceptually similar. Finally, we use stacked CA-GANs (SCA-GANs) to further rectify defects and add compelling details. The experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. In addition, our method significantly decreases the best previous Fréchet inception distance (FID) from 36.2 to 26.2 for sketch synthesis, and from 60.9 to 30.5 for photo synthesis. Besides, we demonstrate that the proposed method is of considerable generalization ability.
Collapse
|
15
|
Wan W, Yang Y, Lee HJ. Generative adversarial learning for detail-preserving face sketch synthesis. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Gao F, Xu X, Yu J, Shang M, Li X, Tao D. Complementary, Heterogeneous and Adversarial Networks for Image-to-Image Translation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3487-3498. [PMID: 33646952 DOI: 10.1109/tip.2021.3061286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Image-to-image translation is to transfer images from a source domain to a target domain. Conditional Generative Adversarial Networks (GANs) have enabled a variety of applications. Initial GANs typically conclude one single generator for generating a target image. Recently, using multiple generators has shown promising results in various tasks. However, generators in these works are typically of homogeneous architectures. In this paper, we argue that heterogeneous generators are complementary to each other and will benefit the generation of images. By heterogeneous, we mean that generators are of different architectures, focus on diverse positions, and perform over multiple scales. To this end, we build two generators by using a deep U-Net and a shallow residual network, respectively. The former concludes a series of down-sampling and up-sampling layers, which typically have large perception field and great spatial locality. In contrast, the residual network has small perceptual fields and works well in characterizing details, especially textures and local patterns. Afterwards, we use a gated fusion network to combine these two generators for producing a final output. The gated fusion unit automatically induces heterogeneous generators to focus on different positions and complement each other. Finally, we propose a novel approach to integrate multi-level and multi-scale features in the discriminator. This multi-layer integration discriminator encourages generators to produce realistic details from coarse to fine scales. We quantitatively and qualitatively evaluate our model on various benchmark datasets. Experimental results demonstrate that our method significantly improves the quality of transferred images, across a variety of image-to-image translation tasks. We have made our code and results publicly available: http://aiart.live/chan/.
Collapse
|
17
|
Yu A, Wu H, Huang H, Lei Z, He R. LAMP-HQ: A Large-Scale Multi-pose High-Quality Database and Benchmark for NIR-VIS Face Recognition. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01432-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Wang Y, Zhang Z, Hao W, Song C. Multi-Domain Image-to-Image Translation via a Unified Circular Framework. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:670-684. [PMID: 33201817 DOI: 10.1109/tip.2020.3037528] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The image-to-image translation aims to learn the corresponding information between the source and target domains. Several state-of-the-art works have made significant progress based on generative adversarial networks (GANs). However, most existing one-to-one translation methods ignore the correlations among different domain pairs. We argue that there is common information among different domain pairs and it is vital to multiple domain pairs translation. In this paper, we propose a unified circular framework for multiple domain pairs translation, leveraging a shared knowledge module across numerous domains. One selected translation pair can benefit from the complementary information from other pairs, and the sharing knowledge is conducive to mutual learning between domains. Moreover, absolute consistency loss is proposed and applied in the corresponding feature maps to ensure intra-domain consistency. Furthermore, our model can be trained in an end-to-end manner. Extensive experiments demonstrate the effectiveness of our approach on several complex translation scenarios, such as Thermal IR switching, weather changing, and semantic transfer tasks.
Collapse
|
19
|
Zhang M, Wang N, Li Y, Gao X. Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2623-2637. [PMID: 31494561 DOI: 10.1109/tnnls.2019.2933590] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural network learning for face sketch synthesis from photos has attracted substantial attention due to its favorable synthesis performance. However, most existing deep-learning-based face sketch synthesis models stacked only by multiple convolutional layers without structured regression often lose the common facial structures, limiting their flexibility in a wide range of practical applications, including intelligent security and digital entertainment. In this article, we introduce a neural network to a probabilistic graphical model and propose a novel face sketch synthesis framework based on the neural probabilistic graphical model (NPGM) composed of a specific structure and a common structure. In the specific structure, we investigate a neural network for mapping the direct relationship between training photos and sketches, yielding the specific information and characteristic features of a test photo. In the common structure, the fidelity between the sketch pixels generated by the specific structure and their candidates selected from the training data are considered, ensuring the preservation of the common facial structure. Experimental results on the Chinese University of Hong Kong face sketch database demonstrate, both qualitatively and quantitatively, that the proposed NPGM-based face sketch synthesis approach can more effectively capture specific features and recover common structures compared with the state-of-the-art methods. Extensive experiments in practical applications further illustrate that the proposed method achieves superior performance.
Collapse
|
20
|
Zhang M, Wang N, Li Y, Gao X. Bionic Face Sketch Generator. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2701-2714. [PMID: 31331901 DOI: 10.1109/tcyb.2019.2924589] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face sketch synthesis is a crucial technique in digital entertainment. However, the existing face sketch synthesis approaches usually generate face sketches with coarse structures. The fine details on some facial components fail to be generated. In this paper, inspired by the artists during drawing face sketches, we propose a bionic face sketch generator. It includes three parts: 1) a coarse part; 2) a fine part; and 3) a finer part. The coarse part builds the facial structure of a sketch by a generative adversarial network in the U-Net. In the middle part, the noise produced by the coarse part is erased and the fine details on the important face components are generated via a probabilistic graphic model. To compensate for the fine sketch with distinctive edge and area of shadows and lights, we learn a mapping relationship at the high-frequency band by a convolutional neural network in the finer part. The experimental results show that the proposed bionic face sketch generator can synthesize the face sketch with more delicate and striking details, satisfy the requirement of users in the digital entertainment, and provide the students with the coarse, fine, and finer face sketch copies when learning sketches. Compared with the state-of-the-art methods, the proposed approach achieves better results in both visual effects and quantitative metrics.
Collapse
|
21
|
He R, Cao J, Song L, Sun Z, Tan T. Adversarial Cross-Spectral Face Completion for NIR-VIS Face Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1025-1037. [PMID: 31880541 DOI: 10.1109/tpami.2019.2961900] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Near infrared-visible (NIR-VIS) heterogeneous face recognition refers to the process of matching NIR to VIS face images. Current heterogeneous methods try to extend VIS face recognition methods to the NIR spectrum by synthesizing VIS images from NIR images. However, due to the self-occlusion and sensing gap, NIR face images lose some visible lighting contents so that they are always incomplete compared to VIS face images. This paper models high-resolution heterogeneous face synthesis as a complementary combination of two components: a texture inpainting component and a pose correction component. The inpainting component synthesizes and inpaints VIS image textures from NIR image textures. The correction component maps any pose in NIR images to a frontal pose in VIS images, resulting in paired NIR and VIS textures. A warping procedure is developed to integrate the two components into an end-to-end deep network. A fine-grained discriminator and a wavelet-based discriminator are designed to improve visual quality. A novel 3D-based pose correction loss, two adversarial losses, and a pixel loss are imposed to ensure synthesis results. We demonstrate that by attaching the correction component, we can simplify heterogeneous face synthesis from one-to-many unpaired image translation to one-to-one paired image translation, and minimize the spectral and pose discrepancy during heterogeneous recognition. Extensive experimental results show that our network not only generates high-resolution VIS face images but also facilitates the accuracy improvement of heterogeneous face recognition.
Collapse
|
22
|
Zhang M, Li Y, Wang N, Chi Y, Gao X. Cascaded Face Sketch Synthesis under Various Illuminations. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1507-1521. [PMID: 31562092 DOI: 10.1109/tip.2019.2942514] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face sketch synthesis from a photo is of significant importance in digital entertainment. An intelligent face sketch synthesis system requires a strong robustness to lighting variations. Under uncontrolled lighting conditions in real-world settings, such a system will perform consistently well and have little restriction on the lighting conditions. However, previous face sketch synthesis methods tend to synthesize sketches under well-controlled lighting conditions. These methods are sensitive to lighting variations and produce unsatisfactory results when the lighting condition varies. In this paper, we propose a novel cascaded face sketch synthesis framework composed of a multiple feature generator and a cascaded low-rank representation. The multiple feature generator not only produces a generated sketch feature consistent with an artist's drawing style but also extracts a photo feature that is robust to various illuminations. Both features ensure that given a photo patch, the optimal sketch candidates can be selected from the database. The cascaded low-rank representation enables a gradual reduction in the gap between the synthesized face sketch and the corresponding artistdrawn sketch. Experimental results illustrate that the proposed cascaded framework generates realistic sketches on par with the current methods on the Chinese University of Hong Kong face sketch database under well-controlled illuminations. Moreover, this framework exhibits greatly improved performance compared to these methods on the extended Chinese University of Hong Kong face sketch database and Chinese celebrity face photos from the web under different illuminations. We argue that this framework paves a novel way for the implementation of computer-aided optical systems that are of essential importance in both face sketch synthesis and optical imaging.
Collapse
|