1
|
Tang D, Jiang X, Wang K, Guo W, Zhang J, Lin Y, Pu H. Toward identity preserving in face sketch-photo synthesis using a hybrid CNN-Mamba framework. Sci Rep 2024; 14:22495. [PMID: 39341858 PMCID: PMC11438986 DOI: 10.1038/s41598-024-72066-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 09/03/2024] [Indexed: 10/01/2024] Open
Abstract
The synthesis of facial sketch-photo has important applications in practical life, such as crime investigation. Many convolutional neural networks (CNNs) based methods have been proposed to address this issue. However, due to the substantial modal differences between sketch and photo, the CNN's insensitivity to global information, and insufficient utilization of hierarchical features, synthesized photos struggle to balance both identity preservation and image quality. Recently, State Space Sequence Models (SSMs) have achieved exciting results in computer vision (CV) tasks. Inspired by SSMs, we design a hybrid CNN-SSM model called FaceMamba for the Face Sketch-Photo Synthesis (FSPS) task. It includes an original Face Vision Mamba Attention for modeling in latent space using SSM. Additionally, it incorporates a general auxiliary method called Attention Feature Injection that combines encoding features, decoding features, and external auxiliary features using attention mechanisms. FaceMamba combines Mamba's modeling ability for long-range dependencies with CNN's powerful local feature extraction ability, and utilizes hierarchical features at the appropriate position. Adequate experimental and evaluation results reveal that FaceMamba has strong competitiveness in FSPS task, achieving the best balance between identity preservation and image quality.
Collapse
Affiliation(s)
- Duoxun Tang
- College of Science, Sichuan Agricultural University, Ya'an, 625000, China
| | - Xinhang Jiang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Kunpeng Wang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Weichen Guo
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Jingyuan Zhang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Ye Lin
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China
| | - Haibo Pu
- College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China.
| |
Collapse
|
2
|
Kong X, Deng Y, Tang F, Dong W, Ma C, Chen Y, He Z, Xu C. Exploring the Temporal Consistency of Arbitrary Style Transfer: A Channelwise Perspective. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8482-8496. [PMID: 37018565 DOI: 10.1109/tnnls.2022.3230084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Arbitrary image stylization by neural networks has become a popular topic, and video stylization is attracting more attention as an extension of image stylization. However, when image stylization methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted a detailed and comprehensive analysis of the cause of such flickering effects. Systematic comparisons among typical neural style transfer approaches show that the feature migration modules for state-of-the-art (SOTA) learning systems are ill-conditioned and could lead to a channelwise misalignment between the input content representations and the generated frames. Unlike traditional methods that relieve the misalignment via additional optical flow constraints or regularization modules, we focus on keeping the temporal consistency by aligning each output frame with the input frame. To this end, we propose a simple yet efficient multichannel correlation network (MCCNet), to ensure that output frames are directly aligned with inputs in the hidden feature space while maintaining the desired style patterns. An inner channel similarity loss is adopted to eliminate side effects caused by the absence of nonlinear operations such as softmax for strict alignment. Furthermore, to improve the performance of MCCNet under complex light conditions, we introduce an illumination loss during training. Qualitative and quantitative evaluations demonstrate that MCCNet performs well in arbitrary video and image style transfer tasks. Code is available at https://github.com/kongxiuxiu/MCCNetV2.
Collapse
|
3
|
Nandanwar L, Shivakumara P, Jalab HA, Ibrahim RW, Raghavendra R, Pal U, Lu T, Blumenstein M. A Conformable Moments-Based Deep Learning System for Forged Handwriting Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5407-5420. [PMID: 36129871 DOI: 10.1109/tnnls.2022.3204390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Detecting forged handwriting is important in a wide variety of machine learning applications, and it is challenging when the input images are degraded with noise and blur. This article presents a new model based on conformable moments (CMs) and deep ensemble neural networks (DENNs) for forged handwriting detection in noisy and blurry environments. Since CMs involve fractional calculus with the ability to model nonlinearities and geometrical moments as well as preserving spatial relationships between pixels, fine details in images are preserved. This motivates us to introduce a DENN classifier, which integrates stenographic kernels and spatial features to classify input images as normal (original, clean images), altered (handwriting changed through copy-paste and insertion operations), noisy (added noise to original image), blurred (added blur to original image), altered-noise (noise is added to the altered image), and altered-blurred (blur is added to the altered image). To evaluate our model, we use a newly introduced dataset, which comprises handwritten words altered at the character level, as well as several standard datasets, namely ACPR 2019, ICPR 2018-FDC, and the IMEI dataset. The first two of these datasets include handwriting samples that are altered at the character and word levels, and the third dataset comprises forged International Mobile Equipment Identity (IMEI) numbers. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of classification rate.
Collapse
|
4
|
Ma Z, Lin T, Li X, Li F, He D, Ding E, Wang N, Gao X. Dual-Affinity Style Embedding Network for Semantic-Aligned Image Style Transfer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7404-7417. [PMID: 35108207 DOI: 10.1109/tnnls.2022.3143356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Image style transfer aims at synthesizing an image with the content from one image and the style from another. User studies have revealed that the semantic correspondence between style and content greatly affects subjective perception of style transfer results. While current studies have made great progress in improving the visual quality of stylized images, most methods directly transfer global style statistics without considering semantic alignment. Current semantic style transfer approaches still work in an iterative optimization fashion, which is impractically computationally expensive. Addressing these issues, we introduce a novel dual-affinity style embedding network (DaseNet) to synthesize images with style aligned at semantic region granularity. In the dual-affinity module, feature correlation and semantic correspondence between content and style images are modeled jointly for embedding local style patterns according to semantic distribution. Furthermore, the semantic-weighted style loss and the region-consistency loss are introduced to ensure semantic alignment and content preservation. With the end-to-end network architecture, DaseNet can well balance visual quality and inference efficiency for semantic style transfer. Experimental results on different scene categories have demonstrated the effectiveness of the proposed method.
Collapse
|
5
|
Kong F, Pu Y, Lee I, Nie R, Zhao Z, Xu D, Qian W, Liang H. Unpaired Artistic Portrait Style Transfer via Asymmetric Double-Stream GAN. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5427-5439. [PMID: 37459266 DOI: 10.1109/tnnls.2023.3263846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
With the development of image style transfer technologies, portrait style transfer has attracted growing attention in this research community. In this article, we present an asymmetric double-stream generative adversarial network (ADS-GAN) to solve the problems that caused by cartoonization and other style transfer techniques when they are applied to portrait photos, such as facial deformation, contours missing, and stiff lines. By observing the characteristics between source and target images, we propose an edge contour retention (ECR) regularized loss to constrain the local and global contours of generated portrait images to avoid the portrait deformation. In addition, a content-style feature fusion module is introduced for further learning of the target image style, which uses a style attention mechanism to integrate features and embeds style features into content features of portrait photos according to the attention weights. Finally, a guided filter is introduced in content encoder to smooth the textures and specific details of source image, thereby eliminating its negative impact on style transfer. We conducted overall unified optimization training on all components and got an ADS-GAN for unpaired artistic portrait style transfer. Qualitative comparisons and quantitative analyses demonstrate that the proposed method generates superior results than benchmark work in preserving the overall structure and contours of portrait; ablation and parameter study demonstrate the effectiveness of each component in our framework.
Collapse
|
6
|
Zhang S, Ye S. Backdoor Attack against Face Sketch Synthesis. ENTROPY (BASEL, SWITZERLAND) 2023; 25:974. [PMID: 37509921 PMCID: PMC10378581 DOI: 10.3390/e25070974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 06/14/2023] [Accepted: 06/21/2023] [Indexed: 07/30/2023]
Abstract
Deep neural networks (DNNs) are easily exposed to backdoor threats when training with poisoned training samples. Models using backdoor attack have normal performance for benign samples, and possess poor performance for poisoned samples manipulated with pre-defined trigger patterns. Currently, research on backdoor attacks focuses on image classification and object detection. In this article, we investigated backdoor attacks in facial sketch synthesis, which can be beneficial for many applications, such as animation production and assisting police in searching for suspects. Specifically, we propose a simple yet effective poison-only backdoor attack suitable for generation tasks. We demonstrate that when the backdoor is integrated into the target model via our attack, it can mislead the model to synthesize unacceptable sketches of any photos stamped with the trigger patterns. Extensive experiments are executed on the benchmark datasets. Specifically, the light strokes devised by our backdoor attack strategy can significantly decrease the perceptual quality. However, the FSIM score of light strokes is 68.21% on the CUFS dataset and the FSIM scores of pseudo-sketches generated by FCN, cGAN, and MDAL are 69.35%, 71.53%, and 72.75%, respectively. There is no big difference, which proves the effectiveness of the proposed backdoor attack method.
Collapse
Affiliation(s)
- Shengchuan Zhang
- Department of Artificial lntelligence, School of Intormatics, Xiamen University, Xiamen 361005, China
| | - Suhang Ye
- Department of Artificial lntelligence, School of Intormatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
7
|
Peng C, Zhang C, Liu D, Wang N, Gao X. Face photo–sketch synthesis via intra-domain enhancement. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2022.110026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
8
|
Yu W, Zhu M, Wang N, Wang X, Gao X. An Efficient Transformer Based on Global and Local Self-Attention for Face Photo-Sketch Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:483-495. [PMID: 37015434 DOI: 10.1109/tip.2022.3229614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Face photo-sketch synthesis tasks have been dominated by convolutional neural networks (CNNs), especially CNN-based generative adversarial networks (GANs), because of their strong texture modeling capabilities and thus their ability to generate more realistic face photos/sketches beyond traditional methods. However, due to CNNs' locality and spatial invariance properties, there have weaknesses in capturing the global and structural information which are extremely important for face images. Inspired by the recent phenomenal success of the Transformer in vision tasks, we propose replacing CNNs with Transformers that are able to model long-range dependencies to synthesize more structured and realistic face images. However, the existing vision Transformers are mainly designed for high-level vision tasks and lack the dense prediction ability to generate high resolution images due to the quadratic computational complexity of their self-attention mechanism. In addition, the original Transformer is not capable of modeling local correlations which is an important skill for image generation. To address these challenges, we propose two types of memory-friendly Transformer encoders, one for processing local correlations via local self-attention and another for modeling global information via global self-attention. By integrating the two proposed Transformer encoders, we present an efficient GL-Transformer for face photo-sketch synthesis, which can synthesize realistic face photo/sketch images from coarse to fine. Extensive experiments demonstrate that our model achieves a comparable or better performance beyond the state-of-the-art CNN-based methods both qualitatively and quantitatively.
Collapse
|
9
|
Sun J, Yu H, Zhang JJ, Dong J, Yu H, Zhong G. Face image-sketch synthesis via generative adversarial fusion. Neural Netw 2022; 154:179-189. [PMID: 35905652 DOI: 10.1016/j.neunet.2022.07.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 05/29/2022] [Accepted: 07/10/2022] [Indexed: 10/17/2022]
Abstract
Face image-sketch synthesis is widely applied in law enforcement and digital entertainment fields. Despite the extensive progression in face image-sketch synthesis, there are few methods focusing on generating a color face image from a sketch. The existing methods pay less attention to learning the illumination or highlight distribution on the face region. However, the illumination is the key factor that makes the generated color face image looks vivid and realistic. Moreover, existing methods tend to employ some image preprocessing technologies and facial region patching approaches to generate high-quality face images, which results in the high complexity and memory consumption in practice. In this paper, we propose a novel end-to-end generative adversarial fusion model, called GAF, which fuses two U-Net generators and a discriminator by jointly learning the content and adversarial loss functions. In particular, we propose a parametric tanh activation function to learn and control illumination highlight distribution over faces, which is integrated between the two U-Net generators by an illumination distribution layer. Additionally, we fuse the attention mechanism into the second U-Net generator of GAF to keep the identity consistency and refine the generated facial details. The qualitative and quantitative experiments on the public benchmark datasets show that the proposed GAF has better performance than existing image-sketch synthesis methods in synthesized face image quality (FSIM) and face recognition accuracy (NLDA). Meanwhile, the good generalization ability of GAF has also been verified. To further demonstrate the reliability and authenticity of face images generated using GAF, we use the generated face image to attack the well-known face recognition system. The result shows that the face images generated by GAF can maintain identity consistency and well maintain everyone's unique facial characteristics, which can be further used in the benchmark of facial spoofing. Moreover, the experiments are implemented to verify the effectiveness and rationality of the proposed parametric tanh activation function and attention mechanism in GAF.
Collapse
Affiliation(s)
- Jianyuan Sun
- Department of Computer Science and Technology, Qingdao University, Qingdao 266071, China; Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK.
| | - Hongchuan Yu
- National Centre for Computer Animation, Bournemouth University, Poole BH12 5BB, UK.
| | - Jian J Zhang
- National Centre for Computer Animation, Bournemouth University, Poole BH12 5BB, UK.
| | - Junyu Dong
- Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China.
| | - Hui Yu
- School of Creative Technologies, University of Portsmouth, Portsmouth PO1 2DJ, UK.
| | - Guoqiang Zhong
- Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China.
| |
Collapse
|
10
|
Nie L, Liu L, Wu Z, Kang W. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Fu C, Wu X, Hu Y, Huang H, He R. DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2938-2952. [PMID: 33460368 DOI: 10.1109/tpami.2021.3052549] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Heterogeneous face recognition (HFR) refers to matching cross-domain faces and plays a crucial role in public security. Nevertheless, HFR is confronted with challenges from large domain discrepancy and insufficient heterogeneous data. In this paper, we formulate HFR as a dual generation problem, and tackle it via a novel dual variational generation (DVG-Face) framework. Specifically, a dual variational generator is elaborately designed to learn the joint distribution of paired heterogeneous images. However, the small-scale paired heterogeneous training data may limit the identity diversity of sampling. In order to break through the limitation, we propose to integrate abundant identity information of large-scale visible data into the joint distribution. Furthermore, a pairwise identity preserving loss is imposed on the generated paired heterogeneous images to ensure their identity consistency. As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises. The identity consistency and identity diversity properties allow us to employ these generated images to train the HFR network via a contrastive learning mechanism, yielding both domain-invariant and discriminative embedding features. Concretely, the generated paired heterogeneous images are regarded as positive pairs, and the images obtained from different samplings are considered as negative pairs. Our method achieves superior performances over state-of-the-art methods on seven challenging databases belonging to five HFR tasks, including NIR-VIS, Sketch-Photo, Profile-Frontal Photo, Thermal-VIS, and ID-Camera.
Collapse
|
12
|
Zhu M, Li J, Wang N, Gao X. Knowledge Distillation for Face Photo-Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:893-906. [PMID: 33108298 DOI: 10.1109/tnnls.2020.3030536] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Significant progress has been made with face photo-sketch synthesis in recent years due to the development of deep convolutional neural networks, particularly generative adversarial networks (GANs). However, the performance of existing methods is still limited because of the lack of training data (photo-sketch pairs). To address this challenge, we investigate the effect of knowledge distillation (KD) on training neural networks for the face photo-sketch synthesis task and propose an effective KD model to improve the performance of synthetic images. In particular, we utilize a teacher network trained on a large amount of data in a related task to separately learn knowledge of the face photo and knowledge of the face sketch and simultaneously transfer this knowledge to two student networks designed for the face photo-sketch synthesis task. In addition to assimilating the knowledge from the teacher network, the two student networks can mutually transfer their own knowledge to further enhance their learning. To further enhance the perception quality of the synthetic image, we propose a KD+ model that combines GANs with KD. The generator can produce images with more realistic textures and less noise under the guide of knowledge. Extensive experiments and a user study demonstrate the superiority of our models over the state-of-the-art methods.
Collapse
|
13
|
Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q. Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4350-4362. [PMID: 32149668 DOI: 10.1109/tcyb.2020.2972944] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It covers wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this article, we propose using facial composition information to help the synthesis of face sketch/photo. Especially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs, including a face photo/sketch and the corresponding pixelwise face labels for generating a sketch/photo. Next, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. In addition, we employ a perceptual loss function to encourage the synthesized image and real image to be perceptually similar. Finally, we use stacked CA-GANs (SCA-GANs) to further rectify defects and add compelling details. The experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. In addition, our method significantly decreases the best previous Fréchet inception distance (FID) from 36.2 to 26.2 for sketch synthesis, and from 60.9 to 30.5 for photo synthesis. Besides, we demonstrate that the proposed method is of considerable generalization ability.
Collapse
|
14
|
Liu D, Gao X, Wang N, Li J, Peng C. Coupled Attribute Learning for Heterogeneous Face Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4699-4712. [PMID: 31940558 DOI: 10.1109/tnnls.2019.2957285] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Heterogeneous face recognition (HFR) is a challenging problem in face recognition and subject to large textural and spatial structure differences of face images. Different from conventional face recognition in homogeneous environments, there exist many face images taken from different sources (including different sensors or different mechanisms) in reality. In addition, limited training samples of cross-modality pairs make HFR more challenging due to the complex generation procedure of these images. Despite the great progress that has been achieved in recent years, existing works mainly focus on HFR from only cross-modality image matching. However, it is more practical to obtain both facial images and semantic descriptions about facial attributes in real-world situations, in which the semantic description clues are nearly always obtained during the process of image generation. Motivated by human cognitive mechanisms, we naturally utilize the explicit invariant semantic description, i.e., face attributes, to help address the gap among face images of different modalities. Existing facial attributes-related face recognition methods primarily regard attributes as the high-level features used to enhance recognition performance, ignoring the inherent relationship between face attributes and identities. In this article, we propose novel coupled attribute learning for the HFR (CAL-HFR) method without labeling the attributes manually. Deep convolutional networks are employed to directly map face images in heterogeneous scenarios to a compact common space where distances are taken as dissimilarities of pairs. Coupled attribute guided triplet loss (CAGTL) is designed to train an end-to-end HFR network that can effectively eliminate defects of incorrectly estimated attributes. Extensive experiments on multiple heterogeneous scenarios demonstrate that the proposed method achieves superior performance compared with that of state-of-the-art methods. Furthermore, we make publicly available our generated pairwise annotated heterogeneous facial attribute database for evaluation and promoting related research.
Collapse
|
15
|
Peng C, Wang N, Li J, Gao X. Universal Face Photo-Sketch Style Transfer via Multiview Domain Translation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8519-8534. [PMID: 32813659 DOI: 10.1109/tip.2020.3016502] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Face photo-sketch style transfer aims to convert a representation of a face from the photo (or sketch) domain to the sketch (respectively, photo) domain while preserving the character of the subject. It has wide-ranging applications in law enforcement, forensic investigation and digital entertainment. However, conventional face photo-sketch synthesis methods usually require training images from both the source domain and the target domain, and are limited in that they cannot be applied to universal conditions where collecting training images in the source domain that match the style of the test image is unpractical. This problem entails two major challenges: 1) designing an effective and robust domain translation model for the universal situation in which images of the source domain needed for training are unavailable, and 2) preserving the facial character while performing a transfer to the style of an entire image collection in the target domain. To this end, we present a novel universal face photo-sketch style transfer method that does not need any image from the source domain for training. The regression relationship between an input test image and the entire training image collection in the target domain is inferred via a deep domain translation framework, in which a domain-wise adaption term and a local consistency adaption term are developed. To improve the robustness of the style transfer process, we propose a multiview domain translation method that flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way. Qualitative and quantitative comparisons are provided for universal unconstrained conditions of unavailable training images from the source domain, demonstrating the effectiveness and superiority of our method for universal face photo-sketch style transfer.
Collapse
|
16
|
Zhang M, Wang N, Li Y, Gao X. Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2623-2637. [PMID: 31494561 DOI: 10.1109/tnnls.2019.2933590] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural network learning for face sketch synthesis from photos has attracted substantial attention due to its favorable synthesis performance. However, most existing deep-learning-based face sketch synthesis models stacked only by multiple convolutional layers without structured regression often lose the common facial structures, limiting their flexibility in a wide range of practical applications, including intelligent security and digital entertainment. In this article, we introduce a neural network to a probabilistic graphical model and propose a novel face sketch synthesis framework based on the neural probabilistic graphical model (NPGM) composed of a specific structure and a common structure. In the specific structure, we investigate a neural network for mapping the direct relationship between training photos and sketches, yielding the specific information and characteristic features of a test photo. In the common structure, the fidelity between the sketch pixels generated by the specific structure and their candidates selected from the training data are considered, ensuring the preservation of the common facial structure. Experimental results on the Chinese University of Hong Kong face sketch database demonstrate, both qualitatively and quantitatively, that the proposed NPGM-based face sketch synthesis approach can more effectively capture specific features and recover common structures compared with the state-of-the-art methods. Extensive experiments in practical applications further illustrate that the proposed method achieves superior performance.
Collapse
|
17
|
Ma Z, Li J, Wang N, Gao X. Image style transfer with collection representation space and semantic-guided reconstruction. Neural Netw 2020; 129:123-137. [PMID: 32512319 DOI: 10.1016/j.neunet.2020.05.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 05/06/2020] [Accepted: 05/24/2020] [Indexed: 10/24/2022]
Abstract
Image style transfer renders the content of an image into different styles. Current methods made decent progress with transferring the style of single image, however, visual statistics from one image cannot reflect the full scope of an artist. Also, previous work did not put content preservation in the important position, which would result in poor structure integrity, thus deteriorating the comprehensibility of generated image. These two problems would limit the visual quality improvement of style transfer results. Targeting at style resemblance and content preservation problems, we propose a style transfer system composed of collection representation space and semantic-guided reconstruction. We train an encoder-decoder network with art collections to construct a representation space that can reflect the style of the artist. Then, we use semantic information as guidance to reconstruct the target representation of the input image for better content preservation. We conduct both quantitative analysis and qualitative evaluation to assess the proposed method. Experiment results demonstrate that our approach well balanced the trade-off between capturing artistic characteristics and preserving content information in style transfer tasks.
Collapse
Affiliation(s)
- Zhuoqi Ma
- The State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi'an, 710071, PR China
| | - Jie Li
- The State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi'an, 710071, PR China
| | - Nannan Wang
- The State Key Laboratory of Integrated Services Networks, School of Telecommunication Engineering, Xidian University, Xi'an, 710071, PR China
| | - Xinbo Gao
- The State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi'an, 710071, PR China; The Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, PR China.
| |
Collapse
|
18
|
Zhang M, Wang N, Li Y, Gao X. Bionic Face Sketch Generator. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2701-2714. [PMID: 31331901 DOI: 10.1109/tcyb.2019.2924589] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face sketch synthesis is a crucial technique in digital entertainment. However, the existing face sketch synthesis approaches usually generate face sketches with coarse structures. The fine details on some facial components fail to be generated. In this paper, inspired by the artists during drawing face sketches, we propose a bionic face sketch generator. It includes three parts: 1) a coarse part; 2) a fine part; and 3) a finer part. The coarse part builds the facial structure of a sketch by a generative adversarial network in the U-Net. In the middle part, the noise produced by the coarse part is erased and the fine details on the important face components are generated via a probabilistic graphic model. To compensate for the fine sketch with distinctive edge and area of shadows and lights, we learn a mapping relationship at the high-frequency band by a convolutional neural network in the finer part. The experimental results show that the proposed bionic face sketch generator can synthesize the face sketch with more delicate and striking details, satisfy the requirement of users in the digital entertainment, and provide the students with the coarse, fine, and finer face sketch copies when learning sketches. Compared with the state-of-the-art methods, the proposed approach achieves better results in both visual effects and quantitative metrics.
Collapse
|
19
|
Panetta K, Wan Q, Agaian S, Rajeev S, Kamath S, Rajendran R, Rao SP, Kaszowska A, Taylor HA, Samani A, Yuan X. A Comprehensive Database for Benchmarking Imaging Systems. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:509-520. [PMID: 30507525 DOI: 10.1109/tpami.2018.2884458] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cross-modality face recognition is an emerging topic due to the wide-spread usage of different sensors in day-to-day life applications. The development of face recognition systems relies greatly on existing databases for evaluation and obtaining training examples for data-hungry machine learning algorithms. However, currently, there is no publicly available face database that includes more than two modalities for the same subject. In this work, we introduce the Tufts Face Database that includes images acquired in various modalities: photograph images, thermal images, near infrared images, a recorded video, a computerized facial sketch, and 3D images of each volunteer's face. An Institutional Research Board protocol was obtained and images were collected from students, staff, faculty, and their family members at Tufts University. The database includes over 10,000 images from 113 individuals from more than 15 different countries, various gender identities, ages, and ethnic backgrounds. The contributions of this work are: 1) Detailed description of the content and acquisition procedure for images in the Tufts Face Database; 2) The Tufts Face Database is publicly available to researchers worldwide, which will allow assessment and creation of more robust, consistent, and adaptable recognition algorithms; 3) A comprehensive, up-to-date review on face recognition systems and face datasets.
Collapse
|
20
|
Li X, Shen L, Shen M, Tan F, Qiu CS. Deep learning based early stage diabetic retinopathy detection using optical coherence tomography. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.079] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
21
|
|
22
|
Zhao ZQ, Zheng P, Xu ST, Wu X. Object Detection With Deep Learning: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3212-3232. [PMID: 30703038 DOI: 10.1109/tnnls.2018.2876865] [Citation(s) in RCA: 828] [Impact Index Per Article: 138.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.
Collapse
|
23
|
Zhu M, Li J, Wang N, Gao X. A Deep Collaborative Framework for Face Photo-Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3096-3108. [PMID: 30676981 DOI: 10.1109/tnnls.2018.2890018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Great breakthroughs have been made in the accuracy and speed of face photo-sketch synthesis in recent years. Regression-based methods have gained increasing attention, which benefit from deeper and faster end-to-end convolutional neural networks. However, most of these models typically formulate the mapping from photo domain X to sketch domain Y as a unidirectional feedforward mapping, G: X → Y , and vice versa, F: Y → X ; thus, the utilization of mutual interaction between two opposite mappings is lacking. Therefore, we proposed a collaborative framework for face photo-sketch synthesis. The concept behind our model was that a middle latent domain ~Z between the photo domain X and the sketch domain Y can be learned during the learning procedure of G: X → Y and F: Y → X by introducing a collaborative loss that makes full use of two opposite mappings. This strategy can constrain the two opposite mappings and make them more symmetrical, thus making the network more suitable for the photo-sketch synthesis task and obtaining higher quality generated images. Qualitative and quantitative experiments demonstrated the superior performance of our model in comparison with the existing state-of-the-art solutions.
Collapse
|
24
|
Zhang M, Wang N, Li Y, Gao X. Deep Latent Low-Rank Representation for Face Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3109-3123. [PMID: 30676980 DOI: 10.1109/tnnls.2018.2890017] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Face sketch synthesis is useful and profitable in digital entertainment. Most existing face sketch synthesis methods rely on the assumption that facial photographs/sketches form a low-dimensional manifold. Once the training data are insufficient, the manifold could not characterize the identity-specific information that is included in a test photograph but excluded in the training data. Thus, the synthesized sketch would lose this information, such as glasses, earrings, hairstyles, and hairpins. To provide the sufficient data and satisfy the assumption on manifold, we propose a novel face sketch synthesis framework based on deep latent low-rank representation (DLLRR) in this paper. The DLLRR induces the hidden training sketches with the identity-specific information as the hidden data to the insufficient original training sketches as the observed data. And it searches the lowest rank representation on the candidates of a test photograph from the both hidden and observed data. For the strong representational capability of the coupled autoencoder, we leverage it to reveal the hidden data. Experiment results on face photograph-sketch database illustrate that the proposed method can successfully provide the sufficient training data with the identity-specific information. And compared to the state of the arts, the proposed method synthesizes more clean and vivid face sketches.
Collapse
|
25
|
|
26
|
|
27
|
Cao B, Wang N, Li J, Gao X. Data Augmentation-Based Joint Learning for Heterogeneous Face Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1731-1743. [PMID: 30369451 DOI: 10.1109/tnnls.2018.2872675] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Heterogeneous face recognition (HFR) is the process of matching face images captured from different sources. HFR plays an important role in security scenarios. However, HFR remains a challenging problem due to the considerable discrepancies (i.e., shape, style, and color) between cross-modality images. Conventional HFR methods utilize only the information involved in heterogeneous face images, which is not effective because of the substantial differences between heterogeneous face images. To better address this issue, this paper proposes a data augmentation-based joint learning (DA-JL) approach. The proposed method mutually transforms the cross-modality differences by incorporating synthesized images into the learning process. The aggregated data augments the intraclass scale, which provides more discriminative information. However, this method also reduces the interclass diversity (i.e., discriminative information). We develop the DA-JL model to balance this dilemma. Finally, we obtain the similarity score between heterogeneous face image pairs through the log-likelihood ratio. Extensive experiments on a viewed sketch database, forensic sketch database, near-infrared image database, thermal-infrared image database, low-resolution photo database, and image with occlusion database illustrate that the proposed method achieves superior performance in comparison with the state-of-the-art methods.
Collapse
|
28
|
Zhang S, Ji R, Hu J, Lu X, Li X. Face Sketch Synthesis by Multidomain Adversarial Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1419-1428. [PMID: 30281495 DOI: 10.1109/tnnls.2018.2869574] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Given a training set of face photo-sketch pairs, face sketch synthesis targets at learning a mapping from the photo domain to the sketch domain. Despite the exciting progresses made in the literature, it retains as an open problem to synthesize high-quality sketches against blurs and deformations. Recent advances in generative adversarial training provide a new insight into face sketch synthesis, from which perspective the existing synthesis pipelines can be fundamentally revisited. In this paper, we present a novel face sketch synthesis method by multidomain adversarial learning (termed MDAL), which overcomes the defects of blurs and deformations toward high-quality synthesis. The principle of our scheme relies on the concept of "interpretation through synthesis." In particular, we first interpret face photographs in the photodomain and face sketches in the sketch domain by reconstructing themselves respectively via adversarial learning. We define the intermediate products in the reconstruction process as latent variables, which form a latent domain. Second, via adversarial learning, we make the distributions of latent variables being indistinguishable between the reconstruction process of the face photograph and that of the face sketch. Finally, given an input face photograph, the latent variable obtained by reconstructing this face photograph is applied for synthesizing the corresponding sketch. Quantitative comparisons to the state-of-the-art methods demonstrate the superiority of the proposed MDAL method.
Collapse
|
29
|
Abstract
The exemplar-based method is most frequently used in face sketch synthesis because of its efficiency in representing the nonlinear mapping between face photos and sketches. However, the sketches synthesized by existing exemplar-based methods suffer from block artifacts and blur effects. In addition, most exemplar-based methods ignore the training sketches in the weight representation process. To improve synthesis performance, a novel joint training model is proposed in this paper, taking sketches into consideration. First, we construct the joint training photo and sketch by concatenating the original photo and its sketch with a high-pass filtered image of their corresponding sketch. Then, an offline random sampling strategy is adopted for each test photo patch to select the joint training photo and sketch patches in the neighboring region. Finally, a novel locality constraint is designed to calculate the reconstruction weight, allowing the synthesized sketches to have more detailed information. Extensive experimental results on public datasets show the superiority of the proposed joint training model, both from subjective perceptual and the FaceNet-based face recognition objective evaluation, compared to existing state-of-the-art sketch synthesis methods.
Collapse
|
30
|
Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01175-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
31
|
Zhang M, Wang R, Gao X, Li J, Tao D. Dual-Transfer Face Sketch-Photo Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:642-657. [PMID: 30222563 DOI: 10.1109/tip.2018.2869688] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recognizing the identity of a sketched face from a face photograph dataset is a critical yet challenging task in many applications, not least law enforcement and criminal investigations. An intelligent sketched face identification system would rely on automatic face sketch synthesis from photographs, thereby avoiding the cost of artists manually drawing sketches. However, conventional face sketch-photo synthesis methods tend to generate sketches that are consistent with the artists'drawing styles. Identity-specific information is often overlooked, leading to unsatisfactory identity verification and recognition performance. In this paper, we discuss the reasons why conventional methods fail to recover identity-specific information. Then, we propose a novel dual-transfer face sketch-photo synthesis framework composed of an inter-domain transfer process and an intra-domain transfer process. In the inter-domain transfer, a regressor of the test photograph with respect to the training photographs is learned and transferred to the sketch domain, ensuring the recovery of common facial structures during synthesis. In the intra-domain transfer, a mapping characterizing the relationship between photographs and sketches is learned and transferred across different identities, such that the loss of identity-specific information is suppressed during synthesis. The fusion of information recovered by the two processes is straightforward by virtue of an ad hoc information splitting strategy. We employ both linear and nonlinear formulations to instantiate the proposed framework. Experiments on The Chinese University of Hong Kong face sketch database demonstrate that compared to the current state-of-the-art the proposed framework produces more identifiable facial structures and yields higher face recognition performance in both the photo and sketch domains.
Collapse
|
32
|
Markov random fields and facial landmarks for handling uncontrolled images of face sketch synthesis. Pattern Anal Appl 2018. [DOI: 10.1007/s10044-018-0755-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
|
34
|
Shao M, Zhang Y, Fu Y. Collaborative Random Faces-Guided Encoders for Pose-Invariant Face Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1019-1032. [PMID: 28166506 DOI: 10.1109/tnnls.2017.2648122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Learning discriminant face representation for pose-invariant face recognition has been identified as a critical issue in visual learning systems. The challenge lies in the drastic changes of facial appearances between the test face and the registered face. To that end, we propose a high-level feature learning framework called "collaborative random faces (RFs)-guided encoders" toward this problem. The contributions of this paper are three fold. First, we propose a novel supervised autoencoder that is able to capture the high-level identity feature despite of pose variations. Second, we enrich the identity features by replacing the target values of conventional autoencoders with random signals (RFs in this paper), which are unique for each subject under different poses. Third, we further improve the performance of the framework by incorporating deep convolutional neural network facial descriptors and linking discriminative identity features from different RFs for the augmented identity features. Finally, we conduct face identification experiments on Multi-PIE database, and face verification experiments on labeled faces in the wild and YouTube Face databases, where face recognition rate and verification accuracy with Receiver Operating Characteristic curves are rendered. In addition, discussions of model parameters and connections with the existing methods are provided. These experiments demonstrate that our learning system works fairly well on handling pose variations.
Collapse
|
35
|
Peng J, Luo P, Guan Z, Fan J. Graph-regularized multi-view semantic subspace learning. INT J MACH LEARN CYB 2017. [DOI: 10.1007/s13042-017-0766-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
36
|
Yao S, Chen Z, Jia Y, Liu C. Cascade heterogeneous face sketch-photo synthesis via dual-scale Markov Network. J EXP THEOR ARTIF IN 2017. [DOI: 10.1080/0952813x.2017.1409286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Saisai Yao
- School of Control Science & Engineering, Shandong University, Jinan, China
| | - Zhenxue Chen
- School of Control Science & Engineering, Shandong University, Jinan, China
| | - Yunyi Jia
- Department of Automotive Engineering, Clemson University, Greenville, SC, USA
| | - Chengyun Liu
- School of Control Science & Engineering, Shandong University, Jinan, China
| |
Collapse
|
37
|
|
38
|
Wang N, Gao X, Sun L, Li J. Bayesian Face Sketch Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:1264-1274. [PMID: 28092542 DOI: 10.1109/tip.2017.2651375] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Exemplar-based face sketch synthesis has been widely applied to both digital entertainment and law enforcement. In this paper, we propose a Bayesian framework for face sketch synthesis, which provides a systematic interpretation for understanding the common properties and intrinsic difference in different methods from the perspective of probabilistic graphical models. The proposed Bayesian framework consists of two parts: the neighbor selection model and the weight computation model. Within the proposed framework, we further propose a Bayesian face sketch synthesis method. The essential rationale behind the proposed Bayesian method is that we take the spatial neighboring constraint between adjacent image patches into consideration for both aforementioned models, while the state-of-the-art methods neglect the constraint either in the neighbor selection model or in the weight computation model. Extensive experiments on the Chinese University of Hong Kong face sketch database demonstrate that the proposed Bayesian method could achieve superior performance compared with the state-of-the-art methods in terms of both subjective perceptions and objective evaluations.
Collapse
|
39
|
Zhang D, Lin L, Chen T, Wu X, Tan W, Izquierdo E. Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:328-339. [PMID: 27831874 DOI: 10.1109/tip.2016.2623485] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained convolutional neural network (CNN). Then, we utilize a branched fully CNN for learning structural and textural representations, respectively. In addition, we design a sorted matching mean square error metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across data set without additional training.
Collapse
|