1
|
Chen Y, Zhao Y, Cao L, Jia W, Liu X. Learning Deep Blind Quality Assessment for Cartoon Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6650-6655. [PMID: 34847046 DOI: 10.1109/tnnls.2021.3127720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Although the cartoon industry has developed rapidly in recent years, few studies pay special attention to cartoon image quality assessment (IQA). Unfortunately, applying blind natural IQA algorithms directly to cartoons often leads to inconsistent results with subjective visual perception. Hence, this brief proposes a blind cartoon IQA method based on convolutional neural networks (CNNs). Note that training a robust CNN depends on manually labeled training sets. However, for a large number of cartoon images, it is very time-consuming and costly to manually generate enough mean opinion scores (MOSs). Therefore, this brief first proposes a full reference (FR) cartoon IQA metric based on cartoon-texture decomposition and then uses the estimated FR index to guide the no-reference IQA network. Moreover, in order to improve the robustness of the proposed network, a large-scale dataset is established in the training stage, and a stochastic degradation strategy is presented, which randomly implements different degradations with random parameters. Experimental results on both synthetic and real-world cartoon image datasets demonstrate the effectiveness and robustness of the proposed method.
Collapse
|
2
|
Yan J, Zhang K, Luo S, Xu J, Lu J, Xiong Z. Learning graph-constrained cascade regressors for single image super-resolution. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02904-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
3
|
Abstract
AbstractRanking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, chemistry, credit risk screening, image ranking or media memorability. While there already exist reviews concentrating on specific types of ranking problems like label and object ranking problems, there does not yet seem to exist an overview concentrating on instance ranking problems that both includes developments in distinguishing between different types of instance ranking problems as well as careful discussions about their differences and the applicability of the existing ranking algorithms to them. In instance ranking, one explicitly takes the responses into account with the goal to infer a scoring function which directly maps feature vectors to real-valued ranking scores, in contrast to object ranking problems where the ranks are given as preference information with the goal to learn a permutation. In this article, we systematically review different types of instance ranking problems and the corresponding loss functions resp. goodness criteria. We discuss the difficulties when trying to optimize those criteria. As for a detailed and comprehensive overview of existing machine learning techniques to solve such ranking problems, we systematize existing techniques and recapitulate the corresponding optimization problems in a unified notation. We also discuss to which of the instance ranking problems the respective algorithms are tailored and identify their strengths and limitations. Computational aspects and open research problems are also considered.
Collapse
|
4
|
Wu L, Zhang X, Chen H, Wang D, Deng J. VP-NIQE: An opinion-unaware visual perception natural image quality evaluator. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
5
|
Abstract
Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their quality as perceived by human observers. IQA modeling plays a special bridging role between vision science and engineering practice, both as a test-bed for vision theories and computational biovision models and as a powerful tool that could potentially have a profound impact on a broad range of image processing, computer vision, and computer graphics applications for design, optimization, and evaluation purposes. The growth of IQA research has accelerated over the past two decades. In this review, we present an overview of IQA methods from a Bayesian perspective, with the goals of unifying a wide spectrum of IQA approaches under a common framework and providing useful references to fundamental concepts accessible to vision scientists and image processing practitioners. We discuss the implications of the successes and limitations of modern IQA methods for biological vision and the prospect for vision science to inform the design of future artificial vision systems. (The detailed model taxonomy can be found at http://ivc.uwaterloo.ca/research/bayesianIQA/.) Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Zhengfang Duanmu
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; , , ,
| | - Wentao Liu
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; , , ,
| | - Zhongling Wang
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; , , ,
| | - Zhou Wang
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; , , ,
| |
Collapse
|
6
|
Li N, Chen Z. Toward Visual Distortion in Black-Box Attacks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6156-6167. [PMID: 34214038 DOI: 10.1109/tip.2021.3092822] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Constructing adversarial examples in a black-box threat model injures the original images by introducing visual distortion. In this paper, we propose a novel black-box attack approach that can directly minimize the induced distortion by learning the noise distribution of the adversarial example, assuming only loss-oracle access to the black-box network. To quantify visual distortion, the perceptual distance between the adversarial example and the original image, is introduced in our loss. We first approximate the gradient of the corresponding non-differentiable loss function by sampling noise from the learned noise distribution. Then the distribution is updated using the estimated gradient to reduce visual distortion. The learning continues until an adversarial example is found. We validate the effectiveness of our attack on ImageNet. Our attack results in much lower distortion when compared to the state-of-the-art black-box attacks and achieves 100% success rate on InceptionV3, ResNet50 and VGG16bn. Furthermore, we theoretically prove the convergence of our model. The code is publicly available at https://github.com/Alina-1997/visual-distortion-in-attack.
Collapse
|
7
|
Zhang W, Ma K, Zhai G, Yang X. Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3474-3486. [PMID: 33661733 DOI: 10.1109/tip.2021.3061932] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shift between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an approach of training it for both synthetic and realistic distortions. We first sample pairs of images from individual IQA databases, and compute a probability that the first image of each pair is of higher quality. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.
Collapse
|
8
|
Peng C, Wang N, Li J, Gao X. Universal Face Photo-Sketch Style Transfer via Multiview Domain Translation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8519-8534. [PMID: 32813659 DOI: 10.1109/tip.2020.3016502] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Face photo-sketch style transfer aims to convert a representation of a face from the photo (or sketch) domain to the sketch (respectively, photo) domain while preserving the character of the subject. It has wide-ranging applications in law enforcement, forensic investigation and digital entertainment. However, conventional face photo-sketch synthesis methods usually require training images from both the source domain and the target domain, and are limited in that they cannot be applied to universal conditions where collecting training images in the source domain that match the style of the test image is unpractical. This problem entails two major challenges: 1) designing an effective and robust domain translation model for the universal situation in which images of the source domain needed for training are unavailable, and 2) preserving the facial character while performing a transfer to the style of an entire image collection in the target domain. To this end, we present a novel universal face photo-sketch style transfer method that does not need any image from the source domain for training. The regression relationship between an input test image and the entire training image collection in the target domain is inferred via a deep domain translation framework, in which a domain-wise adaption term and a local consistency adaption term are developed. To improve the robustness of the style transfer process, we propose a multiview domain translation method that flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way. Qualitative and quantitative comparisons are provided for universal unconstrained conditions of unavailable training images from the source domain, demonstrating the effectiveness and superiority of our method for universal face photo-sketch style transfer.
Collapse
|
9
|
Multi-granularity generative adversarial nets with reconstructive sampling for image inpainting. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
|
11
|
Zhang M, Wang N, Li Y, Gao X. Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2623-2637. [PMID: 31494561 DOI: 10.1109/tnnls.2019.2933590] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural network learning for face sketch synthesis from photos has attracted substantial attention due to its favorable synthesis performance. However, most existing deep-learning-based face sketch synthesis models stacked only by multiple convolutional layers without structured regression often lose the common facial structures, limiting their flexibility in a wide range of practical applications, including intelligent security and digital entertainment. In this article, we introduce a neural network to a probabilistic graphical model and propose a novel face sketch synthesis framework based on the neural probabilistic graphical model (NPGM) composed of a specific structure and a common structure. In the specific structure, we investigate a neural network for mapping the direct relationship between training photos and sketches, yielding the specific information and characteristic features of a test photo. In the common structure, the fidelity between the sketch pixels generated by the specific structure and their candidates selected from the training data are considered, ensuring the preservation of the common facial structure. Experimental results on the Chinese University of Hong Kong face sketch database demonstrate, both qualitatively and quantitatively, that the proposed NPGM-based face sketch synthesis approach can more effectively capture specific features and recover common structures compared with the state-of-the-art methods. Extensive experiments in practical applications further illustrate that the proposed method achieves superior performance.
Collapse
|
12
|
Zhang M, Wang N, Li Y, Gao X. Bionic Face Sketch Generator. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2701-2714. [PMID: 31331901 DOI: 10.1109/tcyb.2019.2924589] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face sketch synthesis is a crucial technique in digital entertainment. However, the existing face sketch synthesis approaches usually generate face sketches with coarse structures. The fine details on some facial components fail to be generated. In this paper, inspired by the artists during drawing face sketches, we propose a bionic face sketch generator. It includes three parts: 1) a coarse part; 2) a fine part; and 3) a finer part. The coarse part builds the facial structure of a sketch by a generative adversarial network in the U-Net. In the middle part, the noise produced by the coarse part is erased and the fine details on the important face components are generated via a probabilistic graphic model. To compensate for the fine sketch with distinctive edge and area of shadows and lights, we learn a mapping relationship at the high-frequency band by a convolutional neural network in the finer part. The experimental results show that the proposed bionic face sketch generator can synthesize the face sketch with more delicate and striking details, satisfy the requirement of users in the digital entertainment, and provide the students with the coarse, fine, and finer face sketch copies when learning sketches. Compared with the state-of-the-art methods, the proposed approach achieves better results in both visual effects and quantitative metrics.
Collapse
|
13
|
Duanmu Z, Liu W, Li Z, Ma K, Wang Z. Characterizing Generalized Rate-Distortion Performance of Video Coding: An Eigen Analysis Approach. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6180-6193. [PMID: 32356747 DOI: 10.1109/tip.2020.2988437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space W of the GRD function by analyzing its mathematical properties. We show that W is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of W. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.
Collapse
|
14
|
Abe Y, Shimada M, Takeda Y, Enoki T, Omachi K, Abe S. Evaluation of Patient Positioning during Digital Tomosynthesis and Reconstruction Algorithms for Ilizarov Frames: A Phantom Study. Strategies Trauma Limb Reconstr 2020; 15:1-6. [PMID: 33363634 PMCID: PMC7744665 DOI: 10.5005/jp-journals-10080-1446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Aim Metallic components from circular external fixators, including the Ilizarov frame, cause artefacts on X-rays and obstruct clear visualisation of bone detail. We evaluated the ability of tomosynthesis to reduce interference on radiographs caused by metal artefacts and developed an optimal image acquisition method for such cases. Materials and methods An Ilizarov frame phantom was constructed using rods placed on the bone for the purpose to evaluate the benefits of tomosynthesis. Distances between the rod and bone and the angle between the rod and X-ray tube orbit were set at three different levels. Filtered backprojection images were reconstructed using two different features of the reconstruction function: THICKNESS−− (CONTRAST4) and THICKNESS++ (METAL4); the first is suitable for improving contrast and the second is suitable for metal artefacts. The peak signal-to-noise ratio (PSNR) was used during image evaluation to determine the influence of the metallic rod on bone structure visibility. Results The PSNR increased as the angle between the metal rod and the X-ray tube orbit and the distance between the metallic rod and bone increased. The PSNR was larger when using THICKNESS−− (CONTRAST4) than when using THICKNESS++ (METAL4). Conclusion The optimal reconstruction function and image acquisition determined using the metallic rod in this study suggest that quality equal to that without the metallic rod can be obtained. Clinical significance We describe an optimised method for image acquisition without unnecessary acquisition repetition and unreasonable posture changes when the bone cannot be adequately visualised. How to cite this article Abe Y, Shimada M, Takeda Y, et al. Evaluation of Patient Positioning during Digital Tomosynthesis and Reconstruction Algorithms for Ilizarov Frames: A Phantom Study. Strategies Trauma Limb Reconstr 2020;15(1):1–6.
Collapse
Affiliation(s)
- Yuki Abe
- Department of Radiological Technology, Graduate School of Health Sciences, Okayama University, Okayama, Japan
| | - Makoto Shimada
- Department of Radiological Technology, Graduate School of Health Sciences, Okayama University, Okayama, Japan
| | - Yoshihiro Takeda
- Department of Radiological Technology, Graduate School of Health Sciences, Okayama University, Okayama, Japan
| | - Taisuke Enoki
- Department of Educational Collaboration, Health and Safety Sciences, Osaka Kyoiku University, Kashiwara, Osaka, Japan
| | - Kumiko Omachi
- Department of Radiology, Osaka General Medical Center, Osaka, Japan
| | - Shuji Abe
- Department of Radiology, Osaka Women's and Children's Hospital, Izumi, Osaka, Japan
| |
Collapse
|
15
|
Chen Z, Zhu H. Visual Quality Evaluation for Semantic Segmentation: Subjective Assessment Database and Objective Assessment Measure. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5785-5796. [PMID: 31217113 DOI: 10.1109/tip.2019.2922072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
To promote the applications of semantic segmentation, quality evaluation is important to assess different algorithms and guide their development and optimization. In this paper, we establish a subjective semantic segmentation quality assessment database based on the stimulus-comparison method. Given that the database reflects the relative quality of semantic segmentation result pairs, we adopt a robust regression mapping model to explore the relationship between subjective assessment and objective distance. With the help of the regression model, we can examine whether objective metrics coincide with subjective judgement. In addition, we propose a novel relative quality prediction network (RQPN) based on Siamese CNN as a new objective metric. The metric is trained by our subjective assessment database and can be applied to evaluate the performances of semantic segmentation algorithms, even if the algorithms were not used to build the database. Experiments are conducted to show the advance and the reliability of our database and demonstrate that results predicted by RQPN are more consistent to subjective assessment than existing objective metrics.
Collapse
|
16
|
Chen W, Gu K, Lin W, Xia Z, Le Callet P, Cheng E. Reference-Free Quality Assessment of Sonar Images via Contour Degradation Measurement. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5336-5351. [PMID: 31021766 DOI: 10.1109/tip.2019.2910666] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Sonar imagery plays a significant role in oceanic applications since there is little natural light underwater, and light is irrelevant to sonar imaging. Sonar images are very likely to be affected by various distortions during the process of transmission via the underwater acoustic channel for further analysis. At the receiving end, the reference image is unavailable due to the complex and changing underwater environment and our unfamiliarity with it. To the best of our knowledge, one of the important usages of sonar images is target recognition on the basis of contour information. The contour degradation degree for a sonar image is relevant to the distortions contained in it. To this end, we developed a new no-reference contour degradation measurement for perceiving the quality of sonar images. The sparsities of a series of transform coefficient matrices, which are descriptive of contour information, are first extracted as features from the frequency and spatial domains. The contour degradation degree for a sonar image is then measured by calculating the ratios of extracted features before and after filtering this sonar image. Finally, a bootstrap aggregating (bagging)-based support vector regression module is learned to capture the relationship between the contour degradation degree and the sonar image quality. The results of experiments validate that the proposed metric is competitive with the state-of-the-art reference-based quality metrics and outperforms the latest reference-free competitors.
Collapse
|
17
|
Yang J, Xu H, Zhao Y, Liu H, Lu W. Stereoscopic image quality assessment combining statistical features and binocular theory. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2018.10.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Zhang M, Wang N, Li Y, Gao X. Deep Latent Low-Rank Representation for Face Sketch Synthesis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3109-3123. [PMID: 30676980 DOI: 10.1109/tnnls.2018.2890017] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Face sketch synthesis is useful and profitable in digital entertainment. Most existing face sketch synthesis methods rely on the assumption that facial photographs/sketches form a low-dimensional manifold. Once the training data are insufficient, the manifold could not characterize the identity-specific information that is included in a test photograph but excluded in the training data. Thus, the synthesized sketch would lose this information, such as glasses, earrings, hairstyles, and hairpins. To provide the sufficient data and satisfy the assumption on manifold, we propose a novel face sketch synthesis framework based on deep latent low-rank representation (DLLRR) in this paper. The DLLRR induces the hidden training sketches with the identity-specific information as the hidden data to the insufficient original training sketches as the observed data. And it searches the lowest rank representation on the candidates of a test photograph from the both hidden and observed data. For the strong representational capability of the coupled autoencoder, we leverage it to reveal the hidden data. Experiment results on face photograph-sketch database illustrate that the proposed method can successfully provide the sufficient training data with the identity-specific information. And compared to the state of the arts, the proposed method synthesizes more clean and vivid face sketches.
Collapse
|
19
|
Zhang M, Li Y, Wang N, Chi Y, Gao X. Cascaded Face Sketch Synthesis under Various Illuminations. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1507-1521. [PMID: 31562092 DOI: 10.1109/tip.2019.2942514] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face sketch synthesis from a photo is of significant importance in digital entertainment. An intelligent face sketch synthesis system requires a strong robustness to lighting variations. Under uncontrolled lighting conditions in real-world settings, such a system will perform consistently well and have little restriction on the lighting conditions. However, previous face sketch synthesis methods tend to synthesize sketches under well-controlled lighting conditions. These methods are sensitive to lighting variations and produce unsatisfactory results when the lighting condition varies. In this paper, we propose a novel cascaded face sketch synthesis framework composed of a multiple feature generator and a cascaded low-rank representation. The multiple feature generator not only produces a generated sketch feature consistent with an artist's drawing style but also extracts a photo feature that is robust to various illuminations. Both features ensure that given a photo patch, the optimal sketch candidates can be selected from the database. The cascaded low-rank representation enables a gradual reduction in the gap between the synthesized face sketch and the corresponding artistdrawn sketch. Experimental results illustrate that the proposed cascaded framework generates realistic sketches on par with the current methods on the Chinese University of Hong Kong face sketch database under well-controlled illuminations. Moreover, this framework exhibits greatly improved performance compared to these methods on the extended Chinese University of Hong Kong face sketch database and Chinese celebrity face photos from the web under different illuminations. We argue that this framework paves a novel way for the implementation of computer-aided optical systems that are of essential importance in both face sketch synthesis and optical imaging.
Collapse
|
20
|
Liu X, Weijer JVD, Bagdanov AD. Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:1862-1878. [PMID: 30794168 DOI: 10.1109/tpami.2019.2899857] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.
Collapse
|
21
|
Zhang M, Wang R, Gao X, Li J, Tao D. Dual-Transfer Face Sketch-Photo Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:642-657. [PMID: 30222563 DOI: 10.1109/tip.2018.2869688] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recognizing the identity of a sketched face from a face photograph dataset is a critical yet challenging task in many applications, not least law enforcement and criminal investigations. An intelligent sketched face identification system would rely on automatic face sketch synthesis from photographs, thereby avoiding the cost of artists manually drawing sketches. However, conventional face sketch-photo synthesis methods tend to generate sketches that are consistent with the artists'drawing styles. Identity-specific information is often overlooked, leading to unsatisfactory identity verification and recognition performance. In this paper, we discuss the reasons why conventional methods fail to recover identity-specific information. Then, we propose a novel dual-transfer face sketch-photo synthesis framework composed of an inter-domain transfer process and an intra-domain transfer process. In the inter-domain transfer, a regressor of the test photograph with respect to the training photographs is learned and transferred to the sketch domain, ensuring the recovery of common facial structures during synthesis. In the intra-domain transfer, a mapping characterizing the relationship between photographs and sketches is learned and transferred across different identities, such that the loss of identity-specific information is suppressed during synthesis. The fusion of information recovered by the two processes is straightforward by virtue of an ad hoc information splitting strategy. We employ both linear and nonlinear formulations to instantiate the proposed framework. Experiments on The Chinese University of Hong Kong face sketch database demonstrate that compared to the current state-of-the-art the proposed framework produces more identifiable facial structures and yields higher face recognition performance in both the photo and sketch domains.
Collapse
|
22
|
Lavoue G, Langer M, Peytavie A, Poulin P. A Psychophysical Evaluation of Texture Compression Masking Effects. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:1336-1346. [PMID: 29994636 DOI: 10.1109/tvcg.2018.2805355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Lossy texture compression is increasingly used to reduce GPU memory and bandwidth consumption. However, as raised by recent studies, evaluating the quality of compressed textures is a difficult problem. Indeed using Peak Signal-to-Noise Ratio (PSNR) on texture images, like done in most applications, may not be a correct way to proceed. In particular, there is evidence that masking effects apply when the texture image is mapped on a surface and combined with other textures (e.g., affecting geometry or normal). These masking effects have to be taken into account when compressing a set of texture maps, in order to have a real understanding of the visual impact of the compression artifacts on the rendered images. In this work, we present the first psychophysical experiment investigating the perceptual impact of texture compression on rendered images. We explore the influence of compression bit rate, light direction, and diffuse and normal map content on the visual impact of artifacts. The collected data reveal huge masking effects from normal map to diffuse map artifacts and vice versa, and reveal the weakness of PSNR applied on individual textures for evaluating compression quality. The results allow us to also analyze the performance and failures of image quality metrics for predicting the visibility of these artifacts. We finally provide some recommendations for evaluating the quality of texture compression and show a practical application to approximating the distortion measured on a rendered 3D shape.
Collapse
|
23
|
Effect of external fixation rod coupling in computed tomography. Strategies Trauma Limb Reconstr 2018; 13:137-149. [PMID: 30220005 PMCID: PMC6249148 DOI: 10.1007/s11751-018-0318-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 09/11/2018] [Indexed: 10/28/2022] Open
Abstract
External fixation is a common tool in the treatment of complex fractures, correction of limb deformity, and salvage arthrodesis. These devices typically incorporate radio-opaque metal rods/struts connected at varying distances and orientations between rings. Whilst the predominant imaging modality is plain film radiology, computed tomography (CT) may be performed in order for the surgeon to make a more confident clinical decision (e.g. timing of frame removal, assessment of degree of arthrodesis). We used a fractured sheep leg to systematically assess CT imaging performance with a Discovery CT750 HD CT scanner (GE Healthcare) to show how rod coupling in both traditional Ilizarov and hexapod frames distorts images. We also investigated the role of dual-energy CT (DECT) and metal artefact reduction software (MARS) on the visualisation of the fractured leg. Whilst mechanical reasons predominantly dictate the rod/strut configurations when building a circular frame, rod coupling in CT can be minimised. Firstly, ideally, all or all but one rod can be removed during imaging resulting in no rod coupling. If this is not possible, strategies for configuring the rods to minimise the effect of the rod coupling on the region of interest are demonstrated, e.g., in the case of a four-rod construct, switching the two anterior rods to a more central single one will achieve this goal without particularly jeopardising mechanical strength for a short period. It is also shown that the addition of DECT and MARS results in a reduction of artefacts, but also affects tissue and bone differentiation.
Collapse
|
24
|
|
25
|
Gu K, Tao D, Qiao JF, Lin W. Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1301-1313. [PMID: 28287984 DOI: 10.1109/tnnls.2017.2649101] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g., object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g., visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images, which are generally thought to be of the best quality. In this paper, we present two most important contributions. The first contribution is to develop a new no-reference (NR) IQA model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measure of visual quality using a regression module, which is learned with big-data training samples that are much bigger than the size of relevant image data sets. The results of experiments on nine data sets validate the superiority and efficiency of our blind metric compared with typical state-of-the-art full-reference, reduced-reference and NA IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NR-IQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, low-contrast images, low-light images, and dehazed images. The source code will be released at https://sites.google.com/site/guke198701/publications.
Collapse
|
26
|
Yang J, Jiang B, Wang Y, Lu W, Meng Q. Sparse representation based stereoscopic image quality assessment accounting for perceptual cognitive process. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.053] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Zhang M, Li J, Wang N, Gao X. Compositional Model-Based Sketch Generator in Facial Entertainment. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:904-915. [PMID: 28212105 DOI: 10.1109/tcyb.2017.2664499] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Face sketch synthesis (FSS) plays an important role in facial entertainment, which includes face sketch morphing among two styles, multiview FSS and face sketch expression manipulation. For facial entertainment, most existing FSS methods generate sketches with over-smoothing effects, i.e., fine details are suppressed more or less. In this paper, we propose a face sketch generator based on the compositional model to handle this issue. It decomposes a face into different components instead of patches as before, and each component has several candidate templates. Multilevel B-spline approximation is utilized to delicately polish the chosen templates of all components. To fuse these components, Poisson blending is employed instead of the weighted average operator. The proposed compositional method crucially reduces the high frequency loss and improves the synthesis performance in comparison to the state-of-the-art methods. Experiments on face sketch morphing, expression manipulation, and multiview FSS, make further efforts to demonstrate the effectiveness of the proposed method.
Collapse
|
28
|
Gao F, Wang Y, Li P, Tan M, Yu J, Zhu Y. DeepSim: Deep similarity for image quality assessment. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.054] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Ma K, Liu W, Liu T, Wang Z, Tao D. dipIQ: Blind Image Quality Assessment by Learning-to-Rank Discriminable Image Pairs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3951-3964. [PMID: 28574353 DOI: 10.1109/tip.2017.2708503] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Objective assessment of image quality is fundamentally important in many image processing tasks. In this paper, we focus on learning blind image quality assessment (BIQA) models, which predict the quality of a digital image with no access to its original pristine-quality counterpart as reference. One of the biggest challenges in learning BIQA models is the conflict between the gigantic image space (which is in the dimension of the number of image pixels) and the extremely limited reliable ground truth data for training. Such data are typically collected via subjective testing, which is cumbersome, slow, and expensive. Here, we first show that a vast amount of reliable training data in the form of quality-discriminable image pairs (DIPs) can be obtained automatically at low cost by exploiting large-scale databases with diverse image content. We then learn an opinion-unaware BIQA (OU-BIQA, meaning that no subjective opinions are used for training) model using RankNet, a pairwise learning-to-rank (L2R) algorithm, from millions of DIPs, each associated with a perceptual uncertainty level, leading to a DIP inferred quality (dipIQ) index. Extensive experiments on four benchmark IQA databases demonstrate that dipIQ outperforms the state-of-the-art OU-BIQA models. The robustness of dipIQ is also significantly improved as confirmed by the group MAximum Differentiation competition method. Furthermore, we extend the proposed framework by learning models with ListNet (a listwise L2R algorithm) on quality-discriminable image lists (DIL). The resulting DIL inferred quality index achieves an additional performance gain.
Collapse
|
30
|
Yu S, Wu S, Wang L, Jiang F, Xie Y, Li L. A shallow convolutional neural network for blind image sharpness assessment. PLoS One 2017; 12:e0176632. [PMID: 28459832 PMCID: PMC5436206 DOI: 10.1371/journal.pone.0176632] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 04/13/2017] [Indexed: 11/18/2022] Open
Abstract
Blind image quality assessment can be modeled as feature extraction followed by score prediction. It necessitates considerable expertise and efforts to handcraft features for optimal representation of perceptual image quality. This paper addresses blind image sharpness assessment by using a shallow convolutional neural network (CNN). The network takes single feature layer to unearth intrinsic features for image sharpness representation and utilizes multilayer perceptron (MLP) to rate image quality. Different from traditional methods, CNN integrates feature extraction and score prediction into an optimization procedure and retrieves features automatically from raw images. Moreover, its prediction performance can be enhanced by replacing MLP with general regression neural network (GRNN) and support vector regression (SVR). Experiments on Gaussian blur images from LIVE-II, CSIQ, TID2008 and TID2013 demonstrate that CNN features with SVR achieves the best overall performance, indicating high correlation with human subjective judgment.
Collapse
Affiliation(s)
- Shaode Yu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,
Shenzhen, Guangdong, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of
Sciences, Shenzhen, Guangdong, China
| | - Shibin Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,
Shenzhen, Guangdong, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of
Sciences, Shenzhen, Guangdong, China
| | - Lei Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,
Shenzhen, Guangdong, China
| | - Fan Jiang
- Faculty of Information Engineering and Automation, Kunming University of
Science and Technology, Kunming, Yunnan, China
| | - Yaoqin Xie
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,
Shenzhen, Guangdong, China
- * E-mail:
(YQX); (LDL)
| | - Leida Li
- School of Information and Control Engineering, China University of Mining
and Technology, Xuzhou, Jiangsu, China
- * E-mail:
(YQX); (LDL)
| |
Collapse
|
31
|
Zhang K, Tao D, Gao X, Li X, Li J. Coarse-to-Fine Learning for Single-Image Super-Resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1109-1122. [PMID: 26915133 DOI: 10.1109/tnnls.2015.2511069] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper develops a coarse-to-fine framework for single-image super-resolution (SR) reconstruction. The coarse-to-fine approach achieves high-quality SR recovery based on the complementary properties of both example learning-and reconstruction-based algorithms: example learning-based SR approaches are useful for generating plausible details from external exemplars but poor at suppressing aliasing artifacts, while reconstruction-based SR methods are propitious for preserving sharp edges yet fail to generate fine details. In the coarse stage of the method, we use a set of simple yet effective mapping functions, learned via correlative neighbor regression of grouped low-resolution (LR) to high-resolution (HR) dictionary atoms, to synthesize an initial SR estimate with particularly low computational cost. In the fine stage, we devise an effective regularization term that seamlessly integrates the properties of local structural regularity, nonlocal self-similarity, and collaborative representation over relevant atoms in a learned HR dictionary, to further improve the visual quality of the initial SR estimation obtained in the coarse stage. The experimental results indicate that our method outperforms other state-of-the-art methods for producing high-quality images despite that both the initial SR estimation and the followed enhancement are cheap to implement.
Collapse
|
32
|
Ma K, Duanmu Z, Wu Q, Wang Z, Yong H, Li H, Zhang L. Waterloo Exploration Database: New Challenges for Image Quality Assessment Models. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:1004-1016. [PMID: 27893392 DOI: 10.1109/tip.2016.2631888] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The great content diversity of real-world digital images poses a grand challenge to image quality assessment (IQA) models, which are traditionally designed and validated on a handful of commonly used IQA databases with very limited content variation. To test the generalization capability and to facilitate the wide usage of IQA techniques in real-world applications, we establish a large-scale database named the Waterloo Exploration Database, which in its current state contains 4744 pristine natural images and 94 880 distorted images created from them. Instead of collecting the mean opinion score for each image via subjective testing, which is extremely difficult if not impossible, we present three alternative test criteria to evaluate the performance of IQA models, namely, the pristine/distorted image discriminability test, the listwise ranking consistency test, and the pairwise preference consistency test (P-test). We compare 20 well-known IQA models using the proposed criteria, which not only provide a stronger test in a more challenging testing environment for existing models, but also demonstrate the additional benefits of using the proposed database. For example, in the P-test, even for the best performing no-reference IQA model, more than 6 million failure cases against the model are "discovered" automatically out of over 1 billion test pairs. Furthermore, we discuss how the new database may be exploited using innovative approaches in the future, to reveal the weaknesses of existing IQA models, to provide insights on how to improve the models, and to shed light on how the next-generation IQA models may be developed. The database and codes are made publicly available at: https://ece.uwaterloo.ca/~k29ma/exploration/.
Collapse
|
33
|
|
34
|
Shao F, Tian W, Lin W, Jiang G, Dai Q. Toward a Blind Deep Quality Evaluator for Stereoscopic Images Based on Monocular and Binocular Interactions. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:2059-2074. [PMID: 26960225 DOI: 10.1109/tip.2016.2538462] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
During recent years, blind image quality assessment (BIQA) has been intensively studied with different machine learning tools. Existing BIQA metrics, however, do not design for stereoscopic images. We believe this problem can be resolved by separating 3D images and capturing the essential attributes of images via deep neural network. In this paper, we propose a blind deep quality evaluator (DQE) for stereoscopic images (denoted by 3D-DQE) based on monocular and binocular interactions. The key technical steps in the proposed 3D-DQE are to train two separate 2D deep neural networks (2D-DNNs) from 2D monocular images and cyclopean images to model the process of monocular and binocular quality predictions, and combine the measured 2D monocular and cyclopean quality scores using different weighting schemes. Experimental results on four public 3D image quality assessment databases demonstrate that in comparison with the existing methods, the devised algorithm achieves high consistent alignment with subjective assessment.
Collapse
|