1
|
Huang Y, Li L, Chen P, Wu H, Lin W, Shi G. Multi-Modality Multi-Attribute Contrastive Pre-Training for Image Aesthetics Computing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1205-1218. [PMID: 39504278 DOI: 10.1109/tpami.2024.3492259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
In the Image Aesthetics Computing (IAC) field, most prior methods leveraged the off-the-shelf backbones pre-trained on the large-scale ImageNet database. While these pre-trained backbones have achieved notable success, they often overemphasize object-level semantics and fail to capture the high-level concepts of image aesthetics, which may only achieve suboptimal performances. To tackle this long-neglected problem, we propose a multi-modality multi-attribute contrastive pre-training framework, targeting at constructing an alternative to ImageNet-based pre-training for IAC. Specifically, the proposed framework consists of two main aspects. 1) We build a multi-attribute image description database with human feedback, leveraging the competent image understanding capability of the multi-modality large language model to generate rich aesthetic descriptions. 2) To better adapt models to aesthetic computing tasks, we integrate the image-based visual features with the attribute-based text features, and map the integrated features into different embedding spaces, based on which the multi-attribute contrastive learning is proposed for obtaining more comprehensive aesthetic representation. To alleviate the distribution shift encountered when transitioning from the general visual domain to the aesthetic domain, we further propose a semantic affinity loss to restrain the content information and enhance model generalization. Extensive experiments demonstrate that the proposed framework sets new state-of-the-arts for IAC tasks.
Collapse
|
2
|
Shi H, Wang L, Wang G. Blind Quality Prediction for View Synthesis Based on Heterogeneous Distortion Perception. SENSORS (BASEL, SWITZERLAND) 2022; 22:7081. [PMID: 36146438 PMCID: PMC9504726 DOI: 10.3390/s22187081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/07/2022] [Accepted: 09/08/2022] [Indexed: 06/16/2023]
Abstract
The quality of synthesized images directly affects the practical application of virtual view synthesis technology, which typically uses a depth-image-based rendering (DIBR) algorithm to generate a new viewpoint based on texture and depth images. Current view synthesis quality metrics commonly evaluate the quality of DIBR-synthesized images, where the DIBR process is computationally expensive and time-consuming. In addition, the existing view synthesis quality metrics cannot achieve robustness due to the shallow hand-crafted features. To avoid the complicated DIBR process and learn more efficient features, this paper presents a blind quality prediction model for view synthesis based on HEterogeneous DIstortion Perception, dubbed HEDIP, which predicts the image quality of view synthesis from texture and depth images. Specifically, the texture and depth images are first fused based on discrete cosine transform to simulate the distortion of view synthesis images, and then the spatial and gradient domain features are extracted in a Two-Channel Convolutional Neural Network (TCCNN). Finally, a fully connected layer maps the extracted features to a quality score. Notably, the ground-truth score of the source image cannot effectively represent the labels of each image patch during training due to the presence of local distortions in view synthesis image. So, we design a Heterogeneous Distortion Perception (HDP) module to provide effective training labels for each image patch. Experiments show that with the help of the HDP module, the proposed model can effectively predict the quality of view synthesis. Experimental results demonstrate the effectiveness of the proposed model.
Collapse
Affiliation(s)
- Haozhi Shi
- School of Physics, Xidian University, Xi’an 710071, China
| | - Lanmei Wang
- School of Physics, Xidian University, Xi’an 710071, China
| | - Guibao Wang
- School of Physics and Telecommunication Engineering, Shaanxi University of Technology, Hanzhong 723001, China
| |
Collapse
|
3
|
Quality Assessment of View Synthesis Based on Visual Saliency and Texture Naturalness. ELECTRONICS 2022. [DOI: 10.3390/electronics11091384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Depth-Image-Based-Rendering (DIBR) is one of the core techniques for generating new views in 3D video applications. However, the distortion characteristics of the DIBR synthetic view are different from the 2D image. It is necessary to study the unique distortion characteristics of DIBR views and design effective and efficient algorithms to evaluate the DIBR-synthesized image and guide DIBR algorithms. In this work, the visual saliency and texture natrualness features are extracted to evaluate the quality of the DIBR views. After extracting the feature, we adopt machine learning method for mapping the extracted feature to the quality score of the DIBR views. Experiments constructed on two synthetic view databases IETR and IRCCyN/IVC, and the results show that our proposed algorithm performs better than the compared synthetic view quality evaluation methods.
Collapse
|
4
|
Jakhetiya V, Chaudhary S, Subudhi BN, Lin W, Guntuku SC. Perceptually Unimportant Information Reduction and Cosine Similarity-Based Quality Assessment of 3D-Synthesized Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2027-2039. [PMID: 35167450 DOI: 10.1109/tip.2022.3147981] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Quality assessment of 3D-synthesized images has traditionally been based on detecting specific categories of distortions such as stretching, black-holes, blurring, etc. However, such approaches have limitations in accurately detecting distortions entirely in 3D synthesized images affecting their performance. This work proposes an algorithm to efficiently detect the distortions and subsequently evaluate the perceptual quality of 3D synthesized images. The process of generation of 3D synthesized images produces a few pixel shift between reference and 3D synthesized image, and hence they are not properly aligned with each other. To address this, we propose using morphological operation (opening) in the residual image to reduce perceptually unimportant information between the reference and the distorted 3D synthesized image. The residual image suppresses the perceptually unimportant information and highlights the geometric distortions which significantly affect the overall quality of 3D synthesized images. We utilized the information present in the residual image to quantify the perceptual quality measure and named this algorithm as Perceptually Unimportant Information Reduction (PU-IR) algorithm. At the same time, the residual image cannot capture the minor structural and geometric distortions due to the usage of erosion operation. To address this, we extract the perceptually important deep features from the pre-trained VGG-16 architectures on the Laplacian pyramid. The distortions in 3D synthesized images are present in patches, and the human visual system perceives even the small levels of these distortions. With this view, to compare these deep features between reference and distorted image, we propose using cosine similarity and named this algorithm as Deep Features extraction and comparison using Cosine Similarity (DF-CS) algorithm. The cosine similarity is based upon their similarity rather than computing the magnitude of the difference of deep features. Finally, the pooling is done to obtain the objective quality scores using simple multiplication to both PU-IR and DF-CS algorithms. Our source code is available online: https://github.com/sadbhawnathakur/3D-Image-Quality-Assessment.
Collapse
|
5
|
Jakhetiya V, Mumtaz D, Subudhi BN, Guntuku SC. Stretching Artifacts Identification for Quality Assessment of 3D-Synthesized Views. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1737-1750. [PMID: 35100114 DOI: 10.1109/tip.2022.3145997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing Quality Assessment (QA) algorithms consider identifying "black-holes" to assess perceptual quality of 3D-synthesized views. However, advancements in rendering and inpainting techniques have made black-hole artifacts near obsolete. Further, 3D-synthesized views frequently suffer from stretching artifacts due to occlusion that in turn affect perceptual quality. Existing QA algorithms are found to be inefficient in identifying these artifacts, as has been seen by their performance on the IETR dataset. We found, empirically, that there is a relationship between the number of blocks with stretching artifacts in view and the overall perceptual quality. Building on this observation, we propose a Convolutional Neural Network (CNN) based algorithm that identifies the blocks with stretching artifacts and incorporates the number of blocks with the stretching artifacts to predict the quality of 3D-synthesized views. To address the challenge with existing 3D-synthesized views dataset, which has few samples, we collect images from other related datasets to increase the sample size and increase generalization while training our proposed CNN-based algorithm. The proposed algorithm identifies blocks with stretching distortions and subsequently fuses them to predict perceptual quality without reference, achieving improvement in performance compared to existing no-reference QA algorithms that are not trained on the IETR dataset. The proposed algorithm can also identify the blocks with stretching artifacts efficiently, which can further be used in downstream applications to improve the quality of 3D views. Our source code is available online: https://github.com/sadbhawnathakur/3D-Image-Quality-Assessment.
Collapse
|
6
|
Sandic-Stankovic DD, Kukolj DD, Le Callet P. Quality Assessment of DIBR-Synthesized Views Based on Sparsity of Difference of Closings and Difference of Gaussians. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1161-1175. [PMID: 34990360 DOI: 10.1109/tip.2021.3139238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Images synthesized using depth-image-based-rendering (DIBR) techniques may suffer from complex structural distortions. The goal of the primary visual cortex and other parts of brain is to reduce redundancies of input visual signal in order to discover the intrinsic image structure, and thus create sparse image representation. Human visual system (HVS) treats images on several scales and several levels of resolution when perceiving the visual scene. With an attempt to emulate the properties of HVS, we have designed the no-reference model for the quality assessment of DIBR-synthesized views. To extract a higher-order structure of high curvature which corresponds to distortion of shapes to which the HVS is highly sensitive, we define a morphological oriented Difference of Closings (DoC) operator and use it at multiple scales and resolutions. DoC operator nonlinearly removes redundancies and extracts fine grained details, texture of an image local structure and contrast to which HVS is highly sensitive. We introduce a new feature based on sparsity of DoC band. To extract perceptually important low-order structural information (edges), we use the non-oriented Difference of Gaussians (DoG) operator at different scales and resolutions. Measure of sparsity is calculated for DoG bands to get scalar features. To model the relationship between the extracted features and subjective scores, the general regression neural network (GRNN) is used. Quality predictions by the proposed DoC-DoG-GRNN model show higher compatibility with perceptual quality scores in comparison to the tested state-of-the-art metrics when evaluated on four benchmark datasets with synthesized views, IRCCyN/IVC image/video dataset, MCL-3D stereoscopic image dataset and IST image dataset.
Collapse
|
7
|
Tang L, Tian C, Meng Y, Xu K. Longitudinal evaluation for COVID-19 chest CT disease progression based on Tchebichef moments. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2021; 31:1120-1127. [PMID: 34219952 PMCID: PMC8239802 DOI: 10.1002/ima.22583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 03/23/2021] [Accepted: 03/27/2021] [Indexed: 06/13/2023]
Abstract
Blur is a key property in the perception of COVID-19 computed tomography (CT) image manifestations. Typically, blur causes edge extension, which brings shape changes in infection regions. Tchebichef moments (TM) have been verified efficiently in shape representation. Intuitively, disease progression of same patient over time during the treatment is represented as different blur degrees of infection regions, since different blur degrees cause the magnitudes change of TM on infection regions image, blur of infection regions can be captured by TM. With the above observation, a longitudinal objective quantitative evaluation method for COVID-19 disease progression based on TM is proposed. COVID-19 disease progression CT image database (COVID-19 DPID) is built to employ radiologist subjective ratings and manual contouring, which can test and compare disease progression on the CT images acquired from the same patient over time. Then the images are preprocessed, including lung automatic segmentation, longitudinal registration, slice fusion, and a fused slice image with region of interest (ROI) is obtained. Next, the gradient of a fused ROI image is calculated to represent the shape. The gradient image of fused ROI is separated into same size blocks, a block energy is calculated as quadratic sum of non-direct current moment values. Finally, the objective assessment score is obtained by TM energy-normalized applying block variances. We have conducted experiment on COVID-19 DPID and the experiment results indicate that our proposed metric supplies a satisfactory correlation with subjective evaluation scores, demonstrating effectiveness in the quantitative evaluation for COVID-19 disease progression.
Collapse
Affiliation(s)
- Lu Tang
- School of Medical ImagingXuzhou Medical UniversityXuzhouChina
| | - Chuangeng Tian
- School of Information and Electrical EngineeringXuzhou University of TechnologyXuzhouChina
| | - Yankai Meng
- Department of RadiologyThe Affiliated Hospital of Xuzhou Medical UniversityXuzhouChina
| | - Kai Xu
- School of Medical ImagingXuzhou Medical UniversityXuzhouChina
- Department of RadiologyThe Affiliated Hospital of Xuzhou Medical UniversityXuzhouChina
| |
Collapse
|
8
|
de Oliveira AQ, Silveira TLTD, Walter M, Jung CR. A Hierarchical Superpixel-Based Approach for DIBR View Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6408-6419. [PMID: 34214037 DOI: 10.1109/tip.2021.3092817] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
View synthesis allows observers to explore static scenes using aligned color images and depth maps captured in a preset camera path. Among the options, depth-image-based rendering (DIBR) approaches have been effective and efficient since only one pair of color and depth map is required, saving storage and bandwidth. The present work proposes a novel DIBR pipeline for view synthesis that properly tackles the different artifacts that arise from 3D warping, such as cracks, disocclusions, ghosts, and out-of-field areas. A key aspect of our contributions relies on the adaptation and usage of a hierarchical image superpixel algorithm that helps to maintain structural characteristics of the scene during image reconstruction. We compare our approach with state-of-the-art methods and show that it attains the best average results in two common assessment metrics under public still-image and video-sequence datasets. Visual results are also provided, illustrating the potential of our technique in real-world applications.
Collapse
|
9
|
Jin C, Peng Z, Zou W, Chen F, Jiang G, Yu M. No-Reference Quality Assessment for 3D Synthesized Images Based on Visual-Entropy-Guided Multi-Layer Features Analysis. ENTROPY (BASEL, SWITZERLAND) 2021; 23:770. [PMID: 34207229 PMCID: PMC8233917 DOI: 10.3390/e23060770] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/01/2021] [Accepted: 06/14/2021] [Indexed: 11/16/2022]
Abstract
Multiview video plus depth is one of the mainstream representations of 3D scenes in emerging free viewpoint video, which generates virtual 3D synthesized images through a depth-image-based-rendering (DIBR) technique. However, the inaccuracy of depth maps and imperfect DIBR techniques result in different geometric distortions that seriously deteriorate the users' visual perception. An effective 3D synthesized image quality assessment (IQA) metric can simulate human visual perception and determine the application feasibility of the synthesized content. In this paper, a no-reference IQA metric based on visual-entropy-guided multi-layer features analysis for 3D synthesized images is proposed. According to the energy entropy, the geometric distortions are divided into two visual attention layers, namely, bottom-up layer and top-down layer. The feature of salient distortion is measured by regional proportion plus transition threshold on a bottom-up layer. In parallel, the key distribution regions of insignificant geometric distortion are extracted by a relative total variation model, and the features of these distortions are measured by the interaction of decentralized attention and concentrated attention on top-down layers. By integrating the features of both bottom-up and top-down layers, a more visually perceptive quality evaluation model is built. Experimental results show that the proposed method is superior to the state-of-the-art in assessing the quality of 3D synthesized images.
Collapse
Affiliation(s)
- Chongchong Jin
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
| | - Zongju Peng
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Wenhui Zou
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
| | - Fen Chen
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Gangyi Jiang
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
| | - Mei Yu
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China; (C.J.); (W.Z.); (F.C.); (G.J.); (M.Y.)
| |
Collapse
|
10
|
Xi X, Meng X, Qin Z, Nie X, Yin Y, Chen X. IA-net: informative attention convolutional neural network for choroidal neovascularization segmentation in OCT images. BIOMEDICAL OPTICS EXPRESS 2020; 11:6122-6136. [PMID: 33282479 PMCID: PMC7687935 DOI: 10.1364/boe.400816] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/22/2020] [Accepted: 09/22/2020] [Indexed: 05/08/2023]
Abstract
Choroidal neovascularization (CNV) is a characteristic feature of wet age-related macular degeneration (AMD). Quantification of CNV is useful to clinicians in the diagnosis and treatment of CNV disease. Before quantification, CNV lesion should be delineated by automatic CNV segmentation technology. Recently, deep learning methods have achieved significant success for medical image segmentation. However, some CNVs are small objects which are hard to discriminate, resulting in performance degradation. In addition, it's difficult to train an effective network for accurate segmentation due to the complicated characteristics of CNV in OCT images. In order to tackle these two challenges, this paper proposed a novel Informative Attention Convolutional Neural Network (IA-net) for automatic CNV segmentation in OCT images. Considering that the attention mechanism has the ability to enhance the discriminative power of the interesting regions in the feature maps, the attention enhancement block is developed by introducing the additional attention constraint. It has the ability to force the model to pay high attention on CNV in the learned feature maps, improving the discriminative ability of the learned CNV features, which is useful to improve the segmentation performance on small CNV. For accurate pixel classification, the novel informative loss is proposed with the incorporation of an informative attention map. It can focus training on a set of informative samples that are difficult to be predicted. Therefore, the trained model has the ability to learn enough information to classify these informative samples, further improving the performance. The experimental results on our database demonstrate that the proposed method outperforms traditional CNV segmentation methods.
Collapse
Affiliation(s)
- Xiaoming Xi
- School of Computer Science and Technology, Shandong Jianzhu University, 250101, China
| | - Xianjing Meng
- School of Computer Science and Technology, Shandong University of Finance and Economics, 250014, China
| | - Zheyun Qin
- School of Software, Shandong University, 250101, China
| | - Xiushan Nie
- School of Computer Science and Technology, Shandong Jianzhu University, 250101, China
| | - Yilong Yin
- School of Software, Shandong University, 250101, China
| | - Xinjian Chen
- School of Electronic and Information Engineering, Soochow University, 215006, China
| |
Collapse
|
11
|
Abstract
In this paper, we propose a no-reference image quality assessment (NR-IQA) approach towards authentically distorted images, based on expanding proxy labels. In order to distinguish from the human labels, we define the quality score, which is generated by using a traditional NR-IQA algorithm, as “proxy labels”. “Proxy” means that the objective results are obtained by computer after the extraction and assessment of the image features, instead of human judging. To solve the problem of limited image quality assessment (IQA) dataset size, we adopt a cascading transfer-learning method. First, we obtain large numbers of proxy labels which denote the quality score of authentically distorted images by using a traditional no-reference IQA method. Then the deep network is trained by the proxy labels, in order to learn IQA-related knowledge from the amounts of images with their scores. Ultimately, we use fine-tuning to inherit knowledge represented in the trained network. During the procedure, the mapping relationship fits in with human visual perception closer. The experimental results demonstrate that the proposed algorithm shows an outstanding performance as compared with the existing algorithms. On the LIVE In the Wild Image Quality Challenge database and KonIQ-10k database (two standard databases for authentically distorted image quality assessment), the algorithm realized good consistency between human visual perception and the predicted quality score of authentically distorted images.
Collapse
|
12
|
Wang G, Wang Z, Gu K, Li L, Xia Z, Wu L. Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1802-1814. [PMID: 31613757 DOI: 10.1109/tip.2019.2945675] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the "blind" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.
Collapse
|