1
|
Huang Y, Li L, Chen P, Wu H, Lin W, Shi G. Multi-Modality Multi-Attribute Contrastive Pre-Training for Image Aesthetics Computing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1205-1218. [PMID: 39504278 DOI: 10.1109/tpami.2024.3492259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
In the Image Aesthetics Computing (IAC) field, most prior methods leveraged the off-the-shelf backbones pre-trained on the large-scale ImageNet database. While these pre-trained backbones have achieved notable success, they often overemphasize object-level semantics and fail to capture the high-level concepts of image aesthetics, which may only achieve suboptimal performances. To tackle this long-neglected problem, we propose a multi-modality multi-attribute contrastive pre-training framework, targeting at constructing an alternative to ImageNet-based pre-training for IAC. Specifically, the proposed framework consists of two main aspects. 1) We build a multi-attribute image description database with human feedback, leveraging the competent image understanding capability of the multi-modality large language model to generate rich aesthetic descriptions. 2) To better adapt models to aesthetic computing tasks, we integrate the image-based visual features with the attribute-based text features, and map the integrated features into different embedding spaces, based on which the multi-attribute contrastive learning is proposed for obtaining more comprehensive aesthetic representation. To alleviate the distribution shift encountered when transitioning from the general visual domain to the aesthetic domain, we further propose a semantic affinity loss to restrain the content information and enhance model generalization. Extensive experiments demonstrate that the proposed framework sets new state-of-the-arts for IAC tasks.
Collapse
|
2
|
Cherepkova O, Amirshahi SA, Pedersen M. Individual Contrast Preferences in Natural Images. J Imaging 2024; 10:25. [PMID: 38249010 PMCID: PMC10817677 DOI: 10.3390/jimaging10010025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/04/2024] [Accepted: 01/09/2024] [Indexed: 01/23/2024] Open
Abstract
This paper is an investigation in the field of personalized image quality assessment with the focus of studying individual contrast preferences for natural images. To achieve this objective, we conducted an in-lab experiment with 22 observers who assessed 499 natural images and collected their contrast level preferences. We used a three-alternative forced choice comparison approach coupled with a modified adaptive staircase algorithm to dynamically adjust the contrast for each new triplet. Through cluster analysis, we clustered observers into three groups based on their preferred contrast ranges: low contrast, natural contrast, and high contrast. This finding demonstrates the existence of individual variations in contrast preferences among observers. To facilitate further research in the field of personalized image quality assessment, we have created a database containing 10,978 original contrast level values preferred by observers, which is publicly available online.
Collapse
Affiliation(s)
- Olga Cherepkova
- Department of Computer Science, Norwegian University of Science and Technology, 2802 Gjøvik, Norway; (S.A.A.); (M.P.)
| | | | | |
Collapse
|
3
|
Yang H, Chen J. Art appreciation model design based on improved PageRank and ECA-ResNeXt50 algorithm. PeerJ Comput Sci 2023; 9:e1734. [PMID: 38192472 PMCID: PMC10773910 DOI: 10.7717/peerj-cs.1734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/13/2023] [Indexed: 01/10/2024]
Abstract
Image sentiment analysis technology can predict, measure and understand the emotional experience of human beings through images. Aiming at the problem of extracting emotional characteristics in art appreciation, this article puts forward an innovative method. Firstly, the PageRank algorithm is enhanced using tweet content similarity and time factors; secondly, the SE-ResNet network design is used to integrate Efficient Channel Attention (ECA) with the residual network structure, and ResNeXt50 is optimized to enhance the extraction of image sentiment features. Finally, the weight coefficients of overall emotions are dynamically adjusted to select a specific emotion incorporation strategy, resulting in effective bimodal fusion. The proposed model demonstrates exceptional performance in predicting sentiment labels, with maximum classification accuracy reaching 88.20%. The accuracy improvement of 21.34% compared to the traditional deep convolutional neural networks (DCNN) model attests to the effectiveness of this study. This research enriches images and texts' emotion feature extraction capabilities and improves the accuracy of emotion fusion classification.
Collapse
Affiliation(s)
- Hang Yang
- School of Journalism, Qinghai Normal University, Xining, Qinghai, China
| | - Jingyao Chen
- The Graduate School of Namseoul University, Cheonan, Republic of Korea
| |
Collapse
|
4
|
Jia G, Li P, He R. Theme-Aware Aesthetic Distribution Prediction With Full-Resolution Photographs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8654-8668. [PMID: 35245201 DOI: 10.1109/tnnls.2022.3151787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Aesthetic quality assessment (AQA) is a challenging task due to complex aesthetic factors. Currently, it is common to conduct AQA using deep neural networks (DNNs) that require fixed-size inputs. The existing methods mainly transform images by resizing, cropping, and padding or use adaptive pooling to alternately capture the aesthetic features from fixed-size inputs. However, these transformations potentially damage aesthetic features. To address this issue, we propose a simple but effective method to accomplish full-resolution image AQA by combining image padding with region of image (RoM) pooling. Padding turns inputs into the same size. RoM pooling pools image features and discards extra padded features to eliminate the side effects of padding. In addition, the image aspect ratios are encoded and fused with visual features to remedy the shape information loss of RoM pooling. Furthermore, we observe that the same image may receive different aesthetic evaluations under different themes, which we call the theme criterion bias. Hence, a theme-aware model that uses theme information to guide model predictions is proposed. Finally, we design an attention-based feature fusion module to effectively use both the shape and theme information. Extensive experiments prove the effectiveness of the proposed method over state-of-the-art methods.
Collapse
|
5
|
Yang J, Zhou Y, Zhao Y, Lu W, Gao X. MetaMP: Metalearning-Based Multipatch Image Aesthetics Assessment. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5716-5728. [PMID: 35580097 DOI: 10.1109/tcyb.2022.3169017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Image aesthetics assessment (IAA) is a subjective and complex task. The aesthetics of different themes vary greatly in content and aesthetic results, whether they are in the same aesthetic community or not. In aesthetic evaluation tasks, the pretrained network with direct fine-tune may not be able to quickly adapt to tasks on various themes. This article introduces a metalearning-based multipatch (MetaMP) IAA method to adapt to various thematic tasks quickly. The network is trained based on metalearning to obtain content-oriented aesthetic expression. In addition, we design a complete-information patch selection scheme and a multipatch (MP) network to make the fine details fit the overall impression. Experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-art models based on aesthetic visual analysis (AVA) benchmark datasets. In addition, the evaluation of the dataset shows the effectiveness of our metalearning training model, which not only improves MetaMP assessment accuracy but also provides valuable guidance for network initialization of IAA.
Collapse
|
6
|
Li L, Zhi T, Shi G, Yang Y, Xu L, Li Y, Guo Y. Anchor-based Knowledge Embedding for Image Aesthetics Assessment. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
7
|
Song Y, Tang F, Dong W, Huang F, Lee TY, Xu C. Balance-Aware Grid Collage for Small Image Collections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1330-1344. [PMID: 34529567 DOI: 10.1109/tvcg.2021.3113031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Grid collages (GClg) of small image collections are popular and useful in many applications, such as personal album management, online photo posting, and graphic design. In this article, we focus on how visual effects influence individual preferences through various arrangements of multiple images under such scenarios. A novel balance-aware metric is proposed to bridge the gap between multi-image joint presentation and visual pleasure. The metric merges psychological achievements into the field of grid collage. To capture user preference, a bonus mechanism related to a user-specified special location in the grid and uniqueness values of the subimages is integrated into the metric. An end-to-end reinforcement learning mechanism empowers the model without tedious manual annotations. Experiments demonstrate that our metric can evaluate the GClg visual balance in line with human subjective perception, and the model can generate visually pleasant GClg results, which is comparable to manual designs.
Collapse
|
8
|
Hentschel S, Kobs K, Hotho A. CLIP knows image aesthetics. Front Artif Intell 2022; 5:976235. [PMID: 36504688 PMCID: PMC9732445 DOI: 10.3389/frai.2022.976235] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 11/11/2022] [Indexed: 11/27/2022] Open
Abstract
Most Image Aesthetic Assessment (IAA) methods use a pretrained ImageNet classification model as a base to fine-tune. We hypothesize that content classification is not an optimal pretraining task for IAA, since the task discourages the extraction of features that are useful for IAA, e.g., composition, lighting, or style. On the other hand, we argue that the Contrastive Language-Image Pretraining (CLIP) model is a better base for IAA models, since it has been trained using natural language supervision. Due to the rich nature of language, CLIP needs to learn a broad range of image features that correlate with sentences describing the image content, composition, environments, and even subjective feelings about the image. While it has been shown that CLIP extracts features useful for content classification tasks, its suitability for tasks that require the extraction of style-based features like IAA has not yet been shown. We test our hypothesis by conducting a three-step study, investigating the usefulness of features extracted by CLIP compared to features obtained from the last layer of a comparable ImageNet classification model. In each step, we get more computationally expensive. First, we engineer natural language prompts that let CLIP assess an image's aesthetic without adjusting any weights in the model. To overcome the challenge that CLIP's prompting only is applicable to classification tasks, we propose a simple but effective strategy to convert multiple prompts to a continuous scalar as required when predicting an image's mean aesthetic score. Second, we train a linear regression on the AVA dataset using image features obtained by CLIP's image encoder. The resulting model outperforms a linear regression trained on features from an ImageNet classification model. It also shows competitive performance with fully fine-tuned networks based on ImageNet, while only training a single layer. Finally, by fine-tuning CLIP's image encoder on the AVA dataset, we show that CLIP only needs a fraction of training epochs to converge, while also performing better than a fine-tuned ImageNet model. Overall, our experiments suggest that CLIP is better suited as a base model for IAA methods than ImageNet pretrained networks.
Collapse
|
9
|
Yang J, Li J, Li L, Wang X, Ding Y, Gao X. Seeking Subjectivity in Visual Emotion Distribution Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5189-5202. [PMID: 35914042 DOI: 10.1109/tip.2022.3193749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Visual Emotion Analysis (VEA), which aims to predict people's emotions towards different visual stimuli, has become an attractive research topic recently. Rather than a single label classification task, it is more rational to regard VEA as a Label Distribution Learning (LDL) problem by voting from different individuals. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. In psychology, the Object-Appraisal-Emotion model has demonstrated that each individual's emotion is affected by his/her subjective appraisal, which is further formed by the affective memory. Inspired by this, we propose a novel Subjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution. To depict the diversity in crowd voting process, we first propose the Subjectivity Appraising with multiple branches, where each branch simulates the emotion evocation process of a specific individual. Specifically, we construct the affective memory with an attention-based mechanism to preserve each individual's unique emotional experience. A subjectivity loss is further proposed to guarantee the divergence between different individuals. Moreover, we propose the Subjectivity Matching with a matching loss, aiming at assigning unordered emotion labels to ordered individual predictions in a one-to-one correspondence with the Hungarian algorithm. Extensive experiments and comparisons are conducted on public visual emotion distribution datasets, and the results demonstrate that the proposed SAMNet consistently outperforms the state-of-the-art methods. Ablation study verifies the effectiveness of our method and visualization proves its interpretability.
Collapse
|
10
|
Abstract
Image quality assessment (IQA) aims to automatically evaluate image perceptual quality by simulating the human visual system, which is an important research topic in the field of image processing and computer vision. Although existing deep-learning-based IQA models have achieved significant success, these IQA models usually require input images with a fixed size, which varies the perceptual quality of images. To this end, this paper proposes an aspect-ratio-embedded Transformer-based image quality assessment method, which can implant the adaptive aspect ratios of input images into the multihead self-attention module of the Swin Transformer. In this way, the proposed IQA model can not only relieve the variety of perceptual quality caused by size changes in input images but also leverage more global content correlations to infer image perceptual quality. Furthermore, to comprehensively capture the impact of low-level and high-level features on image quality, the proposed IQA model combines the output features of multistage Transformer blocks for jointly inferring image quality. Experimental results on multiple IQA databases show that the proposed IQA method is superior to state-of-the-art methods for assessing image technical and aesthetic quality.
Collapse
|
11
|
Sun M, Gong X, Nie H, Iqbal MM, Xie B. SRAFE: Siamese Regression Aesthetic Fusion Evaluation for Chinese Calligraphic Copy. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Mingwei Sun
- Central South University Changsha China
- Hunan Xiangjiang Artificial Intelligence Academy Changsha China
| | - Xinyu Gong
- Central South University Changsha China
- Hunan Xiangjiang Artificial Intelligence Academy Changsha China
| | | | | | - Bin Xie
- Central South University Changsha China
| |
Collapse
|
12
|
|
13
|
Zhu H, Li L, Wu J, Zhao S, Ding G, Shi G. Personalized Image Aesthetics Assessment via Meta-Learning With Bilevel Gradient Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1798-1811. [PMID: 32525805 DOI: 10.1109/tcyb.2020.2984670] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Typical image aesthetics assessment (IAA) is modeled for the generic aesthetics perceived by an "average" user. However, such generic aesthetics models neglect the fact that users' aesthetic preferences vary significantly depending on their unique preferences. Therefore, it is essential to tackle the issue for personalized IAA (PIAA). Since PIAA is a typical small sample learning (SSL) problem, existing PIAA models are usually built by fine-tuning the well-established generic IAA (GIAA) models, which are regarded as prior knowledge. Nevertheless, this kind of prior knowledge based on "average aesthetics" fails to incarnate the aesthetic diversity of different people. In order to learn the shared prior knowledge when different people judge aesthetics, that is, learn how people judge image aesthetics, we propose a PIAA method based on meta-learning with bilevel gradient optimization (BLG-PIAA), which is trained using individual aesthetic data directly and generalizes to unknown users quickly. The proposed approach consists of two phases: 1) meta-training and 2) meta-testing. In meta-training, the aesthetics assessment of each user is regarded as a task, and the training set of each task is divided into two sets: 1) support set and 2) query set. Unlike traditional methods that train a GIAA model based on average aesthetics, we train an aesthetic meta-learner model by bilevel gradient updating from the support set to the query set using many users' PIAA tasks. In meta-testing, the aesthetic meta-learner model is fine-tuned using a small amount of aesthetic data of a target user to obtain the PIAA model. The experimental results show that the proposed method outperforms the state-of-the-art PIAA metrics, and the learned prior model of BLG-PIAA can be quickly adapted to unseen PIAA tasks.
Collapse
|
14
|
Zhu H, Zhou Y, Yao R, Wang G, Yang Y. Learning image aesthetic subjectivity from attribute-aware relational reasoning network. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.02.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Yang J, Gao X, Li L, Wang X, Ding J. SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8686-8701. [PMID: 34665725 DOI: 10.1109/tip.2021.3118983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visual Emotion Analysis (VEA) aims at finding out how people feel emotionally towards different visual stimuli, which has attracted great attention recently with the prevalence of sharing images on social networks. Since human emotion involves a highly complex and abstract cognitive process, it is difficult to infer visual emotions directly from holistic or regional features in affective images. It has been demonstrated in psychology that visual emotions are evoked by the interactions between objects as well as the interactions between objects and scenes within an image. Inspired by this, we propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. Then, we conduct reasoning on the Emotion Graph using Graph Convolutional Network (GCN), yielding emotion-enhanced object features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism. Extensive experiments and comparisons are conducted on eight public visual emotion datasets, and the results demonstrate that the proposed SOLVER consistently outperforms the state-of-the-art methods by a large margin. Ablation studies verify the effectiveness of our method and visualizations prove its interpretability, which also bring new insight to explore the mysteries in VEA. Notably, we further discuss SOLVER on three other potential datasets with extended experiments, where we validate the robustness of our method and notice some limitations of it.
Collapse
|
16
|
|
17
|
Zhang L, Zhang P. Research on aesthetic models based on neural architecture search. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Computational aesthetics, which uses computers to learn human aesthetic habits and ultimately replace humans in scoring images, has become a hot topic in recent years due to its wide application. Most of the initial research is to manually extract features and use classifiers such as support vector machines to score images. With the development of deep learning, traditional manual feature extraction methods are gradually replaced by convolutional neural networks to extract more comprehensive features. However, it is a huge challenge to artificially design an aesthetic neural network. Recently, Neural Architecture Search has upsurged to find suitable neural networks for many tasks in deep learning. In this paper, we first attempt to combine Neural Architecture Search with computational aesthetics. We design and apply a customized progressive differentiable architecture search strategy to obtain a light-weighted and efficient aesthetic baseline model. In addition, we simulate the multi-person rating mechanism by outputting the distribution of the aesthetic value of the image, replacing the previous classification scheme of judging the beauty and unbeauty of the image by the threshold value, and propose a self-weighted Earth Mover’s Distance loss to better fit human subjective scoring. Based on the baseline model, we further introduce several strategies including an attention mechanism, the dilated convolution, and adaptive pooling, to enhance the performance. Finally, we design several groups of comparative experiments to demonstrate the effectiveness of our baseline aesthetic model and the introduced improvement strategies.
Collapse
Affiliation(s)
- Lingyun Zhang
- School of Software Engineering, South China University of Technology, Guangzhou, China
| | - Pingjian Zhang
- School of Software Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|