1
|
Cao Y, Min X, Sun W, Zhai G. Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1882-1896. [PMID: 37030730 DOI: 10.1109/tip.2023.3251695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the popularity of mobile Internet, audio and video (A/V) have become the main way for people to entertain and socialize daily. However, in order to reduce the cost of media storage and transmission, A/V signals will be compressed by service providers before they are transmitted to end-users, which inevitably causes distortions in the A/V signals and degrades the end-user's Quality of Experience (QoE). This motivates us to research the objective audio-visual quality assessment (AVQA). In the field of AVQA, most previous works only focus on single-mode audio or visual signals, which ignores that the perceptual quality of users depends on both audio and video signals. Therefore, we propose an objective AVQA architecture for multi-mode signals based on attentional neural networks. Specifically, we first utilize an attention prediction model to extract the salient regions of video frames. Then, a pre-trained convolutional neural network is used to extract short-time features of the salient regions and the corresponding audio signals. Next, the short-time features are fed into Gated Recurrent Unit (GRU) networks to model the temporal relationship between adjacent frames. Finally, the fully connected layers are utilized to fuse the temporal related features of A/V signals modeled by the GRU network into the final quality score. The proposed architecture is flexible and can be applied to both full-reference and no-reference AVQA. Experimental results on the LIVE-SJTU Database and UnB-AVC Database demonstrate that our model outperforms the state-of-the-art AVQA methods. The code of the proposed method will be publicly available to promote the development of the field of AVQA.
Collapse
|
2
|
Wang Z, Shen L, Xu M, Yu M, Wang K, Lin Y. Domain Adaptation for Underwater Image Enhancement. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1442-1457. [PMID: 37027547 DOI: 10.1109/tip.2023.3244647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Recently, learning-based algorithms have shown impressive performance in underwater image enhancement. Most of them resort to training on synthetic data and obtain outstanding performance. However, these deep methods ignore the significant domain gap between the synthetic and real data (i.e., inter-domain gap), and thus the models trained on synthetic data often fail to generalize well to real-world underwater scenarios. Moreover, the complex and changeable underwater environment also causes a great distribution gap among the real data itself (i.e., intra-domain gap). However, almost no research focuses on this problem and thus their techniques often produce visually unpleasing artifacts and color distortions on various real images. Motivated by these observations, we propose a novel Two-phase Underwater Domain Adaptation network (TUDA) to simultaneously minimize the inter-domain and intra-domain gap. Concretely, in the first phase, a new triple-alignment network is designed, including a translation part for enhancing realism of input images, followed by a task-oriented enhancement part. With performing image-level, feature-level and output-level adaptation in these two parts through jointly adversarial learning, the network can better build invariance across domains and thus bridging the inter-domain gap. In the second phase, an easy-hard classification of real data according to the assessed quality of enhanced images is performed, in which a new rank-based underwater quality assessment method is embedded. By leveraging implicit quality information learned from rankings, this method can more accurately assess the perceptual quality of enhanced images. Using pseudo labels from the easy part, an easy-hard adaptation technique is then conducted to effectively decrease the intra-domain gap between easy and hard samples. Extensive experimental results demonstrate that the proposed TUDA is significantly superior to existing works in terms of both visual quality and quantitative metrics.
Collapse
|
3
|
Yang L, Xu M, Li S, Guo Y, Wang Z. Blind VQA on 360° Video via Progressively Learning From Pixels, Frames, and Video. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:128-143. [PMID: 37015524 DOI: 10.1109/tip.2022.3226417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Blind visual quality assessment (BVQA) on 360° video plays a key role in optimizing immersive multimedia systems. When assessing the quality of 360° video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i.e., a progressive quality assessment paradigm. However, the existing BVQA approaches for 360° video neglect this paradigm. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality, and thus propose a novel BVQA approach (namely ProVQA) for 360° video via progressively learning from pixels, frames and video. Corresponding to the progressive learning of pixels, frames and video, three sub-nets are designed in our ProVQA approach, i.e., the spherical perception aware quality prediction (SPAQ), motion perception aware quality prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ sub-net first models the spatial quality degradation based on spherical perception mechanism of human. Then, by exploiting motion cues across adjacent frames, the MPAQ sub-net properly incorporates motion contextual information for quality assessment on 360° video. Finally, the MFTN sub-net aggregates multi-frame quality degradation to yield the final quality score, via exploring long-term quality correlation from multiple frames. The experiments validate that our approach significantly advances the state-of-the-art BVQA performance on 360° video over two datasets, the code of which has been public in https://github.com/yanglixiaoshen/ProVQA.
Collapse
|
4
|
Duan H, Min X, Zhu Y, Zhai G, Yang X, Le Callet P. Confusing Image Quality Assessment: Toward Better Augmented Reality Experience. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7206-7221. [PMID: 36367913 DOI: 10.1109/tip.2022.3220404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To achieve better QoE of AR, whose two layers are influenced by each other, it is important to evaluate its perceptual quality first. In this paper, we consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory. A more general problem is first proposed, which is evaluating the perceptual quality of superimposed images, i.e., confusing image quality assessment. A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs. Then a subjective quality perception experiment is conducted towards attaining a better understanding of how humans perceive the confusing images. Based on the CFIQA database, several benchmark models and a specifically designed CFIQA model are proposed for solving this problem. Experimental results show that the proposed CFIQA model achieves state-of-the-art performance compared to other benchmark models. Moreover, an extended ARIQA study is further conducted based on the CFIQA study. We establish an ARIQA database to better simulate the real AR application scenarios, which contains 20 AR reference images, 20 background (BG) reference images, and 560 distorted images generated from AR and BG references, as well as the correspondingly collected subjective quality ratings. Three types of full-reference (FR) IQA benchmark variants are designed to study whether we should consider the visual confusion when designing corresponding IQA algorithms. An ARIQA metric is finally proposed for better evaluating the perceptual quality of AR images. Experimental results demonstrate the good generalization ability of the CFIQA model and the state-of-the-art performance of the ARIQA model. The databases, benchmark models, and proposed metrics are available at: https://github.com/DuanHuiyu/ARIQA.
Collapse
|
5
|
Chen B, Zhu L, Kong C, Zhu H, Wang S, Li Z. No-Reference Image Quality Assessment by Hallucinating Pristine Features. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6139-6151. [PMID: 36112560 DOI: 10.1109/tip.2022.3205770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we propose a no-reference (NR) image quality assessment (IQA) method via feature level pseudo-reference (PR) hallucination. The proposed quality assessment framework is rooted in the view that the perceptually meaningful features could be well exploited to characterize the visual quality, and the natural image statistical behaviors are exploited in an effort to deliver the accurate predictions. Herein, the PR features from the distorted images are learned by a mutual learning scheme with the pristine reference as the supervision, and the discriminative characteristics of PR features are further ensured with the triplet constraints. Given a distorted image for quality inference, the feature level disentanglement is performed with an invertible neural layer for final quality prediction, leading to the PR and the corresponding distortion features for comparison. The effectiveness of our proposed method is demonstrated on four popular IQA databases, and superior performance on cross-database evaluation also reveals the high generalization capability of our method. The implementation of our method is publicly available on https://github.com/Baoliang93/FPR.
Collapse
|
6
|
Ahmed N, Shahzad Asif HM, Bhatti AR, Khan A. Deep ensembling for perceptual image quality assessment. Soft comput 2022. [DOI: 10.1007/s00500-021-06662-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Yang J, Bian Z, Liu J, Jiang B, Lu W, Gao X, Song H. No-Reference Quality Assessment for Screen Content Images Using Visual Edge Model and AdaBoosting Neural Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6801-6814. [PMID: 34310304 DOI: 10.1109/tip.2021.3098245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In this paper, a competitive no-reference metric is proposed to assess the perceptive quality of screen content images (SCIs), which uses the human visual edge model and AdaBoosting neural network. Inspired by the existing theory that the edge information which reflects the visual quality of SCI is effectively captured by the human visual difference of the Gaussian (DOG) model, we compute two types of multi-scale edge maps via the DOG operator firstly. Specifically, two types of edge maps contain contour and edge information respectively. Then after locally normalizing edge maps, L -moments distribution estimation is utilized to fit their DOG coefficients, and the fitted L -moments parameters can be regarded as edge features. Finally, to obtain the final perceptive quality score, we use an AdaBoosting back-propagation neural network (ABPNN) to map the quality-aware features to the perceptual quality score of SCIs. The reason why the ABPNN is regarded as the appropriate approach for the visual quality assessment of SCIs is that we abandon the regression network with a shallow structure, try a regression network with a deep architecture, and achieve a good generalization ability. The proposed method delivers highly competitive performance and shows high consistency with the human visual system (HVS) on the public SCI-oriented databases.
Collapse
|
8
|
Zhang F, Zhang B, Zhang R, Zhang X. SPCM: Image quality assessment based on symmetry phase congruency. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105987] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Chen Z, Zhu H. Visual Quality Evaluation for Semantic Segmentation: Subjective Assessment Database and Objective Assessment Measure. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5785-5796. [PMID: 31217113 DOI: 10.1109/tip.2019.2922072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
To promote the applications of semantic segmentation, quality evaluation is important to assess different algorithms and guide their development and optimization. In this paper, we establish a subjective semantic segmentation quality assessment database based on the stimulus-comparison method. Given that the database reflects the relative quality of semantic segmentation result pairs, we adopt a robust regression mapping model to explore the relationship between subjective assessment and objective distance. With the help of the regression model, we can examine whether objective metrics coincide with subjective judgement. In addition, we propose a novel relative quality prediction network (RQPN) based on Siamese CNN as a new objective metric. The metric is trained by our subjective assessment database and can be applied to evaluate the performances of semantic segmentation algorithms, even if the algorithms were not used to build the database. Experiments are conducted to show the advance and the reliability of our database and demonstrate that results predicted by RQPN are more consistent to subjective assessment than existing objective metrics.
Collapse
|
10
|
Sinno Z, Moorthy A, De Cock J, Li Z, Bovik AC. Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2536-2551. [PMID: 31514136 DOI: 10.1109/tip.2019.2939733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical "billboard" and "thumbnail" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.
Collapse
|
11
|
Duanmu Z, Ma K, Wang Z. Quality-of-Experience for Adaptive Streaming Videos: An Expectation Confirmation Theory Motivated Approach. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:6135-6146. [PMID: 30010561 DOI: 10.1109/tip.2018.2855403] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The dynamic adaptive streaming over HTTP (DASH) provides an inter-operable solution to overcome volatile network conditions, but how the human visual quality-ofexperience (QoE) changes with time-varying video quality is not well-understood. Here, we build a large-scale video database of time-varying quality and design a series of subjective experiments to investigate how humans respond to compression level, spatial and temporal resolution adaptations. Our path-analytic results show that quality adaptations influence the QoE by modifying the perceived quality of subsequent video segments. Specifically, the quality deviation introduced by quality adaptations is asymmetric with respect to the adaptation direction, which is further influenced by other factors such as compression level and content. Furthermore, we propose an objective QoE model by integrating the empirical findings from our subjective experiments and the expectation confirmation theory (ECT). Experimental results show that the proposed ECT-QoE model is in close agreement with subjective opinions and significantly outperforms existing QoE models. The video database together with the code are available online at https://ece.uwaterloo.ca/~zduanmu/tip2018ectqoe/.
Collapse
|