1
|
Liu Y, Tian Y, Wang S, Zhang X, Kwong S. Overview of High-Dynamic-Range Image Quality Assessment. J Imaging 2024; 10:243. [PMID: 39452406 PMCID: PMC11508586 DOI: 10.3390/jimaging10100243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/21/2024] [Accepted: 09/23/2024] [Indexed: 10/26/2024] Open
Abstract
In recent years, the High-Dynamic-Range (HDR) image has gained widespread popularity across various domains, such as the security, multimedia, and biomedical fields, owing to its ability to deliver an authentic visual experience. However, the extensive dynamic range and rich detail in HDR images present challenges in assessing their quality. Therefore, current efforts involve constructing subjective databases and proposing objective quality assessment metrics to achieve an efficient HDR Image Quality Assessment (IQA). Recognizing the absence of a systematic overview of these approaches, this paper provides a comprehensive survey of both subjective and objective HDR IQA methods. Specifically, we review 7 subjective HDR IQA databases and 12 objective HDR IQA metrics. In addition, we conduct a statistical analysis of 9 IQA algorithms, incorporating 3 perceptual mapping functions. Our findings highlight two main areas for improvement. Firstly, the size and diversity of HDR IQA subjective databases should be significantly increased, encompassing a broader range of distortion types. Secondly, objective quality assessment algorithms need to identify more generalizable perceptual mapping approaches and feature extraction methods to enhance their robustness and applicability. Furthermore, this paper aims to serve as a valuable resource for researchers by discussing the limitations of current methodologies and potential research directions in the future.
Collapse
Affiliation(s)
- Yue Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (Y.L.); (Y.T.); (S.W.)
| | - Yu Tian
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (Y.L.); (Y.T.); (S.W.)
| | - Shiqi Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (Y.L.); (Y.T.); (S.W.)
| | - Xinfeng Zhang
- School of Computer Science and Technology, University of Chinese Academic of Sciences, Beijing 100190, China;
| | - Sam Kwong
- School of Data Sciences, Lingnan University, Hong Kong, China
| |
Collapse
|
2
|
Chen Y, Zhao Y, Cao L, Jia W, Liu X. Learning Deep Blind Quality Assessment for Cartoon Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6650-6655. [PMID: 34847046 DOI: 10.1109/tnnls.2021.3127720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Although the cartoon industry has developed rapidly in recent years, few studies pay special attention to cartoon image quality assessment (IQA). Unfortunately, applying blind natural IQA algorithms directly to cartoons often leads to inconsistent results with subjective visual perception. Hence, this brief proposes a blind cartoon IQA method based on convolutional neural networks (CNNs). Note that training a robust CNN depends on manually labeled training sets. However, for a large number of cartoon images, it is very time-consuming and costly to manually generate enough mean opinion scores (MOSs). Therefore, this brief first proposes a full reference (FR) cartoon IQA metric based on cartoon-texture decomposition and then uses the estimated FR index to guide the no-reference IQA network. Moreover, in order to improve the robustness of the proposed network, a large-scale dataset is established in the training stage, and a stochastic degradation strategy is presented, which randomly implements different degradations with random parameters. Experimental results on both synthetic and real-world cartoon image datasets demonstrate the effectiveness and robustness of the proposed method.
Collapse
|
3
|
Cao Y, Min X, Sun W, Zhai G. Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1882-1896. [PMID: 37030730 DOI: 10.1109/tip.2023.3251695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the popularity of mobile Internet, audio and video (A/V) have become the main way for people to entertain and socialize daily. However, in order to reduce the cost of media storage and transmission, A/V signals will be compressed by service providers before they are transmitted to end-users, which inevitably causes distortions in the A/V signals and degrades the end-user's Quality of Experience (QoE). This motivates us to research the objective audio-visual quality assessment (AVQA). In the field of AVQA, most previous works only focus on single-mode audio or visual signals, which ignores that the perceptual quality of users depends on both audio and video signals. Therefore, we propose an objective AVQA architecture for multi-mode signals based on attentional neural networks. Specifically, we first utilize an attention prediction model to extract the salient regions of video frames. Then, a pre-trained convolutional neural network is used to extract short-time features of the salient regions and the corresponding audio signals. Next, the short-time features are fed into Gated Recurrent Unit (GRU) networks to model the temporal relationship between adjacent frames. Finally, the fully connected layers are utilized to fuse the temporal related features of A/V signals modeled by the GRU network into the final quality score. The proposed architecture is flexible and can be applied to both full-reference and no-reference AVQA. Experimental results on the LIVE-SJTU Database and UnB-AVC Database demonstrate that our model outperforms the state-of-the-art AVQA methods. The code of the proposed method will be publicly available to promote the development of the field of AVQA.
Collapse
|
4
|
Saleem S, Amin J, Sharif M, Mallah GA, Kadry S, Gandomi AH. Leukemia segmentation and classification: A comprehensive survey. Comput Biol Med 2022; 150:106028. [PMID: 36126356 DOI: 10.1016/j.compbiomed.2022.106028] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 07/11/2022] [Accepted: 08/20/2022] [Indexed: 11/30/2022]
Abstract
Blood is made up of leukocytes (WBCs), erythrocytes (RBCs), and thrombocytes. The ratio of blood cancer diseases is increasing rapidly, among which leukemia is one of the famous cancer which may lead to death. Leukemia cancer is initiated by the unnecessary growth of immature WBCs present in the sponge tissues of bone marrow. It is generally analyzed by etiologists by perceiving slides of blood smear images under a microscope. The morphological features and blood cells count facilitated the etiologists to detect leukemia. Due to the late detection and expensive instruments used for leukemia analysis, the death rate has risen significantly. The fluorescence-based cell sorting technique and manual recounts using a hemocytometer are error-prone and imprecise. Leukemia detection methods consist of pre-processing, segmentation, features extraction, and classification. In this article, recent deep learning methodologies and challenges for leukemia detection are discussed. These methods are helpful to examine the microscopic blood smears images and for the detection of leukemia more accurately.
Collapse
Affiliation(s)
- Saba Saleem
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | - Javaria Amin
- Department of Computer Science, University of Wah, Wah Cantt, Pakistan
| | - Muhammad Sharif
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | | | - Seifedine Kadry
- Department of Applied Data Science, Noroff University College, Kristiansand, Norway; Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| | - Amir H Gandomi
- Faculty of Engineering & Information Technology, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
5
|
Zeng H, Huang H, Hou J, Cao J, Wang Y, Ma KK. Screen Content Video Quality Assessment Model Using Hybrid Spatiotemporal Features. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6175-6187. [PMID: 36126028 DOI: 10.1109/tip.2022.3206621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, a full-reference video quality assessment (VQA) model is designed for the perceptual quality assessment of the screen content videos (SCVs), called the hybrid spatiotemporal feature-based model (HSFM). The SCVs are of hybrid structure including screen and natural scenes, which are perceived by the human visual system (HVS) with different visual effects. With this consideration, the three dimensional Laplacian of Gaussian (3D-LOG) filter and three dimensional Natural Scene Statistics (3D-NSS) are exploited to extract the screen and natural spatiotemporal features, based on the reference and distorted SCV sequences separately. The similarities of these extracted features are then computed independently, followed by generating the distorted screen and natural quality scores for screen and natural scenes. After that, an adaptive screen and natural quality fusion scheme through the local video activity is developed to combine them for arriving at the final VQA score of the distorted SCV under evaluation. The experimental results on the Screen Content Video Database (SCVD) and Compressed Screen Content Video Quality (CSCVQ) databases have shown that the proposed HSFM is more in line with the perceptual quality assessment of the SCVs perceived by the HVS, compared with a variety of classic and latest IQA/VQA models.
Collapse
|
6
|
Cartoon Image Processing: A Survey. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01645-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
7
|
Huang H, Zeng H, Hou J, Chen J, Zhu J, Ma KK. A Spatial and Geometry Feature-Based Quality Assessment Model for the Light Field Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3765-3779. [PMID: 35604974 DOI: 10.1109/tip.2022.3175619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This paper proposes a new full-reference image quality assessment (IQA) model for performing perceptual quality evaluation on light field (LF) images, called the spatial and geometry feature-based model (SGFM). Considering that the LF image describe both spatial and geometry information of the scene, the spatial features are extracted over the sub-aperture images (SAIs) by using contourlet transform and then exploited to reflect the spatial quality degradation of the LF images, while the geometry features are extracted across the adjacent SAIs based on 3D-Gabor filter and then explored to describe the viewing consistency loss of the LF images. These schemes are motivated and designed based on the fact that the human eyes are more interested in the scale, direction, contour from the spatial perspective and viewing angle variations from the geometry perspective. These operations are applied to the reference and distorted LF images independently. The degree of similarity can be computed based on the above-measured quantities for jointly arriving at the final IQA score of the distorted LF image. Experimental results on three commonly-used LF IQA datasets show that the proposed SGFM is more in line with the quality assessment of the LF images perceived by the human visual system (HVS), compared with multiple classical and state-of-the-art IQA models.
Collapse
|
8
|
Yang Q, Ma Z, Xu Y, Li Z, Sun J. Inferring Point Cloud Quality via Graph Similarity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3015-3029. [PMID: 33360982 DOI: 10.1109/tpami.2020.3047083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Objective quality estimation of media content plays a vital role in a wide range of applications. Though numerous metrics exist for 2D images and videos, similar metrics are missing for 3D point clouds with unstructured and non-uniformly distributed points. In this paper, we propose [Formula: see text]-a metric to accurately and quantitatively predict the human perception of point cloud with superimposed geometry and color impairments. Human vision system is more sensitive to the high spatial-frequency components (e.g., contours and edges), and weighs local structural variations more than individual point intensities. Motivated by this fact, we use graph signal gradient as a quality index to evaluate point cloud distortions. Specifically, we first extract geometric keypoints by resampling the reference point cloud geometry information to form an object skeleton. Then, we construct local graphs centered at these keypoints for both reference and distorted point clouds. Next, we compute three moments of color gradients between centered keypoint and all other points in the same local graph for local significance similarity feature. Finally, we obtain similarity index by pooling the local graph significance across all color channels and averaging across all graphs. We evaluate [Formula: see text] on two large and independent point cloud assessment datasets that involve a wide range of impairments (e.g., re-sampling, compression, and additive noise). [Formula: see text] provides state-of-the-art performance for all distortions with noticeable gains in predicting the subjective mean opinion score (MOS) in comparison with point-wise distance-based metrics adopted in standardized reference software. Ablation studies further show that [Formula: see text] can be generalized to various scenarios with consistent performance by adjusting its key modules and parameters. Models and associated materials will be made available at https://njuvision.github.io/GraphSIM or http://smt.sjtu.edu.cn/papers/GraphSIM.
Collapse
|
9
|
|
10
|
Tang T, Li L, Wu X, Chen R, Li H, Lu G, Cheng L. TSA-SCC: Text Semantic-Aware Screen Content Coding With Ultra Low Bitrate. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2463-2477. [PMID: 35196232 DOI: 10.1109/tip.2022.3152003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Due to the rapid growth of web conferences, remote screen sharing, and online games, screen content has become an important type of internet media information and over 90% of online media interactions are screen based. Meanwhile, as the main component in the screen content, textual information averagely takes up over 40% of the whole image on various commonly used screen content datasets. However, it is difficult to compress the textual information by using the traditional coding schemes as HEVC, which assumes strong spatial and temporal correlations within the image/video. State-of-the-art screen content coding (SCC) standard as HEVC-SCC still adopts a block-based coding framework and does not consider the text semantics for compression, thus inevitably blurring texts at a lower bitrate. In this paper, we propose a general text semantic-aware screen content coding scheme (TSA-SCC) for ultra low bitrate setting. This method detects the abrupt picture in a screen content video (or image), recognizes textual information (including word, position, font type, font size and font color) in the abrupt picture based on neural networks, and encodes texts with text coding tools. The other pictures as well as the background image after removing texts from the abrupt picture via inpainting, are encoded with HEVC-SCC. Compared with HEVC-SCC, the proposed method TSA-SCC reduces bitrate by up to 3× at a similar compression quality. Moreover, TSA-SCC achieves much better visual quality with less bitrate consumption when encoding the screen content video/image at ultra low bitrates.
Collapse
|
11
|
Jiang Q, Liu Z, Gu K, Shao F, Zhang X, Liu H, Lin W. Single Image Super-Resolution Quality Assessment: A Real-World Dataset, Subjective Studies, and an Objective Metric. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2279-2294. [PMID: 35239481 DOI: 10.1109/tip.2022.3154588] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Numerous single image super-resolution (SISR) algorithms have been proposed during the past years to reconstruct a high-resolution (HR) image from its low-resolution (LR) observation. However, how to fairly compare the performance of different SISR algorithms/results remains a challenging problem. So far, the lack of comprehensive human subjective study on large-scale real-world SISR datasets and accurate objective SISR quality assessment metrics makes it unreliable to truly understand the performance of different SISR algorithms. We in this paper make efforts to tackle these two issues. Firstly, we construct a real-world SISR quality dataset (i.e., RealSRQ) and conduct human subjective studies to compare the performance of the representative SISR algorithms. Secondly, we propose a new objective metric, i.e., KLTSRQA, based on the Karhunen-Loéve Transform (KLT) to evaluate the quality of SISR images in a no-reference (NR) manner. Experiments on our constructed RealSRQ and the latest synthetic SISR quality dataset (i.e., QADS) have demonstrated the superiority of our proposed KLTSRQA metric, achieving higher consistency with human subjective scores than relevant existing NR image quality assessment (NR-IQA) metrics. The dataset and the code will be made available at https://github.com/Zhentao-Liu/RealSRQ-KLTSRQA.
Collapse
|
12
|
Yang J, Bian Z, Liu J, Jiang B, Lu W, Gao X, Song H. No-Reference Quality Assessment for Screen Content Images Using Visual Edge Model and AdaBoosting Neural Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6801-6814. [PMID: 34310304 DOI: 10.1109/tip.2021.3098245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In this paper, a competitive no-reference metric is proposed to assess the perceptive quality of screen content images (SCIs), which uses the human visual edge model and AdaBoosting neural network. Inspired by the existing theory that the edge information which reflects the visual quality of SCI is effectively captured by the human visual difference of the Gaussian (DOG) model, we compute two types of multi-scale edge maps via the DOG operator firstly. Specifically, two types of edge maps contain contour and edge information respectively. Then after locally normalizing edge maps, L -moments distribution estimation is utilized to fit their DOG coefficients, and the fitted L -moments parameters can be regarded as edge features. Finally, to obtain the final perceptive quality score, we use an AdaBoosting back-propagation neural network (ABPNN) to map the quality-aware features to the perceptual quality score of SCIs. The reason why the ABPNN is regarded as the appropriate approach for the visual quality assessment of SCIs is that we abandon the regression network with a shallow structure, try a regression network with a deep architecture, and achieve a good generalization ability. The proposed method delivers highly competitive performance and shows high consistency with the human visual system (HVS) on the public SCI-oriented databases.
Collapse
|
13
|
Cheng S, Zeng H, Chen J, Hou J, Zhu J, Ma KK. Screen Content Video Quality Assessment: Subjective and Objective Study. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8636-8651. [PMID: 32845839 DOI: 10.1109/tip.2020.3018256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we make the first attempt to study the subjective and objective quality assessment for the screen content videos (SCVs). For that, we construct the first large-scale video quality assessment (VQA) database specifically for the SCVs, called the screen content video database (SCVD). This SCVD provides 16 reference SCVs, 800 distorted SCVs, and their corresponding subjective scores, and it is made publicly available for research usage. The distorted SCVs are generated from each reference SCV with 10 distortion types and 5 degradation levels for each distortion type. Each distorted SCV is rated by at least 32 subjects in the subjective test. Furthermore, we propose the first full-reference VQA model for the SCVs, called the spatiotemporal Gabor feature tensor-based model (SGFTM), to objectively evaluate the perceptual quality of the distorted SCVs. This is motivated by the observation that 3D-Gabor filter can well stimulate the visual functions of the human visual system (HVS) on perceiving videos, being more sensitive to the edge and motion information that are often-encountered in the SCVs. Specifically, the proposed SGFTM exploits 3D-Gabor filter to individually extract the spatiotemporal Gabor feature tensors from the reference and distorted SCVs, followed by measuring their similarities and later combining them together through the developed spatiotemporal feature tensor pooling strategy to obtain the final SGFTM score. Experimental results on SCVD have shown that the proposed SGFTM yields a high consistency on the subjective perception of SCV quality and consistently outperforms multiple classical and state-of-the-art image/video quality assessment models.
Collapse
|
14
|
|