1
|
Zhu X, Yang L, Duan H, Min X, Zhai G, Callet PL. ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2277-2287. [PMID: 40067703 DOI: 10.1109/tvcg.2025.3549174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
With the development of eXtended Reality (XR), photo capturing and display technology based on head-mounted displays (HMDs) have experienced significant advancements and gained considerable attention. Egocentric spatial images and videos are emerging as a compelling form of stereoscopic XR content. The assessment for the Quality of Experience (QoE) of XR content is important to ensure a high-quality viewing experience. Different from traditional 2D images, egocentric spatial images present challenges for perceptual quality assessment due to their special shooting, processing methods, and stereoscopic characteristics. However, the corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking. In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESIQAD), the first IQA database dedicated for egocentric spatial images as far as we know. Our ESIQAD includes 500 egocentric spatial images and the corresponding mean opinion scores (MOSs) under three display modes, including 2D display, 3D-window display, and 3D-immersive display. Based on our ESIQAD, we propose a novel mamba2-based multi-stage feature fusion model, termed ESIQAnet, which predicts the perceptual quality of egocentric spatial images under the three display modes. Specifically, we first extract features from multiple visual state space duality (VSSD) blocks, then apply cross attention to fuse binocular view information and use transposed attention to further refine the features. The multi-stage features are finally concatenated and fed into a quality regression network to predict the quality score. Extensive experimental results demonstrate that the ESIQAnet outperforms 22 state-of-the-art IQA models on the ESIQAD under all three display modes. The database and code are available at https://github.com/IntMeGroup/ESIQA.
Collapse
|
2
|
Wang H, Ke X, Guo W, Zheng W. No-reference stereoscopic image quality assessment based on binocular collaboration. Neural Netw 2024; 180:106752. [PMID: 39340969 DOI: 10.1016/j.neunet.2024.106752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 07/17/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Stereoscopic images typically consist of left and right views along with depth information. Assessing the quality of stereoscopic/3D images (SIQA) is often more complex than that of 2D images due to scene disparities between the left and right views and the intricate process of fusion in binocular vision. To address the problem of quality prediction bias of multi-distortion images, we investigated the visual physiology and the processing of visual information by the primary visual cortex of the human brain and proposed a no-reference stereoscopic image quality evaluation method. The method mainly includes an innovative end-to-end NR-SIQA neural network with a picture patch generation algorithm. The algorithm generates a saliency map by fusing the left and right views and then guides the image cropping in the database based on the saliency map. The proposed models are validated and compared based on publicly available databases. The results show that the model and algorithm together outperform the state-of-the-art NR-SIQA metric in the LIVE 3D database and the WIVC 3D database, and have excellent results in the specific noise metric. The model generalization experiments demonstrate a certain degree of generality of our proposed model.
Collapse
Affiliation(s)
- Hanling Wang
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, Fujian, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou University, Fuzhou, 350116, China
| | - Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, Fujian, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou University, Fuzhou, 350116, China.
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, Fujian, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou University, Fuzhou, 350116, China
| | - Wukun Zheng
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, Fujian, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou University, Fuzhou, 350116, China
| |
Collapse
|
3
|
Tian Y, Yan Y, Zhai G, Chen L, Gao Z. CLSA: A Contrastive Learning Framework With Selective Aggregation for Video Rescaling. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1300-1314. [PMID: 37022906 DOI: 10.1109/tip.2023.3242774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Video rescaling has recently drawn extensive attention for its practical applications such as video compression. Compared to video super-resolution, which focuses on upscaling bicubic-downscaled videos, video rescaling methods jointly optimize a downscaler and a upscaler. However, the inevitable loss of information during downscaling makes the upscaling procedure still ill-posed. Furthermore, the network architecture of previous methods mostly relies on convolution to aggregate information within local regions, which cannot effectively capture the relationship between distant locations. To address the above two issues, we propose a unified video rescaling framework by introducing the following designs. First, we propose to regularize the information of the downscaled videos via a contrastive learning framework, where, particularly, hard negative samples for learning are synthesized online. With this auxiliary contrastive learning objective, the downscaler tends to retain more information that benefits the upscaler. Second, we present a selective global aggregation module (SGAM) to efficiently capture long-range redundancy in high-resolution videos, where only a few representative locations are adaptively selected to participate in the computationally-heavy self-attention (SA) operations. SGAM enjoys the efficiency of the sparse modeling scheme while preserving the global modeling capability of SA. We refer to the proposed framework as Contrastive Learning framework with Selective Aggregation (CLSA) for video rescaling. Comprehensive experimental results show that CLSA outperforms video rescaling and rescaling-based video compression methods on five datasets, achieving state-of-the-art performance.
Collapse
|
4
|
Scislo L. Single-Point and Surface Quality Assessment Algorithm in Continuous Production with the Use of 3D Laser Doppler Scanning Vibrometry System. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23031263. [PMID: 36772303 PMCID: PMC9920583 DOI: 10.3390/s23031263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/17/2023] [Accepted: 01/20/2023] [Indexed: 05/14/2023]
Abstract
In the current economic situation of many companies, the need to reduce production time is a critical element. However, this cannot usually be carried out with a decrease in the quality of the final product. This article presents a possible solution for reducing the time needed for quality management. With the use of modern solutions such as optical measurement systems, quality control can be performed without additional stoppage time. In the case of single-point measurement with the Laser Doppler Vibrometer, the measurement can be performed quickly in a matter of milliseconds for each product. This article presents an example of such quality assurance measurements, with the use of fully non-contact methods, together with a proposed evaluation criterion for quality assessment. The proposed quality assurance algorithm allows the comparison of each of the products' modal responses with the ideal template and stores this information in the cloud, e.g., in the company's supervisory system. This makes the presented 3D Laser Vibrometry System an advanced instrumentation and data acquisition system which is the perfect application in the case of a factory quality management system based on the Industry 4.0 concept.
Collapse
Affiliation(s)
- Lukasz Scislo
- Faculty of Electrical and Computer Engineering, Cracow University of Technology, Warszawska 24, 31-155 Cracow, Poland
| |
Collapse
|
5
|
Yang CJ, Huang WK, Lin KP. Three-Dimensional Printing Quality Inspection Based on Transfer Learning with Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2023; 23:491. [PMID: 36617085 PMCID: PMC9824655 DOI: 10.3390/s23010491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 12/26/2022] [Accepted: 12/30/2022] [Indexed: 06/17/2023]
Abstract
Fused deposition modeling (FDM) is a form of additive manufacturing where three-dimensional (3D) models are created by depositing melted thermoplastic polymer filaments in layers. Although FDM is a mature process, defects can occur during printing. Therefore, an image-based quality inspection method for 3D-printed objects of varying geometries was developed in this study. Transfer learning with pretrained models, which were used as feature extractors, was combined with ensemble learning, and the resulting model combinations were used to inspect the quality of FDM-printed objects. Model combinations with VGG16 and VGG19 had the highest accuracy in most situations. Furthermore, the classification accuracies of these model combinations were not significantly affected by differences in color. In summary, the combination of transfer learning with ensemble learning is an effective method for inspecting the quality of 3D-printed objects. It reduces time and material wastage and improves 3D printing quality.
Collapse
Affiliation(s)
- Cheng-Jung Yang
- Program in Interdisciplinary Studies, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Wei-Kai Huang
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Keng-Pei Lin
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| |
Collapse
|
6
|
Cao L, You J, Song Y, Xu H, Jiang Z, Jiang G. Client-Oriented Blind Quality Metric for High Dynamic Range Stereoscopic Omnidirectional Vision Systems. SENSORS (BASEL, SWITZERLAND) 2022; 22:8513. [PMID: 36366211 PMCID: PMC9655719 DOI: 10.3390/s22218513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 10/28/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
A high dynamic range (HDR) stereoscopic omnidirectional vision system can provide users with more realistic binocular and immersive perception, where the HDR stereoscopic omnidirectional image (HSOI) suffers distortions during its encoding and visualization, making its quality evaluation more challenging. To solve the problem, this paper proposes a client-oriented blind HSOI quality metric based on visual perception. The proposed metric mainly consists of a monocular perception module (MPM) and binocular perception module (BPM), which combine monocular/binocular, omnidirectional and HDR/tone-mapping perception. The MPM extracts features from three aspects: global color distortion, symmetric/asymmetric distortion and scene distortion. In the BPM, the binocular fusion map and binocular difference map are generated by joint image filtering. Then, brightness segmentation is performed on the binocular fusion image, and distinctive features are extracted on the segmented high/low/middle brightness regions. For the binocular difference map, natural scene statistical features are extracted by multi-coefficient derivative maps. Finally, feature screening is used to remove the redundancy between the extracted features. Experimental results on the HSOID database show that the proposed metric is generally better than the representative quality metric, and is more consistent with the subjective perception.
Collapse
Affiliation(s)
- Liuyan Cao
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Jihao You
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Yang Song
- College of Science and Technology, Ningbo University, Ningbo 315300, China
- Zhejiang Provincial United Key Laboratory of Embedded Systems, Hangzhou 310032, China
| | - Haiyong Xu
- Zhejiang Provincial United Key Laboratory of Embedded Systems, Hangzhou 310032, China
- School of Mathematics and Statistics, Ningbo University, Ningbo 315211, China
| | - Zhidi Jiang
- College of Science and Technology, Ningbo University, Ningbo 315300, China
| | - Gangyi Jiang
- Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
- Zhejiang Provincial United Key Laboratory of Embedded Systems, Hangzhou 310032, China
| |
Collapse
|
7
|
Pan Y, Zhou W, Ye L, Yu L. HFFNet: hierarchical feature fusion network for blind binocular image quality prediction. APPLIED OPTICS 2022; 61:7602-7607. [PMID: 36256359 DOI: 10.1364/ao.465349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/17/2022] [Indexed: 06/16/2023]
Abstract
Compared with monocular images, scene discrepancies between the left- and right-view images impose additional challenges on visual quality predictions in binocular images. Herein, we propose a hierarchical feature fusion network (HFFNet) for blind binocular image quality prediction that handles scene discrepancies and uses multilevel fusion features from the left- and right-view images to reflect distortions in binocular images. Specifically, a feature extraction network based on MobileNetV2 is used to determine the feature layers from distorted binocular images; then, low-level binocular fusion features (or middle-level and high-level binocular fusion features) are obtained by fusing the left and right low-level monocular features (or middle-level and high-level monocular features) using the feature gate module; further, three feature enhancement modules are used to enrich the information of the extracted features at different levels. Finally, the total feature maps obtained from the high-, middle-, and low-level fusion features are applied to a three-input feature fusion module for feature merging. Thus, the proposed HFFNet provides better results, to the best of our knowledge, than existing methods on two benchmark datasets.
Collapse
|
8
|
Varga D. A Human Visual System Inspired No-Reference Image Quality Assessment Method Based on Local Feature Descriptors. SENSORS (BASEL, SWITZERLAND) 2022; 22:6775. [PMID: 36146123 PMCID: PMC9502000 DOI: 10.3390/s22186775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/01/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
Objective quality assessment of natural images plays a key role in many fields related to imaging and sensor technology. Thus, this paper intends to introduce an innovative quality-aware feature extraction method for no-reference image quality assessment (NR-IQA). To be more specific, a various sequence of HVS inspired filters were applied to the color channels of an input image to enhance those statistical regularities in the image to which the human visual system is sensitive. From the obtained feature maps, the statistics of a wide range of local feature descriptors were extracted to compile quality-aware features since they treat images from the human visual system's point of view. To prove the efficiency of the proposed method, it was compared to 16 state-of-the-art NR-IQA techniques on five large benchmark databases, i.e., CLIVE, KonIQ-10k, SPAQ, TID2013, and KADID-10k. It was demonstrated that the proposed method is superior to the state-of-the-art in terms of three different performance indices.
Collapse
|
9
|
Zhu R. Research on the Evaluation of Moral Education Effectiveness and Student Behavior in Universities under the Environment of Big Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2832661. [PMID: 35942466 PMCID: PMC9356784 DOI: 10.1155/2022/2832661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/17/2022] [Accepted: 07/06/2022] [Indexed: 11/17/2022]
Abstract
Traditional moral evaluation relies on artificial and subjective evaluation by teachers, and there are subjective errors or prejudices. To achieve further objective evaluation, students' classroom performance can be identified, and the effectiveness of moral education can be evaluated based on student behavior. Since student classroom behavior is random and uncertain, in order to accurately evaluate its indicators, a large amount of student classroom behavior data must be used as the basis for analysis, while certain techniques are used to filter out valuable information from it. In this paper, an improved graph convolutional network algorithm is proposed to study students' behaviors in order to further improve the accuracy of moral education evaluation in universities. The technique of video recognition is used to achieve student behavior recognition, thus helping to improve the quality of moral education evaluation in colleges and universities. First, the multi-information flow data related to nodes and skeletons are fused to improve the computing speed by reducing the number of network parameters. Second, the spatiotemporal attention module based on nonlocal operations is constructed to focus on the most action discriminative nodes and improve the recognition accuracy by reducing redundant information. Then, the spatiotemporal feature extraction module is constructed to obtain the spatiotemporal association information of the nodes of interest. Finally, the action recognition is realized by the Softmax layer. The experimental results show that the algorithm of action recognition in this paper is more accurate and can better help moral evaluation.
Collapse
Affiliation(s)
- Rui Zhu
- Publicity Department, Shandong Management University, Jinan, Shandong 250000, China
| |
Collapse
|
10
|
BPG-Based Automatic Lossy Compression of Noisy Images with the Prediction of an Optimal Operation Existence and Its Parameters. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
With a resolution improvement, the size of modern remote sensing images increases. This makes it desirable to compress them, mostly by using lossy compression techniques. Often the images to be compressed (or some component images of multichannel remote sensing data) are noisy. The lossy compression of such images has several peculiarities dealing with specific noise filtering effects and evaluation of the compression technique’s performance. In particular, an optimal operation point (OOP) may exist where quality of a compressed image is closer to the corresponding noise-free (true) image than the uncompressed (original, noisy) image quality, according to certain criterion (metrics). In such a case, it is reasonable to automatically compress an image under interest in the OOP neighborhood, but without having the true image at disposal in practice, it is impossible to accurately determine if the OOP does exist. Here we show that, by a simple and fast preliminary analysis and pre-training, it is possible to predict the OOPs existence and the metric values in it with appropriate accuracy. The study is carried out for a better portable graphics (BPG) coder for additive white Gaussian noise, focusing mainly on one-component (grayscale) images. The results allow for concluding that prediction is possible for an improvement (reduction) in the quality metrics of PSNR and PSNR-HVS-M. In turn, this allows for decision-making about the existence or absence of an OOP. If an OOP is absent, a more “careful” compression is recommended. Having such rules, it then becomes possible to carry out the compression automatically. Additionally, possible modifications for the cases of signal-dependent noise and the joint compression of three-component images are considered and the possible existence of an OOP for these cases is demonstrated.
Collapse
|
11
|
Alrashedy HHN, Almansour AF, Ibrahim DM, Hammoudeh MAA. BrainGAN: Brain MRI Image Generation and Classification Framework Using GAN Architectures and CNN Models. SENSORS (BASEL, SWITZERLAND) 2022; 22:4297. [PMID: 35684918 PMCID: PMC9185441 DOI: 10.3390/s22114297] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/02/2022] [Accepted: 06/03/2022] [Indexed: 02/01/2023]
Abstract
Deep learning models have been used in several domains, however, adjusting is still required to be applied in sensitive areas such as medical imaging. As the use of technology in the medical domain is needed because of the time limit, the level of accuracy assures trustworthiness. Because of privacy concerns, machine learning applications in the medical field are unable to use medical data. For example, the lack of brain MRI images makes it difficult to classify brain tumors using image-based classification. The solution to this challenge was achieved through the application of Generative Adversarial Network (GAN)-based augmentation techniques. Deep Convolutional GAN (DCGAN) and Vanilla GAN are two examples of GAN architectures used for image generation. In this paper, a framework, denoted as BrainGAN, for generating and classifying brain MRI images using GAN architectures and deep learning models was proposed. Consequently, this study proposed an automatic way to check that generated images are satisfactory. It uses three models: CNN, MobileNetV2, and ResNet152V2. Training the deep transfer models with images made by Vanilla GAN and DCGAN, and then evaluating their performance on a test set composed of real brain MRI images. From the results of the experiment, it was found that the ResNet152V2 model outperformed the other two models. The ResNet152V2 achieved 99.09% accuracy, 99.12% precision, 99.08% recall, 99.51% area under the curve (AUC), and 0.196 loss based on the brain MRI images generated by DCGAN architecture.
Collapse
Affiliation(s)
- Halima Hamid N. Alrashedy
- Department of Information Technology, College of Computer Qassim University, Buraydah 51452, Saudi Arabia; (H.H.N.A.); (A.F.A.); (D.M.I.)
| | - Atheer Fahad Almansour
- Department of Information Technology, College of Computer Qassim University, Buraydah 51452, Saudi Arabia; (H.H.N.A.); (A.F.A.); (D.M.I.)
| | - Dina M. Ibrahim
- Department of Information Technology, College of Computer Qassim University, Buraydah 51452, Saudi Arabia; (H.H.N.A.); (A.F.A.); (D.M.I.)
- Computers and Control Engineering Department, Faculty of Engineering, Tanta University, Tanta 31733, Egypt
| | - Mohammad Ali A. Hammoudeh
- Department of Information Technology, College of Computer Qassim University, Buraydah 51452, Saudi Arabia; (H.H.N.A.); (A.F.A.); (D.M.I.)
| |
Collapse
|
12
|
De Falco I, De Pietro G, Sannino G. A Two-Step Approach for Classification in Alzheimer’s Disease. SENSORS 2022; 22:s22113966. [PMID: 35684587 PMCID: PMC9183018 DOI: 10.3390/s22113966] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 12/16/2022]
Abstract
The classification of images is of high importance in medicine. In this sense, Deep learning methodologies show excellent performance with regard to accuracy. The drawback of these methodologies is the fact that they are black boxes, so no explanation is given to users on the reasons underlying their choices. In the medical domain, this lack of transparency and information, typical of black box models, brings practitioners to raise concerns, and the result is a resistance to the use of deep learning tools. In order to overcome this problem, a different Machine Learning approach to image classification is used here that is based on interpretability concepts thanks to the use of an evolutionary algorithm. It relies on the application of two steps in succession. The first receives a set of images in the inut and performs image filtering on them so that a numerical data set is generated. The second is a classifier, the kernel of which is an evolutionary algorithm. This latter, at the same time, classifies and automatically extracts explicit knowledge as a set of IF–THEN rules. This method is investigated with respect to a data set of MRI brain imagery referring to Alzheimer’s disease. Namely, a two-class data set (non-demented and moderate demented) and a three-class data set (non-demented, mild demented, and moderate demented) are extracted. The methodology shows good results in terms of accuracy (100% for the best run over the two-class problem and 91.49% for the best run over the three-class one), F_score (1.0000 and 0.9149, respectively), and Matthews Correlation Coefficient (1.0000 and 0.8763, respectively). To ascertain the quality of these results, they are contrasted against those from a wide set of well-known classifiers. The outcome of this comparison is that, in both problems, the methodology achieves the best results in terms of accuracy and F_score, whereas, for the Matthews Correlation Coefficient, it has the best result over the two-class problem and the second over the three-class one.
Collapse
|
13
|
Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness. ELECTRONICS 2022. [DOI: 10.3390/electronics11101611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The quality of the input text image has a clear impact on the output of a scene text recognition (STR) system; however, due to the fact that the main content of a text image is a sequence of characters containing semantic information, how to effectively assess text image quality remains a research challenge. Text image quality assessment (TIQA) can help in picking a hard sample, leading to a more robust STR system and recognition-oriented text image restoration. In this paper, by arguing that the text image quality comes from character-level texture feature and embedding robustness, we propose a learning-based fine-grained, sharp, and recognizable text image quality assessment method (FSR–TIQA), which is the first TIQA scheme to our knowledge. In order to overcome the difficulty of obtaining the character position in a text image, an attention-based recognizer is used to generate the character embedding and character image. We use the similarity distribution distance to evaluate the character embedding robustness between the intra-class and inter-class similarity distributions. The Haralick feature is used to reflect the clarity of the character region texture feature. Then, a quality score network is designed under a label–free training scheme to normalize the texture feature and output the quality score. Extensive experiments indicate that FSR-TIQA has significant discrimination for different quality text images on benchmarks and Textzoom datasets. Our method shows good potential to analyze dataset distribution and guide dataset collection.
Collapse
|
14
|
Si J, Huang B, Yang H, Lin W, Pan Z. A no-Reference Stereoscopic Image Quality Assessment Network Based on Binocular Interaction and Fusion Mechanisms. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3066-3080. [PMID: 35394908 DOI: 10.1109/tip.2022.3164537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In contemporary society full of stereoscopic images, how to assess visual quality of 3D images has attracted an increasing attention in field of Stereoscopic Image Quality Assessment (SIQA). Compared with 2D-IQA, SIQA is more challenging because some complicated features of Human Visual System (HVS), such as binocular interaction and binocular fusion, must be considered. In this paper, considering both binocular interaction and fusion mechanisms of the HVS, a hierarchical no-reference stereoscopic image quality assessment network (StereoIF-Net) is proposed to simulate the whole quality perception of 3D visual signals in human cortex, including two key modules: BIM and BFM. In particular, Binocular Interaction Modules (BIMs) are constructed to simulate binocular interaction in V2-V5 visual cortex regions, in which a novel cross convolution is designed to explore the interaction details in each region. In the BIMs, different output channel numbers are designed to imitate various receptive fields in V2-V5. Furthermore, a Binocular Fusion Module (BFM) with automatic learned weights is proposed to model binocular fusion of the HVS in higher cortex layers. The verification experiments are conducted on the LIVE 3D, IVC and Waterloo-IVC SIQA databases and three indices including PLCC, SROCC and RMSE are employed to evaluate the assessment consistency between StereoIF-Net and the HVS. The proposed StereoIF-Net achieves almost the best results compared with advanced SIQA methods. Specifically, the metric values on LIVE 3D, IVC and WIVC-I are the best, and are the second-best on the WIVC-II.
Collapse
|
15
|
Varga D. No-Reference Video Quality Assessment Using Multi-Pooled, Saliency Weighted Deep Features and Decision Fusion. SENSORS (BASEL, SWITZERLAND) 2022; 22:2209. [PMID: 35336380 PMCID: PMC8948651 DOI: 10.3390/s22062209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 03/07/2022] [Accepted: 03/10/2022] [Indexed: 12/10/2022]
Abstract
With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.
Collapse
|
16
|
Messai O, Chetouani A, Hachouf F, Ahmed Seghir Z. 3D saliency guided deep quality predictor for no-reference stereoscopic images. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
17
|
High-Frequency Ultrasound Dataset for Deep Learning-Based Image Quality Assessment. SENSORS 2022; 22:s22041478. [PMID: 35214381 PMCID: PMC8875486 DOI: 10.3390/s22041478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/09/2022] [Accepted: 02/12/2022] [Indexed: 12/04/2022]
Abstract
This study aims at high-frequency ultrasound image quality assessment for computer-aided diagnosis of skin. In recent decades, high-frequency ultrasound imaging opened up new opportunities in dermatology, utilizing the most recent deep learning-based algorithms for automated image analysis. An individual dermatological examination contains either a single image, a couple of pictures, or an image series acquired during the probe movement. The estimated skin parameters might depend on the probe position, orientation, or acquisition setup. Consequently, the more images analyzed, the more precise the obtained measurements. Therefore, for the automated measurements, the best choice is to acquire the image series and then analyze its parameters statistically. However, besides the correctly received images, the resulting series contains plenty of non-informative data: Images with different artifacts, noise, or the images acquired for the time stamp when the ultrasound probe has no contact with the patient skin. All of them influence further analysis, leading to misclassification or incorrect image segmentation. Therefore, an automated image selection step is crucial. To meet this need, we collected and shared 17,425 high-frequency images of the facial skin from 516 measurements of 44 patients. Two experts annotated each image as correct or not. The proposed framework utilizes a deep convolutional neural network followed by a fuzzy reasoning system to assess the acquired data’s quality automatically. Different approaches to binary and multi-class image analysis, based on the VGG-16 model, were developed and compared. The best classification results reach 91.7% accuracy for the first, and 82.3% for the second analysis, respectively.
Collapse
|
18
|
Shen L, Chen X, Pan Z, Fan K, Li F, Lei J. No-reference stereoscopic image quality assessment based on global and local content characteristics. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.024] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|