1
|
Ma L, Zhao Y, Peng P, Tian Y. Sensitivity Decouple Learning for Image Compression Artifacts Reduction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3620-3633. [PMID: 38787669 DOI: 10.1109/tip.2024.3403034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction, i.e., the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
Collapse
|
2
|
Ali AM, Benjdira B, Koubaa A, El-Shafai W, Khan Z, Boulila W. Vision Transformers in Image Restoration: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:2385. [PMID: 36904589 PMCID: PMC10006889 DOI: 10.3390/s23052385] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 06/18/2023]
Abstract
The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.
Collapse
Affiliation(s)
- Anas M. Ali
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
| | - Bilel Benjdira
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- SE & ICT Laboratory, LR18ES44, ENICarthage, University of Carthage, Tunis 1054, Tunisia
| | - Anis Koubaa
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Walid El-Shafai
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
- Security Engineering Laboratory, Computer Science Department, Prince Sultan University, Riyadh 11586, Saudi Arabia
| | - Zahid Khan
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Wadii Boulila
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- RIADI Laboratory, University of Manouba, Manouba 2010, Tunisia
| |
Collapse
|
3
|
Dinesh C, Cheung G, Bajic IV. Point Cloud Sampling via Graph Balancing and Gershgorin Disc Alignment. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:868-886. [PMID: 35025739 DOI: 10.1109/tpami.2022.3143089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Point cloud (PC)-a collection of discrete geometric samples of a 3D object's surface-is typically large, which entails expensive subsequent operations. Thus, PC sub-sampling is of practical importance. Previous model-based sub-sampling schemes are ad-hoc in design and do not preserve the overall shape sufficiently well, while previous data-driven schemes are trained for specific pre-determined input PC sizes and sub-sampling rates and thus do not generalize well. Leveraging advances in graph sampling, we propose a fast PC sub-sampling algorithm of linear time complexity that chooses a 3D point subset while minimizing a global reconstruction error. Specifically, to articulate a sampling objective, we first assume a super-resolution (SR) method based on feature graph Laplacian regularization (FGLR) that reconstructs the original high-res PC, given points chosen by a sampling matrix H. We prove that minimizing a worst-case SR reconstruction error is equivalent to maximizing the smallest eigenvalue λmin of matrix HT H+ μL, where L is a symmetric, positive semi-definite matrix derived from a neighborhood graph connecting the 3D points. To arrive at a fast algorithm, instead of maximizing λmin, we maximize a lower bound λ-min(HT H+ μL) via selection of H-this translates to a graph sampling problem for a signed graph G with self-loops specified by graph Laplacian L. We tackle this general graph sampling problem in three steps. First, we approximate G with a balanced graph GB specified by Laplacian LB. Second, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we perform a similarity transform Lp = SLB S-1, so that all Gershgorin disc left-ends of Lp are aligned exactly at λmin(LB). Finally, we choose samples on GB using a previous graph sampling algorithm to maximize λ-min(HT H+ μLp) in linear time. Experimental results show that 3D points chosen by our algorithm outperformed competing schemes both numerically and visually in reconstruction quality.
Collapse
|
4
|
Fu X, Wang M, Cao X, Ding X, Zha ZJ. A Model-Driven Deep Unfolding Method for JPEG Artifacts Removal. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6802-6816. [PMID: 34081590 DOI: 10.1109/tnnls.2021.3083504] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning-based methods have achieved notable progress in removing blocking artifacts caused by lossy JPEG compression on images. However, most deep learning-based methods handle this task by designing black-box network architectures to directly learn the relationships between the compressed images and their clean versions. These network architectures are always lack of sufficient interpretability, which limits their further improvements in deblocking performance. To address this issue, in this article, we propose a model-driven deep unfolding method for JPEG artifacts removal, with interpretable network structures. First, we build a maximum posterior (MAP) model for deblocking using convolutional dictionary learning and design an iterative optimization algorithm using proximal operators. Second, we unfold this iterative algorithm into a learnable deep network structure, where each module corresponds to a specific operation of the iterative algorithm. In this way, our network inherits the benefits of both the powerful model ability of data-driven deep learning method and the interpretability of traditional model-driven method. By training the proposed network in an end-to-end manner, all learnable modules can be automatically explored to well characterize the representations of both JPEG artifacts and image content. Experiments on synthetic and real-world datasets show that our method is able to generate competitive or even better deblocking results, compared with state-of-the-art methods both quantitatively and qualitatively.
Collapse
|
5
|
Hum YC, Tee YK, Yap WS, Mokayed H, Tan TS, Salim MIM, Lai KW. A contrast enhancement framework under uncontrolled environments based on just noticeable difference. SIGNAL PROCESSING: IMAGE COMMUNICATION 2022; 103:116657. [DOI: 10.1016/j.image.2022.116657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/26/2024]
|
6
|
Niu Y, Liu C, Ma M, Li F, Chen Z, Shi G. NL-CALIC Soft Decoding Using Strict Constrained Wide-Activated Recurrent Residual Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1243-1257. [PMID: 34951841 DOI: 10.1109/tip.2021.3136608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we propose a normalized Tanh activate strategy and a lightweight wide-activate recurrent structure to solve three key challenges of the soft-decoding of near-lossless codes: 1. How to add an effective strict constrained peak absolute error (PAE) boundary to the network; 2. An end-to-end solution that is suitable for different quantization steps (compression ratios). 3. Simple structure that favors the GPU and FPGA implementation. To this end, we propose a Wide-activated Recurrent structure with a normalized Tanh activate strategy for Soft-Decoding (WRSD). Experiments demonstrate the effectiveness of the proposed WRSD technique that WRSD outperforms better than the state-of-the-art soft decoders with less than 5% number of parameters, and every computation node of WRSD requires less than 64KB storage for the parameters which can be easily cached by most of the current consumer-level GPUs. Source code is available at https://github.com/dota-109/WRSD.
Collapse
|
7
|
|
8
|
Franzoni V, Biondi G, Milani A. Emotional sounds of crowds: spectrogram-based analysis using deep learning. MULTIMEDIA TOOLS AND APPLICATIONS 2020; 79:36063-36075. [PMID: 32837250 PMCID: PMC7429201 DOI: 10.1007/s11042-020-09428-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 06/10/2020] [Accepted: 07/16/2020] [Indexed: 06/11/2023]
Abstract
Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.
Collapse
Affiliation(s)
- Valentina Franzoni
- Department of Mathematics and Computer Science, University of Perugia, Perugia, Italy
| | - Giulio Biondi
- Department of Mathematics and Computer Science, University of Florence, Florence, Italy
| | - Alfredo Milani
- Department of Mathematics and Computer Science, University of Perugia, Perugia, Italy
| |
Collapse
|
9
|
Dinesh C, Cheung G, Bajic IV. Point Cloud Denoising via Feature Graph Laplacian Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4143-4158. [PMID: 32012012 DOI: 10.1109/tip.2020.2969052] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Imperfection in the acquisition process means that point clouds are often corrupted with noise. Building on recent advances in graph signal processing, we design local algorithms for 3D point cloud denoising. Specifically, we design a signal-dependent feature graph Laplacian regularizer (SDFGLR) that assumes surface normals computed from point coordinates are piecewise smooth with respect to a signal-dependent graph Laplacian matrix. Using SDFGLR as a signal prior, we formulate an optimization problem with a general 'p-norm fidelity term that can explicitly remove only two types of additive noise: small but non-sparse noise like Gaussian (using '2 fidelity term) and large but sparser noise like Laplacian (using '1 fidelity term). To establish a linear relationship between normals and 3D point coordinates, we first perform bipartite graph approximation to divide the point cloud into two disjoint node sets (red and blue). We then optimize the red and blue nodes' coordinates alternately. For '2-norm fidelity term, we iteratively solve an unconstrained quadratic programming (QP) problem, efficiently computed using conjugate gradient with a bounded condition number to ensure numerical stability. For '1-norm fidelity term, we iteratively minimize an '1-'2 cost function using accelerated proximal gradient (APG), where a good step size is chosen via Lipschitz continuity analysis. Finally, we propose simple mean and median filters for flat patches of a given point cloud to estimate the noise variance given the noise type, which in turn is used to compute a weight parameter trading off the fidelity term and signal prior in the problem formulation. Extensive experiments show state-of-the-art denoising performance among local methods using our proposed algorithms.
Collapse
|
10
|
Liu X, Zhai D, Chen R, Ji X, Zhao D, Gao W. Depth Super-Resolution via Joint Color-Guided Internal and External Regularizations. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:1636-1645. [PMID: 30334757 DOI: 10.1109/tip.2018.2875506] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Depth information is being widely used in many real-world applications. However, due to the limitation of depth sensing technology, the captured depth map in practice usually has much lower resolution than that of color image counterpart. In this paper, we propose to combine the internal smoothness prior and external gradient consistency constraint in graph domain for depth super-resolution. On one hand, a new graph Laplacian regularizer is proposed to preserve the inherent piecewise smooth characteristic of depth, which has desirable filtering properties. A specific weight matrix of the respect graph is defined to make full use of information of both depth and the corresponding guidance image. On the other hand, inspired by an observation that the gradient of depth is small except at edge separating regions, we introduce a graph gradient consistency constraint to enforce that the graph gradient of depth is close to the thresholded gradient of guidance. We reinterpret the gradient thresholding model as variational optimization with sparsity constraint. In this way, we remedy the problem of structure discrepancy between depth and guidance. Finally, the internal and external regularizations are casted into a unified optimization framework, which can be efficiently addressed by ADMM. Experimental results demonstrate that our method outperforms the state-of-the-art with respect to both objective and subjective quality evaluations.
Collapse
|
11
|
Hu W, Fu Z, Guo Z. Local Frequency Interpretation and Non-Local Self-Similarity on Graph for Point Cloud Inpainting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4087-4100. [PMID: 30908221 DOI: 10.1109/tip.2019.2906554] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As 3D scanning devices and depth sensors mature, point clouds have attracted increasing attention as a format for 3D object representation, with applications in various fields such as tele-presence, navigation and heritage reconstruction. However, point clouds usually exhibit holes of missing data, mainly due to the limitation of acquisition techniques and complicated structure. Further, point clouds are defined on irregular non- Euclidean domains, which is challenging to address especially with conventional signal processing tools. Hence, leveraging on recent advances in graph signal processing, we propose an efficient point cloud inpainting method, exploiting both the local smoothness and the non-local self-similarity in point clouds. Specifically, we first propose a frequency interpretation in graph nodal domain, based on which we derive the smoothing and denoising properties of a graph-signal smoothness prior in order to describe the local smoothness of point clouds. Secondly, we explore the characteristics of non-local self-similarity, by globally searching for the most similar area to the missing region. The similarity metric between two areas is defined based on the direct component and the anisotropic graph total variation of normals in each area. Finally, we formulate the hole-filling step as an optimization problem based on the selected most similar area and regularized by the graph-signal smoothness prior. Besides, we propose voxelization and automatic hole detection methods for the point cloud prior to inpainting. Experimental results show that the proposed approach outperforms four competing methods significantly, both in objective and subjective quality.
Collapse
|