1
|
Deng X, Zhang C, Jiang L, Xia J, Xu M. DeepSN-Net: Deep Semi-Smooth Newton Driven Network for Blind Image Restoration. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2632-2646. [PMID: 40030883 DOI: 10.1109/tpami.2024.3525089] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The deep unfolding network represents a promising research avenue in image restoration. However, most current deep unfolding methodologies are anchored in first-order optimization algorithms, which suffer from sluggish convergence speed and unsatisfactory learning efficiency. In this paper, to address this issue, we first formulate an improved second-order semi-smooth Newton (ISN) algorithm, transforming the original nonlinear equations into an optimization problem amenable to network implementation. After that, we propose an innovative network architecture based on the ISN algorithm for blind image restoration, namely DeepSN-Net. To the best of our knowledge, DeepSN-Net is the first successful endeavor to design a second-order deep unfolding network for image restoration, which fills the blank of this area. Furthermore, it offers several distinct advantages: 1) DeepSN-Net provides a unified framework to a variety of image restoration tasks in both synthetic and real-world contexts, without imposing constraints on the degradation conditions. 2) The network architecture is meticulously aligned with the ISN algorithm, ensuring that each module possesses robust physical interpretability. 3) The network exhibits high learning efficiency, superior restoration accuracy and good generalization ability across 11 datasets on three typical restoration tasks. The success of DeepSN-Net on image restoration may ignite many subsequent works centered around the second-order optimization algorithms, which is good for the community.
Collapse
|
2
|
Zhong Z, Liu X, Jiang J, Zhao D, Ji X. Deep Attentional Guided Image Filtering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12236-12250. [PMID: 37015130 DOI: 10.1109/tnnls.2023.3253472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Guided filter is a fundamental tool in computer vision and computer graphics, which aims to transfer structure information from the guide image to the target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exist significantly different edges in two images, simply transferring all structural information from the guide to the target would result in various artifacts. To cope with this problem, we propose an effective framework named deep attentional guided image filtering, the filtering process of which can fully integrate the complementary information contained in both images. Specifically, we propose an attentional kernel learning module to generate dual sets of filter kernels from the guidance and the target and then adaptively combine them by modeling the pixelwise dependency between the two images. Meanwhile, we propose a multiscale guided image filtering module to progressively generate the filtering result with the constructed kernels in a coarse-to-fine manner. Correspondingly, a multiscale fusion strategy is introduced to reuse the intermediate results in the coarse-to-fine process. Extensive experiments show that the proposed framework compares favorably with the state-of-the-art methods in a wide range of guided image filtering applications, such as guided super-resolution (SR), cross-modality restoration, and semantic segmentation. Moreover, our scheme achieved the first place in the real depth map SR challenge held in ACM ICMR 2021. The codes can be found at https://github.com/zhwzhong/DAGF.
Collapse
|
3
|
Lei P, Hu L, Fang F, Zhang G. Joint Under-Sampling Pattern and Dual-Domain Reconstruction for Accelerating Multi-Contrast MRI. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4686-4701. [PMID: 39178087 DOI: 10.1109/tip.2024.3445729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
Multi-Contrast Magnetic Resonance Imaging (MCMRI) utilizes the short-time reference image to facilitate the reconstruction of the long-time target one, providing a new solution for fast MRI. Although various methods have been proposed, they still have certain limitations. 1) existing methods featuring the preset under-sampling patterns give rise to redundancy between multi-contrast images and limit their model performance; 2) most methods focus on the information in the image domain, prior knowledge in the k-space domain has not been fully explored; and 3) most networks are manually designed and lack certain physical interpretability. To address these issues, we propose a joint optimization of the under-sampling pattern and a deep-unfolding dual-domain network for accelerating MCMRI. Firstly, to reduce the redundant information and sample more contrast-specific information, we propose a new framework to learn the optimal under-sampling pattern for MCMRI. Secondly, a dual-domain model is established to reconstruct the target image in both the image domain and the k-space frequency domain. The model in the image domain introduces a spatial transformation to explicitly model the inconsistent and unaligned structures of MCMRI. The model in the k-space learns prior knowledge from the frequency domain, enabling the model to capture more global information from the input images. Thirdly, we employ the proximal gradient algorithm to optimize the proposed model and then unfold the iterative results into a deep-unfolding network, called MC-DuDoN. We evaluate the proposed MC-DuDoN on MCMRI super-resolution and reconstruction tasks. Experimental results give credence to the superiority of the current model. In particular, since our approach explicitly models the inconsistent structures, it shows robustness on spatially misaligned MCMRI. In the reconstruction task, compared with conventional masks, the learned mask restores more realistic images, even under an ultra-high acceleration ratio ( ×30 ). Code is available at https://github.com/lpcccc-cv/MC-DuDoNet.
Collapse
|
4
|
An B, Wang S, Qin F, Zhao Z, Yan R, Chen X. Adversarial Algorithm Unrolling Network for Interpretable Mechanical Anomaly Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6007-6020. [PMID: 37028350 DOI: 10.1109/tnnls.2023.3250664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In mechanical anomaly detection, algorithms with higher accuracy, such as those based on artificial neural networks, are frequently constructed as black boxes, resulting in opaque interpretability in architecture and low credibility in results. This article proposes an adversarial algorithm unrolling network (AAU-Net) for interpretable mechanical anomaly detection. AAU-Net is a generative adversarial network (GAN). Its generator, composed of an encoder and a decoder, is mainly produced by algorithm unrolling of a sparse coding model, which is specially designed for feature encoding and decoding of vibration signals. Thus, AAU-Net has a mechanism-driven and interpretable network architecture. In other words, it is ad hoc interpretable. Moreover, a multiscale feature visualization approach for AAU-Net is introduced to verify that meaningful features are encoded by AAU-Net, helping users to trust the detection results. The feature visualization approach enables the results of AAU-Net to be interpretable, i.e., post hoc interpretable. To verify AAU-Net's capability of feature encoding and anomaly detection, we designed and performed simulations and experiments. The results show that AAU-Net can learn signal features that match the dynamic mechanism of the mechanical system. Considering the excellent feature learning ability, unsurprisingly, AAU-Net achieves the best overall anomaly detection performance compared with other algorithms.
Collapse
|
5
|
Deng X, Xu J, Gao F, Sun X, Xu M. Deep M 2CDL: Deep Multi-Scale Multi-Modal Convolutional Dictionary Learning Network. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2770-2787. [PMID: 37983156 DOI: 10.1109/tpami.2023.3334624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
For multi-modal image processing, network interpretability is essential due to the complicated dependency across modalities. Recently, a promising research direction for interpretable network is to incorporate dictionary learning into deep learning through unfolding strategy. However, the existing multi-modal dictionary learning models are both single-layer and single-scale, which restricts the representation ability. In this paper, we first introduce a multi-scale multi-modal convolutional dictionary learning ( M2CDL) model, which is performed in a multi-layer strategy, to associate different image modalities in a coarse-to-fine manner. Then, we propose a unified framework namely Deep M2CDL derived from the M2CDL model for both multi-modal image restoration (MIR) and multi-modal image fusion (MIF) tasks. The network architecture of Deep M2CDL fully matches the optimization steps of the M2CDL model, which makes each network module with good interpretability. Different from handcrafted priors, both the dictionary and sparse feature priors are learned through the network. The performance of the proposed Deep M2CDL is evaluated on a wide variety of MIR and MIF tasks, which shows the superiority of it over many state-of-the-art methods both quantitatively and qualitatively. In addition, we also visualize the multi-modal sparse features and dictionary filters learned from the network, which demonstrates the good interpretability of the Deep M2CDL network.
Collapse
|
6
|
Liu Y, Jia Q, Zhang J, Fan X, Wang S, Ma S, Gao W. Hierarchical Similarity Learning for Aliasing Suppression Image Super-Resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2759-2771. [PMID: 35930518 DOI: 10.1109/tnnls.2022.3191674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
As a highly ill-posed issue, single-image super-resolution (SISR) has been widely investigated in recent years. The main task of SISR is to recover the information loss caused by the degradation procedure. According to the Nyquist sampling theory, the degradation leads to the aliasing effect and makes it hard to restore the correct textures from low-resolution (LR) images. In practice, there are correlations and self-similarities among the adjacent patches in the natural images. This article considers the self-similarity and proposes a hierarchical image super-resolution network (HSRNet) to suppress the influence of aliasing. We consider the SISR issue in the optimization perspective and propose an iterative solution pattern based on the half-quadratic splitting (HQS) method. To explore the texture with local image prior, we design a hierarchical exploration block (HEB) and progressive increase the receptive field. Furthermore, multilevel spatial attention (MSA) is devised to obtain the relations of adjacent feature and enhance the high-frequency information, which acts as a crucial role for visual experience. The experimental result shows that HSRNet achieves better quantitative and visual performance than other works and remits the aliasing more effectively.
Collapse
|
7
|
Yang G, Zhang L, Liu A, Fu X, Chen X, Wang R. MGDUN: An interpretable network for multi-contrast MRI image super-resolution reconstruction. Comput Biol Med 2023; 167:107605. [PMID: 37925907 DOI: 10.1016/j.compbiomed.2023.107605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/28/2023] [Accepted: 10/17/2023] [Indexed: 11/07/2023]
Abstract
Magnetic resonance imaging (MRI) Super-Resolution (SR) aims to obtain high resolution (HR) images with more detailed information for precise diagnosis and quantitative image analysis. Deep unfolding networks outperform general MRI SR reconstruction methods by providing better performance and improved interpretability, which enhance the trustworthiness required in clinical practice. Additionally, current SR reconstruction techniques often rely on a single contrast or a simple multi-contrast fusion mechanism, ignoring the complex relationships between different contrasts. To address these issues, in this paper, we propose a Model-Guided multi-contrast interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction, which explicitly incorporates the well-studied multi-contrast MRI observation model into an unfolding iterative network. Specifically, we manually design an objective function for MGDUN that can be iteratively computed by the half-quadratic splitting algorithm. The iterative MGDUN algorithm is unfolded into a special model-guided deep unfolding network that explicitly takes into account both the multi-contrast relationship matrix and the MRI observation matrix during the end-to-end optimization process. Extensive experimental results on the multi-contrast IXI dataset and the BraTs 2019 dataset demonstrate the superiority of our proposed model, with PSNR reaching 37.3366 and 35.9690 respectively. Our proposed MGDUN provides a promising solution for multi-contrast MR image super-resolution reconstruction. Code is available at https://github.com/yggame/MGDUN.
Collapse
Affiliation(s)
- Gang Yang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China.
| | - Li Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China; Institute of Intelligent Machines, and Hefei Institute of Physical Science, Chinese Academy Sciences, Hefei 230031, China
| | - Aiping Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China.
| | - Xueyang Fu
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Xun Chen
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Rujing Wang
- Institute of Intelligent Machines, and Hefei Institute of Physical Science, Chinese Academy Sciences, Hefei 230031, China
| |
Collapse
|
8
|
Liu S, Dragotti PL. Sensing Diversity and Sparsity Models for Event Generation and Video Reconstruction from Events. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12444-12458. [PMID: 37216257 DOI: 10.1109/tpami.2023.3278940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Events-to-video (E2V) reconstruction and video-to-events (V2E) simulation are two fundamental research topics in event-based vision. Current deep neural networks for E2V reconstruction are usually complex and difficult to interpret. Moreover, existing event simulators are designed to generate realistic events, but research on how to improve the event generation process has been so far limited. In this paper, we propose a light, simple model-based deep network for E2V reconstruction, explore the diversity for adjacent pixels in V2E generation, and finally build a video-to-events-to-video (V2E2V) architecture to validate how alternative event generation strategies improve video reconstruction. For the E2V reconstruction, we model the relationship between events and intensity using sparse representation models. A convolutional ISTA network (CISTA) is then designed using the algorithm unfolding strategy. Long short-term temporal consistency (LSTC) constraints are further introduced to enhance the temporal coherence. In the V2E generation, we introduce the idea of having interleaved pixels with different contrast threshold and lowpass bandwidth and conjecture that this can help extract more useful information from intensity. Finally, V2E2V architecture is used to verify the effectiveness of this strategy. Results highlight that our CISTA-LSTC network outperforms state-of-the-art methods and achieves better temporal consistency. Sensing diversity in event generation reveals more fine details and this leads to a significantly improved reconstruction quality.
Collapse
|
9
|
|
10
|
TC-net: transformer combined with cnn for image denoising. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03785-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Ma C, Yu P, Lu J, Zhou J. Recovering Realistic Details for Magnification-Arbitrary Image Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3669-3683. [PMID: 35580105 DOI: 10.1109/tip.2022.3174393] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The emergence of implicit neural representations (INR) has shown the potential to represent images in a continuous form by mapping pixel coordinates to RGB values. Recent work is capable of recovering arbitrary-resolution images from the continuous representations of the input low-resolution (LR) images. However, it can only super-resolve blurry images and lacks the ability to generate perceptual-pleasant details. In this paper, we propose implicit pixel flow (IPF) to model the coordinate dependency between the blurry INR distribution and the sharp real-world distribution. For each pixel near the blurry edges, IPF assigns offsets for the coordinates of the pixel so that the original RGB values can be replaced by the RGB values of a neighboring pixel which are more appropriate to form sharper edges. By modifying the relationship between the INR-domain coordinates and the image-domain pixels via IPF, we convert the original blurry INR distribution to a sharp one. Specifically, we adopt convolutional neural networks to extract continuous flow representations and employ multi-layer perceptrons to build the implicit function for calculating pixel flow. In addition, we propose a new double constraint module to search for more stable and optimal pixel flows during training. To the best of our knowledge, this is the first method to recover perceptually-pleasant details for magnification-arbitrary single image super-resolution. Experimental results on public benchmark datasets demonstrate that we successfully restore shape edges and satisfactory textures from continuous image representations.
Collapse
|
12
|
Gupta H, Mitra K. Toward Unaligned Guided Thermal Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:433-445. [PMID: 34855595 DOI: 10.1109/tip.2021.3130538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Thermography is a useful imaging technique as it works well in poor visibility conditions. High-resolution thermal imaging sensors are usually expensive and this limits the general applicability of such imaging systems. Many thermal cameras are accompanied by a high-resolution visible-range camera, which can be used as a guide to super-resolve the low-resolution thermal images. However, the thermal and visible images form a stereo pair and the difference in their spectral range makes it very challenging to pixel-wise align the two images. The existing guided super-resolution (GSR) methods are based on aligned image pairs and hence are not appropriate for this task. In this paper, we attempt to remove the necessity of pixel-to-pixel alignment for GSR by proposing two models: the first one employs a correlation-based feature-alignment loss to reduce the misalignment in the feature-space itself and the second model includes a misalignment-map estimation block as a part of an end-to-end framework that adequately aligns the input images for performing guided super-resolution. We conduct multiple experiments to compare our methods with existing state-of-the-art single and guided super-resolution techniques and show that our models are better suited for the task of unaligned guided super-resolution from very low-resolution thermal images.
Collapse
|
13
|
Deng X, Dragotti PL. Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3333-3348. [PMID: 32248098 DOI: 10.1109/tpami.2020.2984244] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we propose a novel deep convolutional neural network to solve the general multi-modal image restoration (MIR) and multi-modal image fusion (MIF) problems. Different from other methods based on deep learning, our network architecture is designed by drawing inspirations from a new proposed multi-modal convolutional sparse coding (MCSC) model. The key feature of the proposed network is that it can automatically split the common information shared among different modalities, from the unique information that belongs to each single modality, and is therefore denoted with CU-Net, i.e., common and unique information splitting network. Specifically, the CU-Net is composed of three modules, i.e., the unique feature extraction module (UFEM), common feature preservation module (CFPM), and image reconstruction module (IRM). The architecture of each module is derived from the corresponding part in the MCSC model, which consists of several learned convolutional sparse coding (LCSC) blocks. Extensive numerical results verify the effectiveness of our method on a variety of MIR and MIF tasks, including RGB guided depth image super-resolution, flash guided non-flash image denoising, multi-focus and multi-exposure image fusion.
Collapse
|
14
|
Shi Z, Chen Y, Gavves E, Mettes P, Snoek CGM. Unsharp Mask Guided Filtering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7472-7485. [PMID: 34449363 DOI: 10.1109/tip.2021.3106812] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering by means of an additional guidance image. Where classical guided filters transfer structures using hand-designed functions, recent guided filters have been considerably advanced through parametric learning of deep networks. The state-of-the-art leverages deep networks to estimate the two core coefficients of the guided filter. In this work, we posit that simultaneously estimating both coefficients is suboptimal, resulting in halo artifacts and structure inconsistencies. Inspired by unsharp masking, a classical technique for edge enhancement that requires only a single coefficient, we propose a new and simplified formulation of the guided filter. Our formulation enjoys a filtering prior from a low-pass filter and enables explicit structure transfer by estimating a single coefficient. Based on our proposed formulation, we introduce a successive guided filtering network, which provides multiple filtering results from a single network, allowing for a trade-off between accuracy and efficiency. Extensive ablations, comparisons and analysis show the effectiveness and efficiency of our formulation and network, resulting in state-of-the-art results across filtering tasks like upsampling, denoising, and cross-modality filtering. Code is available at https://github.com/shizenglin/Unsharp-Mask-Guided-Filtering.
Collapse
|
15
|
Sun Z, Yu Y. Fast Approximation for Sparse Coding with Applications to Object Recognition. SENSORS (BASEL, SWITZERLAND) 2021; 21:1442. [PMID: 33669576 PMCID: PMC7923134 DOI: 10.3390/s21041442] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/02/2022]
Abstract
Sparse Coding (SC) has been widely studied and shown its superiority in the fields of signal processing, statistics, and machine learning. However, due to the high computational cost of the optimization algorithms required to compute the sparse feature, the applicability of SC to real-time object recognition tasks is limited. Many deep neural networks have been constructed to low fast estimate the sparse feature with the help of a large number of training samples, which is not suitable for small-scale datasets. Therefore, this work presents a simple and efficient fast approximation method for SC, in which a special single-hidden-layer neural network (SLNNs) is constructed to perform the approximation task, and the optimal sparse features of training samples exactly computed by sparse coding algorithm are used as ground truth to train the SLNNs. After training, the proposed SLNNs can quickly estimate sparse features for testing samples. Ten benchmark data sets taken from UCI databases and two face image datasets are used for experiment, and the low root mean square error (RMSE) results between the approximated sparse features and the optimal ones have verified the approximation performance of this proposed method. Furthermore, the recognition results demonstrate that the proposed method can effectively reduce the computational time of testing process while maintaining the recognition performance, and outperforms several state-of-the-art fast approximation sparse coding methods, as well as the exact sparse coding algorithms.
Collapse
Affiliation(s)
| | - Yuanlong Yu
- The College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China;
| |
Collapse
|
16
|
Wang M, Wei S, Liang J, Zhou Z, Qu Q, Shi J, Zhang X. TPSSI-Net: Fast and Enhanced Two-Path Iterative Network for 3D SAR Sparse Imaging. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7317-7332. [PMID: 34415832 DOI: 10.1109/tip.2021.3104168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The emerging field of combining compressed sensing (CS) and three-dimensional synthetic aperture radar (3D SAR) imaging has shown significant potential to reduce sampling rate and improve image quality. However, the conventional CS-driven algorithms are always limited by huge computational costs and non-trivial tuning of parameters. In this article, to address this problem, we propose a two-path iterative framework dubbed TPSSI-Net for 3D SAR sparse imaging. By mapping the AMP into a layer-fixed deep neural network, each layer of TPSSI-Net consists of four modules in cascade corresponding to four steps of the AMP optimization. Differently, the Onsager terms in TPSSI-Net are modified to be differentiable and scaled by learnable coefficients. Rather than manually choosing a sparsifying basis, a two-path convolutional neural network (CNN) is developed and embedded in TPSSI-Net for nonlinear sparse representation in the complex-valued domain. All parameters are layer-varied and optimized by end-to-end training based on a channel-wise loss function, bounding both symmetry constraint and measurement fidelity. Finally, extensive SAR imaging experiments, including simulations and real-measured tests, demonstrate the effectiveness and high efficiency of the proposed TPSSI-Net.
Collapse
|
17
|
Marivani I, Tsiligianni E, Cornelis B, Deligiannis N. Multimodal Deep Unfolding for Guided Image Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8443-8456. [PMID: 32784140 DOI: 10.1109/tip.2020.3014729] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The reconstruction of a high resolution image given a low resolution observation is an ill-posed inverse problem in imaging. Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a highresolution output. Unlike existing deep multimodal models that do not incorporate domain knowledge about the problem, we propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture. Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information; therefore, the proposed neural network is interpretable by design. The deep unfolding architecture is used as a core component of a multimodal framework for guided image super-resolution. An alternative multimodal design is investigated by employing residual learning to improve the training efficiency. The presented multimodal approach is applied to super-resolution of near-infrared and multi-spectral images as well as depth upsampling using RGB images as side information. Experimental results show that our model outperforms state-ofthe-art methods.
Collapse
|