1
|
Zhu Y, Fu X, Zhang Z, Liu A, Xiong Z, Zha ZJ. Hue Guidance Network for Single Image Reflection Removal. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13701-13712. [PMID: 37220051 DOI: 10.1109/tnnls.2023.3270938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Reflection from glasses is ubiquitous in daily life, but it is usually undesirable in photographs. To remove these unwanted noises, existing methods utilize either correlative auxiliary information or handcrafted priors to constrain this ill-posed problem. However, due to their limited capability to describe the properties of reflections, these methods are unable to handle strong and complex reflection scenes. In this article, we propose a hue guidance network (HGNet) with two branches for single image reflection removal (SIRR) by integrating image information and corresponding hue information. The complementarity between image information and hue information has not been noticed. The key to this idea is that we found that hue information can describe reflections well and thus can be used as a superior constraint for the specific SIRR task. Accordingly, the first branch extracts the salient reflection features by directly estimating the hue map. The second branch leverages these effective features, which can help locate salient reflection regions to obtain a high-quality restored image. Furthermore, we design a new cyclic hue loss to provide a more accurate optimization direction for the network training. Experiments substantiate the superiority of our network, especially its excellent generalization ability to various reflection scenes, as compared with state-of-the-arts both qualitatively and quantitatively. Source codes are available at https://github.com/zhuyr97/HGRR.
Collapse
|
2
|
Lei P, Hu L, Fang F, Zhang G. Joint Under-Sampling Pattern and Dual-Domain Reconstruction for Accelerating Multi-Contrast MRI. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4686-4701. [PMID: 39178087 DOI: 10.1109/tip.2024.3445729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
Multi-Contrast Magnetic Resonance Imaging (MCMRI) utilizes the short-time reference image to facilitate the reconstruction of the long-time target one, providing a new solution for fast MRI. Although various methods have been proposed, they still have certain limitations. 1) existing methods featuring the preset under-sampling patterns give rise to redundancy between multi-contrast images and limit their model performance; 2) most methods focus on the information in the image domain, prior knowledge in the k-space domain has not been fully explored; and 3) most networks are manually designed and lack certain physical interpretability. To address these issues, we propose a joint optimization of the under-sampling pattern and a deep-unfolding dual-domain network for accelerating MCMRI. Firstly, to reduce the redundant information and sample more contrast-specific information, we propose a new framework to learn the optimal under-sampling pattern for MCMRI. Secondly, a dual-domain model is established to reconstruct the target image in both the image domain and the k-space frequency domain. The model in the image domain introduces a spatial transformation to explicitly model the inconsistent and unaligned structures of MCMRI. The model in the k-space learns prior knowledge from the frequency domain, enabling the model to capture more global information from the input images. Thirdly, we employ the proximal gradient algorithm to optimize the proposed model and then unfold the iterative results into a deep-unfolding network, called MC-DuDoN. We evaluate the proposed MC-DuDoN on MCMRI super-resolution and reconstruction tasks. Experimental results give credence to the superiority of the current model. In particular, since our approach explicitly models the inconsistent structures, it shows robustness on spatially misaligned MCMRI. In the reconstruction task, compared with conventional masks, the learned mask restores more realistic images, even under an ultra-high acceleration ratio ( ×30 ). Code is available at https://github.com/lpcccc-cv/MC-DuDoNet.
Collapse
|
3
|
Ma L, Zhao Y, Peng P, Tian Y. Sensitivity Decouple Learning for Image Compression Artifacts Reduction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3620-3633. [PMID: 38787669 DOI: 10.1109/tip.2024.3403034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction, i.e., the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
Collapse
|
4
|
Chang Y, Chen M, Yu C, Li Y, Chen L, Yan L. Direction and Residual Awareness Curriculum Learning Network for Rain Streaks Removal. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8414-8428. [PMID: 37018699 DOI: 10.1109/tnnls.2022.3227730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Single-image rain streaks' removal has attracted great attention in recent years. However, due to the highly visual similarity between the rain streaks and the line pattern image edges, the over-smoothing of image edges or residual rain streaks' phenomenon may unexpectedly occur in the deraining results. To overcome this problem, we propose a direction and residual awareness network within the curriculum learning paradigm for the rain streaks' removal. Specifically, we present a statistical analysis of the rain streaks on large-scale real rainy images and figure out that rain streaks in local patches possess principal directionality. This motivates us to design a direction-aware network for rain streaks' modeling, in which the principal directionality property endows us with the discriminative representation ability of better differing rain streaks from image edges. On the other hand, for image modeling, we are motivated by the iterative regularization in classical image processing and unfold it into a novel residual-aware block (RAB) to explicitly model the relationship between the image and the residual. The RAB adaptively learns balance parameters to selectively emphasize informative image features and better suppress the rain streaks. Finally, we formulate the rain streaks' removal problem into the curriculum learning paradigm which progressively learns the directionality of the rain streaks, rain streaks' appearance, and the image layer in a coarse-to-fine, easy-to-hard guidance manner. Solid experiments on extensive simulated and real benchmarks demonstrate the visual and quantitative improvement of the proposed method over the state-of-the-art methods.
Collapse
|
5
|
An B, Wang S, Qin F, Zhao Z, Yan R, Chen X. Adversarial Algorithm Unrolling Network for Interpretable Mechanical Anomaly Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6007-6020. [PMID: 37028350 DOI: 10.1109/tnnls.2023.3250664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In mechanical anomaly detection, algorithms with higher accuracy, such as those based on artificial neural networks, are frequently constructed as black boxes, resulting in opaque interpretability in architecture and low credibility in results. This article proposes an adversarial algorithm unrolling network (AAU-Net) for interpretable mechanical anomaly detection. AAU-Net is a generative adversarial network (GAN). Its generator, composed of an encoder and a decoder, is mainly produced by algorithm unrolling of a sparse coding model, which is specially designed for feature encoding and decoding of vibration signals. Thus, AAU-Net has a mechanism-driven and interpretable network architecture. In other words, it is ad hoc interpretable. Moreover, a multiscale feature visualization approach for AAU-Net is introduced to verify that meaningful features are encoded by AAU-Net, helping users to trust the detection results. The feature visualization approach enables the results of AAU-Net to be interpretable, i.e., post hoc interpretable. To verify AAU-Net's capability of feature encoding and anomaly detection, we designed and performed simulations and experiments. The results show that AAU-Net can learn signal features that match the dynamic mechanism of the mechanical system. Considering the excellent feature learning ability, unsurprisingly, AAU-Net achieves the best overall anomaly detection performance compared with other algorithms.
Collapse
|
6
|
Wang Z, Guo D, Tu Z, Huang Y, Zhou Y, Wang J, Feng L, Lin D, You Y, Agback T, Orekhov V, Qu X. A Sparse Model-Inspired Deep Thresholding Network for Exponential Signal Reconstruction-Application in Fast Biological Spectroscopy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7578-7592. [PMID: 35120010 DOI: 10.1109/tnnls.2022.3144580] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The nonuniform sampling (NUS) is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partially sampled exponentials is highly expected in general signal processing and many applications. Deep learning (DL) has shown astonishing potential in this field, but many existing problems, such as lack of robustness and explainability, greatly limit its applications. In this work, by combining the merits of the sparse model-based optimization method and data-driven DL, we propose a DL architecture for spectra reconstruction from undersampled data, called MoDern. It follows the iterative reconstruction in solving a sparse model to build the neural network, and we elaborately design a learnable soft-thresholding to adaptively eliminate the spectrum artifacts introduced by undersampling. Extensive results on both synthetic and biological data show that MoDern enables more robust, high-fidelity, and ultrafast reconstruction than the state-of-the-art methods. Remarkably, MoDern has a small number of network parameters and is trained on solely synthetic data while generalizing well to biological data in various scenarios. Furthermore, we extend it to an open-access and easy-to-use cloud computing platform (XCloud-MoDern), contributing a promising strategy for further development of biological applications.
Collapse
|
7
|
Huang Y, Zhao J, Wang Z, Orekhov V, Guo D, Qu X. Exponential Signal Reconstruction With Deep Hankel Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6214-6226. [PMID: 34941531 DOI: 10.1109/tnnls.2021.3134717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Exponential function is a basic form of temporal signals, and how to fast acquire this signal is one of the fundamental problems and frontiers in signal processing. To achieve this goal, partial data may be acquired but result in severe artifacts in its spectrum, which is the Fourier transform of exponentials. Thus, reliable spectrum reconstruction is highly expected in the fast data acquisition in many applications, such as chemistry, biology, and medical imaging. In this work, we propose a deep learning method whose neural network structure is designed by imitating the iterative process in the model-based state-of-the-art exponentials' reconstruction method with the low-rank Hankel matrix factorization. With the experiments on synthetic data and realistic biological magnetic resonance signals, we demonstrate that the new method yields much lower reconstruction errors and preserves the low-intensity signals much better than compared methods.
Collapse
|
8
|
Wang K, Liao X, Li J, Meng D, Wang Y. Hyperspectral Image Super-Resolution via Knowledge-Driven Deep Unrolling and Transformer Embedded Convolutional Recurrent Neural Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:4581-4594. [PMID: 37467098 DOI: 10.1109/tip.2023.3293768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Hyperspectral (HS) imaging has been widely used in various real application problems. However, due to the hardware limitations, the obtained HS images usually have low spatial resolution, which could obviously degrade their performance. Through fusing a low spatial resolution HS image with a high spatial resolution auxiliary image (e.g., multispectral, RGB or panchromatic image), the so-called HS image fusion has underpinned much of recent progress in enhancing the spatial resolution of HS image. Nonetheless, a corresponding well registered auxiliary image cannot always be available in some real situations. To remedy this issue, we propose in this paper a newly single HS image super-resolution method based on a novel knowledge-driven deep unrolling technique. Precisely, we first propose a maximum a posterior based energy model with implicit priors, which can be solved by alternating optimization to determine an elementary iteration mechanism. We then unroll such iteration mechanism with an ingenious Transformer embedded convolutional recurrent neural network in which two structural designs are integrated. That is, the vision Transformer and 3D convolution learn the implicit spatial-spectral priors, and the recurrent hidden connections over iterations model the recurrence of the iterative reconstruction stages. Thus, an effective knowledge-driven, end-to-end and data-dependent HS image super-resolution framework can be successfully attained. Extensive experiments on three HS image datasets demonstrate the superiority of the proposed method over several state-of-the-art HS image super-resolution methods.
Collapse
|
9
|
Liu Y, Zhou X, Zhong W. Multi-Modality Image Fusion and Object Detection Based on Semantic Information. ENTROPY (BASEL, SWITZERLAND) 2023; 25:718. [PMID: 37238472 PMCID: PMC10216995 DOI: 10.3390/e25050718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/12/2023] [Accepted: 04/24/2023] [Indexed: 05/28/2023]
Abstract
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in the degradation of important information. In addition, while many methods use various loss functions or fusion rules to retain complementary features of both modes, the fusion results often retain redundant or even invalid information.In order to accurately extract the effective information from both infrared images and visible light images without omission or redundancy, and to better serve downstream tasks such as target detection with the fused image, we propose a multi-level structure search attention fusion network based on semantic information guidance, which realizes the fusion of infrared and visible images in an end-to-end way. Our network has two main contributions: the use of neural architecture search (NAS) and the newly designed multilevel adaptive attention module (MAAB). These methods enable our network to retain the typical characteristics of the two modes while removing useless information for the detection task in the fusion results. In addition, our loss function and joint training method can establish a reliable relationship between the fusion network and subsequent detection tasks. Extensive experiments on the new dataset (M3FD) show that our fusion method has achieved advanced performance in both subjective and objective evaluations, and the mAP in the object detection task is improved by 0.5% compared to the second-best method (FusionGAN).
Collapse
Affiliation(s)
- Yong Liu
- School of Software Technology, Dalian University of Technology, Dalian 116620, China
| | - Xin Zhou
- International School of Information Science & Engineering, Dalian University of Technology, Dalian 116620, China
| | - Wei Zhong
- International School of Information Science & Engineering, Dalian University of Technology, Dalian 116620, China
| |
Collapse
|
10
|
Yin JL, Chen BH, Peng YT, Hwang H. Automatic Intermediate Generation With Deep Reinforcement Learning for Robust Two-Exposure Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7853-7862. [PMID: 34181551 DOI: 10.1109/tnnls.2021.3088907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fusing low dynamic range (LDR) for high dynamic range (HDR) images has gained a lot of attention, especially to achieve real-world application significance when the hardware resources are limited to capture images with different exposure times. However, existing HDR image generation by picking the best parts from each LDR image often yields unsatisfactory results due to either the lack of input images or well-exposed contents. To overcome this limitation, we model the HDR image generation process in two-exposure fusion as a deep reinforcement learning problem and learn an online compensating representation to fuse with LDR inputs for HDR image generation. Moreover, we build a two-exposure dataset with reference HDR images from a public multiexposure dataset that has not yet been normalized to train and evaluate the proposed model. By assessing the built dataset, we show that our reinforcement HDR image generation significantly outperforms other competing methods under different challenging scenarios, even with limited well-exposed contents. More experimental results on a no-reference multiexposure image dataset demonstrate the generality and effectiveness of the proposed model. To the best of our knowledge, this is the first work to use a reinforcement-learning-based framework for an online compensating representation in two-exposure image fusion.
Collapse
|
11
|
Fu X, Wang M, Cao X, Ding X, Zha ZJ. A Model-Driven Deep Unfolding Method for JPEG Artifacts Removal. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6802-6816. [PMID: 34081590 DOI: 10.1109/tnnls.2021.3083504] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning-based methods have achieved notable progress in removing blocking artifacts caused by lossy JPEG compression on images. However, most deep learning-based methods handle this task by designing black-box network architectures to directly learn the relationships between the compressed images and their clean versions. These network architectures are always lack of sufficient interpretability, which limits their further improvements in deblocking performance. To address this issue, in this article, we propose a model-driven deep unfolding method for JPEG artifacts removal, with interpretable network structures. First, we build a maximum posterior (MAP) model for deblocking using convolutional dictionary learning and design an iterative optimization algorithm using proximal operators. Second, we unfold this iterative algorithm into a learnable deep network structure, where each module corresponds to a specific operation of the iterative algorithm. In this way, our network inherits the benefits of both the powerful model ability of data-driven deep learning method and the interpretability of traditional model-driven method. By training the proposed network in an end-to-end manner, all learnable modules can be automatically explored to well characterize the representations of both JPEG artifacts and image content. Experiments on synthetic and real-world datasets show that our method is able to generate competitive or even better deblocking results, compared with state-of-the-art methods both quantitatively and qualitatively.
Collapse
|
12
|
Pu W. Shuffle GAN With Autoencoder: A Deep Learning Approach to Separate Moving and Stationary Targets in SAR Imagery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4770-4784. [PMID: 33684045 DOI: 10.1109/tnnls.2021.3060747] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Synthetic aperture radar (SAR) has been widely applied in both civilian and military fields because it provides high-resolution images of the ground target regardless of weather conditions, day or night. In SAR imaging, the separation of moving and stationary targets is of great significance as it is capable of removing the ambiguity stemming from inevitable moving targets in stationary scene imaging and suppressing clutter in moving target imaging. The newly emerged generative adversarial networks (GANs) have great performance in many other signal processing areas; however, they have not been introduced to radar imaging tasks. In this work, we propose a novel shuffle GAN with autoencoder separation method to separate the moving and stationary targets in SAR imagery. The proposed algorithm is based on the independence of well-focused stationary targets and blurred moving targets for creating adversarial constraints. Note that the algorithm operates in a totally unsupervised fashion without requiring a sample set that contains mixed and separated SAR images. Experiments are carried out on synthetic and real SAR data to validate the effectiveness of the proposed method.
Collapse
|
13
|
Liu R, Chen Q, Yao Y, Fan X, Luo Z. Location-Aware and Regularization-Adaptive Correlation Filters for Robust Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2430-2442. [PMID: 32749966 DOI: 10.1109/tnnls.2020.3005447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filter (CF) has recently been widely used for visual tracking. The estimation of the search window and the filter-learning strategies is the key component of the CF trackers. Nevertheless, prevalent CF models separately address these issues in heuristic manners. The commonly used CF models directly set the estimated location in the previous frame as the search center for the current one. Moreover, these models usually rely on simple and fixed regularization for filter learning, and thus, their performance is compromised by the search window size and optimization heuristics. To break these limits, this article proposes a location-aware and regularization-adaptive CF (LRCF) for robust visual tracking. LRCF establishes a novel bilevel optimization model to address simultaneously the location-estimation and filter-training problems. We prove that our bilevel formulation can successfully obtain a globally converged CF and the corresponding object location in a collaborative manner. Moreover, based on the LRCF framework, we design two trackers named LRCF-S and LRCF-SA and a series of comparisons to prove the flexibility and effectiveness of the LRCF framework. Extensive experiments on different challenging benchmark data sets demonstrate that our LRCF trackers perform favorably against the state-of-the-art methods in practice.
Collapse
|
14
|
Wang H, Wu Y, Xie Q, Zhao Q, Liang Y, Zhang S, Meng D. Structural residual learning for single image rain removal. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106595] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|