1
|
Zhong Z, Liu X, Jiang J, Zhao D, Ji X. Deep Attentional Guided Image Filtering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12236-12250. [PMID: 37015130 DOI: 10.1109/tnnls.2023.3253472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Guided filter is a fundamental tool in computer vision and computer graphics, which aims to transfer structure information from the guide image to the target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exist significantly different edges in two images, simply transferring all structural information from the guide to the target would result in various artifacts. To cope with this problem, we propose an effective framework named deep attentional guided image filtering, the filtering process of which can fully integrate the complementary information contained in both images. Specifically, we propose an attentional kernel learning module to generate dual sets of filter kernels from the guidance and the target and then adaptively combine them by modeling the pixelwise dependency between the two images. Meanwhile, we propose a multiscale guided image filtering module to progressively generate the filtering result with the constructed kernels in a coarse-to-fine manner. Correspondingly, a multiscale fusion strategy is introduced to reuse the intermediate results in the coarse-to-fine process. Extensive experiments show that the proposed framework compares favorably with the state-of-the-art methods in a wide range of guided image filtering applications, such as guided super-resolution (SR), cross-modality restoration, and semantic segmentation. Moreover, our scheme achieved the first place in the real depth map SR challenge held in ACM ICMR 2021. The codes can be found at https://github.com/zhwzhong/DAGF.
Collapse
|
2
|
Wang X, Wang S, Xiong H, Xuan K, Zhuang Z, Liu M, Shen Z, Zhao X, Zhang L, Wang Q. Spatial attention-based implicit neural representation for arbitrary reduction of MRI slice spacing. Med Image Anal 2024; 94:103158. [PMID: 38569379 DOI: 10.1016/j.media.2024.103158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 03/10/2024] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
Magnetic resonance (MR) images collected in 2D clinical protocols typically have large inter-slice spacing, resulting in high in-plane resolution and reduced through-plane resolution. Super-resolution technique can enhance the through-plane resolution of MR images to facilitate downstream visualization and computer-aided diagnosis. However, most existing works train the super-resolution network at a fixed scaling factor, which is not friendly to clinical scenes of varying inter-slice spacing in MR scanning. Inspired by the recent progress in implicit neural representation, we propose a Spatial Attention-based Implicit Neural Representation (SA-INR) network for arbitrary reduction of MR inter-slice spacing. The SA-INR aims to represent an MR image as a continuous implicit function of 3D coordinates. In this way, the SA-INR can reconstruct the MR image with arbitrary inter-slice spacing by continuously sampling the coordinates in 3D space. In particular, a local-aware spatial attention operation is introduced to model nearby voxels and their affinity more accurately in a larger receptive field. Meanwhile, to improve the computational efficiency, a gradient-guided gating mask is proposed for applying the local-aware spatial attention to selected areas only. We evaluate our method on the public HCP-1200 dataset and the clinical knee MR dataset to demonstrate its superiority over other existing methods.
Collapse
Affiliation(s)
- Xin Wang
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Sheng Wang
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Honglin Xiong
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, 201210, China
| | - Kai Xuan
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, 210044, China
| | - Zixu Zhuang
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Mengjun Liu
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Zhenrong Shen
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Xiangyu Zhao
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
| | - Lichi Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China.
| | - Qian Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, 201210, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
| |
Collapse
|
3
|
Deng X, Xu J, Gao F, Sun X, Xu M. Deep M 2CDL: Deep Multi-Scale Multi-Modal Convolutional Dictionary Learning Network. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2770-2787. [PMID: 37983156 DOI: 10.1109/tpami.2023.3334624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
For multi-modal image processing, network interpretability is essential due to the complicated dependency across modalities. Recently, a promising research direction for interpretable network is to incorporate dictionary learning into deep learning through unfolding strategy. However, the existing multi-modal dictionary learning models are both single-layer and single-scale, which restricts the representation ability. In this paper, we first introduce a multi-scale multi-modal convolutional dictionary learning ( M2CDL) model, which is performed in a multi-layer strategy, to associate different image modalities in a coarse-to-fine manner. Then, we propose a unified framework namely Deep M2CDL derived from the M2CDL model for both multi-modal image restoration (MIR) and multi-modal image fusion (MIF) tasks. The network architecture of Deep M2CDL fully matches the optimization steps of the M2CDL model, which makes each network module with good interpretability. Different from handcrafted priors, both the dictionary and sparse feature priors are learned through the network. The performance of the proposed Deep M2CDL is evaluated on a wide variety of MIR and MIF tasks, which shows the superiority of it over many state-of-the-art methods both quantitatively and qualitatively. In addition, we also visualize the multi-modal sparse features and dictionary filters learned from the network, which demonstrates the good interpretability of the Deep M2CDL network.
Collapse
|
4
|
Liu A, Liu Y, Gu J, Qiao Y, Dong C. Blind Image Super-Resolution: A Survey and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5461-5480. [PMID: 36040934 DOI: 10.1109/tpami.2022.3203009] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Blind image super-resolution (SR), aiming to super-resolve low-resolution images with unknown degradation, has attracted increasing attention due to its significance in promoting real-world applications. Many novel and effective solutions have been proposed recently, especially with powerful deep learning techniques. Despite years of efforts, it still remains as a challenging research problem. This paper serves as a systematic review on recent progress in blind image SR, and proposes a taxonomy to categorize existing methods into three different classes according to their ways of degradation modelling and the data used to solve the SR model. This taxonomy helps summarize and distinguish among existing methods. We hope to provide insights into current research states, as well as revealing novel research directions worth exploring. In addition, we make a summary on commonly used datasets and previous competitions related to blind image SR. Last but not least, a comparison among different methods is provided with detailed analysis on their merits and demerits using both synthetic and real testing images.
Collapse
|
5
|
Ariav I, Cohen I. Fully Cross-Attention Transformer for Guided Depth Super-Resolution. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23052723. [PMID: 36904930 PMCID: PMC10007518 DOI: 10.3390/s23052723] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 02/26/2023] [Accepted: 02/27/2023] [Indexed: 06/12/2023]
Abstract
Modern depth sensors are often characterized by low spatial resolution, which hinders their use in real-world applications. However, the depth map in many scenarios is accompanied by a corresponding high-resolution color image. In light of this, learning-based methods have been extensively used for guided super-resolution of depth maps. A guided super-resolution scheme uses a corresponding high-resolution color image to infer high-resolution depth maps from low-resolution ones. Unfortunately, these methods still have texture copying problems due to improper guidance from color images. Specifically, in most existing methods, guidance from the color image is achieved by a naive concatenation of color and depth features. In this paper, we propose a fully transformer-based network for depth map super-resolution. A cascaded transformer module extracts deep features from a low-resolution depth. It incorporates a novel cross-attention mechanism to seamlessly and continuously guide the color image into the depth upsampling process. Using a window partitioning scheme, linear complexity in image resolution can be achieved, so it can be applied to high-resolution images. The proposed method of guided depth super-resolution outperforms other state-of-the-art methods through extensive experiments.
Collapse
|
6
|
Fang L, Wang X. Multi-input Unet model based on the integrated block and the aggregation connection for MRI brain tumor segmentation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Ge H, Dai Y, Zhu Z, Zang X. Single-Stage Underwater Target Detection Based on Feature Anchor Frame Double Optimization Network. SENSORS (BASEL, SWITZERLAND) 2022; 22:7875. [PMID: 36298226 PMCID: PMC9608072 DOI: 10.3390/s22207875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/29/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
OBJECTIVE The shallow underwater environment is complex, with problems of color shift, uneven illumination, blurring, and distortion in the imaging process. These scenes are very unfavorable for the reasoning of the detection network. Additionally, typical object identification algorithms struggle to maintain high resilience in underwater environments due to picture domain offset, making underwater object detection problematic. METHODS This paper proposes a single-stage detection method with the double enhancement of anchor boxes and features. The feature context relevance is improved by proposing a composite-connected backbone network. The receptive field enhancement module is introduced to enhance the multi-scale detection capability. Finally, a prediction refinement strategy is proposed, which refines the anchor frame and features through two regressions, solves the problem of feature anchor frame misalignment, and improves the detection performance of the single-stage underwater algorithm. RESULTS We achieved an effect of 80.2 mAP on the Labeled Fish in the Wild dataset, which saves some computational resources and time while still improving accuracy. On the original basis, UWNet can achieve 2.1 AP accuracy improvement due to the powerful feature extraction function and the critical role of multi-scale functional modules. At an input resolution of 300 × 300, UWNet can provide an accuracy of 32.4 AP. When choosing the number of prediction layers, the accuracy of the four and six prediction layer structures is compared. The experiments show that on the Labeled Fish in the Wild dataset, the six prediction layers are better than the four. CONCLUSION The single-stage underwater detection model UWNet proposed in this research has a double anchor frame and feature optimization. By adding three functional modules, the underwater detection of the single-stage detector is enhanced to address the issue that it is simple to miss detection while detecting small underwater targets.
Collapse
|
8
|
|
9
|
CDMC-Net: Context-Aware Image Deblurring Using a Multi-scale Cascaded Network. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10976-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
10
|
Zhang W, Zhuang P, Sun H, Li G, Kwong S, Li C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:3997-4010. [PMID: 35657839 DOI: 10.1109/tip.2022.3177129] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Underwater images typically suffer from color deviations and low visibility due to the wavelength-dependent light absorption and scattering. To deal with these degradation issues, we propose an efficient and robust underwater image enhancement method, called MLLE. Specifically, we first locally adjust the color and details of an input image according to a minimum color loss principle and a maximum attenuation map-guided fusion strategy. Afterward, we employ the integral and squared integral maps to compute the mean and variance of local image blocks, which are used to adaptively adjust the contrast of the input image. Meanwhile, a color balance strategy is introduced to balance the color differences between channel a and channel b in the CIELAB color space. Our enhanced results are characterized by vivid color, improved contrast, and enhanced details. Extensive experiments on three underwater image enhancement datasets demonstrate that our method outperforms the state-of-the-art methods. Our method is also appealing in its fast processing speed within 1s for processing an image of size 1024×1024×3 on a single CPU. Experiments further suggest that our method can effectively improve the performance of underwater image segmentation, keypoint detection, and saliency detection. The project page is available at https://li-chongyi.github.io/proj_MMLE.html.
Collapse
|
11
|
A Generic Framework for Depth Reconstruction Enhancement. J Imaging 2022; 8:jimaging8050138. [PMID: 35621902 PMCID: PMC9145806 DOI: 10.3390/jimaging8050138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/11/2022] [Accepted: 05/13/2022] [Indexed: 12/10/2022] Open
Abstract
We propose a generic depth-refinement scheme based on GeoNet, a recent deep-learning approach for predicting depth and normals from a single color image, and extend it to be applied to any depth reconstruction task such as super resolution, denoising and deblurring, as long as the task includes a depth output. Our approach utilizes a tight coupling of the inherent geometric relationship between depth and normal maps to guide a neural network. In contrast to GeoNet, we do not utilize the original input information to the backbone reconstruction task, which leads to a generic application of our network structure. Our approach first learns a high-quality normal map from the depth image generated by the backbone method and then uses this normal map to refine the initial depth image jointly with the learned normal map. This is motivated by the fact that it is hard for neural networks to learn direct mapping between depth and normal maps without explicit geometric constraints. We show the efficiency of our method on the exemplary inverse depth-image reconstruction tasks of denoising, super resolution and removal of motion blur.
Collapse
|
12
|
Liu P, Zhang Z, Meng Z, Gao N, Wang C. PDR-Net: Progressive depth reconstruction network for color guided depth map super-resolution. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Xiao Y, Wu J, Zhang J, Zhou P, Zheng Y, Leung CS, Kavan L. Interactive Deep Colorization and its Application for Image Compression. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1557-1572. [PMID: 32881687 DOI: 10.1109/tvcg.2020.3021510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent methods based on deep learning have shown promise in converting grayscale images to colored ones. However, most of them only allow limited user inputs (no inputs, only global inputs, or only local inputs), to control the output colorful images. The possible difficulty lies in how to differentiate the influences of different inputs. To solve this problem, we propose a two-stage deep colorization method allowing users to control the results by flexibly setting global inputs and local inputs. The key steps include enabling color themes as global inputs by extracting K mean colors and generating K-color maps to define a global theme loss, and designing a loss function to differentiate the influences of different inputs without causing artifacts. We also propose a color theme recommendation method to help users choose color themes. Based on the colorization model, we further propose an image compression scheme, which supports variable compression ratios in a single network. Experiments on colorization show that our method can flexibly control the colorized results with only a few inputs and generate state-of-the-art results. Experiments on compression show that our method achieves much higher image quality at the same compression ratio when compared to the state-of-the-art methods.
Collapse
|
14
|
Cui Y, Sun Y, Jian M, Zhang X, Yao T, Gao X, Li Y, Zhang Y. A novel underwater image restoration method based on decomposition network and physical imaging model. INT J INTELL SYST 2021. [DOI: 10.1002/int.22806] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Yanfang Cui
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Yujuan Sun
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Muwei Jian
- School of Information Science and Engineering Linyi University Linyi China
- School of Computer Science and Technology Shandong University of Finance and Economics Jinan China
| | - Xiaofeng Zhang
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Tao Yao
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Xin Gao
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Yiru Li
- School of Information and Electrical Engineering Ludong University Yantai China
| | - Yan Zhang
- School of Information and Electrical Engineering Ludong University Yantai China
| |
Collapse
|
15
|
Zhong Z, Liu X, Jiang J, Zhao D, Chen Z, Ji X. High-Resolution Depth Maps Imaging via Attention-Based Hierarchical Multi-Modal Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:648-663. [PMID: 34878976 DOI: 10.1109/tip.2021.3131041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a high-resolution (HR) depth map from the input low-resolution (LR) depth and its coupled HR RGB image that serves as the guidance. The most challenging issue for guided DSR is how to correctly select consistent structures and propagate them, and properly handle inconsistent ones. In this paper, we propose a novel attention-based hierarchical multi-modal fusion (AHMF) network for guided DSR. Specifically, to effectively extract and combine relevant information from LR depth and HR guidance, we propose a multi-modal attention based fusion (MMAF) strategy for hierarchical convolutional layers, including a feature enhancement block to select valuable features and a feature recalibration block to unify the similarity metrics of modalities with different appearance characteristics. Furthermore, we propose a bi-directional hierarchical feature collaboration (BHFC) module to fully leverage low-level spatial information and high-level structure information among multi-scale features. Experimental results show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
Collapse
|
16
|
Deng X, Dragotti PL. Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3333-3348. [PMID: 32248098 DOI: 10.1109/tpami.2020.2984244] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we propose a novel deep convolutional neural network to solve the general multi-modal image restoration (MIR) and multi-modal image fusion (MIF) problems. Different from other methods based on deep learning, our network architecture is designed by drawing inspirations from a new proposed multi-modal convolutional sparse coding (MCSC) model. The key feature of the proposed network is that it can automatically split the common information shared among different modalities, from the unique information that belongs to each single modality, and is therefore denoted with CU-Net, i.e., common and unique information splitting network. Specifically, the CU-Net is composed of three modules, i.e., the unique feature extraction module (UFEM), common feature preservation module (CFPM), and image reconstruction module (IRM). The architecture of each module is derived from the corresponding part in the MCSC model, which consists of several learned convolutional sparse coding (LCSC) blocks. Extensive numerical results verify the effectiveness of our method on a variety of MIR and MIF tasks, including RGB guided depth image super-resolution, flash guided non-flash image denoising, multi-focus and multi-exposure image fusion.
Collapse
|
17
|
Zhou J, Yang T, Ren W, Zhang D, Zhang W. Underwater image restoration via depth map and illumination estimation based on a single image. OPTICS EXPRESS 2021; 29:29864-29886. [PMID: 34614723 DOI: 10.1364/oe.427839] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/17/2021] [Indexed: 06/13/2023]
Abstract
For the enhancement process of underwater images taken in various water types, previous methods employ the simple image formation model, thus obtaining poor restoration results. Recently, a revised underwater image formation model (i.e., the Akkaynak-Treibitz model) has shown better robustness in underwater image restoration, but has drawn little attention due to its complexity. Herein, we develop a dehazing method utilizing the revised model, which depends on the scene depth map and a color correction method to eliminate color distortion. Specifically, we first design an underwater image depth estimation method to create the depth map. Subsequently, according to the depth value of each pixel, the backscatter is estimated and removed by the channel based on the revised model. Furthermore, we propose a color correction approach to adjust the global color distribution of the image automatically. Our method only uses a single underwater image as input to eliminate lightwave absorption and scattering influence. Compared with state-of-the-art methods, both subjective and objective experimental results show that our approach can be applied to various real-world underwater scenes and has better contrast and color.
Collapse
|
18
|
|
19
|
Li C, Anwar S, Hou J, Cong R, Guo C, Ren W. Underwater Image Enhancement via Medium Transmission-Guided Multi-Color Space Embedding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4985-5000. [PMID: 33961554 DOI: 10.1109/tip.2021.3076367] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Underwater images suffer from color casts and low contrast due to wavelength- and distance-dependent attenuation and scattering. To solve these two degradation issues, we present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor. Concretely, we first propose a multi-color space encoder network, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure. Coupled with an attention mechanism, the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted. Inspired by underwater imaging physical models, we design a medium transmission (indicating the percentage of the scene radiance reaching the camera)-guided decoder network to enhance the response of network towards quality-degraded regions. As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods. Extensive experiments demonstrate that our Ucolor achieves superior performance against state-of-the-art methods in terms of both visual quality and quantitative metrics. The code is publicly available at: https://li-chongyi.github.io/Proj_Ucolor.html.
Collapse
|
20
|
Stereo superpixel: An iterative framework based on parallax consistency and collaborative optimization. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
Ruget A, McLaughlin S, Henderson RK, Gyongy I, Halimi A, Leach J. Robust super-resolution depth imaging via a multi-feature fusion deep network. OPTICS EXPRESS 2021; 29:11917-11937. [PMID: 33984963 DOI: 10.1364/oe.415563] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 02/04/2021] [Indexed: 06/12/2023]
Abstract
The number of applications that use depth imaging is increasing rapidly, e.g. self-driving autonomous vehicles and auto-focus assist on smartphone cameras. Light detection and ranging (LIDAR) via single-photon sensitive detector (SPAD) arrays is an emerging technology that enables the acquisition of depth images at high frame rates. However, the spatial resolution of this technology is typically low in comparison to the intensity images recorded by conventional cameras. To increase the native resolution of depth images from a SPAD camera, we develop a deep network built to take advantage of the multiple features that can be extracted from a camera's histogram data. The network is designed for a SPAD camera operating in a dual-mode such that it captures alternate low resolution depth and high resolution intensity images at high frame rates, thus the system does not require any additional sensor to provide intensity images. The network then uses the intensity images and multiple features extracted from down-sampled histograms to guide the up-sampling of the depth. Our network provides significant image resolution enhancement and image denoising across a wide range of signal-to-noise ratios and photon levels. Additionally, we show that the network can be applied to other data types of SPAD data, demonstrating the generality of the algorithm.
Collapse
|
22
|
Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G, Zhang D, Huang Q. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:88-100. [PMID: 32078571 DOI: 10.1109/tcyb.2020.2969255] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Salient object detection from RGB-D images is an important yet challenging vision task, which aims at detecting the most distinctive objects in a scene by combining color information and depth constraints. Unlike prior fusion manners, we propose an attention steered interweave fusion network (ASIF-Net) to detect salient objects, which progressively integrates cross-modal and cross-level complementarity from the RGB image and corresponding depth map via steering of an attention mechanism. Specifically, the complementary features from RGB-D images are jointly extracted and hierarchically fused in a dense and interweaved manner. Such a manner breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Meanwhile, an attention mechanism is introduced to locate the potential salient regions in an attention-weighted fashion, which advances in highlighting the salient objects and suppressing the cluttered background regions. Instead of focusing only on pixelwise saliency, we also ensure that the detected salient objects have the objectness characteristics (e.g., complete structure and sharp boundary) by incorporating the adversarial learning that provides a global semantic constraint for RGB-D salient object detection. Quantitative and qualitative experiments demonstrate that the proposed method performs favorably against 17 state-of-the-art saliency detectors on four publicly available RGB-D salient object detection datasets. The code and results of our method are available at https://github.com/Li-Chongyi/ASIF-Net.
Collapse
|
23
|
Li C, Cong R, Guo C, Li H, Zhang C, Zheng F, Zhao Y. A parallel down-up fusion network for salient object detection in optical remote sensing images. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.108] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
Cong R, Lei J, Fu H, Hou J, Huang Q, Kwong S. Going From RGB to RGBD Saliency: A Depth-Guided Transformation Model. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3627-3639. [PMID: 31443060 DOI: 10.1109/tcyb.2019.2932005] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Depth information has been demonstrated to be useful for saliency detection. However, the existing methods for RGBD saliency detection mainly focus on designing straightforward and comprehensive models, while ignoring the transferable ability of the existing RGB saliency detection models. In this article, we propose a novel depth-guided transformation model (DTM) going from RGB saliency to RGBD saliency. The proposed model includes three components, that is: 1) multilevel RGBD saliency initialization; 2) depth-guided saliency refinement; and 3) saliency optimization with depth constraints. The explicit depth feature is first utilized in the multilevel RGBD saliency model to initialize the RGBD saliency by combining the global compactness saliency cue and local geodesic saliency cue. The depth-guided saliency refinement is used to further highlight the salient objects and suppress the background regions by introducing the prior depth domain knowledge and prior refined depth shape. Benefiting from the consistency of the entire object in the depth map, we formulate an optimization model to attain more consistent and accurate saliency results via an energy function, which integrates the unary data term, color smooth term, and depth consistency term. Experiments on three public RGBD saliency detection benchmarks demonstrate the effectiveness and performance improvement of the proposed DTM from RGB to RGBD saliency.
Collapse
|
25
|
Chen R, Gao W. Color-Guided Depth Map Super-Resolution Using a Dual-Branch Multi-Scale Residual Network with Channel Interaction. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20061560. [PMID: 32168872 PMCID: PMC7146598 DOI: 10.3390/s20061560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/06/2020] [Accepted: 03/07/2020] [Indexed: 06/10/2023]
Abstract
We designed an end-to-end dual-branch residual network architecture that inputs a low-resolution (LR) depth map and a corresponding high-resolution (HR) color image separately into the two branches, and outputs an HR depth map through a multi-scale, channel-wise feature extraction, interaction, and upsampling. Each branch of this network contains several residual levels at different scales, and each level comprises multiple residual groups composed of several residual blocks. A short-skip connection in every residual block and a long-skip connection in each residual group or level allow for low-frequency information to be bypassed while the main network focuses on learning high-frequency information. High-frequency information learned by each residual block in the color image branch is input into the corresponding residual block in the depth map branch, and this kind of channel-wise feature supplement and fusion can not only help the depth map branch to alleviate blur in details like edges, but also introduce some depth artifacts to feature maps. To avoid the above introduced artifacts, the channel interaction fuses the feature maps using weights referring to the channel attention mechanism. The parallel multi-scale network architecture with channel interaction for feature guidance is the main contribution of our work and experiments show that our proposed method had a better performance in terms of accuracy compared with other methods.
Collapse
Affiliation(s)
- Ruijin Chen
- National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Gao
- National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
26
|
Huang Z, Fan J, Cheng S, Yi S, Wang X, Li H. HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3429-3441. [PMID: 31902762 DOI: 10.1109/tip.2019.2960589] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dense depth cues are important and have wide applications in various computer vision tasks. In autonomous driving, LIDAR sensors are adopted to acquire depth measurements around the vehicle to perceive the surrounding environments. However, depth maps obtained by LIDAR are generally sparse because of its hardware limitation. The task of depth completion attracts increasing attention, which aims at generating a dense depth map from an input sparse depth map. To effectively utilize multi-scale features, we propose three novel sparsity-invariant operations, based on which, a sparsity-invariant multi-scale encoder-decoder network (HMS-Net) for handling sparse inputs and sparse feature maps is also proposed. Additional RGB features could be incorporated to further improve the depth completion performance. Our extensive experiments and component analysis on two public benchmarks, KITTI depth completion benchmark and NYU-depth-v2 dataset, demonstrate the effectiveness of the proposed approach. As of Aug. 12th, 2018, on KITTI depth completion leaderboard, our proposed model without RGB guidance ranks 1st among all peer-reviewed methods without using RGB information, and our model with RGB guidance ranks 2nd among all RGB-guided methods.
Collapse
|
27
|
Li C, Guo C, Ren W, Cong R, Hou J, Kwong S, Tao D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:4376-4389. [PMID: 31796402 DOI: 10.1109/tip.2019.2955241] [Citation(s) in RCA: 138] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images acquired in the wild and how we could gauge the progress in the field. To bridge this gap, we present the first comprehensive perceptual study and analysis of underwater image enhancement using large-scale real-world images. In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images. We treat the rest 60 underwater images which cannot obtain satisfactory reference images as challenging data. Using this dataset, we conduct a comprehensive study of the state-of-the-art underwater image enhancement algorithms qualitatively and quantitatively. In addition, we propose an underwater image enhancement network (called Water-Net) trained on this benchmark as a baseline, which indicates the generalization of the proposed UIEB for training Convolutional Neural Networks (CNNs). The benchmark evaluations and the proposed Water-Net demonstrate the performance and limitations of state-of-the-art algorithms, which shed light on future research in underwater image enhancement. The dataset and code are available at.
Collapse
|
28
|
Deng X, Dragotti PL. Deep Coupled ISTA Network for Multi-modal Image Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1683-1698. [PMID: 31603781 DOI: 10.1109/tip.2019.2944270] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.
Collapse
|
29
|
Cong R, Lei J, Fu H, Porikli F, Huang Q, Hou C. Video Saliency Detection via Sparsity-Based Reconstruction and Propagation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4819-4831. [PMID: 31059438 DOI: 10.1109/tip.2019.2910377] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Video saliency detection aims to continuously discover the motion-related salient objects from the video sequences. Since it needs to consider the spatial and temporal constraints jointly, video saliency detection is more challenging than image saliency detection. In this paper, we propose a new method to detect the salient objects in video based on sparse reconstruction and propagation. With the assistance of novel static and motion priors, a single-frame saliency model is first designed to represent the spatial saliency in each individual frame via the sparsity-based reconstruction. Then, through a progressive sparsity-based propagation, the sequential correspondence in the temporal space is captured to produce the inter-frame saliency map. Finally, these two maps are incorporated into a global optimization model to achieve spatio-temporal smoothness and global consistency of the salient object in the whole video. The experiments on three large-scale video saliency datasets demonstrate that the proposed method outperforms the state-of-the-art algorithms both qualitatively and quantitatively.
Collapse
|