1
|
Li J, Jiang J, Liang P, Ma J, Nie L. MaeFuse: Transferring Omni Features With Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1340-1353. [PMID: 40031430 DOI: 10.1109/tip.2025.3541562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In this paper, we introduce MaeFuse, a novel autoencoder model designed for Infrared and Visible Image Fusion (IVIF). The existing approaches for image fusion often rely on training combined with downstream tasks to obtain high-level visual information, which is effective in emphasizing target objects and delivering impressive results in visual quality and task-specific applications. Instead of being driven by downstream tasks, our model called MaeFuse utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks, to obtain perception friendly features with a low cost. In order to eliminate the domain gap of different modal features and the block effect caused by the MAE encoder, we further develop a guided training strategy. This strategy is meticulously crafted to ensure that the fusion layer seamlessly adjusts to the feature space of the encoder, gradually enhancing the fusion performance. The proposed method can facilitate the comprehensive integration of feature vectors from both infrared and visible modalities, thus preserving the rich details inherent in each modal. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets. The code is available at https://github.com/Henry-Lee-real/MaeFuse.
Collapse
|
2
|
Leng X, Wang X, Yue W, Jin J, Xu G. Structural tensor and frequency guided semi-supervised segmentation for medical images. Med Phys 2024; 51:8929-8942. [PMID: 39284343 DOI: 10.1002/mp.17399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/10/2024] [Accepted: 08/19/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND The method of semi-supervised semantic segmentation entails training with a limited number of labeled samples alongside many unlabeled samples, aiming to reduce dependence on pixel-level annotations. Most semi-supervised semantic segmentation methods primarily focus on sample augmentation in spatial dimensions to reduce the shortage of labeled samples. These methods tend to ignore the structural information of objects. In addition, frequency-domain information also supplies another perspective to evaluate information from images, which includes different properties compared to the spatial domain. PURPOSE In this study, we attempt to answer these two questions: (1) is it helpful to provide structural information of objects in semi-supervised semantic segmentation tasks for medical images? (2) is it more effective to evaluate the segmentation performance in the frequency domain compared to the spatial domain for semi-supervised medical image segmentation? Therefore, we seek to introduce structural and frequency information to improve the performance of semi-supervised semantic segmentation for medical images. METHODS We present a novel structural tensor loss (STL) to guide feature learning on the spatial domain for semi-supervised semantic segmentation. Specifically, STL utilizes the structural information encoded in the tensors to enforce the consistency of objects across spatial regions, thereby promoting more robust and accurate feature extraction. Additionally, we proposed a frequency-domain alignment loss (FAL) to enable the neural networks to learn frequency-domain information across different augmented samples. It leverages the inherent patterns present in frequency-domain representations to guide the network in capturing and aligning features across diverse augmentation variations, thereby enhancing the model's robustness for the inputting variations. RESULTS We conduct our experiments on three benchmark datasets, which include MRI (ACDC) for cardiac, CT (Synapse) for abdomen organs, and ultrasound image (BUSI) for breast lesion segmentation. The experimental results demonstrate that our method outperforms state-of-the-art semi-supervised approaches regarding the Dice similarity coefficient. CONCLUSIONS We find the proposed approach could improve the final performance of the semi-supervised medical image segmentation task. It will help reduce the need for medical image labels. Our code will are available at https://github.com/apple1986/STLFAL.
Collapse
Affiliation(s)
- Xuesong Leng
- School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, Hubei, China
| | - Xiaxia Wang
- School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, Hubei, China
| | - Wenbo Yue
- School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, Hubei, China
| | - Jianxiu Jin
- School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
| | - Guoping Xu
- School of Computer Science and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, Hubei, China
| |
Collapse
|
3
|
Huang J, Li X, Tan H, Cheng X. Generative Adversarial Network for Trimodal Medical Image Fusion Using Primitive Relationship Reasoning. IEEE J Biomed Health Inform 2024; 28:5729-5741. [PMID: 39093669 DOI: 10.1109/jbhi.2024.3426664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Medical image fusion has become a hot biomedical image processing technology in recent years. The technology coalesces useful information from different modal medical images onto an informative single fused image to provide reasonable and effective medical assistance. Currently, research has mainly focused on dual-modal medical image fusion, and little attention has been paid on trimodal medical image fusion, which has greater application requirements and clinical significance. For this, the study proposes an end-to-end generative adversarial network for trimodal medical image fusion. Utilizing a multi-scale squeeze and excitation reasoning attention network, the proposed method generates an energy map for each source image, facilitating efficient trimodal medical image fusion under the guidance of an energy ratio fusion strategy. To obtain the global semantic information, we introduced squeeze and excitation reasoning attention blocks and enhanced the global feature by primitive relationship reasoning. Through extensive fusion experiments, we demonstrate that our method yields superior visual results and objective evaluation metric scores compared to state-of-the-art fusion methods. Furthermore, the proposed method also obtained the best accuracy in the glioma segmentation experiment.
Collapse
|
4
|
Deng Z, Wang L, Kuai Z, Chen Q, Ye C, Scott AD, Nielles-Vallespin S, Zhu Y. Deep learning method with integrated invertible wavelet scattering for improving the quality of in vivocardiac DTI. Phys Med Biol 2024; 69:185005. [PMID: 39142339 DOI: 10.1088/1361-6560/ad6f6a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 08/14/2024] [Indexed: 08/16/2024]
Abstract
Objective.Respiratory motion, cardiac motion and inherently low signal-to-noise ratio (SNR) are major limitations ofin vivocardiac diffusion tensor imaging (DTI). We propose a novel enhancement method that uses unsupervised learning based invertible wavelet scattering (IWS) to improve the quality ofin vivocardiac DTI.Approach.Our method starts by extracting nearly transformation-invariant features from multiple cardiac diffusion-weighted (DW) image acquisitions using multi-scale wavelet scattering (WS). Then, the relationship between the WS coefficients and DW images is learned through a multi-scale encoder and a decoder network. Using the trained encoder, the deep features of WS coefficients of multiple DW image acquisitions are further extracted and then fused using an average rule. Finally, using the fused WS features and trained decoder, the enhanced DW images are derived.Main result.We evaluate the performance of the proposed method by comparing it with several methods on threein vivocardiac DTI datasets in terms of SNR, contrast to noise ratio (CNR), fractional anisotropy (FA), mean diffusivity (MD) and helix angle (HA). Comparing against the best comparison method, SNR/CNR of diastolic, gastric peristalsis influenced, and end-systolic DW images were improved by 1%/16%, 5%/6%, and 56%/30%, respectively. The approach also yielded consistent FA and MD values and more coherent helical fiber structures than the comparison methods used in this work.Significance.The ablation results verify that using the transformation-invariant and noise-robust wavelet scattering features enables us to effectively explore the useful information from the limited data, providing a potential mean to alleviate the dependence of the fusion results on the number of repeated acquisitions, which is beneficial for dealing with the issues of noise and residual motion simultaneously and therefore improving the quality ofinvivocardiac DTI. Code can be found inhttps://github.com/strawberry1996/WS-MCNN.
Collapse
Affiliation(s)
- Zeyu Deng
- Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, College of Computer Science and Technology, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China
| | - Lihui Wang
- Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, College of Computer Science and Technology, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China
| | - Zixiang Kuai
- Imaging Center, Harbin Medical University Cancer Hospital, Harbin, People's Republic of China
| | - Qijian Chen
- Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, College of Computer Science and Technology, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China
| | - Chen Ye
- Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, College of Computer Science and Technology, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China
| | - Andrew D Scott
- CMR Unit, Royal Brompton Hospital, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Sonia Nielles-Vallespin
- CMR Unit, Royal Brompton Hospital, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Yuemin Zhu
- University Lyon, INSA Lyon, CNRS, Inserm, IRP Metislab CREATIS UMR5220, U1206, Lyon 69621, France
| |
Collapse
|
5
|
Xu G, He C, Wang H, Zhu H, Ding W. DM-Fusion: Deep Model-Driven Network for Heterogeneous Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10071-10085. [PMID: 37022081 DOI: 10.1109/tnnls.2023.3238511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Heterogeneous image fusion (HIF) is an enhancement technique for highlighting the discriminative information and textural detail from heterogeneous source images. Although various deep neural network-based HIF methods have been proposed, the most widely used single data-driven manner of the convolutional neural network always fails to give a guaranteed theoretical architecture and optimal convergence for the HIF problem. In this article, a deep model-driven neural network is designed for this HIF problem, which adaptively integrates the merits of model-based techniques for interpretability and deep learning-based methods for generalizability. Unlike the general network architecture as a black box, the proposed objective function is tailored to several domain knowledge network modules to model the compact and explainable deep model-driven HIF network termed DM-fusion. The proposed deep model-driven neural network shows the feasibility and effectiveness of three parts, the specific HIF model, an iterative parameter learning scheme, and data-driven network architecture. Furthermore, the task-driven loss function strategy is proposed to achieve feature enhancement and preservation. Numerous experiments on four fusion tasks and downstream applications illustrate the advancement of DM-fusion compared with the state-of-the-art (SOTA) methods both in fusion quality and efficiency. The source code will be available soon.
Collapse
|
6
|
Hu X, Jiang J, Wang C, Liu X, Ma J. Incrementally Adapting Pretrained Model Using Network Prior for Multi-Focus Image Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3950-3963. [PMID: 38905081 DOI: 10.1109/tip.2024.3409940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
Multi-focus image fusion can fuse the clear parts of two or more source images captured at the same scene with different focal lengths into an all-in-focus image. On the one hand, previous supervised learning-based multi-focus image fusion methods relying on synthetic datasets have a clear distribution shift with real scenarios. On the other hand, unsupervised learning-based multi-focus image fusion methods can well adapt to the observed images but lack the general knowledge of defocus blur that can be learned from paired data. To avoid the problems of existing methods, this paper presents a novel multi-focus image fusion model by considering both the general knowledge brought by the supervised pretrained backbone and the extrinsic priors optimized on specific testing sample to improve the performance of image fusion. To be specific, the Incremental Network Prior Adaptation (INPA) framework is proposed to incrementally integrate features extracted from the pretrained strong baselines into a tiny prior network (6.9% parameters of the backbone network) to boost the performance for test samples. We evaluate our method on both synthetic and real-world public datasets (Lytro, MFI-WHU, and Real-MFF) and show that our method outperforms existing supervised learning-based methods and unsupervised learning based methods.
Collapse
|
7
|
Zhou H, Zeng X, Lin B, Li D, Ali Shah SA, Liu B, Guo K, Guo Z. Polarization motivating high-performance weak targets' imaging based on a dual-discriminator GAN. OPTICS EXPRESS 2024; 32:3835-3851. [PMID: 38297596 DOI: 10.1364/oe.504918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/06/2024] [Indexed: 02/02/2024]
Abstract
High-level detection of weak targets under bright light has always been an important yet challenging task. In this paper, a method of effectively fusing intensity and polarization information has been proposed to tackle this issue. Specifically, an attention-guided dual-discriminator generative adversarial network (GAN) has been designed for image fusion of these two sources, in which the fusion results can maintain rich background information in intensity images while significantly completing target information from polarization images. The framework consists of a generator and two discriminators, which retain the texture and salient information as much as possible from the source images. Furthermore, attention mechanism is introduced to focus on contextual semantic information and enhance long-term dependency. For preserving salient information, a suitable loss function has been introduced to constrain the pixel-level distribution between the result and the original image. Moreover, the real scene dataset of weak targets under bright light has been built and the effects of fusion between polarization and intensity information on different weak targets have been investigated and discussed. The results demonstrate that the proposed method outperforms other methods both in subjective evaluations and objective indexes, which prove the effectiveness of achieving accurate detection of weak targets in bright light background.
Collapse
|
8
|
AYMAZ S. CNN ve SVM yöntemleriyle çoklu-odaklı görüntü birleştirmede yeni bir hibrit yaklaşım. GAZI ÜNIVERSITESI MÜHENDISLIK MIMARLIK FAKÜLTESI DERGISI 2023; 39:1123-1136. [DOI: 10.17341/gazimmfd.1208107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Çoklu-odaklı görüntü birleştirme, aynı sahnenin farklı odak değerlerine sahip iki veya daha fazla görüntüsünün birleştirilerek tüm-odaklı bir görüntü oluşturulmasıdır. Tüm-odaklı görüntü oluşturulurken temel amaç kaynak görüntülerdeki doğru odak bilgisinin maksimum seviyede birleştirilmiş görüntüye aktarılmasıdır. Önerilen çalışmada, bu amaç doğrultusunda yeni bir hibrit yaklaşım önerilmektedir. Bu yaklaşım, görüntülerden çıkarılan önemli özelliklerin sınıflandırılması ve etkili füzyon kuralları ile birleştirilmesine dayanmaktadır. Özellik çıkarımında, özgün olarak tasarlanan ve basit sistemlerde dahi kolaylıkla çalışabilen bir CNN mimarisi kullanılmaktadır. Çıkarılan özellikler, SVM sınıflandırıcısına verilmekte ve özellik vektörünün odaklı ya da odaksız olarak sınıflandırılması sağlanmaktadır. Sınıflandırma işlemleri sonrasında her bir kaynak görüntü için ikili karar haritaları oluşturulmaktadır. Bu karar haritalarının yanında, önerilen çalışmanın özgün yönlerinden birisi de kararsız bölgelere ait haritaların da çıkarılmasıdır. Bu bölgeler, sınıflandırıcının özellik vektörlerini tam olarak sınıflandıramadığı odaklı bölgelerden odaksız bölgelere geçiş noktalarından oluşmaktadır. Görüntü birleştirmede en önemli konulardan birisi de füzyon kuralının seçimidir. Önerilen çalışmada, sınıflandırıcının kesin olarak karar verebildiği noktalar doğrudan birleştirilmiş görüntüye aktarılırken, kararsız bölgeler için iki alternatif füzyon kuralı kullanılmaktadır. Bunlar gradyan-tabanlı ve laplas-tabanlı füzyon kurallarıdır. Çalışmada her bir füzyon kuralı için, füzyon kurallarının birleştirmeye etkisi gözlemlenmektedir. Sonuç olarak, önerilen çalışmanın performansı objektif performans metrikleriyle değerlendirilmektedir. Sonuçlar incelendiğinde, çalışmanın basit sistemlerde çalışabilen etkili bir füzyon aracı olduğu görülmektedir.
Collapse
|
9
|
Zhou T, Zhang X, Lu H, Li Q, Liu L, Zhou H. GMRE-iUnet: Isomorphic Unet fusion model for PET and CT lung tumor images. Comput Biol Med 2023; 166:107514. [PMID: 37826951 DOI: 10.1016/j.compbiomed.2023.107514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/14/2023]
Abstract
Lung tumor PET and CT image fusion is a key technology in clinical diagnosis. However, the existing fusion methods are difficult to obtain fused images with high contrast, prominent morphological features, and accurate spatial localization. In this paper, an isomorphic Unet fusion model (GMRE-iUnet) for lung tumor PET and CT images is proposed to address the above problems. The main idea of this network is as following: Firstly, this paper constructs an isomorphic Unet fusion network, which contains two independent multiscale dual encoders Unet, it can capture the features of the lesion region, spatial localization, and enrich the morphological information. Secondly, a Hybrid CNN-Transformer feature extraction module (HCTrans) is constructed to effectively integrate local lesion features and global contextual information. In addition, the residual axial attention feature compensation module (RAAFC) is embedded into the Unet to capture fine-grained information as compensation features, which makes the model focus on local connections in neighboring pixels. Thirdly, a hybrid attentional feature fusion module (HAFF) is designed for multiscale feature information fusion, it aggregates edge information and detail representations using local entropy and Gaussian filtering. Finally, the experiment results on the multimodal lung tumor medical image dataset show that the model in this paper can achieve excellent fusion performance compared with other eight fusion models. In CT mediastinal window images and PET images comparison experiment, AG, EI, QAB/F, SF, SD, and IE indexes are improved by 16.19%, 26%, 3.81%, 1.65%, 3.91% and 8.01%, respectively. GMRE-iUnet can highlight the information and morphological features of the lesion areas and provide practical help for the aided diagnosis of lung tumors.
Collapse
Affiliation(s)
- Tao Zhou
- School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China; Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan, 750021, China
| | - Xiangxiang Zhang
- School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China.
| | - Huiling Lu
- School of Medical Information & Engineering, Ningxia Medical University, Yinchuan, 750004, China.
| | - Qi Li
- School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China
| | - Long Liu
- School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China
| | - Huiyu Zhou
- School of Computing and Mathematical Sciences, University of Leicester, LE1 7RH, United Kingdom
| |
Collapse
|
10
|
Xu H, Yuan J, Ma J. MURF: Mutually Reinforcing Multi-Modal Image Registration and Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12148-12166. [PMID: 37285256 DOI: 10.1109/tpami.2023.3283682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Existing image fusion methods are typically limited to aligned source images and have to "tolerate" parallaxes when images are unaligned. Simultaneously, the large variances between different modalities pose a significant challenge for multi-modal image registration. This study proposes a novel method called MURF, where for the first time, image registration and fusion are mutually reinforced rather than being treated as separate issues. MURF leverages three modules: shared information extraction module (SIEM), multi-scale coarse registration module (MCRM), and fine registration and fusion module (F2M). The registration is carried out in a coarse-to-fine manner. During coarse registration, SIEM first transforms multi-modal images into mono-modal shared information to eliminate the modal variances. Then, MCRM progressively corrects the global rigid parallaxes. Subsequently, fine registration to repair local non-rigid offsets and image fusion are uniformly implemented in F2M. The fused image provides feedback to improve registration accuracy, and the improved registration result further improves the fusion result. For image fusion, rather than solely preserving the original source information in existing methods, we attempt to incorporate texture enhancement into image fusion. We test on four types of multi-modal data (RGB-IR, RGB-NIR, PET-MRI, and CT-MRI). Extensive registration and fusion results validate the superiority and universality of MURF.
Collapse
|
11
|
Luo J, Ren W, Gao X, Cao X. Multi-Exposure Image Fusion via Deformable Self-Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1529-1540. [PMID: 37022900 DOI: 10.1109/tip.2023.3242824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most multi-exposure image fusion (MEF) methods perform unidirectional alignment within limited and local regions, which ignore the effects of augmented locations and preserve deficient global features. In this work, we propose a multi-scale bidirectional alignment network via deformable self-attention to perform adaptive image fusion. The proposed network exploits differently exposed images and aligns them to the normal exposure in varying degrees. Specifically, we design a novel deformable self-attention module that considers variant long-distance attention and interaction and implements the bidirectional alignment for image fusion. To realize adaptive feature alignment, we employ a learnable weighted summation of different inputs and predict the offsets in the deformable self-attention module, which facilitates that the model generalizes well in various scenes. In addition, the multi-scale feature extraction strategy makes the features across different scales complementary and provides fine details and contextual features. Extensive experiments demonstrate that our proposed algorithm performs favorably against state-of-the-art MEF methods.
Collapse
|
12
|
Fine-grained multi-focus image fusion based on edge features. Sci Rep 2023; 13:2478. [PMID: 36774391 PMCID: PMC9922251 DOI: 10.1038/s41598-023-29584-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/07/2023] [Indexed: 02/13/2023] Open
Abstract
Multi-focus image fusion is a process of fusing multiple images of different focus areas into a total focus image, which has important application value. In view of the defects of the current fusion method in the detail information retention effect of the original image, a fusion architecture based on two stages is designed. In the training phase, combined with the polarized self-attention module and the DenseNet network structure, an encoder-decoder structure network is designed for image reconstruction tasks to enhance the original information retention ability of the model. In the fusion stage, combined with the encoded feature map, a fusion strategy based on edge feature map is designed for image fusion tasks to enhance the attention ability of detail information in the fusion process. Compared with 9 classical fusion algorithms, the proposed algorithm has achieved advanced fusion performance in both subjective and objective evaluations, and the fused image has better information retention effect on the original image.
Collapse
|
13
|
Aymaz S, Köse C, Aymaz Ş. A novel approach with the dynamic decision mechanism (DDM) in multi-focus image fusion. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1821-1871. [DOI: 10.1007/s11042-022-13323-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 05/16/2022] [Accepted: 05/30/2022] [Indexed: 10/04/2024]
|
14
|
Chen Y, Wan M, Xu Y, Cao X, Zhang X, Chen Q, Gu G. Unsupervised end-to-end infrared and visible image fusion network using learnable fusion strategy. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2022; 39:2257-2270. [PMID: 36520746 DOI: 10.1364/josaa.473908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/21/2022] [Indexed: 06/17/2023]
Abstract
Infrared and visible image fusion aims to reconstruct fused images with comprehensive visual information by merging the complementary features of source images captured by different imaging sensors. This technology has been widely used in civil and military fields, such as urban security monitoring, remote sensing measurement, and battlefield reconnaissance. However, the existing methods still suffer from the preset fusion strategies that cannot be adjustable to different fusion demands and the loss of information during the feature propagation process, thereby leading to the poor generalization ability and limited fusion performance. Therefore, we propose an unsupervised end-to-end network with learnable fusion strategy for infrared and visible image fusion in this paper. The presented network mainly consists of three parts, including the feature extraction module, the fusion strategy module, and the image reconstruction module. First, in order to preserve more information during the process of feature propagation, dense connections and residual connections are applied to the feature extraction module and the image reconstruction module, respectively. Second, a new convolutional neural network is designed to adaptively learn the fusion strategy, which is able to enhance the generalization ability of our algorithm. Third, due to the lack of ground truth in fusion tasks, a loss function that consists of saliency loss and detail loss is exploited to guide the training direction and balance the retention of different types of information. Finally, the experimental results verify that the proposed algorithm delivers competitive performance when compared with several state-of-the-art algorithms in terms of both subjective and objective evaluations. Our codes are available at https://github.com/MinjieWan/Unsupervised-end-to-end-infrared-and-visible-image-fusion-network-using-learnable-fusion-strategy.
Collapse
|
15
|
Wang L, Yoon KJ. Deep Learning for HDR Imaging: State-of-the-Art and Future Trends. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8874-8895. [PMID: 34714739 DOI: 10.1109/tpami.2021.3123686] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.
Collapse
|
16
|
Li B, Hwang JN, Liu Z, Li C, Wang Z. PET and MRI image fusion based on a dense convolutional network with dual attention. Comput Biol Med 2022; 151:106339. [PMID: 36459810 DOI: 10.1016/j.compbiomed.2022.106339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/27/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022]
Abstract
The fusion techniques of different modalities in medical images, e.g., Positron Emission Tomography (PET) and Magnetic Resonance Imaging (MRI), are increasingly significant in many clinical applications by integrating the complementary information from different medical images. In this paper, we propose a novel fusion model based on a dense convolutional network with dual attention (CSpA-DN) for PET and MRI images. In our framework, an encoder composed of the densely connected neural network is constructed to extract features from source images, and a decoder network is employed to generate the fused image from these features. Simultaneously, a dual-attention module is introduced in the encoder and decoder to further integrate local features along with their global dependencies adaptively. In the dual-attention module, a spatial attention block is leveraged to extract features of each point from encoder network by a weighted sum of feature information at all positions. Meanwhile, the interdependent correlation of all image features is aggregated via a module of channel attention. In addition, we design a specific loss function including image loss, structural loss, gradient loss and perception loss to preserve more structural and detail information and sharpen the edges of targets. Our approach facilitates the fused images to not only preserve abundant functional information from PET images but also retain rich detail structures of MRI images. Experimental results on publicly available datasets illustrate the superiorities of CSpA-DN model compared with state-of-the-art methods according to both qualitative observation and objective assessment.
Collapse
Affiliation(s)
- Bicao Li
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, 450007, China; School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, 450000, China.
| | - Jenq-Neng Hwang
- Department of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA.
| | - Zhoufeng Liu
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, 450007, China.
| | - Chunlei Li
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, 450007, China.
| | - Zongmin Wang
- School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, China; Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, 450000, China.
| |
Collapse
|
17
|
Liu J, Duan J, Hao Y, Chen G, Zhang H. Semantic-guided polarization image fusion method based on a dual-discriminator GAN. OPTICS EXPRESS 2022; 30:43601-43621. [PMID: 36523055 DOI: 10.1364/oe.472214] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/11/2022] [Indexed: 06/17/2023]
Abstract
Polarization image fusion is the process of fusing an intensity image and a polarization parameter image solved by Stokes vector into a more detailed image. Conventional polarization image fusion strategies lack the targeting and robustness for fusing different targets in the images because they do not account for the differences in the characterization of the polarization properties of different materials, and the fusion rule is manually designed. Therefore, we propose a novel end-to-end network model called a semantic guided dual discriminator generative adversarial network (SGPF-GAN) to solve the polarization image fusion problem. We have specifically created a polarization image information quality discriminator (PIQD) block to guide the fusion process by employing this block in a weighted way. The network establishes an adversarial game relationship between a generator and two discriminators. The goal of the generator is to generate a fused image by weighted fusion of each semantic object of the image, the dual discriminator's objective is to identify specific modalities (polarization/intensity) of various semantic targets. The results of qualitative and quantitative evaluations demonstrate the superiority of our SGPF-GAN in terms of visual effects and quantitative measures. Additionally, using this fusion approach to transparent, camouflaged hidden target detection and image segmentation can significantly boost the performance.
Collapse
|
18
|
Kong W, Li C, Lei Y. Multimodal medical image fusion using convolutional neural network and extreme learning machine. Front Neurorobot 2022; 16:1050981. [PMID: 36467563 PMCID: PMC9708736 DOI: 10.3389/fnbot.2022.1050981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 10/28/2022] [Indexed: 08/27/2023] Open
Abstract
The emergence of multimodal medical imaging technology greatly increases the accuracy of clinical diagnosis and etiological analysis. Nevertheless, each medical imaging modal unavoidably has its own limitations, so the fusion of multimodal medical images may become an effective solution. In this paper, a novel fusion method on the multimodal medical images exploiting convolutional neural network (CNN) and extreme learning machine (ELM) is proposed. As a typical representative in deep learning, CNN has been gaining more and more popularity in the field of image processing. However, CNN often suffers from several drawbacks, such as high computational costs and intensive human interventions. To this end, the model of convolutional extreme learning machine (CELM) is constructed by incorporating ELM into the traditional CNN model. CELM serves as an important tool to extract and capture the features of the source images from a variety of different angles. The final fused image can be obtained by integrating the significant features together. Experimental results indicate that, the proposed method is not only helpful to enhance the accuracy of the lesion detection and localization, but also superior to the current state-of-the-art ones in terms of both subjective visual performance and objective criteria.
Collapse
Affiliation(s)
- Weiwei Kong
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, China
| | - Chi Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, China
| | - Yang Lei
- College of Cryptography Engineering, Engineering University of PAP, Xi'an, China
| |
Collapse
|
19
|
Zhang X. Deep Learning-Based Multi-Focus Image Fusion: A Survey and a Comparative Study. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4819-4838. [PMID: 33974542 DOI: 10.1109/tpami.2021.3078906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Multi-focus image fusion (MFIF) is an important area in image processing. Since 2017, deep learning has been introduced to the field of MFIF and various methods have been proposed. However, there is a lack of survey papers that discuss deep learning-based MFIF methods in detail. In this study, we fill this gap by giving a detailed survey on deep learning-based MFIF algorithms, including methods, datasets and evaluation metrics. To the best of our knowledge, this is the first survey paper that focuses on deep learning-based approaches in the field of MFIF. Besides, extensive experiments have been conducted to compare the performance of deep learning-based MFIF algorithms with conventional MFIF approaches. By analyzing qualitative and quantitative results, we give some observations on the current status of MFIF and discuss some future prospects of this field.
Collapse
|
20
|
|
21
|
Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07635-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
22
|
Wang Z, Li X, Duan H, Zhang X. A Self-Supervised Residual Feature Learning Model for Multifocus Image Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4527-4542. [PMID: 35737635 DOI: 10.1109/tip.2022.3184250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multi-focus image fusion (MFIF) attempts to achieve an "all-focused" image from multiple source images with the same scene but different focused objects. Given the lack of multi-focus image sets for network training, we propose a self-supervised residual feature learning model in this paper. The model consists of a feature extraction network and a fusion module. We select image super-resolution as a pretext task in the MFIF field, which is supported by a new residual gradient prior discovered by our theoretical study for low- and high-resolution (LR-HR) image pairs, as well as for multi-focus images. In the pretext task, our network's training set is LR-HR image pairs generated from natural images, and HR images can be regarded as pseudo-labels of LR images. In the fusion task, the trained network extracts residual features of multi-focus images firstly. Secondly, the fusion module, consisting of an activity level measurement and a new boundary refinement method, is leveraged for the features to generated decision maps. Experimental results, both subjective evaluations and objective evaluations, demonstrate that our approach outperforms other state-of-the-art fusion algorithms.
Collapse
|
23
|
Multimodal medical image fusion based on multichannel coupled neural P systems and max-cloud models in spectral total variation domain. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
Liang X, Jung C. Deep Cross Spectral Stereo Matching Using Multi-Spectral Image Fusion. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3155202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
25
|
Wang Y, Jin X, Yang J, Jiang Q, Tang Y, Wang P, Lee SJ. Color multi-focus image fusion based on transfer learning. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-211434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Multi-focus image fusion is a technique that integrates the focused areas in a pair or set of source images with the same scene into a fully focused image. Inspired by transfer learning, this paper proposes a novel color multi-focus image fusion method based on deep learning. First, color multi-focus source images are fed into VGG-19 network, and the parameters of convolutional layer of the VGG-19 network are then migrated to a neural network containing multilayer convolutional layers and multilayer skip-connection structures for feature extraction. Second, the initial decision maps are generated using the reconstructed feature maps of a deconvolution module. Third, the initial decision maps are refined and processed to obtain the second decision maps, and then the source images are fused to obtain the initial fused images based on the second decision maps. Finally, the final fused image is produced by comparing the QABF metrics of the initial fused images. The experimental results show that the proposed method can effectively improve the segmentation performance of the focused and unfocused areas in the source images, and the generated fused images are superior in both subjective and objective metrics compared with most contrast methods.
Collapse
Affiliation(s)
- Yun Wang
- School of Software, Yunnan University, Kunming, Yunnan, China
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
| | - Xin Jin
- School of Software, Yunnan University, Kunming, Yunnan, China
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
| | - Jie Yang
- School of Software, Yunnan University, Kunming, Yunnan, China
- School of Physics and Electronic Science, Normal University, Zunyi, China
| | - Qian Jiang
- School of Software, Yunnan University, Kunming, Yunnan, China
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
| | - Yue Tang
- School of Mathematics and Statistics, Yunnan University, Kunming, Yunnan, China
| | - Puming Wang
- School of Software, Yunnan University, Kunming, Yunnan, China
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
| | - Shin-Jye Lee
- Institute of Technology Management, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
26
|
Ma B, Yin X, Wu D, Shen H, Ban X, Wang Y. End-to-end learning for simultaneously generating decision map and multi-focus image fusion result. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.115] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
27
|
Jung H, Kim Y, Jang H, Ha N, Sohn K. Multi-Task Learning Framework for Motion Estimation and Dynamic Scene Deblurring. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8170-8183. [PMID: 34550887 DOI: 10.1109/tip.2021.3113185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Motion blur, which disturbs human and machine perceptions of a scene, has been considered an unnecessary artifact that should be removed. However, the blur can be a useful clue to understanding the dynamic scene, since various sources of motion generate different types of artifacts. Motivated by the relationship between motion and blur, we propose a motion-aware feature learning framework for dynamic scene deblurring through multi-task learning. Our multi-task framework simultaneously estimates a deblurred image and a motion field from a blurred image. We design the encoder-decoder architectures for two tasks, and the encoder part is shared between them. Our motion estimation network could effectively distinguish between different types of blur, which facilitates image deblurring. Understanding implicit motion information through image deblurring could improve the performance of motion estimation. In addition to sharing the network between two tasks, we propose a reblurring loss function to optimize the overall parameters in our multi-task architecture. We provide an intensive analysis of complementary tasks to show the effectiveness of our multi-task framework. Furthermore, the experimental results demonstrate that the proposed method outperforms the state-of-the-art deblurring methods with respect to both qualitative and quantitative evaluations.
Collapse
|
28
|
Multi-Spectral Fusion and Denoising of Color and Near-Infrared Images Using Multi-Scale Wavelet Analysis. SENSORS 2021; 21:s21113610. [PMID: 34067310 PMCID: PMC8196879 DOI: 10.3390/s21113610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 05/16/2021] [Accepted: 05/18/2021] [Indexed: 11/16/2022]
Abstract
We formulate multi-spectral fusion and denoising for the luminance channel as a maximum a posteriori estimation problem in the wavelet domain. To deal with the discrepancy between RGB and near infrared (NIR) data in fusion, we build a discrepancy model and introduce the wavelet scale map. The scale map adjusts the wavelet coefficients of NIR data to have the same distribution as the RGB data. We use the priors of the wavelet scale map and its gradient as the contrast preservation term and gradient denoising term, respectively. Specifically, we utilize the local contrast and visibility measurements in the contrast preservation term to transfer the selected NIR data to the fusion result. We also use the gradient of NIR wavelet coefficients as the weight for the gradient denoising term in the wavelet scale map. Based on the wavelet scale map, we perform fusion of the RGB and NIR wavelet coefficients in the base and detail layers. To remove noise, we model the prior of the fused wavelet coefficients using NIR-guided Laplacian distributions. In the chrominance channels, we remove noise guided by the fused luminance channel. Based on the luminance variation after fusion, we further enhance the color of the fused image. Our experimental results demonstrated that the proposed method successfully performed the fusion of RGB and NIR images with noise reduction, detail preservation, and color enhancement.
Collapse
|
29
|
Li H, Cen Y, Liu Y, Chen X, Yu Z. Different Input Resolutions and Arbitrary Output Resolution: A Meta Learning-Based Deep Framework for Infrared and Visible Image Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4070-4083. [PMID: 33798086 DOI: 10.1109/tip.2021.3069339] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Infrared and visible image fusion has gained ever-increasing attention in recent years due to its great significance in a variety of vision-based applications. However, existing fusion methods suffer from some limitations in terms of the spatial resolutions of both input source images and output fused image, which prevents their practical usage to a great extent. In this paper, we propose a meta learning-based deep framework for the fusion of infrared and visible images. Unlike most existing methods, the proposed framework can accept the source images of different resolutions and generate the fused image of arbitrary resolution just with a single learned model. In the proposed framework, the features of each source image are first extracted by a convolutional network and upscaled by a meta-upscale module with an arbitrary appropriate factor according to practical requirements. Then, a dual attention mechanism-based feature fusion module is developed to combine features from different source images. Finally, a residual compensation module, which can be iteratively adopted in the proposed framework, is designed to enhance the capability of our method in detail extraction. In addition, the loss function is formulated in a multi-task learning manner via simultaneous fusion and super-resolution, aiming to improve the effect of feature learning. And, a new contrast loss inspired by a perceptual contrast enhancement approach is proposed to further improve the contrast of the fused image. Extensive experiments on widely-used fusion datasets demonstrate the effectiveness and superiority of the proposed method. The code of the proposed method is publicly available at https://github.com/yuliu316316/MetaLearning-Fusion.
Collapse
|
30
|
Deng X, Zhang Y, Xu M, Gu S, Duan Y. Deep Coupled Feedback Network for Joint Exposure Fusion and Image Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3098-3112. [PMID: 33600315 DOI: 10.1109/tip.2021.3058764] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays, people are getting used to taking photos to record their daily life, however, the photos are actually not consistent with the real natural scenes. The two main differences are that the photos tend to have low dynamic range (LDR) and low resolution (LR), due to the inherent imaging limitations of cameras. The multi-exposure image fusion (MEF) and image super-resolution (SR) are two widely-used techniques to address these two issues. However, they are usually treated as independent researches. In this paper, we propose a deep Coupled Feedback Network (CF-Net) to achieve MEF and SR simultaneously. Given a pair of extremely over-exposed and under-exposed LDR images with low-resolution, our CF-Net is able to generate an image with both high dynamic range (HDR) and high-resolution. Specifically, the CF-Net is composed of two coupled recursive sub-networks, with LR over-exposed and under-exposed images as inputs, respectively. Each sub-network consists of one feature extraction block (FEB), one super-resolution block (SRB) and several coupled feedback blocks (CFB). The FEB and SRB are to extract high-level features from the input LDR image, which are required to be helpful for resolution enhancement. The CFB is arranged after SRB, and its role is to absorb the learned features from the SRBs of the two sub-networks, so that it can produce a high-resolution HDR image. We have a series of CFBs in order to progressively refine the fused high-resolution HDR image. Extensive experimental results show that our CF-Net drastically outperforms other state-of-the-art methods in terms of both SR accuracy and fusion performance. The software code is available here https://github.com/ytZhang99/CF-Net.
Collapse
|
31
|
Li H, Yang M, Yu Z. Joint image fusion and super-resolution for enhanced visualization via semi-coupled discriminative dictionary learning and advantage embedding. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Multi-focus image fusion algorithm based on supervised learning for fully convolutional neural network. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2020.11.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
33
|
|
34
|
Pyramid Inter-Attention for High Dynamic Range Imaging. SENSORS 2020; 20:s20185102. [PMID: 32906841 PMCID: PMC7570613 DOI: 10.3390/s20185102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 07/26/2020] [Accepted: 08/09/2020] [Indexed: 11/17/2022]
Abstract
This paper proposes a novel approach to high-dynamic-range (HDR) imaging of dynamic scenes to eliminate ghosting artifacts in HDR images when in the presence of severe misalignment (large object or camera motion) in input low-dynamic-range (LDR) images. Recent non-flow-based methods suffer from ghosting artifacts in the presence of large object motion. Flow-based methods face the same issue since their optical flow algorithms yield huge alignment errors. To eliminate ghosting artifacts, we propose a simple yet effective alignment network for solving the misalignment. The proposed pyramid inter-attention module (PIAM) performs alignment of LDR features by leveraging inter-attention maps. Additionally, to boost the representation of aligned features in the merging process, we propose a dual excitation block (DEB) that recalibrates each feature both spatially and channel-wise. Exhaustive experimental results demonstrate the effectiveness of the proposed PIAM and DEB, achieving state-of-the-art performance in terms of producing ghost-free HDR images.
Collapse
|