1
|
Liu J, Wu G, Liu Z, Wang D, Jiang Z, Ma L, Zhong W, Fan X, Liu R. Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2349-2369. [PMID: 40030603 DOI: 10.1109/tpami.2024.3521416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Infrared-visible image fusion (IVIF) is a fundamental and critical task in the field of computer vision. Its aim is to integrate the unique characteristics of both infrared and visible spectra into a holistic representation. Since 2018, growing amount and diversity IVIF approaches step into a deep-learning era, encompassing introduced a broad spectrum of networks or loss functions for improving visual enhancement. As research deepens and practical demands grow, several intricate issues like data compatibility, perception accuracy, and efficiency cannot be ignored. Regrettably, there is a lack of recent surveys that comprehensively introduce and organize this expanding domain of knowledge. Given the current rapid development, this paper aims to fill the existing gap by providing a comprehensive survey that covers a wide array of aspects. Initially, we introduce a multi-dimensional framework to elucidate the prevalent learning-based IVIF methodologies, spanning topics from basic visual enhancement strategies to data compatibility, task adaptability, and further extensions. Subsequently, we delve into a profound analysis of these new approaches, offering a detailed lookup table to clarify their core ideas. Last but not the least, We also summarize performance comparisons quantitatively and qualitatively, covering registration, fusion and follow-up high-level tasks. Beyond delving into the technical nuances of these learning-based fusion approaches, we also explore potential future directions and open issues that warrant further exploration by the community.
Collapse
|
2
|
Lv J, Zeng X, Chen B, Hu M, Yang S, Qiu X, Wang Z. A stochastic structural similarity guided approach for multi-modal medical image fusion. Sci Rep 2025; 15:8792. [PMID: 40082698 PMCID: PMC11906891 DOI: 10.1038/s41598-025-93662-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 03/07/2025] [Indexed: 03/16/2025] Open
Abstract
Multi-modal medical image fusion (MMIF) aims to integrate complementary information from different modalities to obtain a fused image that contains more comprehensive details, providing clinicians with a more thorough reference for diagnosis. However, most existing deep learning-based fusion methods predominantly focus on the local statistical features within images, which limits the ability of the model to capture long-range dependencies and correlations within source images, thus compromising fusion performance. To address this issue, we propose an unsupervised image fusion method guided by stochastic structural similarity (S3IMFusion). This method incorporates a multi-scale fusion network based on CNN and Transformer modules to extract complementary information from the images effectively. During the training, a loss function with the ability to interact global contextual information was designed. Specifically, a random sorting index is generated based on the source images, and pixel features are mixed and rearranged between the fused and source images according to this index. The structural similarity loss is then computed by averaging the losses between pixel blocks of the rearranged images. This ensures that the fusion result preserves the globally correlated complementary features from the source images. Experimental results on the Harvard dataset demonstrate that S3IMFusion outperforms existing methods, achieving more accurate fusion of medical images. Additionally, we extend the method to infrared and visible image fusion tasks, with results indicating that S3IMFusion exhibits excellent generalization performance.
Collapse
Affiliation(s)
- Junhui Lv
- Department of Neurosurgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, 310016, China
| | - Xiangzhi Zeng
- Department of Automation, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bo Chen
- Department of Automation, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Mingnan Hu
- Department of Automation, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Shuxu Yang
- Department of Neurosurgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, 310016, China
| | - Xiang Qiu
- Department of Automation, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Zheming Wang
- Department of Automation, Zhejiang University of Technology, Hangzhou, 310023, China.
| |
Collapse
|
3
|
Jiang C, Qian C, Jiang Q, Zhou H, Jiang Z, Teng Y, Xu B, Li X, Ding C, Tian R. Virtual biopsy for non-invasive identification of follicular lymphoma histologic transformation using radiomics-based imaging biomarker from PET/CT. BMC Med 2025; 23:49. [PMID: 39875864 PMCID: PMC11776338 DOI: 10.1186/s12916-025-03893-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Accepted: 01/22/2025] [Indexed: 01/30/2025] Open
Abstract
BACKGROUND This study aimed to construct a radiomics-based imaging biomarker for the non-invasive identification of transformed follicular lymphoma (t-FL) using PET/CT images. METHODS A total of 784 follicular lymphoma (FL), diffuse large B-cell lymphoma, and t-FL patients from 5 independent medical centers were included. The unsupervised EMFusion method was applied to fuse PET and CT images. Deep-based radiomic features were extracted from the fusion images using a deep learning model (ResNet18). These features, along with handcrafted radiomics, were utilized to construct a radiomic signature (R-signature) using automatic machine learning in the training and internal validation cohort. The R-signature was then tested for its predictive ability in the t-FL test cohort. Subsequently, this R-signature was combined with clinical parameters and SUVmax to develop a t-FL scoring system. RESULTS The R-signature demonstrated high accuracy, with mean AUC values as 0.994 in the training cohort and 0.976 in the internal validation cohort. In the t-FL test cohort, the R-signature achieved an AUC of 0.749, with an accuracy of 75.2%, sensitivity of 68.0%, and specificity of 77.5%. Furthermore, the t-FL scoring system, incorporating the R-signature along with clinical parameters (age, LDH, and ECOG PS) and SUVmax, achieved an AUC of 0.820, facilitating the stratification of patients into low, medium, and high transformation risk groups. CONCLUSIONS This study offers a promising approach for identifying t-FL non-invasively by radiomics analysis on PET/CT images. The developed t-FL scoring system provides a valuable tool for clinical decision-making, potentially improving patient management and outcomes.
Collapse
Affiliation(s)
- Chong Jiang
- Department of Nuclear Medicine, West China Hospital, Sichuan University, Guoxue Alley, Address: No.37, Chengdu City, Sichuan, 610041, China
| | - Chunjun Qian
- School of Electrical and Information Engineering, Changzhou Institute of Technology, Changzhou, Jiangsu, China
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou, China
- Center of Medical Physics, Nanjing Medical University, Changzhou, China
| | - Qiuhui Jiang
- Department of Hematology, School of Medicine, The First Affiliated Hospital of Xiamen University and Institute of Hematology, Xiamen University, Xiamen, China
| | - Hang Zhou
- Department of Nuclear Medicine, Qilu Hospital of Shandong University, No.107, Wenhua Xilu, Lixia District, Jinan City, Shandong, 250012, China
| | - Zekun Jiang
- Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yue Teng
- Department of Nuclear Medicine, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Bing Xu
- Department of Hematology, School of Medicine, The First Affiliated Hospital of Xiamen University and Institute of Hematology, Xiamen University, Xiamen, China
| | - Xin Li
- Department of Nuclear Medicine, Qilu Hospital of Shandong University, No.107, Wenhua Xilu, Lixia District, Jinan City, Shandong, 250012, China.
| | - Chongyang Ding
- Department of Nuclear Medicine, The First Affiliated Hospital of Nanjing Medical University, Jiangsu Province Hospital, No.321, Zhongshan Road, Nanjing City, Jiangsu Province, 210008, China.
| | - Rong Tian
- Department of Nuclear Medicine, West China Hospital, Sichuan University, Guoxue Alley, Address: No.37, Chengdu City, Sichuan, 610041, China.
| |
Collapse
|
4
|
Li J, Liu J, Zhou S, Zhang Q, Kasabov NK. GeSeNet: A General Semantic-Guided Network With Couple Mask Ensemble for Medical Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:16248-16261. [PMID: 37478044 DOI: 10.1109/tnnls.2023.3293274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]
Abstract
At present, multimodal medical image fusion technology has become an essential means for researchers and doctors to predict diseases and study pathology. Nevertheless, how to reserve more unique features from different modal source images on the premise of ensuring time efficiency is a tricky problem. To handle this issue, we propose a flexible semantic-guided architecture with a mask-optimized framework in an end-to-end manner, termed as GeSeNet. Specifically, a region mask module is devised to deepen the learning of important information while pruning redundant computation for reducing the runtime. An edge enhancement module and a global refinement module are presented to modify the extracted features for boosting the edge textures and adjusting overall visual performance. In addition, we introduce a semantic module that is cascaded with the proposed fusion network to deliver semantic information into our generated results. Sufficient qualitative and quantitative comparative experiments (i.e., MRI-CT, MRI-PET, and MRI-SPECT) are deployed between our proposed method and ten state-of-the-art methods, which shows our generated images lead the way. Moreover, we also conduct operational efficiency comparisons and ablation experiments to prove that our proposed method can perform excellently in the field of multimodal medical image fusion. The code is available at https://github.com/lok-18/GeSeNet.
Collapse
|
5
|
Du Y, Li D, Hu Z, Liu S, Xia Q, Zhu J, Xu J, Yu T, Zhu D. Dual-Channel in Spatial-Frequency Domain CycleGAN for perceptual enhancement of transcranial cortical vascular structure and function. Comput Biol Med 2024; 173:108377. [PMID: 38569233 DOI: 10.1016/j.compbiomed.2024.108377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]
Abstract
Observing cortical vascular structures and functions using laser speckle contrast imaging (LSCI) at high resolution plays a crucial role in understanding cerebral pathologies. Usually, open-skull window techniques have been applied to reduce scattering of skull and enhance image quality. However, craniotomy surgeries inevitably induce inflammation, which may obstruct observations in certain scenarios. In contrast, image enhancement algorithms provide popular tools for improving the signal-to-noise ratio (SNR) of LSCI. The current methods were less than satisfactory through intact skulls because the transcranial cortical images were of poor quality. Moreover, existing algorithms do not guarantee the accuracy of dynamic blood flow mappings. In this study, we develop an unsupervised deep learning method, named Dual-Channel in Spatial-Frequency Domain CycleGAN (SF-CycleGAN), to enhance the perceptual quality of cortical blood flow imaging by LSCI. SF-CycleGAN enabled convenient, non-invasive, and effective cortical vascular structure observation and accurate dynamic blood flow mappings without craniotomy surgeries to visualize biodynamics in an undisturbed biological environment. Our experimental results showed that SF-CycleGAN achieved a SNR at least 4.13 dB higher than that of other unsupervised methods, imaged the complete vascular morphology, and enabled the functional observation of small cortical vessels. Additionally, the proposed method showed remarkable robustness and could be generalized to various imaging configurations and image modalities, including fluorescence images, without retraining.
Collapse
Affiliation(s)
- Yuwei Du
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Dongyu Li
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China; School of Optical and Electronic Information-Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Zhengwu Hu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Shaojun Liu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Qing Xia
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Jingtan Zhu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Jianyi Xu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Tingting Yu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Dan Zhu
- Britton Chance Center for Biomedical Photonics - MoE Key Laboratory for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics - Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, 430074, China.
| |
Collapse
|
6
|
Pang S, Xia H, Zhang X, Wang Z, Luo J, Li H. An enhanced visualization image acquisition method for samples with poor conductivity under a conventional scanning electron microscope. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2023; 94:123701. [PMID: 38038635 DOI: 10.1063/5.0160950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 11/04/2023] [Indexed: 12/02/2023]
Abstract
The low-vacuum and low-accelerating-voltage modes are the most simple and practical ways to directly analyze poorly conductive samples in conventional scanning electron microscopy (SEM). However, structural feature information may disappear or be obscured in these imaging modes, making it challenging to identify and analyze some local microstructures of poorly conductive samples. To overcome this challenge, an enhanced visualization image acquisition method for samples with poor conductivity is proposed based on the image registration and multi-sensor fusion technology. Experiments demonstrate that the proposed method can effectively obtain enhanced visualization images containing clearer terrain information than the SEM source images, thereby providing new references for measuring and analyzing microstructures.
Collapse
Affiliation(s)
- Shuiquan Pang
- China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong 511370, People's Republic of China
| | - Hao Xia
- China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong 511370, People's Republic of China
| | - Xianmin Zhang
- Guangdong Key Laboratory of Precision Equipment and Manufacturing Technology, School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| | - Zhizhe Wang
- China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong 511370, People's Republic of China
| | - Jun Luo
- China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong 511370, People's Republic of China
| | - Hai Li
- Guangdong Key Laboratory of Precision Equipment and Manufacturing Technology, School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| |
Collapse
|
7
|
Liu R, Liu X, Zeng S, Zhang J, Zhang Y. Value-Function-Based Sequential Minimization for Bi-Level Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15930-15948. [PMID: 37552592 DOI: 10.1109/tpami.2023.3303227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/10/2023]
Abstract
Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. However, most existing strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods able to solve BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into approximated single-level problems, we provide a new algorithm, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to address the above issues. Specifically, BVFSM constructs a series of value-function-based approximations, and thus avoids repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, time-consuming especially for high-dimensional tasks. We also extend BVFSM to address BLO with additional functional constraints. More importantly, BVFSM can be used for the challenging pessimistic BLO, which has never been properly solved before. In theory, we prove the asymptotic convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO (e.g., optimistic, pessimistic, and with constraints) with solid convergence guarantees. Extensive experiments verify the theoretical investigations and demonstrate our superiority on various real-world applications.
Collapse
|
8
|
Liu R, Liu Z, Mu P, Fan X, Luo Z. Optimization-Inspired Learning With Architecture Augmentations and Control Mechanisms for Low-Level Vision. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6075-6089. [PMID: 37922167 DOI: 10.1109/tip.2023.3328486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
In recent years, there has been a growing interest in combining learnable modules with numerical optimization to solve low-level vision tasks. However, most existing approaches focus on designing specialized schemes to generate image/feature propagation. There is a lack of unified consideration to construct propagative modules, provide theoretical analysis tools, and design effective learning mechanisms. To mitigate the above issues, this paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC for short) principles with strong generalization for diverse optimization models. Specifically, by introducing a general energy minimization model and formulating its descent direction from different viewpoints (i.e., in a generative manner, based on the discriminative metric and with optimality-based correction), we construct three propagative modules to effectively solve the optimization models with flexible combinations. We design two control mechanisms that provide the non-trivial theoretical guarantees for both fully- and partially-defined optimization formulations. Under the support of theoretical guarantees, we can introduce diverse architecture augmentation strategies such as normalization and search to ensure stable propagation with convergence and seamlessly integrate the suitable modules into the propagation respectively. Extensive experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
Collapse
|
9
|
Ye S, Wang T, Ding M, Zhang X. F-DARTS: Foveated Differentiable Architecture Search Based Multimodal Medical Image Fusion. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3348-3361. [PMID: 37285248 DOI: 10.1109/tmi.2023.3283517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multimodal medical image fusion (MMIF) is highly significant in such fields as disease diagnosis and treatment. The traditional MMIF methods are difficult to provide satisfactory fusion accuracy and robustness due to the influence of such possible human-crafted components as image transform and fusion strategies. Existing deep learning based fusion methods are generally difficult to ensure image fusion effect due to the adoption of a human-designed network structure and a relatively simple loss function and the ignorance of human visual characteristics during weight learning. To address these issues, we have presented the foveated differentiable architecture search (F-DARTS) based unsupervised MMIF method. In this method, the foveation operator is introduced into the weight learning process to fully explore human visual characteristics for the effective image fusion. Meanwhile, a distinctive unsupervised loss function is designed for network training by integrating mutual information, sum of the correlations of differences, structural similarity and edge preservation value. Based on the presented foveation operator and loss function, an end-to-end encoder-decoder network architecture will be searched using the F-DARTS to produce the fused image. Experimental results on three multimodal medical image datasets demonstrate that the F-DARTS performs better than several traditional and deep learning based fusion methods by providing visually superior fused results and better objective evaluation metrics.
Collapse
|
10
|
Luo Y, Cha H, Zuo L, Cheng P, Zhao Q. General cross-modality registration framework for visible and infrared UAV target image registration. Sci Rep 2023; 13:12941. [PMID: 37558713 PMCID: PMC10412594 DOI: 10.1038/s41598-023-39863-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/01/2023] [Indexed: 08/11/2023] Open
Abstract
In all-day-all-weather tasks, well-aligned multi-modality images pairs can provide extensive complementary information for image-guided UAV target detection. However, multi-modality images in real scenarios are often misaligned, and images registration is extremely difficult due to spatial deformation and the difficulty narrowing cross-modality discrepancy. To better overcome the obstacle, in this paper, we construct a General Cross-Modality Registration (GCMR) Framework, which explores generation registration pattern to simplify the cross-modality image registration into a easier mono-modality image registration with an Image Cross-Modality Translation Network (ICMTN) module and a Multi-level Residual Dense Registration Network (MRDRN). Specifically, ICMTN module is used to generate a pseudo infrared image taking a visible image as input and correct the distortion of structural information during the translation of image modalities. Benefiting from the favorable geometry correct ability of the ICMTN, we further employs MRDRN module which can fully extract and exploit the mutual information of misaligned images to better registered Visible and Infrared image in a mono-modality setting. We evaluate five variants of our approach on the public Anti-UAV datasets. The extensive experimental results demonstrate that the proposed architecture achieves state-of-the-art performance.
Collapse
Affiliation(s)
- Yu Luo
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Hao Cha
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Lei Zuo
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China.
| | - Peng Cheng
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Qing Zhao
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| |
Collapse
|
11
|
Yang D, Wang X, Zhu N, Li S, Hou N. MJ-GAN: Generative Adversarial Network with Multi-Grained Feature Extraction and Joint Attention Fusion for Infrared and Visible Image Fusion. SENSORS (BASEL, SWITZERLAND) 2023; 23:6322. [PMID: 37514617 PMCID: PMC10385123 DOI: 10.3390/s23146322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 06/27/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023]
Abstract
The challenging issues in infrared and visible image fusion (IVIF) are extracting and fusing as much useful information as possible contained in the source images, namely, the rich textures in visible images and the significant contrast in infrared images. Existing fusion methods cannot address this problem well due to the handcrafted fusion operations and the extraction of features only from a single scale. In this work, we solve the problems of insufficient information extraction and fusion from another perspective to overcome the difficulties in lacking textures and unhighlighted targets in fused images. We propose a multi-scale feature extraction (MFE) and joint attention fusion (JAF) based end-to-end method using a generative adversarial network (MJ-GAN) framework for the aim of IVIF. The MFE modules are embedded in the two-stream structure-based generator in a densely connected manner to comprehensively extract multi-grained deep features from the source image pairs and reuse them during reconstruction. Moreover, an improved self-attention structure is introduced into the MFEs to enhance the pertinence among multi-grained features. The merging procedure for salient and important features is conducted via the JAF network in a feature recalibration manner, which also produces the fused image in a reasonable manner. Eventually, we can reconstruct a primary fused image with the major infrared radiometric information and a small amount of visible texture information via a single decoder network. The dual discriminator with strong discriminative power can add more texture and contrast information to the final fused image. Extensive experiments on four publicly available datasets show that the proposed method ultimately achieves phenomenal performance in both visual quality and quantitative assessment compared with nine leading algorithms.
Collapse
Affiliation(s)
- Danqing Yang
- School of Optoelectronic Engineering, Xidian University, Xi'an 710071, China
| | - Xiaorui Wang
- School of Optoelectronic Engineering, Xidian University, Xi'an 710071, China
| | - Naibo Zhu
- Research Institute of System Engineering, PLA Academy of Military Science, Beijing 100091, China
| | - Shuang Li
- Research Institute of System Engineering, PLA Academy of Military Science, Beijing 100091, China
| | - Na Hou
- Research Institute of System Engineering, PLA Academy of Military Science, Beijing 100091, China
| |
Collapse
|
12
|
Liu Y, Zhou X, Zhong W. Multi-Modality Image Fusion and Object Detection Based on Semantic Information. ENTROPY (BASEL, SWITZERLAND) 2023; 25:718. [PMID: 37238472 PMCID: PMC10216995 DOI: 10.3390/e25050718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/12/2023] [Accepted: 04/24/2023] [Indexed: 05/28/2023]
Abstract
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in the degradation of important information. In addition, while many methods use various loss functions or fusion rules to retain complementary features of both modes, the fusion results often retain redundant or even invalid information.In order to accurately extract the effective information from both infrared images and visible light images without omission or redundancy, and to better serve downstream tasks such as target detection with the fused image, we propose a multi-level structure search attention fusion network based on semantic information guidance, which realizes the fusion of infrared and visible images in an end-to-end way. Our network has two main contributions: the use of neural architecture search (NAS) and the newly designed multilevel adaptive attention module (MAAB). These methods enable our network to retain the typical characteristics of the two modes while removing useless information for the detection task in the fusion results. In addition, our loss function and joint training method can establish a reliable relationship between the fusion network and subsequent detection tasks. Extensive experiments on the new dataset (M3FD) show that our fusion method has achieved advanced performance in both subjective and objective evaluations, and the mAP in the object detection task is improved by 0.5% compared to the second-best method (FusionGAN).
Collapse
Affiliation(s)
- Yong Liu
- School of Software Technology, Dalian University of Technology, Dalian 116620, China
| | - Xin Zhou
- International School of Information Science & Engineering, Dalian University of Technology, Dalian 116620, China
| | - Wei Zhong
- International School of Information Science & Engineering, Dalian University of Technology, Dalian 116620, China
| |
Collapse
|
13
|
Chang Z, Feng Z, Yang S, Gao Q. AFT: Adaptive Fusion Transformer for Visible and Infrared Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2077-2092. [PMID: 37018097 DOI: 10.1109/tip.2023.3263113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this paper, an Adaptive Fusion Transformer (AFT) is proposed for unsupervised pixel-level fusion of visible and infrared images. Different from the existing convolutional networks, transformer is adopted to model the relationship of multi-modality images and explore cross-modal interactions in AFT. The encoder of AFT uses a Multi-Head Self-attention (MSA) module and Feed Forward (FF) network for feature extraction. Then, a Multi-head Self-Fusion (MSF) module is designed for the adaptive perceptual fusion of the features. By sequentially stacking the MSF, MSA, and FF, a fusion decoder is constructed to gradually locate complementary features for recovering informative images. In addition, a structure-preserving loss is defined to enhance the visual quality of fused images. Extensive experiments are conducted on several datasets to compare our proposed AFT method with 21 popular approaches. The results show that AFT has state-of-the-art performance in both quantitative metrics and visual perception.
Collapse
|
14
|
Fang Z, Du S, Lin X, Yang J, Wang S, Shi Y. DBO-Net: Differentiable Bi-level Optimization Network for Multi-view Clustering. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
15
|
Liu R, Gao J, Zhang J, Meng D, Lin Z. Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10045-10067. [PMID: 34871167 DOI: 10.1109/tpami.2021.3132674] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO.
Collapse
|
16
|
Liu R, Li Z, Fan X, Zhao C, Huang H, Luo Z. Learning Deformable Image Registration From Optimization: Perspective, Modules, Bilevel Training and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7688-7704. [PMID: 34582346 DOI: 10.1109/tpami.2021.3115825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional deformable registration methods aim at solving an optimization model carefully designed on image pairs and their computational costs are exceptionally high. In contrast, recent deep learning-based approaches can provide fast deformation estimation. These heuristic network architectures are fully data-driven and thus lack explicit geometric constraints which are indispensable to generate plausible deformations, e.g., topology-preserving. Moreover, these learning-based approaches typically pose hyper-parameter learning as a black-box problem and require considerable computational and human effort to perform many training runs. To tackle the aforementioned problems, we propose a new learning-based framework to optimize a diffeomorphic model via multi-scale propagation. Specifically, we introduce a generic optimization model to formulate diffeomorphic registration and develop a series of learnable architectures to obtain propagative updating in the coarse-to-fine feature space. Further, we propose a new bilevel self-tuned training strategy, allowing efficient search of task-specific hyper-parameters. This training strategy increases the flexibility to various types of data while reduces computational and human burdens. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data. Extensive results demonstrate the state-of-the-art performance of the proposed method with diffeomorphic guarantee and extreme efficiency. We also apply our framework to challenging multi-modal image registration, and investigate how our registration to support the down-streaming tasks for medical image analysis including multi-modal fusion and image segmentation.
Collapse
|
17
|
Tang L, Hui Y, Yang H, Zhao Y, Tian C. Medical image fusion quality assessment based on conditional generative adversarial network. Front Neurosci 2022; 16:986153. [PMID: 36033610 PMCID: PMC9400712 DOI: 10.3389/fnins.2022.986153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 07/13/2022] [Indexed: 11/23/2022] Open
Abstract
Multimodal medical image fusion (MMIF) has been proven to effectively improve the efficiency of disease diagnosis and treatment. However, few works have explored dedicated evaluation methods for MMIF. This paper proposes a novel quality assessment method for MMIF based on the conditional generative adversarial networks. First, with the mean opinion scores (MOS) as the guiding condition, the feature information of the two source images is extracted separately through the dual channel encoder-decoder. The features of different levels in the encoder-decoder are hierarchically input into the self-attention feature block, which is a fusion strategy for self-identifying favorable features. Then, the discriminator is used to improve the fusion objective of the generator. Finally, we calculate the structural similarity index between the fake image and the true image, and the MOS corresponding to the maximum result will be used as the final assessment result of the fused image quality. Based on the established MMIF database, the proposed method achieves the state-of-the-art performance among the comparison methods, with excellent agreement with subjective evaluations, indicating that the method is effective in the quality assessment of medical fusion images.
Collapse
Affiliation(s)
- Lu Tang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Yu Hui
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Hang Yang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Yinghong Zhao
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Chuangeng Tian
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China
| |
Collapse
|
18
|
Tang W, He F, Liu Y, Duan Y. MATR: Multimodal Medical Image Fusion via Multiscale Adaptive Transformer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5134-5149. [PMID: 35901003 DOI: 10.1109/tip.2022.3193288] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Owing to the limitations of imaging sensors, it is challenging to obtain a medical image that simultaneously contains functional metabolic information and structural tissue details. Multimodal medical image fusion, an effective way to merge the complementary information in different modalities, has become a significant technique to facilitate clinical diagnosis and surgical navigation. With powerful feature representation ability, deep learning (DL)-based methods have improved such fusion results but still have not achieved satisfactory performance. Specifically, existing DL-based methods generally depend on convolutional operations, which can well extract local patterns but have limited capability in preserving global context information. To compensate for this defect and achieve accurate fusion, we propose a novel unsupervised method to fuse multimodal medical images via a multiscale adaptive Transformer termed MATR. In the proposed method, instead of directly employing vanilla convolution, we introduce an adaptive convolution for adaptively modulating the convolutional kernel based on the global complementary context. To further model long-range dependencies, an adaptive Transformer is employed to enhance the global semantic extraction capability. Our network architecture is designed in a multiscale fashion so that useful multimodal information can be adequately acquired from the perspective of different scales. Moreover, an objective function composed of a structural loss and a region mutual information loss is devised to construct constraints for information preservation at both the structural-level and the feature-level. Extensive experiments on a mainstream database demonstrate that the proposed method outperforms other representative and state-of-the-art methods in terms of both visual quality and quantitative evaluation. We also extend the proposed method to address other biomedical image fusion issues, and the pleasing fusion results illustrate that MATR has good generalization capability. The code of the proposed method is available at https://github.com/tthinking/MATR.
Collapse
|
19
|
Liu R, Jiang Z, Yang S, Fan X. Twin Adversarial Contrastive Learning for Underwater Image Enhancement and Beyond. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4922-4936. [PMID: 35849672 DOI: 10.1109/tip.2022.3190209] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Underwater images suffer from severe distortion, which degrades the accuracy of object detection performed in an underwater environment. Existing underwater image enhancement algorithms focus on the restoration of contrast and scene reflection. In practice, the enhanced images may not benefit the effectiveness of detection and even lead to a severe performance drop. In this paper, we propose an object-guided twin adversarial contrastive learning based underwater enhancement method to achieve both visual-friendly and task-orientated enhancement. Concretely, we first develop a bilateral constrained closed-loop adversarial enhancement module, which eases the requirement of paired data with the unsupervised manner and preserves more informative features by coupling with the twin inverse mapping. In addition, to confer the restored images with a more realistic appearance, we also adopt the contrastive cues in the training phase. To narrow the gap between visually-oriented and detection-favorable target images, a task-aware feedback module is embedded in the enhancement process, where the coherent gradient information of the detector is incorporated to guide the enhancement towards the detection-pleasing direction. To validate the performance, we allocate a series of prolific detectors into our framework. Extensive experiments demonstrate that the enhanced results of our method show remarkable amelioration in visual quality, the accuracy of different detectors conducted on our enhanced images has been promoted notably. Moreover, we also conduct a study on semantic segmentation to illustrate how object guidance improves high-level tasks. Code and models are available at https://github.com/Jzy2017/TACL.
Collapse
|
20
|
Li Z, Xin F, Liu R, Luo Z. Optimizing Loss Function for Uni-modal and Multi-modal Medical Registration. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93046-2_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|