1
|
Cao B, Qi G, Zhao J, Zhu P, Hu Q, Gao X. RTF: Recursive TransFusion for Multi-Modal Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1573-1587. [PMID: 40031796 DOI: 10.1109/tip.2025.3541877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Multi-modal image synthesis is crucial for obtaining complete modalities due to the imaging restrictions in reality. Current methods, primarily CNN-based models, find it challenging to extract global representations because of local inductive bias, leading to synthetic structure deformation or color distortion. Despite the significant global representation ability of transformer in capturing long-range dependencies, its huge parameter size requires considerable training data. Multi-modal synthesis solely based on one of the two structures makes it hard to extract comprehensive information from each modality with limited data. To tackle this dilemma, we propose a simple yet effective Recursive TransFusion (RTF) framework for multi-modal image synthesis. Specifically, we develop a TransFusion unit to integrate local knowledge extracted from the individual modality by connecting a CNN-based local representation block (LRB) and a transformer-based global fusion block (GFB) via a feature translating gate (FTG). Considering the numerous parameters introduced by the transformer, we further unfold a TransFusion unit with recursive constraint repeatedly, forming recursive TransFusion (RTF), which progressively extracts multi-modal information at different depths. Our RTF remarkably reduces network parameters while maintaining superior performance. Extensive experiments validate our superiority against the competing methods on multiple benchmarks. The source code will be available at https://github.com/guoliangq/RTF.
Collapse
|
2
|
Gou F, Liu J, Xiao C, Wu J. Research on Artificial-Intelligence-Assisted Medicine: A Survey on Medical Artificial Intelligence. Diagnostics (Basel) 2024; 14:1472. [PMID: 39061610 PMCID: PMC11275417 DOI: 10.3390/diagnostics14141472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open
Abstract
With the improvement of economic conditions and the increase in living standards, people's attention in regard to health is also continuously increasing. They are beginning to place their hopes on machines, expecting artificial intelligence (AI) to provide a more humanized medical environment and personalized services, thus greatly expanding the supply and bridging the gap between resource supply and demand. With the development of IoT technology, the arrival of the 5G and 6G communication era, and the enhancement of computing capabilities in particular, the development and application of AI-assisted healthcare have been further promoted. Currently, research on and the application of artificial intelligence in the field of medical assistance are continuously deepening and expanding. AI holds immense economic value and has many potential applications in regard to medical institutions, patients, and healthcare professionals. It has the ability to enhance medical efficiency, reduce healthcare costs, improve the quality of healthcare services, and provide a more intelligent and humanized service experience for healthcare professionals and patients. This study elaborates on AI development history and development timelines in the medical field, types of AI technologies in healthcare informatics, the application of AI in the medical field, and opportunities and challenges of AI in the field of medicine. The combination of healthcare and artificial intelligence has a profound impact on human life, improving human health levels and quality of life and changing human lifestyles.
Collapse
Affiliation(s)
- Fangfang Gou
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Jun Liu
- The Second People's Hospital of Huaihua, Huaihua 418000, China
| | - Chunwen Xiao
- The Second People's Hospital of Huaihua, Huaihua 418000, China
| | - Jia Wu
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- Research Center for Artificial Intelligence, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
3
|
Yang H, Sun J, Xu Z. Learning Unified Hyper-Network for Multi-Modal MR Image Synthesis and Tumor Segmentation With Missing Modalities. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3678-3689. [PMID: 37540616 DOI: 10.1109/tmi.2023.3301934] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/06/2023]
Abstract
Accurate segmentation of brain tumors is of critical importance in clinical assessment and treatment planning, which requires multiple MR modalities providing complementary information. However, due to practical limits, one or more modalities may be missing in real scenarios. To tackle this problem, existing methods need to train multiple networks or a unified but fixed network for various possible missing modality cases, which leads to high computational burdens or sub-optimal performance. In this paper, we propose a unified and adaptive multi-modal MR image synthesis method, and further apply it to tumor segmentation with missing modalities. Based on the decomposition of multi-modal MR images into common and modality-specific features, we design a shared hyper-encoder for embedding each available modality into the feature space, a graph-attention-based fusion block to aggregate the features of available modalities to the fused features, and a shared hyper-decoder for image reconstruction. We also propose an adversarial common feature constraint to enforce the fused features to be in a common space. As for missing modality segmentation, we first conduct the feature-level and image-level completion using our synthesis method and then segment the tumors based on the completed MR images together with the extracted common features. Moreover, we design a hypernet-based modulation module to adaptively utilize the real and synthetic modalities. Experimental results suggest that our method can not only synthesize reasonable multi-modal MR images, but also achieve state-of-the-art performance on brain tumor segmentation with missing modalities.
Collapse
|
4
|
Nouri H, Nasri R, Abtahi SH. Addressing inter-device variations in optical coherence tomography angiography: will image-to-image translation systems help? Int J Retina Vitreous 2023; 9:51. [PMID: 37644613 PMCID: PMC10466880 DOI: 10.1186/s40942-023-00491-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 08/17/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Optical coherence tomography angiography (OCTA) is an innovative technology providing visual and quantitative data on retinal microvasculature in a non-invasive manner. MAIN BODY Due to variations in the technical specifications of different OCTA devices, there are significant inter-device differences in OCTA data, which can limit their comparability and generalizability. These variations can also result in a domain shift problem that may interfere with applicability of machine learning models on data obtained from different OCTA machines. One possible approach to address this issue may be unsupervised deep image-to-image translation leveraging systems such as Cycle-Consistent Generative Adversarial Networks (Cycle-GANs) and Denoising Diffusion Probabilistic Models (DDPMs). Through training on unpaired images from different device domains, Cycle-GANs and DDPMs may enable cross-domain translation of images. They have been successfully applied in various medical imaging tasks, including segmentation, denoising, and cross-modality image-to-image translation. In this commentary, we briefly describe how Cycle-GANs and DDPMs operate, and review the recent experiments with these models on medical and ocular imaging data. We then discuss the benefits of applying such techniques for inter-device translation of OCTA data and the potential challenges ahead. CONCLUSION Retinal imaging technologies and deep learning-based domain adaptation techniques are rapidly evolving. We suggest exploring the potential of image-to-image translation methods in improving the comparability of OCTA data from different centers or devices. This may facilitate more efficient analysis of heterogeneous data and broader applicability of machine learning models trained on limited datasets in this field.
Collapse
Affiliation(s)
- Hosein Nouri
- Ophthalmic Research Center, Research Institute for Ophthalmology and Vision Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Reza Nasri
- School of Engineering, University of Isfahan, Isfahan, Iran
| | - Seyed-Hossein Abtahi
- Ophthalmic Research Center, Research Institute for Ophthalmology and Vision Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Department of Ophthalmology, Torfe Medical Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Chen J, Chen S, Wee L, Dekker A, Bermejo I. Deep learning based unpaired image-to-image translation applications for medical physics: a systematic review. Phys Med Biol 2023; 68. [PMID: 36753766 DOI: 10.1088/1361-6560/acba74] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 02/08/2023] [Indexed: 02/10/2023]
Abstract
Purpose. There is a growing number of publications on the application of unpaired image-to-image (I2I) translation in medical imaging. However, a systematic review covering the current state of this topic for medical physicists is lacking. The aim of this article is to provide a comprehensive review of current challenges and opportunities for medical physicists and engineers to apply I2I translation in practice.Methods and materials. The PubMed electronic database was searched using terms referring to unpaired (unsupervised), I2I translation, and medical imaging. This review has been reported in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. From each full-text article, we extracted information extracted regarding technical and clinical applications of methods, Transparent Reporting for Individual Prognosis Or Diagnosis (TRIPOD) study type, performance of algorithm and accessibility of source code and pre-trained models.Results. Among 461 unique records, 55 full-text articles were included in the review. The major technical applications described in the selected literature are segmentation (26 studies), unpaired domain adaptation (18 studies), and denoising (8 studies). In terms of clinical applications, unpaired I2I translation has been used for automatic contouring of regions of interest in MRI, CT, x-ray and ultrasound images, fast MRI or low dose CT imaging, CT or MRI only based radiotherapy planning, etc Only 5 studies validated their models using an independent test set and none were externally validated by independent researchers. Finally, 12 articles published their source code and only one study published their pre-trained models.Conclusion. I2I translation of medical images offers a range of valuable applications for medical physicists. However, the scarcity of external validation studies of I2I models and the shortage of publicly available pre-trained models limits the immediate applicability of the proposed methods in practice.
Collapse
Affiliation(s)
- Junhua Chen
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands
| | - Shenlun Chen
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands
| | - Leonard Wee
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands
| | - Inigo Bermejo
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands
| |
Collapse
|
6
|
Tang Z, Duan J, Sun Y, Zeng Y, Zhang Y, Yao X. A combined deformable model and medical transformer algorithm for medical image segmentation. Med Biol Eng Comput 2023; 61:129-137. [PMID: 36323981 PMCID: PMC9816223 DOI: 10.1007/s11517-022-02702-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 10/19/2022] [Indexed: 01/07/2023]
Abstract
Deep learning-based segmentation models usually require substantial data, and the model usually suffers from poor generalization due to the lack of training data and inefficient network structure. We proposed to combine the deformable model and medical transformer neural network on the image segmentation task to alleviate the aforementioned problems. The proposed method first employs a statistical shape model to generate simulated contours of the target object, and then the thin plate spline is applied to create a realistic texture. Finally, a medical transformer network was constructed to segment three types of medical images, including prostate MR image, heart US image, and tongue color images. The segmentation accuracy of the three tasks achieved 89.97%, 91.90%, and 94.25%, respectively. The experimental results show that the proposed method improves medical image segmentation performance.
Collapse
Affiliation(s)
- Zhixian Tang
- grid.507037.60000 0004 1764 1277College of Medical Imaging, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China ,grid.507037.60000 0004 1764 1277Radiology Department, Shanghai University of Medicine & Health Sciences Affiliated Jiading Hospital, Shanghai, 201800 China
| | - Jintao Duan
- grid.507037.60000 0004 1764 1277College of Medical Instrumentation, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China
| | - Yanming Sun
- grid.507037.60000 0004 1764 1277College of Medical Instrumentation, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China
| | - Yanan Zeng
- grid.507037.60000 0004 1764 1277College of Medical Instrumentation, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China
| | - Yile Zhang
- grid.507037.60000 0004 1764 1277College of Medical Instrumentation, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China
| | - Xufeng Yao
- grid.507037.60000 0004 1764 1277College of Medical Imaging, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China
| |
Collapse
|
7
|
Iqbal A, Sharif M, Yasmin M, Raza M, Aftab S. Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL 2022; 11:333-368. [PMID: 35821891 PMCID: PMC9264294 DOI: 10.1007/s13735-022-00240-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/16/2022] [Accepted: 05/24/2022] [Indexed: 05/13/2023]
Abstract
Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.
Collapse
Affiliation(s)
- Ahmed Iqbal
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | - Muhammad Sharif
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | - Mussarat Yasmin
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | - Mudassar Raza
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Pakistan
| | - Shabib Aftab
- Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan
| |
Collapse
|
8
|
Ali H, Biswas MR, Mohsen F, Shah U, Alamgir A, Mousa O, Shah Z. The role of generative adversarial networks in brain MRI: a scoping review. Insights Imaging 2022; 13:98. [PMID: 35662369 PMCID: PMC9167371 DOI: 10.1186/s13244-022-01237-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 05/11/2022] [Indexed: 11/23/2022] Open
Abstract
The performance of artificial intelligence (AI) for brain MRI can improve if enough data are made available. Generative adversarial networks (GANs) showed a lot of potential to generate synthetic MRI data that can capture the distribution of real MRI. Besides, GANs are also popular for segmentation, noise removal, and super-resolution of brain MRI images. This scoping review aims to explore how GANs methods are being used on brain MRI data, as reported in the literature. The review describes the different applications of GANs for brain MRI, presents the most commonly used GANs architectures, and summarizes the publicly available brain MRI datasets for advancing the research and development of GANs-based approaches. This review followed the guidelines of PRISMA-ScR to perform the study search and selection. The search was conducted on five popular scientific databases. The screening and selection of studies were performed by two independent reviewers, followed by validation by a third reviewer. Finally, the data were synthesized using a narrative approach. This review included 139 studies out of 789 search results. The most common use case of GANs was the synthesis of brain MRI images for data augmentation. GANs were also used to segment brain tumors and translate healthy images to diseased images or CT to MRI and vice versa. The included studies showed that GANs could enhance the performance of AI methods used on brain MRI imaging data. However, more efforts are needed to transform the GANs-based methods in clinical applications.
Collapse
Affiliation(s)
- Hazrat Ali
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar.
| | - Md Rafiul Biswas
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Uzair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Asma Alamgir
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Osama Mousa
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar.
| |
Collapse
|
9
|
Zhu L, He Q, Huang Y, Zhang Z, Zeng J, Lu L, Kong W, Zhou F. DualMMP-GAN: Dual-scale multi-modality perceptual generative adversarial network for medical image segmentation. Comput Biol Med 2022; 144:105387. [PMID: 35305502 DOI: 10.1016/j.compbiomed.2022.105387] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 01/22/2023]
Abstract
Multi-modality magnetic resonance imaging (MRI) can reveal distinct patterns of tissue in the human body and is crucial to clinical diagnosis. But it still remains a challenge to obtain diverse and plausible multi-modality MR images due to expense, noise, and artifacts. For the same lesion, different modalities of MRI have big differences in context information, coarse location, and fine structure. In order to achieve better generation and segmentation performance, a dual-scale multi-modality perceptual generative adversarial network (DualMMP-GAN) is proposed based on cycle-consistent generative adversarial networks (CycleGAN). Dilated residual blocks are introduced to increase the receptive field, preserving structure and context information of images. A dual-scale discriminator is constructed. The generator is optimized by discriminating patches to represent lesions with different sizes. The perceptual consistency loss is introduced to learn the mapping between the generated and target modality at different semantic levels. Moreover, generative multi-modality segmentation (GMMS) combining given modalities with generated modalities is proposed for brain tumor segmentation. Experimental results show that the DualMMP-GAN outperforms the CycleGAN and some state-of-the-art methods in terms of PSNR, SSMI, and RMSE in most tasks. In addition, dice, sensitivity, specificity, and Hausdorff95 obtained from segmentation by GMMS are all higher than those from a single modality. The objective index obtained by the proposed methods are close to upper bounds obtained from real multiple modalities, indicating that GMMS can achieve similar effects as multi-modality. Overall, the proposed methods can serve as an effective method in clinical brain tumor diagnosis with promising application potential.
Collapse
Affiliation(s)
- Li Zhu
- School of Information Engineering, Nanchang University, Nanchang, 330031, China.
| | - Qiong He
- School of Information Engineering, Nanchang University, Nanchang, 330031, China.
| | - Yue Huang
- School of Informatics, Xiamen University, Xiamen, 361005, China.
| | - Zihe Zhang
- School of Information Engineering, Nanchang University, Nanchang, 330031, China.
| | - Jiaming Zeng
- School of Information Engineering, Nanchang University, Nanchang, 330031, China.
| | - Ling Lu
- School of Information Engineering, Nanchang University, Nanchang, 330031, China.
| | - Weiming Kong
- Hospital of the Joint Logistics Support Force of the Chinese People's Liberation Army, No.908, Nanchang, 330002, China.
| | - Fuqing Zhou
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, China.
| |
Collapse
|
10
|
Li D, Peng Y, Guo Y, Sun J. TAUNet: a triple-attention-based multi-modality MRI fusion U-Net for cardiac pathology segmentation. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00660-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractAutomated segmentation of cardiac pathology in MRI plays a significant role for diagnosis and treatment of some cardiac disease. In clinical practice, multi-modality MRI is widely used to improve the cardiac pathology segmentation, because it can provide multiple or complementary information. Recently, deep learning methods have presented implausible performance in multi-modality medical image segmentation. However, how to fuse the underlying multi-modality information effectively to segment the pathology with irregular shapes and small region at random locations, is still a challenge task. In this paper, a triple-attention-based multi-modality MRI fusion U-Net was proposed to learn complex relationship between different modalities and pay more attention on shape information, thus to achieve improved pathology segmentation. First, three independent encoders and one fusion encoder were applied to extract specific and multiple modality features. Secondly, we concatenate the modality feature maps and use the channel attention to fuse specific modal information at every stage of the three dedicate independent encoders, then the three single modality feature maps and channel attention feature maps are together concatenated to the decoder path. Spatial attention was adopted in decoder path to capture the correlation of various positions. Once more, we employ shape attention to focus shape-dependent information. Lastly, the training approach is made efficient by introducing deep supervision mechanism with object contextual representations block to ensure precisely boundary prediction. Our proposed network was evaluated on the public MICCAI 2020 Myocardial pathology segmentation dataset which involves patients suffering from myocardial infarction. Experiments on the dataset with three modalities demonstrate the effectiveness of fusion mode of our model, and attention mechanism can integrate various modality information well. We demonstrated that such a deep learning approach could better fuse complementary information to improve the segmentation performance of cardiac pathology.
Collapse
|