1
|
Zhao L, Wang T, Chen Y, Zhang X, Tang H, Lin F, Li C, Li Q, Tan T, Kang D, Tong T. A novel framework for segmentation of small targets in medical images. Sci Rep 2025; 15:9924. [PMID: 40121297 PMCID: PMC11929788 DOI: 10.1038/s41598-025-94437-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 03/13/2025] [Indexed: 03/25/2025] Open
Abstract
Medical image segmentation represents a pivotal and intricate procedure in the domain of medical image processing and analysis. With the progression of artificial intelligence in recent years, the utilization of deep learning techniques for medical image segmentation has witnessed escalating popularity. Nevertheless, the intricate nature of medical image poses challenges on the segmentation of diminutive targets is still in its early stages. Current networks encounter difficulties in addressing the segmentation of exceedingly small targets, especially when the number of training samples is limited. To overcome this constraint, we have implemented a proficient strategy to enhance lesion images containing small targets and constrained samples. We introduce a segmentation framework termed STS-Net, specifically designed for small target segmentation. This framework leverages the established capacity of convolutional neural networks to acquire effective image representations. The proposed STS-Net network adopts a ResNeXt50-32x4d architecture as the encoder, integrating attention mechanisms during the encoding phase to amplify the feature representation capabilities of the network. We evaluated the proposed network on four publicly available datasets. Experimental results underscore the superiority of our approach in the domain of medical image segmentation, particularly for small target segmentation. The codes are available at https://github.com/zlxokok/STSNet .
Collapse
Affiliation(s)
- Longxuan Zhao
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
| | - Tao Wang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Yuanbin Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Xinlin Zhang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
- Imperial Vision Technology, Fuzhou, 350100, China
| | - Hui Tang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Fuxin Lin
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Chunwang Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Qixuan Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Tao Tan
- Macao Polytechnic University, Macao, 999078, China
| | - Dezhi Kang
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China.
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
| | - Tong Tong
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
- Imperial Vision Technology, Fuzhou, 350100, China.
| |
Collapse
|
2
|
Han K, Lou Q, Lu F. A semi-supervised domain adaptation method with scale-aware and global-local fusion for abdominal multi-organ segmentation. J Appl Clin Med Phys 2025; 26:e70008. [PMID: 39924943 PMCID: PMC11905256 DOI: 10.1002/acm2.70008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 11/02/2024] [Accepted: 11/27/2024] [Indexed: 02/11/2025] Open
Abstract
BACKGROUND Abdominal multi-organ segmentation remains a challenging task. Semi-supervised domain adaptation (SSDA) has emerged as an innovative solution. However, SSDA frameworks based on UNet struggle to capture multi-scale and global information. PURPOSE Our work aimed to propose a novel SSDA method to achieve more accurate abdominal multi-organ segmentation with limited labeled target domain data, which has a superior ability to capture the multi-scale features and integrate local and global information effectively. METHODS The proposed network is based on UNet. In the encoder part, a scale-aware with domain-specific batch normalization (SAD) module is integrated to adaptively extract multi-scale features and to get better generalization across source and target domains. In the bottleneck part, a global-local fusion (GLF) module is utilized for capturing and integrating both local and global information. They are integrated into the framework of self-ensembling mean-teacher (SE-MT) to enhance the model's capability to learn common features across source and target domains. RESULTS To validate the performance of the proposed model, we evaluated it on the public CHAOS and BTCV datasets. For CHAOS, the proposed method obtains an average DSC of 88.97% and ASD of 1.12 mm with only 20% labeled target data. For BTCV, it achieves an average DSC of 88.95% and ASD of 1.13 mm with 20% labeled target data. Compared with the state-of-the-art methods, DSC and ASD increased by at least 0.72% and 0.33 mm on CHAOS, 1.29% and 0.06 mm on BTCV, respectively. Ablation studies were also conducted to verify the contribution of each component of the model. The proposed method achieves a DSC improvement of 3.17% over the baseline with 20% labeled target data. CONCLUSION The proposed SSDA method for abdominal multi-organ segmentation has a powerful ability to extract multi-scale and more global features, significantly improving segmentation accuracy and robustness.
Collapse
Affiliation(s)
- Kexin Han
- School of Science, Zhejiang University of Science and Technology, Hangzhou, China
| | - Qiong Lou
- School of Science, Zhejiang University of Science and Technology, Hangzhou, China
| | - Fang Lu
- School of Science, Zhejiang University of Science and Technology, Hangzhou, China
| |
Collapse
|
3
|
Sun H, Chen L, Li J, Yang Z, Zhu J, Wang Z, Ren G, Cai J, Zhao L. Synthesis of pseudo-PET/CT fusion images in radiotherapy based on a new transformer model. Med Phys 2025; 52:1070-1085. [PMID: 39569842 DOI: 10.1002/mp.17512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 10/04/2024] [Accepted: 10/25/2024] [Indexed: 11/22/2024] Open
Abstract
BACKGROUND PET/CT and planning CT are commonly used medical images in radiotherapy for esophageal and nasopharyngeal cancer. However, repeated scans will expose patients to additional radiation doses and also introduce registration errors. This multimodal treatment approach is expected to be further improved. PURPOSE A new Transformer model is proposed to obtain pseudo-PET/CT fusion images for esophageal and nasopharyngeal cancer radiotherapy. METHODS The data of 129 cases of esophageal cancer and 141 cases of nasopharyngeal cancer were retrospectively selected for training, validation, and testing. PET and CT images are used as input. Based on the Transformer model with a "focus-disperse" attention mechanism and multi-consistency loss constraints, the feature information in two images is effectively captured. This ultimately results in the synthesis of pseudo-PET/CT fusion images with enhanced tumor region imaging. During the testing phase, the accuracy of pseudo-PET/CT fusion images was verified in anatomy and dosimetry, and two prospective cases were selected for further dose verification. RESULTS In terms of anatomical verification, the PET/CT fusion image obtained using the wavelet fusion algorithm was used as the ground truth image after correction by clinicians. The evaluation metrics, including peak signal-to-noise ratio, structural similarity index, mean absolute error, and normalized root mean square error, between the pseudo-fused images obtained based on the proposed model and ground truth, are represented by means (standard deviation). They are 37.82 (1.57), 95.23 (2.60), 29.70 (2.49), and 9.48 (0.32), respectively. These numerical values outperform those of the state-of-the-art deep learning comparative models. In terms of dosimetry validation, based on a 3%/2 mm gamma analysis, the average passing rates of global and tumor regions between the pseudo-fused images (with a PET/CT weight ratio of 2:8) and the planning CT images are 97.2% and 95.5%, respectively. These numerical outcomes are superior to those of pseudo-PET/CT fusion images with other weight ratios. CONCLUSIONS This pseudo-PET/CT fusion images obtained based on the proposed model hold promise as a new modality in the radiotherapy for esophageal and nasopharyngeal cancer.
Collapse
Affiliation(s)
- Hongfei Sun
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Liting Chen
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Jie Li
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Zhi Yang
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Jiarui Zhu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Zhongfei Wang
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Ge Ren
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jing Cai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Lina Zhao
- Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| |
Collapse
|
4
|
Shafi SM, Chinnappan SK. Hybrid transformer-CNN and LSTM model for lung disease segmentation and classification. PeerJ Comput Sci 2024; 10:e2444. [PMID: 39896390 PMCID: PMC11784776 DOI: 10.7717/peerj-cs.2444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 10/01/2024] [Indexed: 02/04/2025]
Abstract
According to the World Health Organization (WHO) report, lung disorders are the third leading cause of mortality worldwide. Approximately three million individuals are affected with various types of lung disorders annually. This issue alarms us to take control measures related to early diagnostics, accurate treatment procedures, etc. The precise identification through the assessment of medical images is crucial for pulmonary disease diagnosis. Also, it remains a formidable challenge due to the diverse and unpredictable nature of pathological lung appearances and shapes. Therefore, the efficient lung disease segmentation and classification model is essential. By taking this initiative, a novel lung disease segmentation with a hybrid LinkNet-Modified LSTM (L-MLSTM) model is proposed in this research article. The proposed model utilizes four essential and fundamental steps for its implementation. The first step is pre-processing, where the input lung images are pre-processed using median filtering. Consequently, an improved Transformer-based convolutional neural network (CNN) model (ITCNN) is proposed to segment the affected region in the segmentation process. After segmentation, essential features such as texture, shape, color, and deep features are retrieved. Specifically, texture features are extracted using modified Local Gradient Increasing Pattern (LGIP) and Multi-texton analysis. Then, the classification step utilizes a hybrid model, the L-MLSTM model. This work leverages two datasets such as the COVID-19 normal pneumonia-CT images dataset (Dataset 1) and the Chest CT scan images dataset (Dataset 2). The dataset is crucial for training and evaluating the model, providing a comprehensive basis for robust and generalizable results. The L-MLSTM model outperforms several existing models, including HDE-NN, DBN, LSTM, LINKNET, SVM, Bi-GRU, RNN, CNN, and VGG19 + CNN, with accuracies of 89% and 95% at learning percentages of 70 and 90, respectively, for datasets 1 and 2. The improved accuracy achieved by the L-MLSTM model highlights its capability to better handle the complexity and variability in lung images. This hybrid approach enhances the model's ability to distinguish between different types of lung diseases and reduces diagnostic errors compared to existing methods.
Collapse
|
5
|
Showrav TT, Hasan MK. Hi- gMISnet: generalized medical image segmentation using DWT based multilayer fusion and dual mode attention into high resolution pGAN. Phys Med Biol 2024; 69:115019. [PMID: 38593830 DOI: 10.1088/1361-6560/ad3cb3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/09/2024] [Indexed: 04/11/2024]
Abstract
Objective.Automatic medical image segmentation is crucial for accurately isolating target tissue areas in the image from background tissues, facilitating precise diagnoses and procedures. While the proliferation of publicly available clinical datasets led to the development of deep learning-based medical image segmentation methods, a generalized, accurate, robust, and reliable approach across diverse imaging modalities remains elusive.Approach.This paper proposes a novel high-resolution parallel generative adversarial network (pGAN)-based generalized deep learning method for automatic segmentation of medical images from diverse imaging modalities. The proposed method showcases better performance and generalizability by incorporating novel components such as partial hybrid transfer learning, discrete wavelet transform (DWT)-based multilayer and multiresolution feature fusion in the encoder, and a dual mode attention gate in the decoder of the multi-resolution U-Net-based GAN. With multi-objective adversarial training loss functions including a unique reciprocal loss for enforcing cooperative learning inpGANs, it further enhances the robustness and accuracy of the segmentation map.Main results.Experimental evaluations conducted on nine diverse publicly available medical image segmentation datasets, including PhysioNet ICH, BUSI, CVC-ClinicDB, MoNuSeg, GLAS, ISIC-2018, DRIVE, Montgomery, and PROMISE12, demonstrate the proposed method's superior performance. The proposed method achieves mean F1 scores of 79.53%, 88.68%, 82.50%, 93.25%, 90.40%, 94.19%, 81.65%, 98.48%, and 90.79%, respectively, on the above datasets, surpass state-of-the-art segmentation methods. Furthermore, our proposed method demonstrates robust multi-domain segmentation capabilities, exhibiting consistent and reliable performance. The assessment of the model's proficiency in accurately identifying small details indicates that the high-resolution generalized medical image segmentation network (Hi-gMISnet) is more precise in segmenting even when the target area is very small.Significance.The proposed method provides robust and reliable segmentation performance on medical images, and thus it has the potential to be used in a clinical setting for the diagnosis of patients.
Collapse
Affiliation(s)
- Tushar Talukder Showrav
- Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, 1205, Bangladesh
| | - Md Kamrul Hasan
- Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, 1205, Bangladesh
| |
Collapse
|
6
|
Kunkyab T, Bahrami Z, Zhang H, Liu Z, Hyde D. A deep learning-based framework (Co-ReTr) for auto-segmentation of non-small cell-lung cancer in computed tomography images. J Appl Clin Med Phys 2024; 25:e14297. [PMID: 38373289 DOI: 10.1002/acm2.14297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 01/15/2024] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
PURPOSE Deep learning-based auto-segmentation algorithms can improve clinical workflow by defining accurate regions of interest while reducing manual labor. Over the past decade, convolutional neural networks (CNNs) have become prominent in medical image segmentation applications. However, CNNs have limitations in learning long-range spatial dependencies due to the locality of the convolutional layers. Transformers were introduced to address this challenge. In transformers with self-attention mechanism, even the first layer of information processing makes connections between distant image locations. Our paper presents a novel framework that bridges these two unique techniques, CNNs and transformers, to segment the gross tumor volume (GTV) accurately and efficiently in computed tomography (CT) images of non-small cell-lung cancer (NSCLC) patients. METHODS Under this framework, input of multiple resolution images was used with multi-depth backbones to retain the benefits of high-resolution and low-resolution images in the deep learning architecture. Furthermore, a deformable transformer was utilized to learn the long-range dependency on the extracted features. To reduce computational complexity and to efficiently process multi-scale, multi-depth, high-resolution 3D images, this transformer pays attention to small key positions, which were identified by a self-attention mechanism. We evaluated the performance of the proposed framework on a NSCLC dataset which contains 563 training images and 113 test images. Our novel deep learning algorithm was benchmarked against five other similar deep learning models. RESULTS The experimental results indicate that our proposed framework outperforms other CNN-based, transformer-based, and hybrid methods in terms of Dice score (0.92) and Hausdorff Distance (1.33). Therefore, our proposed model could potentially improve the efficiency of auto-segmentation of early-stage NSCLC during the clinical workflow. This type of framework may potentially facilitate online adaptive radiotherapy, where an efficient auto-segmentation workflow is required. CONCLUSIONS Our deep learning framework, based on CNN and transformer, performs auto-segmentation efficiently and could potentially assist clinical radiotherapy workflow.
Collapse
Affiliation(s)
- Tenzin Kunkyab
- Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Zhila Bahrami
- School of Engineering, The University of British Columbia Okanagan Campus, Kelowna, British Columbia, Canada
| | - Heqing Zhang
- School of Engineering, The University of British Columbia Okanagan Campus, Kelowna, British Columbia, Canada
| | - Zheng Liu
- School of Engineering, The University of British Columbia Okanagan Campus, Kelowna, British Columbia, Canada
| | - Derek Hyde
- Department of Medical Physics, BC Cancer - Kelowna, Kelowna, Canada
| |
Collapse
|