1
|
Li G, Xie J, Zhang L, Cheng G, Zhang K, Bai M. Dynamic graph consistency and self-contrast learning for semi-supervised medical image segmentation. Neural Netw 2025; 184:107063. [PMID: 39700823 DOI: 10.1016/j.neunet.2024.107063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 12/12/2024] [Accepted: 12/14/2024] [Indexed: 12/21/2024]
Abstract
Semi-supervised medical image segmentation endeavors to exploit a limited set of labeled data in conjunction with a substantial corpus of unlabeled data, with the aim of training models that can match or even exceed the efficacy of fully supervised segmentation models. Despite the potential of this approach, most existing semi-supervised medical image segmentation techniques that employ consistency regularization predominantly focus on spatial consistency at the image level, often neglecting the crucial role of feature-level channel information. To address this limitation, we propose an innovative method that integrates graph convolutional networks with a consistency regularization framework to develop a dynamic graph consistency approach. This method imposes channel-level constraints across different decoders by leveraging high-level features within the network. Furthermore, we introduce a novel self-contrast learning strategy, which performs image-level comparison within the same batch and engages in pixel-level contrast learning based on pixel positions. This approach effectively overcomes traditional contrast learning challenges related to identifying positive and negative samples, reduces computational resource consumption, and significantly improves model performance. Our experimental evaluation on three distinct medical image segmentation datasets indicates that the proposed method demonstrates superior performance across a variety of test scenarios.
Collapse
Affiliation(s)
- Gang Li
- College of Software, Taiyuan University of Technology, Taiyuan, China
| | - Jinjie Xie
- College of Software, Taiyuan University of Technology, Taiyuan, China.
| | - Ling Zhang
- College of Software, Taiyuan University of Technology, Taiyuan, China.
| | - Guijuan Cheng
- College of Software, Taiyuan University of Technology, Taiyuan, China
| | - Kairu Zhang
- College of Software, Taiyuan University of Technology, Taiyuan, China
| | - Mingqi Bai
- College of Software, Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
2
|
Yang Y, Sun G, Zhang T, Wang R, Su J. Semi-supervised medical image segmentation via weak-to-strong perturbation consistency and edge-aware contrastive representation. Med Image Anal 2025; 101:103450. [PMID: 39798528 DOI: 10.1016/j.media.2024.103450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 12/02/2024] [Accepted: 12/26/2024] [Indexed: 01/15/2025]
Abstract
Despite that supervised learning has demonstrated impressive accuracy in medical image segmentation, its reliance on large labeled datasets poses a challenge due to the effort and expertise required for data acquisition. Semi-supervised learning has emerged as a potential solution. However, it tends to yield satisfactory segmentation performance in the central region of the foreground, but struggles in the edge region. In this paper, we propose an innovative framework that effectively leverages unlabeled data to improve segmentation performance, especially in edge regions. Our proposed framework includes two novel designs. Firstly, we introduce a weak-to-strong perturbation strategy with corresponding feature-perturbed consistency loss to efficiently utilize unlabeled data and guide our framework in learning reliable regions. Secondly, we propose an edge-aware contrastive loss that utilizes uncertainty to select positive pairs, thereby learning discriminative pixel-level features in the edge regions using unlabeled data. In this way, the model minimizes the discrepancy of multiple predictions and improves representation ability, ultimately aiming at impressive performance on both primary and edge regions. We conducted a comparative analysis of the segmentation results on the publicly available BraTS2020 dataset, LA dataset, and the 2017 ACDC dataset. Through extensive quantification and visualization experiments under three standard semi-supervised settings, we demonstrate the effectiveness of our approach and set a new state-of-the-art for semi-supervised medical image segmentation. Our code is released publicly at https://github.com/youngyzzZ/SSL-w2sPC.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Harbin Institute of Technology at Shenzhen, Shenzhen, 518055, China
| | - Guoying Sun
- School of Computer Science and Technology, Harbin Institute of Technology at Shenzhen, Shenzhen, 518055, China
| | - Tong Zhang
- Department of Network Intelligence, Peng Cheng Laboratory, Shenzhen, 518055, China
| | - Ruixuan Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China; Department of Network Intelligence, Peng Cheng Laboratory, Shenzhen, 518055, China.
| | - Jingyong Su
- School of Computer Science and Technology, Harbin Institute of Technology at Shenzhen, Shenzhen, 518055, China; National Key Laboratory of Smart Farm Technologies and Systems, Harbin, 150001, China.
| |
Collapse
|
3
|
Wen C, Ye M, Li H, Chen T, Xiao X. Concept-Based Lesion Aware Transformer for Interpretable Retinal Disease Diagnosis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:57-68. [PMID: 39012729 DOI: 10.1109/tmi.2024.3429148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling concept-level interventions. Our code is publicly available at https://github.com/Sorades/CLAT.
Collapse
|
4
|
Gu Y, Sun Z, Chen T, Xiao X, Liu Y, Xu Y, Najman L. Dual structure-aware image filterings for semi-supervised medical image segmentation. Med Image Anal 2025; 99:103364. [PMID: 39418830 DOI: 10.1016/j.media.2024.103364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/25/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024]
Abstract
Semi-supervised image segmentation has attracted great attention recently. The key is how to leverage unlabeled images in the training process. Most methods maintain consistent predictions of the unlabeled images under variations (e.g., adding noise/perturbations, or creating alternative versions) in the image and/or model level. In most image-level variation, medical images often have prior structure information, which has not been well explored. In this paper, we propose novel dual structure-aware image filterings (DSAIF) as the image-level variations for semi-supervised medical image segmentation. Motivated by connected filtering that simplifies image via filtering in structure-aware tree-based image representation, we resort to the dual contrast invariant Max-tree and Min-tree representation. Specifically, we propose a novel connected filtering that removes topologically equivalent nodes (i.e. connected components) having no siblings in the Max/Min-tree. This results in two filtered images preserving topologically critical structure. Applying the proposed DSAIF to mutually supervised networks decreases the consensus of their erroneous predictions on unlabeled images. This helps to alleviate the confirmation bias issue of overfitting to noisy pseudo labels of unlabeled images, and thus effectively improves the segmentation performance. Extensive experimental results on three benchmark datasets demonstrate that the proposed method significantly/consistently outperforms some state-of-the-art methods. The source codes will be publicly available.
Collapse
Affiliation(s)
- Yuliang Gu
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Zhichao Sun
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Tian Chen
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Xin Xiao
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Yepeng Liu
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Yongchao Xu
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China; Medical Artificial Intelligence Research Institute of Renmin Hospital, Wuhan University, Wuhan, China.
| | - Laurent Najman
- Univ Gustave Eiffel, CNRS, LIGM, Marne-la-Vallée, France.
| |
Collapse
|
5
|
Shah J, Che Y, Sohankar J, Luo J, Li B, Su Y, Wu T. Enhancing Amyloid PET Quantification: MRI-Guided Super-Resolution Using Latent Diffusion Models. Life (Basel) 2024; 14:1580. [PMID: 39768288 PMCID: PMC11678505 DOI: 10.3390/life14121580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 11/25/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer's disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits the accurate quantification of amyloid deposition due to partial volume effects (PVE). In this study, we propose a novel approach to addressing PVE using a latent diffusion model for resolution recovery (LDM-RR) of PET imaging. We leverage a synthetic data generation pipeline to create high-resolution PET digital phantoms for model training. The proposed LDM-RR model incorporates a weighted combination of L1, L2, and MS-SSIM losses at both noise and image scales to enhance MRI-guided reconstruction. We evaluated the model's performance in improving statistical power for detecting longitudinal changes and enhancing agreement between amyloid PET measurements from different tracers. The results demonstrate that the LDM-RR approach significantly improves PET quantification accuracy, reduces inter-tracer variability, and enhances the detection of subtle changes in amyloid deposition over time. We show that deep learning has the potential to improve PET quantification in AD, effectively contributing to the early detection and monitoring of disease progression.
Collapse
Affiliation(s)
- Jay Shah
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Yiming Che
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Javad Sohankar
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Ji Luo
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Baoxin Li
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Yi Su
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Teresa Wu
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | | |
Collapse
|
6
|
Li Z, Xie S. EBC-Net: 3D semi-supervised segmentation of pancreas based on edge-biased consistency regularization in dual perturbation space. Med Phys 2024; 51:8260-8271. [PMID: 39042373 DOI: 10.1002/mp.17323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/17/2024] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
BACKGROUND Deep learning technology has made remarkable progress in pancreatic image segmentation tasks. However, annotating 3D medical images is time-consuming and requires expertise, and existing semi-supervised segmentation methods perform poorly in the segmentation task of organs with blurred edges in enhanced CT such as the pancreas. PURPOSE To address the challenges of limited labeled data and indistinct boundaries of regions of interest (ROI). METHODS We propose Edge-Biased Consistency Regularization (EBC-Net). 3D edge detection is employed to construct edge perturbations and integrate edge prior information into limited data, aiding the network in learning from unlabeled data. Additionally, due to the one-sidedness of a single perturbation space, we expand the dual-level perturbation space of both images and features to more efficiently focus the model's attention on the edges of the ROI. Finally, inspired by the clinical habits of doctors, we propose a 3D Anatomical Invariance Extraction Module and Anatomical Attention to capture anatomy-invariant features. RESULTS Extensive experiments have demonstrated that our method outperforms state-of-the-art methods in semi-supervised pancreas image segmentation. Moreover, it can better preserve the morphology of pancreatic organs and excel at edges region accuracy. CONCLUSIONS Incorporated with edge prior knowledge, our method mixes disturbances in dual-perturbation space, which shifts the network's attention to the fuzzy edge region using a few labeled samples. These ideas have been verified on the pancreas segmentation dataset.
Collapse
Affiliation(s)
- Zheng Li
- School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Shipeng Xie
- School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
| |
Collapse
|
7
|
Huang K, Ma X, Zhang Z, Zhang Y, Yuan S, Fu H, Chen Q. Diverse Data Generation for Retinal Layer Segmentation With Potential Structure Modeling. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3584-3595. [PMID: 38587957 DOI: 10.1109/tmi.2024.3384484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Accurate retinal layer segmentation on optical coherence tomography (OCT) images is hampered by the challenges of collecting OCT images with diverse pathological characterization and balanced distribution. Current generative models can produce high-realistic images and corresponding labels without quantitative limitations by fitting distributions of real collected data. Nevertheless, the diversity of their generated data is still limited due to the inherent imbalance of training data. To address these issues, we propose an image-label pair generation framework that generates diverse and balanced potential data from imbalanced real samples. Specifically, the framework first generates diverse layer masks, and then generates plausible OCT images corresponding to these layer masks using two customized diffusion probabilistic models respectively. To learn from imbalanced data and facilitate balanced generation, we introduce pathological-related conditions to guide the generation processes. To enhance the diversity of the generated image-label pairs, we propose a potential structure modeling technique that transfers the knowledge of diverse sub-structures from lowly- or non-pathological samples to highly pathological samples. We conducted extensive experiments on two public datasets for retinal layer segmentation. Firstly, our method generates OCT images with higher image quality and diversity compared to other generative methods. Furthermore, based on the extensive training with the generated OCT images, downstream retinal layer segmentation tasks demonstrate improved results. The code is publicly available at: https://github.com/nicetomeetu21/GenPSM.
Collapse
|
8
|
Chai L, Xue S, Tang D, Liu J, Sun N, Liu X. TLF: Triple learning framework for intracranial aneurysms segmentation from unreliable labeled CTA scans. Comput Med Imaging Graph 2024; 116:102421. [PMID: 39084165 DOI: 10.1016/j.compmedimag.2024.102421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 07/21/2024] [Accepted: 07/21/2024] [Indexed: 08/02/2024]
Abstract
Intracranial aneurysm (IA) is a prevalent disease that poses a significant threat to human health. The use of computed tomography angiography (CTA) as a diagnostic tool for IAs remains time-consuming and challenging. Deep neural networks (DNNs) have made significant advancements in the field of medical image segmentation. Nevertheless, training large-scale DNNs demands substantial quantities of high-quality labeled data, making the annotation of numerous brain CTA scans a challenging endeavor. To address these challenges and effectively develop a robust IAs segmentation model from a large amount of unlabeled training data, we propose a triple learning framework (TLF). The framework primarily consists of three learning paradigms: pseudo-supervised learning, contrastive learning, and confident learning. This paper introduces an enhanced mean teacher model and voxel-selective strategy to conduct pseudo-supervised learning on unreliable labeled training data. Concurrently, we construct the positive and negative training pairs within the high-level semantic feature space to improve the overall learning efficiency of the TLF through contrastive learning. In addition, a multi-scale confident learning is proposed to correct unreliable labels, which enables the acquisition of broader local structural information instead of relying on individual voxels. To evaluate the effectiveness of our method, we conducted extensive experiments on a self-built database of hundreds of cases of brain CTA scans with IAs. Experimental results demonstrate that our method can effectively learn a robust CTA scan-based IAs segmentation model using unreliable labeled data, outperforming state-of-the-art methods in terms of segmentation accuracy. Codes are released at https://github.com/XueShuangqian/TLF.
Collapse
Affiliation(s)
- Lei Chai
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Shuangqian Xue
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Daodao Tang
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jixin Liu
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Ning Sun
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China.
| | - Xiujuan Liu
- Department of Radiology, Zhuhai People's Hospital(Zhuhai Clinical Medical College of Jinan University), Zhuhai 519000, China
| |
Collapse
|
9
|
Li W, Song H, Ai D, Shi J, Wang Y, Wu W, Yang J. Semi-supervised segmentation of orbit in CT images with paired copy-paste strategy. Comput Biol Med 2024; 171:108176. [PMID: 38401453 DOI: 10.1016/j.compbiomed.2024.108176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/06/2024] [Accepted: 02/18/2024] [Indexed: 02/26/2024]
Abstract
The segmentation of the orbit in computed tomography (CT) images plays a crucial role in facilitating the quantitative analysis of orbital decompression surgery for patients with Thyroid-associated Ophthalmopathy (TAO). However, the task of orbit segmentation, particularly in postoperative images, remains challenging due to the significant shape variation and limited amount of labeled data. In this paper, we present a two-stage semi-supervised framework for the automatic segmentation of the orbit in both preoperative and postoperative images, which consists of a pseudo-label generation stage and a semi-supervised segmentation stage. A Paired Copy-Paste strategy is concurrently introduced to proficiently amalgamate features extracted from both preoperative and postoperative images, thereby augmenting the network discriminative capability in discerning changes within orbital boundaries. More specifically, we employ a random cropping technique to transfer regions from labeled preoperative images (foreground) onto unlabeled postoperative images (background), as well as unlabeled preoperative images (foreground) onto labeled postoperative images (background). It is imperative to acknowledge that every set of preoperative and postoperative images belongs to the identical patient. The semi-supervised segmentation network (stage 2) utilizes a combination of mixed supervisory signals from pseudo labels (stage 1) and ground truth to process the two mixed images. The training and testing of the proposed method have been conducted on the CT dataset obtained from the Eye Hospital of Wenzhou Medical University. The experimental results demonstrate that the proposed method achieves a mean Dice similarity coefficient (DSC) of 91.92% with only 5% labeled data, surpassing the performance of the current state-of-the-art method by 2.4%.
Collapse
Affiliation(s)
- Wentao Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Danni Ai
- School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jieliang Shi
- Eye Hospital of Wenzhou Medical University, Wenzhou, 325027, China.
| | - Yuanyuan Wang
- School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Wencan Wu
- Eye Hospital of Wenzhou Medical University, Wenzhou, 325027, China.
| | - Jian Yang
- School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
10
|
Murmu A, Kumar P. GIFNet: an effective global infection feature network for automatic COVID-19 lung lesions segmentation. Med Biol Eng Comput 2024:10.1007/s11517-024-03024-z. [PMID: 38308670 DOI: 10.1007/s11517-024-03024-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/11/2024] [Indexed: 02/05/2024]
Abstract
The ongoing COronaVIrus Disease 2019 (COVID-19) pandemic carried by the SARS-CoV-2 virus spread worldwide in early 2019, bringing about an existential health catastrophe. Automatic segmentation of infected lungs from COVID-19 X-ray and computer tomography (CT) images helps to generate a quantitative approach for treatment and diagnosis. The multi-class information about the infected lung is often obtained from the patient's CT dataset. However, the main challenge is the extensive range of infected features and lack of contrast between infected and normal areas. To resolve these issues, a novel Global Infection Feature Network (GIFNet)-based Unet with ResNet50 model is proposed for segmenting the locations of COVID-19 lung infections. The Unet layers have been used to extract the features from input images and select the region of interest (ROI) by using the ResNet50 technique for training it faster. Moreover, integrating the pooling layer into the atrous spatial pyramid pooling (ASPP) mechanism in the bottleneck helps for better feature selection and handles scale variation during training. Furthermore, the partial differential equation (PDE) approach is used to enhance the image quality and intensity value for particular ROI boundary edges in the COVID-19 images. The proposed scheme has been validated on two datasets, namely the SARS-CoV-2 CT scan and COVIDx-19, for detecting infected lung segmentation (ILS). The experimental findings have been subjected to a comprehensive analysis using various evaluation metrics, including accuracy (ACC), area under curve (AUC), recall (REC), specificity (SPE), dice similarity coefficient (DSC), mean absolute error (MAE), precision (PRE), and mean squared error (MSE) to ensure rigorous validation. The results demonstrate the superior performance of the proposed system compared to the state-of-the-art (SOTA) segmentation models on both X-ray and CT datasets.
Collapse
Affiliation(s)
- Anita Murmu
- Computer Science and Engineering Department, National Institute of Technology Patna, Ashok Rajpath, Patna, Bihar, 800005, India.
| | - Piyush Kumar
- Computer Science and Engineering Department, National Institute of Technology Patna, Ashok Rajpath, Patna, Bihar, 800005, India
| |
Collapse
|
11
|
Yue G, Yang C, Zhao Z, An Z, Yang Y. ERGPNet: lesion segmentation network for COVID-19 chest X-ray images based on embedded residual convolution and global perception. Front Physiol 2023; 14:1296185. [PMID: 38028767 PMCID: PMC10679680 DOI: 10.3389/fphys.2023.1296185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
The Segmentation of infected areas from COVID-19 chest X-ray (CXR) images is of great significance for the diagnosis and treatment of patients. However, accurately and effectively segmenting infected areas of CXR images is still challenging due to the inherent ambiguity of CXR images and the cross-scale variations in infected regions. To address these issues, this article proposes a ERGPNet based on embedded residuals and global perception, to segment lesion regions in COVID-19 CXR images. First, aiming at the inherent fuzziness of CXR images, an embedded residual convolution structure is proposed to enhance the ability of internal feature extraction. Second, a global information perception module is constructed to guide the network in generating long-distance information flow, alleviating the interferences of cross-scale variations on the algorithm's discrimination ability. Finally, the network's sensitivity to target regions is improved, and the interference of noise information is suppressed through the utilization of parallel spatial and serial channel attention modules. The interactions between each module fully establish the mapping relationship between feature representation and information decision-making and improve the accuracy of lesion segmentation. Extensive experiments on three datasets of COVID-19 CXR images, and the results demonstrate that the proposed method outperforms other state-of-the-art segmentation methods of CXR images.
Collapse
Affiliation(s)
- Gongtao Yue
- School of Computer Science, Xijing University, Xi’an, China
| | - Chen Yang
- School of Computer Science, Xijing University, Xi’an, China
| | - Zhengyang Zhao
- School of Information and Navigation, Air Force Engineering University, Xi’an, China
| | - Ziheng An
- School of Integrated Circuits, Anhui University, Hefei, China
| | - Yongsheng Yang
- School of Computer Science, Xijing University, Xi’an, China
| |
Collapse
|
12
|
Lyu F, Ye M, Yip TCF, Wong GLH, Yuen PC. Local Style Transfer via Latent Space Manipulation for Cross-Disease Lesion Segmentation. IEEE J Biomed Health Inform 2023; PP:273-284. [PMID: 37883256 DOI: 10.1109/jbhi.2023.3327726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Automatic lesion segmentation is important for assisting doctors in the diagnostic process. Recent deep learning approaches heavily rely on large-scale datasets, which are difficult to obtain in many clinical applications. Leveraging external labelled datasets is an effective solution to tackle the problem of insufficient training data. In this paper, we propose a new framework, namely LatenTrans, to utilize existing datasets for boosting the performance of lesion segmentation in extremely low data regimes. LatenTrans translates non-target lesions into target-like lesions and expands the training dataset with target-like data for better performance. Images are first projected to the latent space via aligned style-based generative models, and rich lesion semantics are encoded using the latent codes. A novel consistency-aware latent code manipulation module is proposed to enable high-quality local style transfer from non-target lesions to target-like lesions while preserving other parts. Moreover, we propose a new metric, Normalized Latent Distance, to solve the question of how to select an adequate one from various existing datasets for knowledge transfer. Extensive experiments are conducted on segmenting lung and brain lesions, and the experimental results demonstrate that our proposed LatenTrans is superior to existing methods for cross-disease lesion segmentation.
Collapse
|
13
|
Yang Q, Ye M, Cai Z, Su K, Du B. Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:4543-4554. [PMID: 37531308 DOI: 10.1109/tip.2023.3299791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
Composing Text and Image to Image Retrieval (CTI-IR) aims at finding the target image, which matches the query image visually along with the query text semantically. However, existing works ignore the fact that the reference text usually serves multiple functions, e.g., modification and auxiliary. To address this issue, we put forth a unified solution, namely Hierarchical Aggregation Transformer incorporated with Cross Relation Network (CRN). CRN unifies modification and relevance manner in a single framework. This configuration shows broader applicability, enabling us to model both modification and auxiliary text or their combination in triplet relationships simultaneously. Specifically, CRN includes: 1) Cross Relation Network comprehensively captures the relationships of various composed retrieval scenarios caused by two different query text types, allowing a unified retrieval model to designate adaptive combination strategies for flexible applicability; 2) Hierarchical Aggregation Transformer aggregates top-down features with Multi-layer Perceptron (MLP) to overcome the limitations of edge information loss in a window-based multi-stage Transformer. Extensive experiments demonstrate the superiority of the proposed CRN over all three fashion-domain datasets. Code is available at github.com/yan9qu/crn.
Collapse
|