1
|
Guo X, Wen H, Hao H, Zhao Y, Meng Y, Liu J, Zheng Y, Chen W, Zhao Y. Randomness-Restricted Diffusion Model for Ocular Surface Structure Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1359-1372. [PMID: 39527437 DOI: 10.1109/tmi.2024.3494762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Ocular surface diseases affect a significant portion of the population worldwide. Accurate segmentation and quantification of different ocular surface structures are crucial for the understanding of these diseases and clinical decision-making. However, the automated segmentation of the ocular surface structure is relatively unexplored and faces several challenges. Ocular surface structure boundaries are often inconspicuous and obscured by glare from reflections. In addition, the segmentation of different ocular structures always requires training of multiple individual models. Thus, developing a one-model-fits-all segmentation approach is desirable. In this paper, we introduce a randomness-restricted diffusion model for multiple ocular surface structure segmentation. First, a time-controlled fusion-attention module (TFM) is proposed to dynamically adjust the information flow within the diffusion model, based on the temporal relationships between the network's input and time. TFM enables the network to effectively utilize image features to constrain the randomness of the generation process. We further propose a low-frequency consistency filter and a new loss to alleviate model uncertainty and error accumulation caused by the multi-step denoising process. Extensive experiments have shown that our approach can segment seven different ocular surface structures. Our method performs better than both dedicated ocular surface segmentation methods and general medical image segmentation methods. We further validated the proposed method over two clinical datasets, and the results demonstrated that it is beneficial to clinical applications, such as the meibomian gland dysfunction grading and aqueous deficient dry eye diagnosis.
Collapse
|
2
|
Jiang X, Zhang D, Li X, Liu K, Cheng KT, Yang X. Labeled-to-unlabeled distribution alignment for partially-supervised multi-organ medical image segmentation. Med Image Anal 2025; 99:103333. [PMID: 39244795 DOI: 10.1016/j.media.2024.103333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 04/17/2024] [Accepted: 08/30/2024] [Indexed: 09/10/2024]
Abstract
Partially-supervised multi-organ medical image segmentation aims to develop a unified semantic segmentation model by utilizing multiple partially-labeled datasets, with each dataset providing labels for a single class of organs. However, the limited availability of labeled foreground organs and the absence of supervision to distinguish unlabeled foreground organs from the background pose a significant challenge, which leads to a distribution mismatch between labeled and unlabeled pixels. Although existing pseudo-labeling methods can be employed to learn from both labeled and unlabeled pixels, they are prone to performance degradation in this task, as they rely on the assumption that labeled and unlabeled pixels have the same distribution. In this paper, to address the problem of distribution mismatch, we propose a labeled-to-unlabeled distribution alignment (LTUDA) framework that aligns feature distributions and enhances discriminative capability. Specifically, we introduce a cross-set data augmentation strategy, which performs region-level mixing between labeled and unlabeled organs to reduce distribution discrepancy and enrich the training set. Besides, we propose a prototype-based distribution alignment method that implicitly reduces intra-class variation and increases the separation between the unlabeled foreground and background. This can be achieved by encouraging consistency between the outputs of two prototype classifiers and a linear classifier. Extensive experimental results on the AbdomenCT-1K dataset and a union of four benchmark datasets (including LiTS, MSD-Spleen, KiTS, and NIH82) demonstrate that our method outperforms the state-of-the-art partially-supervised methods by a considerable margin, and even surpasses the fully-supervised methods. The source code is publicly available at LTUDA.
Collapse
Affiliation(s)
- Xixi Jiang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Dong Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Xiang Li
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Kangyi Liu
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Kwang-Ting Cheng
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Xin Yang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
3
|
Fu W, Hu H, Li X, Guo R, Chen T, Qian X. A Generalizable Causal-Invariance-Driven Segmentation Model for Peripancreatic Vessels. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3794-3806. [PMID: 38739508 DOI: 10.1109/tmi.2024.3400528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Segmenting peripancreatic vessels in CT, including the superior mesenteric artery (SMA), the coeliac artery (CA), and the partial portal venous system (PPVS), is crucial for preoperative resectability analysis in pancreatic cancer. However, the clinical applicability of vessel segmentation methods is impeded by the low generalizability on multi-center data, mainly attributed to the wide variations in image appearance, namely the spurious correlation factor. Therefore, we propose a causal-invariance-driven generalizable segmentation model for peripancreatic vessels. It incorporates interventions at both image and feature levels to guide the model to capture causal information by enforcing consistency across datasets, thus enhancing the generalization performance. Specifically, firstly, a contrast-driven image intervention strategy is proposed to construct image-level interventions by generating images with various contrast-related appearances and seeking invariant causal features. Secondly, the feature intervention strategy is designed, where various patterns of feature bias across different centers are simulated to pursue invariant prediction. The proposed model achieved high DSC scores (79.69%, 82.62%, and 83.10%) for the three vessels on a cross-validation set containing 134 cases. Its generalizability was further confirmed on three independent test sets of 233 cases. Overall, the proposed method provides an accurate and generalizable segmentation model for peripancreatic vessels and offers a promising paradigm for increasing the generalizability of segmentation models from a causality perspective. Our source codes will be released at https://github.com/ SJTUBME-QianLab/PC_VesselSeg.
Collapse
|
4
|
Liu J, Zhang Y, Wang K, Yavuz MC, Chen X, Yuan Y, Li H, Yang Y, Yuille A, Tang Y, Zhou Z. Universal and extensible language-vision models for organ segmentation and tumor detection from abdominal computed tomography. Med Image Anal 2024; 97:103226. [PMID: 38852215 DOI: 10.1016/j.media.2024.103226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/30/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024]
Abstract
The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3410 CT volumes assembled from 14 publicly available datasets and then test it on 6173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6× faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model.
Collapse
Affiliation(s)
- Jie Liu
- City University of Hong Kong, Hong Kong
| | - Yixiao Zhang
- Johns Hopkins University, United States of America
| | - Kang Wang
- University of California, San Francisco, United States of America
| | - Mehmet Can Yavuz
- University of California, San Francisco, United States of America
| | - Xiaoxi Chen
- University of Illinois Urbana-Champaign, United States of America
| | | | | | - Yang Yang
- University of California, San Francisco, United States of America
| | - Alan Yuille
- Johns Hopkins University, United States of America
| | | | - Zongwei Zhou
- Johns Hopkins University, United States of America.
| |
Collapse
|
5
|
Liu H, Zhuang Y, Song E, Xu X, Hung CC. A bidirectional multilayer contrastive adaptation network with anatomical structure preservation for unpaired cross-modality medical image segmentation. Comput Biol Med 2022; 149:105964. [PMID: 36007288 DOI: 10.1016/j.compbiomed.2022.105964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/16/2022] [Accepted: 08/13/2022] [Indexed: 11/03/2022]
Abstract
Multi-modal medical image segmentation has achieved great success through supervised deep learning networks. However, because of domain shift and limited annotation information, unpaired cross-modality segmentation tasks are still challenging. The unsupervised domain adaptation (UDA) methods can alleviate the segmentation degradation of cross-modality segmentation by knowledge transfer between different domains, but current methods still suffer from the problems of model collapse, adversarial training instability, and mismatch of anatomical structures. To tackle these issues, we propose a bidirectional multilayer contrastive adaptation network (BMCAN) for unpaired cross-modality segmentation. The shared encoder is first adopted for learning modality-invariant encoding representations in image synthesis and segmentation simultaneously. Secondly, to retain the anatomical structure consistency in cross-modality image synthesis, we present a structure-constrained cross-modality image translation approach for image alignment. Thirdly, we construct a bidirectional multilayer contrastive learning approach to preserve the anatomical structures and enhance encoding representations, which utilizes two groups of domain-specific multilayer perceptron (MLP) networks to learn modality-specific features. Finally, a semantic information adversarial learning approach is designed to learn structural similarities of semantic outputs for output space alignment. Our proposed method was tested on three different cross-modality segmentation tasks: brain tissue, brain tumor, and cardiac substructure segmentation. Compared with other UDA methods, experimental results show that our proposed BMCAN achieves state-of-the-art segmentation performance on the above three tasks, and it has fewer training components and better feature representations for overcoming overfitting and domain shift problems. Our proposed method can efficiently reduce the annotation burden of radiologists in cross-modality image analysis.
Collapse
Affiliation(s)
- Hong Liu
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Yuzhou Zhuang
- Institute of Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Enmin Song
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Xiangyang Xu
- Center for Biomedical Imaging and Bioinformatics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Chih-Cheng Hung
- Center for Machine Vision and Security Research, Kennesaw State University, Marietta, MA, 30060, USA.
| |
Collapse
|