1
|
Hosseinzadeh Taher MR, Haghighi F, Gotway MB, Liang J. Large-scale benchmarking and boosting transfer learning for medical image analysis. Med Image Anal 2025; 102:103487. [PMID: 40117988 DOI: 10.1016/j.media.2025.103487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 08/03/2024] [Accepted: 01/27/2025] [Indexed: 03/23/2025]
Abstract
Transfer learning, particularly fine-tuning models pretrained on photographic images to medical images, has proven indispensable for medical image analysis. There are numerous models with distinct architectures pretrained on various datasets using different strategies. But, there is a lack of up-to-date large-scale evaluations of their transferability to medical imaging, posing a challenge for practitioners in selecting the most proper pretrained models for their tasks at hand. To fill this gap, we conduct a comprehensive systematic study, focusing on (i) benchmarking numerous conventional and modern convolutional neural network (ConvNet) and vision transformer architectures across various medical tasks; (ii) investigating the impact of fine-tuning data size on the performance of ConvNets compared with vision transformers in medical imaging; (iii) examining the impact of pretraining data granularity on transfer learning performance; (iv) evaluating transferability of a wide range of recent self-supervised methods with diverse training objectives to a variety of medical tasks across different modalities; and (v) delving into the efficacy of domain-adaptive pretraining on both photographic and medical datasets to develop high-performance models for medical tasks. Our large-scale study (∼5,000 experiments) yields impactful insights: (1) ConvNets demonstrate higher transferability than vision transformers when fine-tuning for medical tasks; (2) ConvNets prove to be more annotation efficient than vision transformers when fine-tuning for medical tasks; (3) Fine-grained representations, rather than high-level semantic features, prove pivotal for fine-grained medical tasks; (4) Self-supervised models excel in learning holistic features compared with supervised models; and (5) Domain-adaptive pretraining leads to performant models via harnessing knowledge acquired from ImageNet and enhancing it through the utilization of readily accessible expert annotations associated with medical datasets. As open science, all codes and pretrained models are available at GitHub.com/JLiangLab/BenchmarkTransferLearning (Version 2).
Collapse
Affiliation(s)
| | - Fatemeh Haghighi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | | | - Jianming Liang
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
2
|
Zou L, Cao Y, Nie Z, Mao L, Qiu Y, Wang Z, Cai Z, Yang X. Segment Like A Doctor: Learning reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation. Med Image Anal 2025; 102:103539. [PMID: 40112510 DOI: 10.1016/j.media.2025.103539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 02/05/2025] [Accepted: 02/27/2025] [Indexed: 03/22/2025]
Abstract
Pancreatic cancer is a lethal invasive tumor with one of the worst prognosis. Accurate and reliable segmentation for pancreas and pancreatic cancer on computerized tomography (CT) images is vital in clinical diagnosis and treatment. Although certain deep learning-based techniques have been tentatively applied to this task, current performance of pancreatic cancer segmentation is far from meeting the clinical needs due to the tiny size, irregular shape and extremely uncertain boundary of the cancer. Besides, most of the existing studies are established on the black-box models which only learn the annotation distribution instead of the logical thinking and diagnostic experience of high-level medical experts, the latter is more credible and interpretable. To alleviate the above issues, we propose a novel Segment-Like-A-Doctor (SLAD) framework to learn the reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation on CT images. Specifically, SLAD aims to simulate the essential logical thinking and experience of doctors in the progressive diagnostic stages of pancreatic cancer: organ, lesion and boundary stage. Firstly, in the organ stage, an Anatomy-aware Masked AutoEncoder (AMAE) is introduced to model the doctors' overall cognition for the anatomical distribution of abdominal organs on CT images by self-supervised pretraining. Secondly, in the lesion stage, a Causality-driven Graph Reasoning Module (CGRM) is designed to learn the global judgment of doctors for lesion detection by exploring topological feature difference between the causal lesion and the non-causal organ. Finally, in the boundary stage, a Diffusion-based Discrepancy Calibration Module (DDCM) is developed to fit the refined understanding of doctors for uncertain boundary of pancreatic cancer by inferring the ambiguous segmentation discrepancy based on the trustworthy lesion core. Experimental results on three independent datasets demonstrate that our approach boosts pancreatic cancer segmentation accuracy by 4%-9% compared with the state-of-the-art methods. Additionally, the tumor-vascular involvement analysis is also conducted to verify the superiority of our method in clinical applications. Our source codes will be publicly available at https://github.com/ZouLiwen-1999/SLAD.
Collapse
Affiliation(s)
- Liwen Zou
- School of Mathematics, Nanjing University, Nanjing, 210093, China
| | - Yingying Cao
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210000, China
| | - Ziwei Nie
- School of Mathematics, Nanjing University, Nanjing, 210093, China
| | - Liang Mao
- Department of Pancreatic Surgery, Nanjing Drum Tower Hospital, Nanjing, 210008, China
| | - Yudong Qiu
- Department of Pancreatic Surgery, Nanjing Drum Tower Hospital, Nanjing, 210008, China
| | - Zhongqiu Wang
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210000, China
| | - Zhenghua Cai
- Department of Pancreatic Surgery, Nanjing Drum Tower Hospital, Nanjing, 210008, China; Medical School, Nanjing University, Nanjing, 210007, China.
| | - Xiaoping Yang
- School of Mathematics, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
3
|
Zhang X, Xiao Z, Wu X, Chen Y, Zhao J, Hu Y, Liu J. Pyramid Pixel Context Adaption Network for Medical Image Classification With Supervised Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6802-6815. [PMID: 38829749 DOI: 10.1109/tnnls.2024.3399164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Spatial attention (SA) mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, the existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, pyramid pixel context adaption (PPCA) module, which exploits multiscale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling (CCPP) to aggregate multiscale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization (PN), and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCA network (PPCANet) is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss (CL). The extensive experiments on six medical image datasets show that the PPCANet outperforms state-of-the-art (SOTA) attention-based networks and recent DNNs. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
Collapse
|
4
|
He Y, An C, Dong K, Lyu Z, Qin S, Tan K, Hao X, Zhu C, Xiu W, Hu B, Xia N, Wang C, Dong Q. A Novel Visual Model for Predicting Prognosis of Resected Hepatoblastoma: A Multicenter Study. Acad Radiol 2025:S1076-6332(25)00197-7. [PMID: 40140274 DOI: 10.1016/j.acra.2025.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2025] [Revised: 02/26/2025] [Accepted: 03/02/2025] [Indexed: 03/28/2025]
Abstract
RATIONALE AND OBJECTIVES This study aimed to evaluate the application of a contrast-enhanced CT-based visual model in predicting postoperative prognosis in patients with hepatoblastoma (HB). MATERIALS AND METHODS We analyzed data from 224 patients across three centers (178 in the training cohort, 46 in the validation cohort). Visual features were extracted from contrast-enhanced CT images, and key features, along with clinicopathological data, were identified using LASSO Cox regression. Visual (DINOv2_score) and clinical (Clinical_score) models were developed, and a combined model integrating DINOv2_score and clinical risk factors was constructed. Nomograms were created for personalized risk assessment, with calibration curves and decision curve analysis (DCA) used to evaluate model performance. RESULTS The DINOv2_score was recognized as a key prognostic indicator for HB. In both the training and validation cohorts, the combined model demonstrated superior performance in predicting disease-free survival (DFS) [C-index (95% CI): 0.886 (0.879-0.895) and 0.873 (0.837-0.909), respectively] and overall survival (OS) [C-index (95% CI): 0.887 (0.877-0.897) and 0.882 (0.858-0.906), respectively]. Calibration curves showed strong alignment between predicted and observed outcomes, while DCA demonstrated that the combined model provided greater clinical net benefit than the clinical or visual models alone across a range of threshold probabilities. CONCLUSION The contrast-enhanced CT-based visual model serves as an effective tool for predicting postoperative prognosis in HB patients. The combined model, integrating the DINOv2_score and clinical risk factors, demonstrated superior performance in survival prediction, offering more precise guidance for personalized treatment strategies.
Collapse
Affiliation(s)
- Ying He
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (Y.H., X.H., W.X., C.W., Q.D.)
| | - Chaohui An
- Department of General Surgery, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China (C.A., Z.L., K.T.)
| | - Kuiran Dong
- Department of Pediatric Surgery, Children's Hospital of Fudan University, 399 Wanyuan Road, Shanghai 201102, China (K.D., S.Q.)
| | - Zhibao Lyu
- Department of General Surgery, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China (C.A., Z.L., K.T.)
| | - Shanlu Qin
- Department of Pediatric Surgery, Children's Hospital of Fudan University, 399 Wanyuan Road, Shanghai 201102, China (K.D., S.Q.)
| | - Kezhe Tan
- Department of General Surgery, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China (C.A., Z.L., K.T.)
| | - Xiwei Hao
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (Y.H., X.H., W.X., C.W., Q.D.)
| | - Chengzhan Zhu
- Department of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (C.Z.)
| | - Wenli Xiu
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (Y.H., X.H., W.X., C.W., Q.D.)
| | - Bin Hu
- Department of Radiology, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (B.H.)
| | - Nan Xia
- Shandong Key Laboratory of Digital Medicine and Computer-Assisted Surgery, The Affiliated Hospital of Qingdao University, No. 16 Jiangsu Road, Qingdao 266003, China (N.X.)
| | - Chaojin Wang
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (Y.H., X.H., W.X., C.W., Q.D.)
| | - Qian Dong
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, No.16 Jiangsu Road, Qingdao 266003, China (Y.H., X.H., W.X., C.W., Q.D.).
| |
Collapse
|
5
|
Lyu J, Bartlett PF, Nasrallah FA, Tang X. Masked Deformation Modeling for Volumetric Brain MRI Self-Supervised Pre-Training. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1596-1607. [PMID: 40030579 DOI: 10.1109/tmi.2024.3510922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Self-supervised learning (SSL) has been proposed to alleviate neural networks' reliance on annotated data and to improve downstream tasks' performance, which has obtained substantial success in several volumetric medical image segmentation tasks. However, most existing approaches are designed and pre-trained on CT or MRI datasets of non-brain organs. The lack of brain prior limits those methods' performance on brain segmentation, especially on fine-grained brain parcellation. To overcome this limitation, we here propose a novel SSL strategy for MRI of the human brain, named Masked Deformation Modeling (MDM). MDM first conducts atlas-guided patch sampling on individual brain MRI scans (moving volumes) and an MNI152 template (a fixed volume). The sampled moving volumes are randomly masked in a feature-aligned manner, and then sent into a U-Net-based network to extract latent features. An intensity head and a deformation field head are used to decode the latent features, respectively restoring the masked volume and predicting the deformation field from the moving volume to the fixed volume. The proposed MDM is fine-tuned and evaluated on three brain parcellation datasets with different granularities (JHU, Mindboggle-101, CANDI), a brain lesion segmentation dataset (ATLAS2), and a brain tumor segmentation dataset (BraTS21). Results demonstrate that MDM outperforms various state-of-the-art medical SSL methods by considerable margins, and can effectively reduce the annotation effort by at least 40%. Codes and pre-trained weights will be released at https://github.com/CRazorback/MDM.
Collapse
|
6
|
Chu S, Ren X, Ji G, Zhao J, Shi J, Wei Y, Pei B, Qiang Y. Learning Consistent Semantic Representation for Chest X-ray via Anatomical Localization in Self-Supervised Pre-Training. IEEE J Biomed Health Inform 2025; 29:2100-2112. [PMID: 40030350 DOI: 10.1109/jbhi.2024.3505303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2025]
Abstract
Despite the similar global structures in Chest X-ray (CXR) images, the same anatomy exhibits varying appearances across images, including differences in local textures, shapes, colors, etc. Learning consistent representations for anatomical semantics through these diverse appearances poses a great challenge for self-supervised pre-training in CXR images. To address this challenge, we propose two new pre-training tasks: inner-image anatomy localization (IIAL) and cross-image anatomy localization (CIAL). Leveraging the relatively stable positions of identical anatomy across images, we utilize position information directly as supervision to learn consistent semantic representations. Specifically, IIAL adopts a coarse-to-fine heatmap localization approach to correlate anatomical semantics with positions, while CIAL leverages feature affine alignment and heatmap localization to establish a correspondence between identical anatomical semantics across varying images, despite their appearance diversity. Furthermore, we introduce a unified end-to-end pre-training framework, anatomy-aware representation learning (AARL), integrating IIAL, CIAL, and a pixel restoration task. The advantages of AARL are: 1) preserving the appearance diversity and 2) training in a simple end-to-end way avoiding complicated preprocessing. Extensive experiments on six downstream tasks, including classification and segmentation tasks in various application scenarios, demonstrate that our AARL: 1) has more powerful representation and transferring ability; 2) is annotation-efficient, reducing the demand for labeled data and 3) improves the sensitivity to detecting various pathological and anatomical patterns.
Collapse
|
7
|
Shi C, Zhang X, Zhao R, Zhang W, Chen F. Semantic structure preservation for accurate multi-modal glioma diagnosis. Sci Rep 2025; 15:7185. [PMID: 40021688 PMCID: PMC11871068 DOI: 10.1038/s41598-025-88458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 01/28/2025] [Indexed: 03/03/2025] Open
Abstract
Pretraining has laid the foundation for the recent success of deep learning in multimodal medical image analysis. However, existing methods often overlook the semantic structure embedded in modality-specific representations, and supervised pretraining requires a carefully designed, time-consuming two-stage annotation process. To address this, we propose a novel semantic structure-preserving consistency method, named "Review of Free-Text Reports for Preserving Multimodal Semantic Structure" (RFPMSS). During the semantic structure training phase, we learn multiple anchors to capture the semantic structure of each modality, and sample-sample relationships are represented by associating samples with these anchors, forming modality-specific semantic relationships. For comprehensive modality alignment, RFPMSS extracts supervision signals from patient examination reports, establishing global alignment between images and text. Evaluations on datasets collected from Shanxi Provincial Cancer Hospital and Shanxi Provincial People's Hospital demonstrate that our proposed cross-modal supervision using free-text image reports and multi-anchor allocation achieves state-of-the-art performance under highly limited supervision. Code: https://github.com/shichaoyu1/RFPMSS.
Collapse
Affiliation(s)
- Chaoyu Shi
- School of Computer Information Engineering, Shanxi Technology and Business University, Taiyuan City, Shanxi Province, China
| | - Xia Zhang
- Respiratory medicine department, Shanxi Province Cancer Hospital/ Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan City, Shanxi Province, China.
| | - Runzhen Zhao
- Pathology Department, Fenyang Hospital of Shanxi Province, Fenyang City, Shanxi Province, China
| | - Wen Zhang
- Neurosurgery Department, Shanxi Provincial People's Hospital, Taiyuan City, Shanxi Province, China
| | - Fei Chen
- Department of radiotherapy, Shanxi Province Cancer Hospital/ Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan City, Shanxi Province, China
| |
Collapse
|
8
|
He Y, Huang F, Jiang X, Nie Y, Wang M, Wang J, Chen H. Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions. IEEE Rev Biomed Eng 2025; 18:172-191. [PMID: 39531565 DOI: 10.1109/rbme.2024.3496744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Foundation model, trained on a diverse range of data and adaptable to a myriad of tasks, is advancing healthcare. It fosters the development of healthcare artificial intelligence (AI) models tailored to the intricacies of the medical field, bridging the gap between limited AI models and the varied nature of healthcare practices. The advancement of a healthcare foundation model (HFM) brings forth tremendous potential to augment intelligent healthcare services across a broad spectrum of scenarios. However, despite the imminent widespread deployment of HFMs, there is currently a lack of clear understanding regarding their operation in the healthcare field, their existing challenges, and their future trajectory. To answer these critical inquiries, we present a comprehensive and in-depth examination that delves into the landscape of HFMs. It begins with a comprehensive overview of HFMs, encompassing their methods, data, and applications, to provide a quick understanding of the current progress. Subsequently, it delves into a thorough exploration of the challenges associated with data, algorithms, and computing infrastructures in constructing and widely applying foundation models in healthcare. Furthermore, this survey identifies promising directions for future development in this field. We believe that this survey will enhance the community's understanding of the current progress of HFMs and serve as a valuable source of guidance for future advancements in this domain.
Collapse
|
9
|
Cai Z, Zhong Z, Lin H, Huang B, Xu Z, Huang B, Deng W, Wu Q, Lei K, Lyu J, Ye Y, Chen H, Zhang J. Self-supervised learning on dual-sequence magnetic resonance imaging for automatic segmentation of nasopharyngeal carcinoma. Comput Med Imaging Graph 2024; 118:102471. [PMID: 39608271 DOI: 10.1016/j.compmedimag.2024.102471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 10/08/2024] [Accepted: 11/12/2024] [Indexed: 11/30/2024]
Abstract
Automating the segmentation of nasopharyngeal carcinoma (NPC) is crucial for therapeutic procedures but presents challenges given the hurdles in amassing extensively annotated datasets. Although previous studies have applied self-supervised learning to capitalize on unlabeled data to improve segmentation performance, these methods often overlooked the benefits of dual-sequence magnetic resonance imaging (MRI). In the present study, we incorporated self-supervised learning with a saliency transformation module using unlabeled dual-sequence MRI for accurate NPC segmentation. 44 labeled and 72 unlabeled patients were collected to develop and evaluate our network. Impressively, our network achieved a mean Dice similarity coefficient (DSC) of 0.77, which is consistent with a previous study that relied on a training set of 4,100 annotated cases. The results further revealed that our approach required minimal adjustments, primarily < 20% tweak in the DSC, to meet clinical standards. By enhancing the automatic segmentation of NPC, our method alleviates the annotation burden on oncologists, curbs subjectivity, and ensures reliable NPC delineation.
Collapse
Affiliation(s)
- Zongyou Cai
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Zhangnan Zhong
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Haiwei Lin
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Bingsheng Huang
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Ziyue Xu
- NVIDIA Corporation, Bethesda, MD, USA
| | - Bin Huang
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Wei Deng
- Department of Radiology, Panyu Central Hospital, Guangzhou, China; Medical Imaging Institute of Panyu, Guangzhou, China
| | - Qiting Wu
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Kaixin Lei
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Jiegeng Lyu
- Medical AI Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
| | - Yufeng Ye
- Department of Radiology, Panyu Central Hospital, Guangzhou, China; Medical Imaging Institute of Panyu, Guangzhou, China.
| | - Hanwei Chen
- Panyu Health Management Center (Panyu Rehabilitation Hospital), Guangzhou, China.
| | - Jian Zhang
- Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen, China; Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
10
|
Liu J, Zhang Y, Wang K, Yavuz MC, Chen X, Yuan Y, Li H, Yang Y, Yuille A, Tang Y, Zhou Z. Universal and extensible language-vision models for organ segmentation and tumor detection from abdominal computed tomography. Med Image Anal 2024; 97:103226. [PMID: 38852215 DOI: 10.1016/j.media.2024.103226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/30/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024]
Abstract
The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3410 CT volumes assembled from 14 publicly available datasets and then test it on 6173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6× faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model.
Collapse
Affiliation(s)
- Jie Liu
- City University of Hong Kong, Hong Kong
| | - Yixiao Zhang
- Johns Hopkins University, United States of America
| | - Kang Wang
- University of California, San Francisco, United States of America
| | - Mehmet Can Yavuz
- University of California, San Francisco, United States of America
| | - Xiaoxi Chen
- University of Illinois Urbana-Champaign, United States of America
| | | | | | - Yang Yang
- University of California, San Francisco, United States of America
| | - Alan Yuille
- Johns Hopkins University, United States of America
| | | | - Zongwei Zhou
- Johns Hopkins University, United States of America.
| |
Collapse
|
11
|
Huang W, Li C, Yang H, Liu J, Liang Y, Zheng H, Wang S. Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement. Med Image Anal 2024; 97:103299. [PMID: 39146702 DOI: 10.1016/j.media.2024.103299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 07/05/2024] [Accepted: 08/06/2024] [Indexed: 08/17/2024]
Abstract
Recently, vision-language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient's condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.
Collapse
Affiliation(s)
- Weijian Huang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Peng Cheng Laboratory, Shenzhen 518066, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng Li
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hao Yang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Peng Cheng Laboratory, Shenzhen 518066, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiarun Liu
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Peng Cheng Laboratory, Shenzhen 518066, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Hairong Zheng
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Shanshan Wang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
12
|
Hu Z, Liu J, Shen S, Wu W, Yuan J, Shen W, Ma L, Wang G, Yang S, Xu X, Cui Y, Li Z, Shen L, Li L, Bian J, Zhang X, Han H, Lin J. Large-volume fully automated cell reconstruction generates a cell atlas of plant tissues. THE PLANT CELL 2024; 36:koae250. [PMID: 39283506 PMCID: PMC11852339 DOI: 10.1093/plcell/koae250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/24/2024] [Accepted: 09/13/2024] [Indexed: 02/27/2025]
Abstract
The geometric shape and arrangement of individual cells play a role in shaping organ functions. However, analyzing multicellular features and exploring their connectomes in centimeter-scale plant organs remain challenging. Here, we established a set of frameworks named Large-Volume Fully Automated Cell Reconstruction (LVACR), enabling the exploration of three-dimensional (3D) cytological features and cellular connectivity in plant tissues. Through benchmark testing, our framework demonstrated superior efficiency in cell segmentation and aggregation, successfully addressing the inherent challenges posed by light sheet fluorescence microscopy (LSFM) imaging. Using LVACR, we successfully established a cell atlas of different plant tissues. Cellular morphology analysis revealed differences of cell clusters and shapes in between different poplar (P. simonii Carr. and P. canadensis Moench.) seeds, whereas topological analysis revealed that they maintained conserved cellular connectivity. Furthermore, LVACR spatiotemporally demonstrated an initial burst of cell proliferation, accompanied by morphological transformations at an early stage in developing the shoot apical meristem. During subsequent development, cell differentiation produced anisotropic features, thereby resulting in various cell shapes. Overall, our findings provided valuable insights into the precise spatial arrangement and cellular behavior of multicellular organisms, thus enhancing our understanding of the complex processes underlying plant growth and differentiation.
Collapse
Affiliation(s)
- Zijian Hu
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Jiazheng Liu
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Future Technology, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China
| | - Shiya Shen
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Weiqian Wu
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Jingbin Yuan
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Weiwei Shen
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Lingyu Ma
- Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing 100091, China
| | - Guangchao Wang
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shunyao Yang
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xiuping Xu
- Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Yaning Cui
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhenchen Li
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Future Technology, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China
| | - Lijun Shen
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Linlin Li
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Jiahui Bian
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xi Zhang
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hua Han
- Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Future Technology, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China
| | - Jinxing Lin
- State Key Laboratory of Tree Genetics and Breeding, State Key Laboratory of Efficient Production of Forest Resources, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| |
Collapse
|
13
|
Huang W, Li C, Zhou HY, Yang H, Liu J, Liang Y, Zheng H, Zhang S, Wang S. Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning. Nat Commun 2024; 15:7620. [PMID: 39223122 PMCID: PMC11369198 DOI: 10.1038/s41467-024-51749-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open
Abstract
Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model's representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks.
Collapse
Affiliation(s)
- Weijian Huang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Cheng Li
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Hong-Yu Zhou
- Department of Biomedical Informatics, Harvard Medical University, Boston, MA, USA
| | - Hao Yang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jiarun Liu
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Hairong Zheng
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Shaoting Zhang
- Qingyuan Research Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Shanshan Wang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
14
|
Jiang N, Wang G, Ye C, Liu T, Yan T. Multi-Task Collaborative Pre-Training and Adaptive Token Selection: A Unified Framework for Brain Representation Learning. IEEE J Biomed Health Inform 2024; 28:5528-5539. [PMID: 38889024 DOI: 10.1109/jbhi.2024.3416038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Structural magnetic resonance imaging (sMRI) reveals the structural organization of the brain. Learning general brain representations from sMRI is an enduring topic in neuroscience. Previous deep learning models neglect that the brain, as the core of cognition, is distinct from other organs whose primary attribute is anatomy. Capturing the high-level representation associated with inter-individual cognitive variability is key to appropriately represent the brain. Given that this cognition-related information is subtle, mixed, and distributed in the brain structure, sMRI-based models need to both capture fine-grained details and understand how they relate to the overall global structure. Additionally, it is also necessary to explicitly express the cognitive information that implicitly embedded in local-global image features. Therefore, we propose MCPATS, a brain representation learning framework that combines Multi-task Collaborative Pre-training (MCP) and Adaptive Token Selection (ATS). First, we develop MCP, including mask-reconstruction to understand global context, distort-restoration to capture fine-grained local details, adversarial learning to integrate features at different granularities, and age-prediction, using age as a surrogate for cognition to explicitly encode cognition-related information from local-global image features. This co-training allows progressive learning of implicit and explicit cognition-related representations. Then, we develop ATS based on mutual attention for downstream use of the learned representation. During fine-tuning, the ATS highlights discriminative features and reduces the impact of irrelevant information. MCPATS was validated on three different public datasets for brain disease diagnosis, outperforming competing methods and achieving accurate diagnosis. Further, we performed detailed analysis to confirm that the MCPATS-learned representation captures cognition-related information.
Collapse
|
15
|
Choopong P, Kusakunniran W. Selection of pre-trained weights for transfer learning in automated cytomegalovirus retinitis classification. Sci Rep 2024; 14:15899. [PMID: 38987446 PMCID: PMC11237151 DOI: 10.1038/s41598-024-67121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 07/08/2024] [Indexed: 07/12/2024] Open
Abstract
Cytomegalovirus retinitis (CMVR) is a significant cause of vision loss. Regular screening is crucial but challenging in resource-limited settings. A convolutional neural network is a state-of-the-art deep learning technique to generate automatic diagnoses from retinal images. However, there are limited numbers of CMVR images to train the model properly. Transfer learning (TL) is a strategy to train a model with a scarce dataset. This study explores the efficacy of TL with different pre-trained weights for automated CMVR classification using retinal images. We utilised a dataset of 955 retinal images (524 CMVR and 431 normal) from Siriraj Hospital, Mahidol University, collected between 2005 and 2015. Images were processed using Kowa VX-10i or VX-20 fundus cameras and augmented for training. We employed DenseNet121 as a backbone model, comparing the performance of TL with weights pre-trained on ImageNet, APTOS2019, and CheXNet datasets. The models were evaluated based on accuracy, loss, and other performance metrics, with the depth of fine-tuning varied across different pre-trained weights. The study found that TL significantly enhances model performance in CMVR classification. The best results were achieved with weights sequentially transferred from ImageNet to APTOS2019 dataset before application to our CMVR dataset. This approach yielded the highest mean accuracy (0.99) and lowest mean loss (0.04), outperforming other methods. The class activation heatmaps provided insights into the model's decision-making process. The model with APTOS2019 pre-trained weights offered the best explanation and highlighted the pathologic lesions resembling human interpretation. Our findings demonstrate the potential of sequential TL in improving the accuracy and efficiency of CMVR diagnosis, particularly in settings with limited data availability. They highlight the importance of domain-specific pre-training in medical image classification. This approach streamlines the diagnostic process and paves the way for broader applications in automated medical image analysis, offering a scalable solution for early disease detection.
Collapse
Affiliation(s)
- Pitipol Choopong
- Department of Ophthalmology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
- Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
| | - Worapan Kusakunniran
- Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand.
| |
Collapse
|
16
|
Taher MRH, Gotway MB, Liang J. Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2024; abs/210504906:11269-11281. [PMID: 39670210 PMCID: PMC11636527 DOI: 10.1109/cvpr52733.2024.01071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.
Collapse
|
17
|
Zeng M, Wang X, Chen W. Worldwide research landscape of artificial intelligence in lung disease: A scientometric study. Heliyon 2024; 10:e31129. [PMID: 38826704 PMCID: PMC11141367 DOI: 10.1016/j.heliyon.2024.e31129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 06/04/2024] Open
Abstract
Purpose To perform a comprehensive bibliometric analysis of the application of artificial intelligence (AI) in lung disease to understand the current status and emerging trends of this field. Materials and methods AI-based lung disease research publications were selected from the Web of Science Core Collection. Citespace, VOS viewer and Excel were used to analyze and visualize co-authorship, co-citation, and co-occurrence analysis of authors, keywords, countries/regions, references and institutions in this field. Results Our study included a total of 5210 papers. The number of publications on AI in lung disease showed explosive growth since 2017. China and the United States lead in publication numbers. The most productive author were Li, Weimin and Qian Wei, with Shanghai Jiaotong University as the most productive institution. Radiology was the most co-cited journal. Lung cancer and COVID-19 emerged as the most studied diseases. Deep learning, convolutional neural network, lung cancer, radiomics will be the focus of future research. Conclusions AI-based diagnosis and treatment of lung disease has become a research hotspot in recent years, yielding significant results. Future work should focus on establishing multimodal AI models that incorporate clinical, imaging and laboratory information. Enhanced visualization of deep learning, AI-driven differential diagnosis model for lung disease and the creation of international large-scale lung disease databases should also be considered.
Collapse
Affiliation(s)
| | | | - Wei Chen
- Department of Radiology, Southwest Hospital, Third Military Medical University, Chongqing, China
| |
Collapse
|
18
|
Haghighi F, Hosseinzadeh Taher MR, Gotway MB, Liang J. Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial? Med Image Anal 2024; 94:103086. [PMID: 38537414 PMCID: PMC11044023 DOI: 10.1016/j.media.2024.103086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/15/2023] [Accepted: 01/05/2024] [Indexed: 04/16/2024]
Abstract
Discriminative, restorative, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, fail to capitalize on the potentially synergistic effects these methods may offer in a ternary setup, which, we envision can significantly benefit deep semantic representation learning. Towards this end, we developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA: (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; (4) improves reusability of low/mid-level features; and (5) enhances restorative self-supervised approaches, revealing that DiRA is a general framework for united representation learning. Code and pretrained models are available at https://github.com/JLiangLab/DiRA.
Collapse
Affiliation(s)
- Fatemeh Haghighi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | | | | | - Jianming Liang
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
19
|
Chen J, Li M, Han H, Zhao Z, Chen X. SurgNet: Self-Supervised Pretraining With Semantic Consistency for Vessel and Instrument Segmentation in Surgical Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1513-1525. [PMID: 38090838 DOI: 10.1109/tmi.2023.3341948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Blood vessel and surgical instrument segmentation is a fundamental technique for robot-assisted surgical navigation. Despite the significant progress in natural image segmentation, surgical image-based vessel and instrument segmentation are rarely studied. In this work, we propose a novel self-supervised pretraining method (SurgNet) that can effectively learn representative vessel and instrument features from unlabeled surgical images. As a result, it allows for precise and efficient segmentation of vessels and instruments with only a small amount of labeled data. Specifically, we first construct a region adjacency graph (RAG) based on local semantic consistency in unlabeled surgical images and use it as a self-supervision signal for pseudo-mask segmentation. We then use the pseudo-mask to perform guided masked image modeling (GMIM) to learn representations that integrate structural information of intraoperative objectives more effectively. Our pretrained model, paired with various segmentation methods, can be applied to perform vessel and instrument segmentation accurately using limited labeled data for fine-tuning. We build an Intraoperative Vessel and Instrument Segmentation (IVIS) dataset, comprised of ~3 million unlabeled images and over 4,000 labeled images with manual vessel and instrument annotations to evaluate the effectiveness of our self-supervised pretraining method. We also evaluated the generalizability of our method to similar tasks using two public datasets. The results demonstrate that our approach outperforms the current state-of-the-art (SOTA) self-supervised representation learning methods in various surgical image segmentation tasks.
Collapse
|
20
|
Xing Z, Zhu L, Yu L, Xing Z, Wan L. Hybrid Masked Image Modeling for 3D Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:2115-2125. [PMID: 38289846 DOI: 10.1109/jbhi.2024.3360239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations.
Collapse
|
21
|
Kyung S, Jang M, Park S, Yoon HM, Hong GS, Kim N. Supervised representation learning based on various levels of pediatric radiographic views for transfer learning. Sci Rep 2024; 14:7551. [PMID: 38555414 PMCID: PMC10981659 DOI: 10.1038/s41598-024-58163-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/26/2024] [Indexed: 04/02/2024] Open
Abstract
Transfer learning plays a pivotal role in addressing the paucity of data, expediting training processes, and enhancing model performance. Nonetheless, the prevailing practice of transfer learning predominantly relies on pre-trained models designed for the natural image domain, which may not be well-suited for the medical image domain in grayscale. Recognizing the significance of leveraging transfer learning in medical research, we undertook the construction of class-balanced pediatric radiograph datasets collectively referred to as PedXnets, grounded in radiographic views using the pediatric radiographs collected over 24 years at Asan Medical Center. For PedXnets pre-training, approximately 70,000 X-ray images were utilized. Three different pre-training weights of PedXnet were constructed using Inception V3 for various radiation perspective classifications: Model-PedXnet-7C, Model-PedXnet-30C, and Model-PedXnet-68C. We validated the transferability and positive effects of transfer learning of PedXnets through pediatric downstream tasks including fracture classification and bone age assessment (BAA). The evaluation of transfer learning effects through classification and regression metrics showed superior performance of Model-PedXnets in quantitative assessments. Additionally, visual analyses confirmed that the Model-PedXnets were more focused on meaningful regions of interest.
Collapse
Affiliation(s)
- Sunggu Kyung
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Miso Jang
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Seungju Park
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Hee Mang Yoon
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Gil-Sun Hong
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Namkug Kim
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea.
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-gu, Seoul, 05505, Republic of Korea.
| |
Collapse
|
22
|
Zhao L, Fong TC, Bell MAL. Detection of COVID-19 features in lung ultrasound images using deep neural networks. COMMUNICATIONS MEDICINE 2024; 4:41. [PMID: 38467808 PMCID: PMC10928066 DOI: 10.1038/s43856-024-00463-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/16/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Deep neural networks (DNNs) to detect COVID-19 features in lung ultrasound B-mode images have primarily relied on either in vivo or simulated images as training data. However, in vivo images suffer from limited access to required manual labeling of thousands of training image examples, and simulated images can suffer from poor generalizability to in vivo images due to domain differences. We address these limitations and identify the best training strategy. METHODS We investigated in vivo COVID-19 feature detection with DNNs trained on our carefully simulated datasets (40,000 images), publicly available in vivo datasets (174 images), in vivo datasets curated by our team (958 images), and a combination of simulated and internal or external in vivo datasets. Seven DNN training strategies were tested on in vivo B-mode images from COVID-19 patients. RESULTS Here, we show that Dice similarity coefficients (DSCs) between ground truth and DNN predictions are maximized when simulated data are mixed with external in vivo data and tested on internal in vivo data (i.e., 0.482 ± 0.211), compared with using only simulated B-mode image training data (i.e., 0.464 ± 0.230) or only external in vivo B-mode training data (i.e., 0.407 ± 0.177). Additional maximization is achieved when a separate subset of the internal in vivo B-mode images are included in the training dataset, with the greatest maximization of DSC (and minimization of required training time, or epochs) obtained after mixing simulated data with internal and external in vivo data during training, then testing on the held-out subset of the internal in vivo dataset (i.e., 0.735 ± 0.187). CONCLUSIONS DNNs trained with simulated and in vivo data are promising alternatives to training with only real or only simulated data when segmenting in vivo COVID-19 lung ultrasound features.
Collapse
Affiliation(s)
- Lingyi Zhao
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Tiffany Clair Fong
- Department of Emergency Medicine, Johns Hopkins Medicine, Baltimore, MD, USA
| | - Muyinatu A Lediju Bell
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
23
|
Pang Y, Liang J, Huang T, Chen H, Li Y, Li D, Huang L, Wang Q. Slim UNETR: Scale Hybrid Transformers to Efficient 3D Medical Image Segmentation Under Limited Computational Resources. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:994-1005. [PMID: 37862274 DOI: 10.1109/tmi.2023.3326188] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2023]
Abstract
Hybrid transformer-based segmentation approaches have shown great promise in medical image analysis. However, they typically require considerable computational power and resources during both training and inference stages, posing a challenge for resource-limited medical applications common in the field. To address this issue, we present an innovative framework called Slim UNETR, designed to achieve a balance between accuracy and efficiency by leveraging the advantages of both convolutional neural networks and transformers. Our method features the Slim UNETR Block as a core component, which effectively enables information exchange through self-attention mechanism decomposition and cost-effective representation aggregation. Additionally, we utilize the throughput metric as an efficiency indicator to provide feedback on model resource consumption. Our experiments demonstrate that Slim UNETR outperforms state-of-the-art models in terms of accuracy, model size, and efficiency when deployed on resource-constrained devices. Remarkably, Slim UNETR achieves 92.44% dice accuracy on BraTS2021 while being 34.6x smaller and 13.4x faster during inference compared to Swin UNETR. Code: https://github.com/aigzhusmart/Slim-UNETR.
Collapse
|
24
|
Yu K, Sun L, Chen J, Reynolds M, Chaudhary T, Batmanghelich K. DrasCLR: A self-supervised framework of learning disease-related and anatomy-specific representation for 3D lung CT images. Med Image Anal 2024; 92:103062. [PMID: 38086236 PMCID: PMC10872608 DOI: 10.1016/j.media.2023.103062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 08/24/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D lung CT images to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale datasets of lung CT scans show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework. Our codes are available at https://github.com/batmanlab/DrasCLR.
Collapse
Affiliation(s)
- Ke Yu
- School of Computing and Information, University of Pittsburgh, Pittsburgh, USA.
| | - Li Sun
- Department of Electrical and Computer Engineering, Boston University, Boston, USA
| | - Junxiang Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Maxwell Reynolds
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Tigmanshu Chaudhary
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Kayhan Batmanghelich
- Department of Electrical and Computer Engineering, Boston University, Boston, USA
| |
Collapse
|
25
|
Fischer M, Bartler A, Yang B. Prompt tuning for parameter-efficient medical image segmentation. Med Image Anal 2024; 91:103024. [PMID: 37976866 DOI: 10.1016/j.media.2023.103024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 07/16/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023]
Abstract
Neural networks pre-trained on a self-supervision scheme have become the standard when operating in data rich environments with scarce annotations. As such, fine-tuning a model to a downstream task in a parameter-efficient but effective way, e.g. for a new set of classes in the case of semantic segmentation, is of increasing importance. In this work, we propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. Relying on the recently popularized prompt tuning approach, we provide a prompt-able UNETR (PUNETR) architecture, that is frozen after pre-training, but adaptable throughout the network by class-dependent learnable prompt tokens. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes (contrastive prototype assignment, CPA) of a student teacher combination. Concurrently, an additional segmentation loss is applied for a subset of classes during pre-training, further increasing the effectiveness of leveraged prompts in the fine-tuning phase. We demonstrate that the resulting method is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models on CT imaging datasets. To this end, the difference between fully fine-tuned and prompt-tuned variants amounts to 7.81 pp for the TCIA/BTCV dataset as well as 5.37 and 6.57 pp for subsets of the TotalSegmentator dataset in the mean Dice Similarity Coefficient (DSC, in %) while only adjusting prompt tokens, corresponding to 0.51% of the pre-trained backbone model with 24.4M frozen parameters. The code for this work is available on https://github.com/marcdcfischer/PUNETR.
Collapse
Affiliation(s)
- Marc Fischer
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany.
| | - Alexander Bartler
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany
| | - Bin Yang
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany
| |
Collapse
|
26
|
Zhou J, Zhao M, Yang Z, Chen L, Liu X. Exploring the Value of MRI Measurement of Hippocampal Volume for Predicting the Occurrence and Progression of Alzheimer's Disease Based on Artificial Intelligence Deep Learning Technology and Evidence-Based Medicine Meta-Analysis. J Alzheimers Dis 2024; 97:1275-1288. [PMID: 38277290 DOI: 10.3233/jad-230733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
BACKGROUND Alzheimer's disease (AD), a major dementia cause, lacks effective treatment. MRI-based hippocampal volume measurement using artificial intelligence offers new insights into early diagnosis and intervention in AD progression. OBJECTIVE This study, involving 483 AD patients, 756 patients with mild cognitive impairment (MCI), and 968 normal controls (NC), investigated the predictive capability of MRI-based hippocampus volume measurements for AD risk using artificial intelligence and evidence-based medicine. METHODS Utilizing data from ADNI and OASIS-brains databases, three convolutional neural networks (InceptionResNetv2, Densenet169, and SEResNet50) were employed for automated AD classification based on structural MRI imaging. A multitask deep learning model and a densely connected 3D convolutional network were utilized. Additionally, a systematic meta-analysis explored the value of MRI-based hippocampal volume measurement in predicting AD occurrence and progression, drawing on 23 eligible articles from PubMed and Embase databases. RESULTS InceptionResNetv2 outperformed other networks, achieving 99.75% accuracy and 100% AUC for AD-NC classification and 99.16% accuracy and 100% AUC for MCI-NC classification. Notably, at a 512×512 size, InceptionResNetv2 demonstrated a classification accuracy of 94.29% and an AUC of 98% for AD-NC and 97.31% accuracy and 98% AUC for MCI-NC. CONCLUSIONS The study concludes that MRI-based hippocampal volume changes effectively predict AD onset and progression, facilitating early intervention and prevention.
Collapse
Affiliation(s)
- Jianguo Zhou
- Department of Radiology, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Mingli Zhao
- Department of Radiology, The Fourth People's Hospital of Lianyungang Affiliated to Nanjing Medical University Kangda, Lianyungang, China
| | - Zhou Yang
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Liping Chen
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Xiaoli Liu
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| |
Collapse
|
27
|
Liu F, Zhu T, Wu X, Yang B, You C, Wang C, Lu L, Liu Z, Zheng Y, Sun X, Yang Y, Clifton L, Clifton DA. A medical multimodal large language model for future pandemics. NPJ Digit Med 2023; 6:226. [PMID: 38042919 PMCID: PMC10693607 DOI: 10.1038/s41746-023-00952-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/24/2023] [Indexed: 12/04/2023] Open
Abstract
Deep neural networks have been integrated into the whole clinical decision procedure which can improve the efficiency of diagnosis and alleviate the heavy workload of physicians. Since most neural networks are supervised, their performance heavily depends on the volume and quality of available labels. However, few such labels exist for rare diseases (e.g., new pandemics). Here we report a medical multimodal large language model (Med-MLLM) for radiograph representation learning, which can learn broad medical knowledge (e.g., image understanding, text semantics, and clinical phenotypes) from unlabelled data. As a result, when encountering a rare disease, our Med-MLLM can be rapidly deployed and easily adapted to them with limited labels. Furthermore, our model supports medical data across visual modality (e.g., chest X-ray and CT) and textual modality (e.g., medical report and free-text clinical note); therefore, it can be used for clinical tasks that involve both visual and textual data. We demonstrate the effectiveness of our Med-MLLM by showing how it would perform using the COVID-19 pandemic "in replay". In the retrospective setting, we test the model on the early COVID-19 datasets; and in the prospective setting, we test the model on the new variant COVID-19-Omicron. The experiments are conducted on 1) three kinds of input data; 2) three kinds of downstream tasks, including disease reporting, diagnosis, and prognosis; 3) five COVID-19 datasets; and 4) three different languages, including English, Chinese, and Spanish. All experiments show that our model can make accurate and robust COVID-19 decision-support with little labelled data.
Collapse
Affiliation(s)
- Fenglin Liu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Tingting Zhu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Xian Wu
- Jarvis Research Center, Tencent YouTu Lab, Beijing, China
| | - Bang Yang
- School of Computer Science, Peking University, Beijing, China
| | | | - Chenyang Wang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Lei Lu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Zhangdaihong Liu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Yefeng Zheng
- Jarvis Research Center, Tencent YouTu Lab, Beijing, China
| | - Xu Sun
- School of Computer Science, Peking University, Beijing, China
| | - Yang Yang
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lei Clifton
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China.
| |
Collapse
|
28
|
Lin G, Zhang Z, Long K, Zhang Y, Lu Y, Geng J, Zhou Z, Feng Q, Lu L, Cao L. GCLR: A self-supervised representation learning pretext task for glomerular filtration barrier segmentation in TEM images. Artif Intell Med 2023; 146:102720. [PMID: 38042604 DOI: 10.1016/j.artmed.2023.102720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 10/04/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023]
Abstract
Automatic segmentation of the three substructures of glomerular filtration barrier (GFB) in transmission electron microscopy (TEM) images holds immense potential for aiding pathologists in renal disease diagnosis. However, the labor-intensive nature of manual annotations limits the training data for a fully-supervised deep learning model. Addressing this, our study harnesses self-supervised representation learning (SSRL) to utilize vast unlabeled data and mitigate annotation scarcity. Our innovation, GCLR, is a hybrid pixel-level pretext task tailored for GFB segmentation, integrating two subtasks: global clustering (GC) and local restoration (LR). GC captures the overall GFB by learning global context representations, while LR refines three substructures by learning local detail representations. Experiments on 18,928 unlabeled glomerular TEM images for self-supervised pre-training and 311 labeled ones for fine-tuning demonstrate that our proposed GCLR obtains the state-of-the-art segmentation results for all three substructures of GFB with the Dice similarity coefficient of 86.56 ± 0.16%, 75.56 ± 0.36%, and 79.41 ± 0.16%, respectively, compared with other representative self-supervised pretext tasks. Our proposed GCLR also outperforms the fully-supervised pre-training methods based on the three large-scale public datasets - MitoEM, COCO, and ImageNet - with less training data and time.
Collapse
Affiliation(s)
- Guoyu Lin
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Zhentai Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Kaixing Long
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yiwen Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yanmeng Lu
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Jian Geng
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, 510515, China; Guangzhou Huayin Medical Laboratory Center, Guangzhou, 510515, China
| | - Zhitao Zhou
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Lijun Lu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| | - Lei Cao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| |
Collapse
|
29
|
Taher MRH, Gotway MB, Liang J. Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-supervision. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER : 5TH MICCAI WORKSHOP, DART 2023, HELD IN CONJUNCTION WITH MICCAI 2023, VANCOUVER, BC, CANADA, OCTOBER 12, 2023, PROCEEDINGS. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (WORKSHOP) (5TH : ... 2023; 14293:94-104. [PMID: 38752223 PMCID: PMC11095552 DOI: 10.1007/978-3-031-45857-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Human anatomy is the foundation of medical imaging and boasts one striking characteristic: its hierarchy in nature, exhibiting two intrinsic properties: (1) locality: each anatomical structure is morphologically distinct from the others; and (2) compositionality: each anatomical structure is an integrated part of a larger whole. We envision a foundation model for medical imaging that is consciously and purposefully developed upon this foundation to gain the capability of "understanding" human anatomy and to possess the fundamental properties of medical imaging. As our first step in realizing this vision towards foundation models in medical imaging, we devise a novel self-supervised learning (SSL) strategy that exploits the hierarchical nature of human anatomy. Our extensive experiments demonstrate that the SSL pretrained model, derived from our training strategy, not only outperforms state-of-the-art (SOTA) fully/self-supervised baselines but also enhances annotation efficiency, offering potential few-shot segmentation capabilities with performance improvements ranging from 9% to 30% for segmentation tasks compared to SSL baselines. This performance is attributed to the significance of anatomy comprehension via our learning strategy, which encapsulates the intrinsic attributes of anatomical structures-locality and compositionality-within the embedding space, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.
Collapse
|
30
|
Kazemimoghadam M, Yang Z, Chen M, Ma L, Lu W, Gu X. Leveraging global binary masks for structure segmentation in medical images. Phys Med Biol 2023; 68:10.1088/1361-6560/acf2e2. [PMID: 37607564 PMCID: PMC10511220 DOI: 10.1088/1361-6560/acf2e2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 08/22/2023] [Indexed: 08/24/2023]
Abstract
Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. Here, we proposed to leverage the consistency of organs' anatomical position and shape information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied: (1) global binary masks were the only input for the U-Net based model, forcing exclusively encoding organs' position and shape information for rough segmentation or localization. (2) Global binary masks were incorporated as an additional channel providing position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart computed tomography (CT) images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. The two scenarios were evaluated using full training split as well as reduced subsets of training data. In scenario (1), training exclusively on global binary masks led to Dice scores of 0.77 ± 0.06 and 0.85 ± 0.04 for the brain and heart structures respectively. Average Euclidian distance of 3.12 ± 1.43 mm and 2.5 ± 0.93 mm were obtained relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicated encoding a surprising degree of position and shape information through global binary masks. In scenario (2), incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3%-125.3% and 1.3%-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building models that are robust to image intensity variations as well as an effective approach to boost performance when access to labeled training data is highly limited.
Collapse
Affiliation(s)
- Mahdieh Kazemimoghadam
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
| | - Zi Yang
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
| | - Mingli Chen
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
| | - Lin Ma
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
| | - Weiguo Lu
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
| | - Xuejun Gu
- Department of Radiation Oncology, the University of Texas Southwestern Medical Center, Dallas TX, 75390 USA
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305
| |
Collapse
|
31
|
Chen Y, Lu X, Xie Q. Collaborative networks of transformers and convolutional neural networks are powerful and versatile learners for accurate 3D medical image segmentation. Comput Biol Med 2023; 164:107228. [PMID: 37473563 DOI: 10.1016/j.compbiomed.2023.107228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 06/13/2023] [Accepted: 07/01/2023] [Indexed: 07/22/2023]
Abstract
Integrating transformers and convolutional neural networks represents a crucial and cutting-edge approach for tackling medical image segmentation problems. Nonetheless, the existing hybrid methods fail to fully leverage the strengths of both operators. During the Patch Embedding, the patch projection method ignores the two-dimensional structure and local spatial information within each patch, while the fixed patch size cannot capture features with rich representation effectively. Moreover, the calculation of self-attention results in attention diffusion, hindering the provision of precise details to the decoder while maintaining feature consistency. Lastly, none of the existing methods establish an efficient multi-scale modeling concept. To address these issues, we design the Collaborative Networks of Transformers and Convolutional neural networks (TC-CoNet), which is generally used for accurate 3D medical image segmentation. First, we elaborately design precise patch embedding to generate 3D features with accurate spatial position information, laying a solid foundation for subsequent learning. The encoder-decoder backbone network is then constructed by TC-CoNet in an interlaced combination to properly incorporate long-range dependencies and hierarchical object concepts at various scales. Furthermore, we employ the constricted attention bridge to constrict attention to local features, allowing us to accurately guide the recovery of detailed information while maintaining feature consistency. Finally, atrous spatial pyramid pooling is applied to high-level feature map to establish the concept of multi-scale objects. On five challenging datasets, including Synapse, ACDC, brain tumor segmentation, cardiac left atrium segmentation, and lung tumor segmentation, the extensive experiments demonstrate that TC-CoNet outperforms state-of-the-art approaches in terms of superiority, migration, and strong generalization. These illustrate in full the efficacy of the proposed transformers and convolutional neural networks combination for medical image segmentation. Our code is freely available at: https://github.com/YongChen-Exact/TC-CoNet.
Collapse
Affiliation(s)
- Yong Chen
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, Hubei, China
| | - Xuesong Lu
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, Hubei, China
| | - Qinlan Xie
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, Hubei, China.
| |
Collapse
|
32
|
Huang SC, Pareek A, Jensen M, Lungren MP, Yeung S, Chaudhari AS. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digit Med 2023; 6:74. [PMID: 37100953 PMCID: PMC10131505 DOI: 10.1038/s41746-023-00811-0] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 03/30/2023] [Indexed: 04/28/2023] Open
Abstract
Advancements in deep learning and computer vision provide promising solutions for medical image analysis, potentially improving healthcare and patient outcomes. However, the prevailing paradigm of training deep learning models requires large quantities of labeled training data, which is both time-consuming and cost-prohibitive to curate for medical images. Self-supervised learning has the potential to make significant contributions to the development of robust medical imaging models through its ability to learn useful insights from copious medical datasets without labels. In this review, we provide consistent descriptions of different self-supervised learning strategies and compose a systematic review of papers published between 2012 and 2022 on PubMed, Scopus, and ArXiv that applied self-supervised learning to medical imaging classification. We screened a total of 412 relevant studies and included 79 papers for data extraction and analysis. With this comprehensive effort, we synthesize the collective knowledge of prior work and provide implementation guidelines for future researchers interested in applying self-supervised learning to their development of medical imaging classification models.
Collapse
Affiliation(s)
- Shih-Cheng Huang
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA.
| | - Anuj Pareek
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
| | - Malte Jensen
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Matthew P Lungren
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Serena Yeung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Akshay S Chaudhari
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
33
|
Yang Z, Xie L, Zhou W, Huo X, Wei L, Lu J, Tian Q, Tang S. VoxSeP: semi-positive voxels assist self-supervised 3D medical segmentation. MULTIMEDIA SYSTEMS 2023; 29:33-48. [DOI: 10.1007/s00530-022-00977-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 06/28/2022] [Indexed: 01/23/2025]
|
34
|
Bilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, Szeskin A, Jacobs C, Mamani GEH, Chartrand G, Lohöfer F, Holch JW, Sommer W, Hofmann F, Hostettler A, Lev-Cohain N, Drozdzal M, Amitai MM, Vivanti R, Sosna J, Ezhov I, Sekuboyina A, Navarro F, Kofler F, Paetzold JC, Shit S, Hu X, Lipková J, Rempfler M, Piraud M, Kirschke J, Wiestler B, Zhang Z, Hülsemeyer C, Beetz M, Ettlinger F, Antonelli M, Bae W, Bellver M, Bi L, Chen H, Chlebus G, Dam EB, Dou Q, Fu CW, Georgescu B, Giró-I-Nieto X, Gruen F, Han X, Heng PA, Hesser J, Moltz JH, Igel C, Isensee F, Jäger P, Jia F, Kaluva KC, Khened M, Kim I, Kim JH, Kim S, Kohl S, Konopczynski T, Kori A, Krishnamurthi G, Li F, Li H, Li J, Li X, Lowengrub J, Ma J, Maier-Hein K, Maninis KK, Meine H, Merhof D, Pai A, Perslev M, Petersen J, Pont-Tuset J, Qi J, Qi X, Rippel O, Roth K, Sarasua I, Schenk A, Shen Z, Torres J, Wachinger C, Wang C, Weninger L, Wu J, Xu D, Yang X, Yu SCH, Yuan Y, Yue M, Zhang L, Cardoso J, Bakas S, Braren R, et alBilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, Szeskin A, Jacobs C, Mamani GEH, Chartrand G, Lohöfer F, Holch JW, Sommer W, Hofmann F, Hostettler A, Lev-Cohain N, Drozdzal M, Amitai MM, Vivanti R, Sosna J, Ezhov I, Sekuboyina A, Navarro F, Kofler F, Paetzold JC, Shit S, Hu X, Lipková J, Rempfler M, Piraud M, Kirschke J, Wiestler B, Zhang Z, Hülsemeyer C, Beetz M, Ettlinger F, Antonelli M, Bae W, Bellver M, Bi L, Chen H, Chlebus G, Dam EB, Dou Q, Fu CW, Georgescu B, Giró-I-Nieto X, Gruen F, Han X, Heng PA, Hesser J, Moltz JH, Igel C, Isensee F, Jäger P, Jia F, Kaluva KC, Khened M, Kim I, Kim JH, Kim S, Kohl S, Konopczynski T, Kori A, Krishnamurthi G, Li F, Li H, Li J, Li X, Lowengrub J, Ma J, Maier-Hein K, Maninis KK, Meine H, Merhof D, Pai A, Perslev M, Petersen J, Pont-Tuset J, Qi J, Qi X, Rippel O, Roth K, Sarasua I, Schenk A, Shen Z, Torres J, Wachinger C, Wang C, Weninger L, Wu J, Xu D, Yang X, Yu SCH, Yuan Y, Yue M, Zhang L, Cardoso J, Bakas S, Braren R, Heinemann V, Pal C, Tang A, Kadoury S, Soler L, van Ginneken B, Greenspan H, Joskowicz L, Menze B. The Liver Tumor Segmentation Benchmark (LiTS). Med Image Anal 2023; 84:102680. [PMID: 36481607 PMCID: PMC10631490 DOI: 10.1016/j.media.2022.102680] [Show More Authors] [Citation(s) in RCA: 175] [Impact Index Per Article: 87.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 09/27/2022] [Accepted: 10/29/2022] [Indexed: 11/18/2022]
Abstract
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in http://medicaldecathlon.com/. In addition, both data and online evaluation are accessible via https://competitions.codalab.org/competitions/17094.
Collapse
Affiliation(s)
- Patrick Bilic
- Department of Informatics, Technical University of Munich, Germany
| | - Patrick Christ
- Department of Informatics, Technical University of Munich, Germany
| | - Hongwei Bran Li
- Department of Informatics, Technical University of Munich, Germany; Department of Quantitative Biomedicine, University of Zurich, Switzerland.
| | | | - Avi Ben-Cohen
- Department of Biomedical Engineering, Tel-Aviv University, Israel
| | - Georgios Kaissis
- Institute for AI in Medicine, Technical University of Munich, Germany; Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany; Department of Computing, Imperial College London, London, United Kingdom
| | - Adi Szeskin
- School of Computer Science and Engineering, the Hebrew University of Jerusalem, Israel
| | - Colin Jacobs
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Gabriel Chartrand
- The University of Montréal Hospital Research Centre (CRCHUM) Montréal, Québec, Canada
| | - Fabian Lohöfer
- Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany
| | - Julian Walter Holch
- Department of Medicine III, University Hospital, LMU Munich, Munich, Germany; Comprehensive Cancer Center Munich, Munich, Germany; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Wieland Sommer
- Department of Radiology, University Hospital, LMU Munich, Germany
| | - Felix Hofmann
- Department of General, Visceral and Transplantation Surgery, University Hospital, LMU Munich, Germany; Department of Radiology, University Hospital, LMU Munich, Germany
| | - Alexandre Hostettler
- Department of Surgical Data Science, Institut de Recherche contre les Cancers de l'Appareil Digestif (IRCAD), France
| | - Naama Lev-Cohain
- Department of Radiology, Hadassah University Medical Center, Jerusalem, Israel
| | | | | | | | - Jacob Sosna
- Department of Radiology, Hadassah University Medical Center, Jerusalem, Israel
| | - Ivan Ezhov
- Department of Informatics, Technical University of Munich, Germany
| | - Anjany Sekuboyina
- Department of Informatics, Technical University of Munich, Germany; Department of Quantitative Biomedicine, University of Zurich, Switzerland
| | - Fernando Navarro
- Department of Informatics, Technical University of Munich, Germany; Department of Radiation Oncology and Radiotherapy, Klinikum rechts der Isar, Technical University of Munich, Germany; TranslaTUM - Central Institute for Translational Cancer Research, Technical University of Munich, Germany
| | - Florian Kofler
- Department of Informatics, Technical University of Munich, Germany; Institute for diagnostic and interventional neuroradiology, Klinikum rechts der Isar,Technical University of Munich, Germany; Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany; TranslaTUM - Central Institute for Translational Cancer Research, Technical University of Munich, Germany
| | - Johannes C Paetzold
- Department of Computing, Imperial College London, London, United Kingdom; Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany
| | - Suprosanna Shit
- Department of Informatics, Technical University of Munich, Germany
| | - Xiaobin Hu
- Department of Informatics, Technical University of Munich, Germany
| | - Jana Lipková
- Brigham and Women's Hospital, Harvard Medical School, USA
| | - Markus Rempfler
- Department of Informatics, Technical University of Munich, Germany
| | - Marie Piraud
- Department of Informatics, Technical University of Munich, Germany; Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Jan Kirschke
- Institute for diagnostic and interventional neuroradiology, Klinikum rechts der Isar,Technical University of Munich, Germany
| | - Benedikt Wiestler
- Institute for diagnostic and interventional neuroradiology, Klinikum rechts der Isar,Technical University of Munich, Germany
| | - Zhiheng Zhang
- Department of Hepatobiliary Surgery, the Affiliated Drum Tower Hospital of Nanjing University Medical School, China
| | | | - Marcel Beetz
- Department of Informatics, Technical University of Munich, Germany
| | | | - Michela Antonelli
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | | | | | - Lei Bi
- School of Computer Science, the University of Sydney, Australia
| | - Hao Chen
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, China
| | - Grzegorz Chlebus
- Fraunhofer MEVIS, Bremen, Germany; Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Erik B Dam
- Department of Computer Science, University of Copenhagen, Denmark
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Chi-Wing Fu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Xavier Giró-I-Nieto
- Signal Theory and Communications Department, Universitat Politecnica de Catalunya, Catalonia, Spain
| | - Felix Gruen
- Institute of Control Engineering, Technische Universität Braunschweig, Germany
| | - Xu Han
- Department of computer science, UNC Chapel Hill, USA
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Jürgen Hesser
- Mannheim Institute for Intelligent Systems in Medicine, department of Medicine Mannheim, Heidelberg University, Germany; Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany; Central Institute for Computer Engineering (ZITI), Heidelberg University, Germany
| | | | - Christian Igel
- Department of Computer Science, University of Copenhagen, Denmark
| | - Fabian Isensee
- Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany; Helmholtz Imaging, Germany
| | - Paul Jäger
- Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany; Helmholtz Imaging, Germany
| | - Fucang Jia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
| | - Krishna Chaitanya Kaluva
- Medical Imaging and Reconstruction Lab, Department of Engineering Design, Indian Institute of Technology Madras, India
| | - Mahendra Khened
- Medical Imaging and Reconstruction Lab, Department of Engineering Design, Indian Institute of Technology Madras, India
| | | | - Jae-Hun Kim
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, South Korea
| | | | - Simon Kohl
- Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tomasz Konopczynski
- Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany
| | - Avinash Kori
- Medical Imaging and Reconstruction Lab, Department of Engineering Design, Indian Institute of Technology Madras, India
| | - Ganapathy Krishnamurthi
- Medical Imaging and Reconstruction Lab, Department of Engineering Design, Indian Institute of Technology Madras, India
| | - Fan Li
- Sensetime, Shanghai, China
| | - Hongchao Li
- Department of Computer Science, Guangdong University of Foreign Studies, China
| | - Junbo Li
- Philips Research China, Philips China Innovation Campus, Shanghai, China
| | - Xiaomeng Li
- Department of Electrical and Electronic Engineering, The University of Hong Kong, China
| | - John Lowengrub
- Departments of Mathematics, Biomedical Engineering, University of California, Irvine, USA; Center for Complex Biological Systems, University of California, Irvine, USA; Chao Family Comprehensive Cancer Center, University of California, Irvine, USA
| | - Jun Ma
- Department of Mathematics, Nanjing University of Science and Technology, China
| | - Klaus Maier-Hein
- Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany; Helmholtz Imaging, Germany
| | | | - Hans Meine
- Fraunhofer MEVIS, Bremen, Germany; Medical Image Computing Group, FB3, University of Bremen, Germany
| | - Dorit Merhof
- Institute of Imaging & Computer Vision, RWTH Aachen University, Germany
| | - Akshay Pai
- Department of Computer Science, University of Copenhagen, Denmark
| | - Mathias Perslev
- Department of Computer Science, University of Copenhagen, Denmark
| | - Jens Petersen
- Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jordi Pont-Tuset
- Eidgenössische Technische Hochschule Zurich (ETHZ), Zurich, Switzerland
| | - Jin Qi
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, China
| | - Xiaojuan Qi
- Department of Electrical and Electronic Engineering, The University of Hong Kong, China
| | - Oliver Rippel
- Institute of Imaging & Computer Vision, RWTH Aachen University, Germany
| | | | - Ignacio Sarasua
- Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany; Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany
| | - Andrea Schenk
- Fraunhofer MEVIS, Bremen, Germany; Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany
| | - Zengming Shen
- Beckman Institute, University of Illinois at Urbana-Champaign, USA; Siemens Healthineers, USA
| | - Jordi Torres
- Barcelona Supercomputing Center, Barcelona, Spain; Universitat Politecnica de Catalunya, Catalonia, Spain
| | - Christian Wachinger
- Department of Informatics, Technical University of Munich, Germany; Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany; Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany
| | - Chunliang Wang
- Department of Biomedical Engineering and Health Systems, KTH Royal Institute of Technology, Sweden
| | - Leon Weninger
- Institute of Imaging & Computer Vision, RWTH Aachen University, Germany
| | - Jianrong Wu
- Tencent Healthcare (Shenzhen) Co., Ltd, China
| | | | - Xiaoping Yang
- Department of Mathematics, Nanjing University, China
| | - Simon Chun-Ho Yu
- Department of Imaging and Interventional Radiology, Chinese University of Hong Kong, Hong Kong, China
| | - Yading Yuan
- Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, NY, USA
| | - Miao Yue
- CGG Services (Singapore) Pte. Ltd., Singapore
| | - Liping Zhang
- Department of Imaging and Interventional Radiology, Chinese University of Hong Kong, Hong Kong, China
| | - Jorge Cardoso
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | - Spyridon Bakas
- Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, PA, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA, USA
| | - Rickmer Braren
- German Cancer Consortium (DKTK), Germany; Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany; Comprehensive Cancer Center Munich, Munich, Germany
| | - Volker Heinemann
- Department of Hematology/Oncology & Comprehensive Cancer Center Munich, LMU Klinikum Munich, Germany
| | | | - An Tang
- Department of Radiology, Radiation Oncology and Nuclear Medicine, University of Montréal, Canada
| | | | - Luc Soler
- Department of Surgical Data Science, Institut de Recherche contre les Cancers de l'Appareil Digestif (IRCAD), France
| | - Bram van Ginneken
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Hayit Greenspan
- Department of Biomedical Engineering, Tel-Aviv University, Israel
| | - Leo Joskowicz
- School of Computer Science and Engineering, the Hebrew University of Jerusalem, Israel
| | - Bjoern Menze
- Department of Informatics, Technical University of Munich, Germany; Department of Quantitative Biomedicine, University of Zurich, Switzerland
| |
Collapse
|
35
|
Zhou HY, Lu C, Wang L, Yu Y. GraVIS: Grouping Augmented Views From Independent Sources for Dermatology Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:3498-3508. [PMID: 36260573 DOI: 10.1109/tmi.2022.3216005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Self-supervised representation learning has been extremely successful in medical image analysis, as it requires no human annotations to provide transferable representations for downstream tasks. Recent self-supervised learning methods are dominated by noise-contrastive estimation (NCE, also known as contrastive learning), which aims to learn invariant visual representations by contrasting one homogeneous image pair with a large number of heterogeneous image pairs in each training step. Nonetheless, NCE-based approaches still suffer from one major problem that is one homogeneous pair is not enough to extract robust and invariant semantic information. Inspired by the archetypical triplet loss, we propose GraVIS, which is specifically optimized for learning self-supervised features from dermatology images, to group homogeneous dermatology images while separating heterogeneous ones. In addition, a hardness-aware attention is introduced and incorporated to address the importance of homogeneous image views with similar appearance instead of those dissimilar homogeneous ones. GraVIS significantly outperforms its transfer learning and self-supervised learning counterparts in both lesion segmentation and disease classification tasks, sometimes by 5 percents under extremely limited supervision. More importantly, when equipped with the pre-trained weights provided by GraVIS, a single model could achieve better results than winners that heavily rely on ensemble strategies in the well-known ISIC 2017 challenge. Code is available at https://bit.ly/3xiFyjx.
Collapse
|
36
|
Pang J, Haghighi F, Ma D, Islam NU, Taher MRH, Gotway MB, Liang J. POPAR: Patch Order Prediction and Appearance Recovery for Self-supervised Medical Image Analysis. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER : 4TH MICCAI WORKSHOP, DART 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (WORKSHOP) (4TH : 2022 : SIN... 2022; 13542:77-87. [PMID: 36507898 PMCID: PMC9728135 DOI: 10.1007/978-3-031-16852-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Vision transformer-based self-supervised learning (SSL) approaches have recently shown substantial success in learning visual representations from unannotated photographic images. However, their acceptance in medical imaging is still lukewarm, due to the significant discrepancy between medical and photographic images. Consequently, we propose POPAR (patch order prediction and appearance recovery), a novel vision transformer-based self-supervised learning framework for chest X-ray images. POPAR leverages the benefits of vision transformers and unique properties of medical imaging, aiming to simultaneously learn patch-wise high-level contextual features by correcting shuffled patch orders and fine-grained features by recovering patch appearance. We transfer POPAR pretrained models to diverse downstream tasks. The experiment results suggest that (1) POPAR outperforms state-of-the-art (SoTA) self-supervised models with vision transformer backbone; (2) POPAR achieves significantly better performance over all three SoTA contrastive learning methods; and (3) POPAR also outperforms fully-supervised pretrained models across architectures. In addition, our ablation study suggests that to achieve better performance on medical imaging tasks, both fine-grained and global contextual features are preferred. All code and models are available at GitHub.com/JLiangLab/POPAR.
Collapse
Affiliation(s)
| | | | - DongAo Ma
- Arizona State University, Tempe, AZ 85281, USA
| | | | | | | | | |
Collapse
|
37
|
Ma D, Taher MRH, Pang J, Islam NU, Haghighi F, Gotway MB, Liang J. Benchmarking and Boosting Transformers for Medical Image Classification. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER : 4TH MICCAI WORKSHOP, DART 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (WORKSHOP) (4TH : 2022 : SIN... 2022; 13542:12-22. [PMID: 36383492 PMCID: PMC9646404 DOI: 10.1007/978-3-031-16852-9_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.
Collapse
Affiliation(s)
- DongAo Ma
- Arizona State University, Tempe, AZ 85281, USA
| | | | | | | | | | | | | |
Collapse
|
38
|
Guo Z, Islam NU, Gotway MB, Liang J. Discriminative, Restorative, and Adversarial Learning: Stepwise Incremental Pretraining. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER : 4TH MICCAI WORKSHOP, DART 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (WORKSHOP) (4TH : 2022 : SIN... 2022; 13542:66-76. [PMID: 36507899 PMCID: PMC9728134 DOI: 10.1007/978-3-031-16852-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Uniting three self-supervised learning (SSL) ingredients (discriminative, restorative, and adversarial learning) enables collaborative representation learning and yields three transferable components: a discriminative encoder, a restorative decoder, and an adversary encoder. To leverage this advantage, we have redesigned five prominent SSL methods, including Rotation, Jigsaw, Rubik's Cube, Deep Clustering, and TransVW, and formulated each in a United framework for 3D medical imaging. However, such a United framework increases model complexity and pretraining difficulty. To overcome this difficulty, we develop a stepwise incremental pretraining strategy, in which a discriminative encoder is first trained via discriminative learning, the pretrained discriminative encoder is then attached to a restorative decoder, forming a skip-connected encoder-decoder, for further joint discriminative and restorative learning, and finally, the pretrained encoder-decoder is associated with an adversarial encoder for final full discriminative, restorative, and adversarial learning. Our extensive experiments demonstrate that the stepwise incremental pretraining stabilizes United models training, resulting in significant performance gains and annotation cost reduction via transfer learning for five target tasks, encompassing both classification and segmentation, across diseases, organs, datasets, and modalities. This performance is attributed to the synergy of the three SSL ingredients in our United framework unleashed via stepwise incremental pretraining. All codes and pretrained models are available at GitHub.com/JLiangLab/StepwisePretraining.
Collapse
Affiliation(s)
- Zuwei Guo
- Arizona State University, Tempe, AZ 85281, USA
| | | | | | | |
Collapse
|
39
|
Taher MRH, Haghighi F, Gotway MB, Liang J. CAiD: Context-Aware Instance Discrimination for Self-supervised Learning in Medical Imaging. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022; 172:535-551. [PMID: 36579134 PMCID: PMC9793869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Recently, self-supervised instance discrimination methods have achieved significant success in learning visual representations from unlabeled photographic images. However, given the marked differences between photographic and medical images, the efficacy of instance-based objectives, focusing on learning the most discriminative global features in the image (i.e., wheels in bicycle), remains unknown in medical imaging. Our preliminary analysis showed that high global similarity of medical images in terms of anatomy hampers instance discrimination methods for capturing a set of distinct features, negatively impacting their performance on medical downstream tasks. To alleviate this limitation, we have developed a simple yet effective self-supervised framework, called Context-Aware instance Discrimination (CAiD). CAiD aims to improve instance discrimination learning by providing finer and more discriminative information encoded from a diverse local context of unlabeled medical images. We conduct a systematic analysis to investigate the utility of the learned features from a three-pronged perspective: (i) generalizability and transferability, (ii) separability in the embedding space, and (iii) reusability. Our extensive experiments demonstrate that CAiD (1) enriches representations learned from existing instance discrimination methods; (2) delivers more discriminative features by adequately capturing finer contextual information from individual medial images; and (3) improves reusability of low/mid-level features compared to standard instance discriminative methods. As open science, all codes and pre-trained models are available on our GitHub page: https://github.com/JLiangLab/CAiD.
Collapse
|
40
|
Haghighi F, Taher MRH, Gotway MB, Liang J. DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2022; 2022:20792-20802. [PMID: 36313959 PMCID: PMC9615927 DOI: 10.1109/cvpr52688.2022.02016] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Discriminative learning, restorative learning, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; and (4) enhances state-of-the-art restorative approaches, revealing that DiRA is a general mechanism for united representation learning. All code and pretrained models are available at https://github.com/JLiangLab/DiRA.
Collapse
|
41
|
Torrent TT, Matos EEDS, Belcavello F, Viridiano M, Gamonal MA, da Costa AD, Marim MC. Representing Context in FrameNet: A Multidimensional, Multimodal Approach. Front Psychol 2022; 13:838441. [PMID: 35444591 PMCID: PMC9014903 DOI: 10.3389/fpsyg.2022.838441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 01/31/2022] [Indexed: 11/13/2022] Open
Abstract
Frame Semantics includes context as a central aspect of the theory. Frames themselves can be regarded as a representation of the immediate context against which meaning is to be construed. Moreover, the notion of frame invocation includes context as one possible source of information comprehenders use to construe meaning. As the original implementation of Frame Semantics, Berkeley FrameNet is capable of providing computational representations of some aspects of context, but not all of them. In this article, we present FrameNet Brasil: a framenet enriched with qualia relations and capable of taking other semiotic modes as input data, namely pictures and videos. We claim that such an enriched model is capable of addressing other types of contextual information in a framenet, namely sentence-level cotext and commonsense knowledge. We demonstrate how the FrameNet Brasil software infrastructure addresses contextual information in both database construction and corpora annotation. We present the guidelines for the construction of two multimodal datasets whose annotations represent contextual information and also report on two experiments: (i) the identification of frame-evoking lexical units in sentences and (ii) a methodology for domain adaptation in Neural Machine Translation that leverages frames and qualia for representing sentence-level context. Experimental results emphasize the importance of computationally representing contextual information in a principled structured fashion as opposed to trying to derive it from the manipulation of linguistic form alone.
Collapse
Affiliation(s)
- Tiago Timponi Torrent
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| | - Ely Edison da Silva Matos
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| | - Frederico Belcavello
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| | - Marcelo Viridiano
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| | - Maucha Andrade Gamonal
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil.,Laboratório Experimental de Tradução, Graduate Program in Linguistics, Faculty of Letters, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Alexandre Diniz da Costa
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| | - Mateus Coutinho Marim
- FrameNet Brasil, Graduate Program in Linguistics, Faculty of Letters, Federal University of Juiz de Fora, Juiz de Fora, Brazil
| |
Collapse
|
42
|
Liu Z, Lu H, Pan X, Xu M, Lan R, Luo X. Diagnosis of Alzheimer’s disease via an attention-based multi-scale convolutional neural network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107942] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
43
|
Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-021-00425-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
44
|
Tajbakhsh N, Roth H, Terzopoulos D, Liang J. Guest Editorial Annotation-Efficient Deep Learning: The Holy Grail of Medical Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:2526-2533. [PMID: 34795461 PMCID: PMC8594751 DOI: 10.1109/tmi.2021.3089292] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Affiliation(s)
| | | | - Demetri Terzopoulos
- University of California, Los Angeles, and VoxelCloud, Inc., Los Angeles, CA, USA
| | | |
Collapse
|
45
|
Hosseinzadeh Taher MR, Haghighi F, Feng R, Gotway MB, Liang J. A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER, AND AFFORDABLE HEALTHCARE AND AI FOR RESOURCE DIVERSE GLOBAL HEALTH : THIRD MICCAI WORKSHOP, DART 2021 AND FIRST MICCAI WORKSHOP, FAIR 2021 : HELD IN CONJUNCTION WITH MICCAI 2021 : STRASBOU... 2021; 12968:3-13. [PMID: 35713581 PMCID: PMC9197759 DOI: 10.1007/978-3-030-87722-4_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Transfer learning from supervised ImageNet models has been frequently used in medical image analysis. Yet, no large-scale evaluation has been conducted to benchmark the efficacy of newly-developed pre-training techniques for medical image analysis, leaving several important questions unanswered. As the first step in this direction, we conduct a systematic study on the transferability of models pre-trained on iNat2021, the most recent large-scale fine-grained dataset, and 14 top self-supervised ImageNet models on 7 diverse medical tasks in comparison with the supervised ImageNet model. Furthermore, we present a practical approach to bridge the domain gap between natural and medical images by continually (pre-)training supervised ImageNet models on medical images. Our comprehensive evaluation yields new insights: (1) pre-trained models on fine-grained data yield distinctive local representations that are more suitable for medical segmentation tasks, (2) self-supervised ImageNet models learn holistic features more effectively than supervised ImageNet models, and (3) continual pre-training can bridge the domain gap between natural and medical images. We hope that this large-scale open evaluation of transfer learning can direct the future research of deep learning for medical imaging. As open science, all codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransferLearning.
Collapse
Affiliation(s)
| | | | - Ruibin Feng
- Stanford University, Stanford, California 94305, USA
| | | | | |
Collapse
|
46
|
Islam NU, Gehlot S, Zhou Z, Gotway MB, Liang J. Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection. MACHINE LEARNING IN MEDICAL IMAGING. MLMI (WORKSHOP) 2021; 12966:692-702. [PMID: 35695860 PMCID: PMC9184235 DOI: 10.1007/978-3-030-87589-3_71] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pulmonary embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients, death. This disorder is commonly diagnosed using CT pulmonary angiography (CTPA). Deep learning holds great promise for the computer-aided CTPA diagnosis (CAD) of PE. However, numerous competing methods for a given task in the deep learning literature exist, causing great confusion regarding the development of a CAD PE system. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis using CTPA at the both image and exam levels. At the image level, we compare convolutional neural networks (CNNs) with vision transformers, and contrast self-supervised learning (SSL) with supervised learning, followed by an evaluation of transfer learning compared with training from scratch. At the exam level, we focus on comparing conventional classification (CC) with multiple instance learning (MIL). Our extensive experiments consistently show: (1) transfer learning consistently boosts performance despite differences between natural images and CT scans, (2) transfer learning with SSL surpasses its supervised counterparts; (3) CNNs outperform vision transformers, which otherwise show satisfactory performance; and (4) CC is, surprisingly, superior to MIL. Compared with the state of the art, our optimal approach provides an AUC gain of 0.2% and 1.05% for image-level and exam-level, respectively.
Collapse
Affiliation(s)
| | - Shiv Gehlot
- Arizona State University, Tempe, AZ 85281, USA
| | | | | | | |
Collapse
|