1
|
Adamson PM, Desai AD, Dominic J, Varma M, Bluethgen C, Wood JP, Syed AB, Boutin RD, Stevens KJ, Vasanawala S, Pauly JM, Gunel B, Chaudhari AS. Using deep feature distances for evaluating the perceptual quality of MR image reconstructions. Magn Reson Med 2025; 94:317-330. [PMID: 39921580 PMCID: PMC12021552 DOI: 10.1002/mrm.30437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 12/09/2024] [Accepted: 01/04/2025] [Indexed: 02/10/2025]
Abstract
PURPOSE Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation. METHODS We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise. RESULTS All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs. CONCLUSION A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.
Collapse
Affiliation(s)
- Philip M. Adamson
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Arjun D. Desai
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Jeffrey Dominic
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, California, USA
| | | | - Jeff P. Wood
- Austin Radiological Association, Austin, Texas, USA
| | - Ali B. Syed
- Department of Radiology, Stanford University, Stanford, California, USA
| | - Robert D. Boutin
- Department of Radiology, Stanford University, Stanford, California, USA
| | | | | | - John M. Pauly
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Beliz Gunel
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Akshay S. Chaudhari
- Department of Radiology, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| |
Collapse
|
2
|
Wang W, Jin Z, Liu X, Chen X. NaMA-Mamba: Foundation model for generalizable nasal disease detection using masked autoencoder with Mamba on endoscopic images. Comput Med Imaging Graph 2025; 122:102524. [PMID: 40088572 DOI: 10.1016/j.compmedimag.2025.102524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 02/15/2025] [Accepted: 03/04/2025] [Indexed: 03/17/2025]
Abstract
Artificial intelligence (AI) has shown great promise in analyzing nasal endoscopic images for disease detection. However, current AI systems require extensive expert-labeled data for each specific medical condition, limiting their applications. In this work, the challenge is addressed through two key innovations, the creation of the first large-scale pre-training dataset of nasal endoscopic images, and the development of a novel self-learning AI system specifically designed for nasal endoscopy, named NaMA-Mamba. In the proposed NaMA-Mamba model, two key technologies are utilized, which are the nasal endoscopic state space model (NE-SSM) for analyzing sequences of images and an enhanced learning mechanism (CoMAE) for capturing fine details in nasal tissues. These innovations enable the system to learn effectively from unlabeled images while maintaining high accuracy across different diagnostic tasks. In extensive testing, NaMA-Mamba achieved remarkable results using minimal labeled data, matching the performance of traditional systems that require full expert labeling while needing only 1% of the labeled data for tasks such as detecting nasal polyps and identifying nasopharyngeal cancer. These results demonstrate the potential of NaMA-Mamba to significantly improve the efficiency and accessibility of AI-assisted nasal disease diagnosis in clinical practice.
Collapse
Affiliation(s)
- Wensheng Wang
- Academy for Engineering and Technology, Fudan University, Shanghai, 200433, China; Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
| | - Zewen Jin
- Academy for Engineering and Technology, Fudan University, Shanghai, 200433, China; Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
| | - Xueli Liu
- Eye & ENT Hospital of Fudan University, Shanghai, 200031, China.
| | - Xinrong Chen
- Academy for Engineering and Technology, Fudan University, Shanghai, 200433, China; Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
| |
Collapse
|
3
|
Noh S, Lee MS, Lee BD. Automated radiography assessment of ankle joint instability using deep learning. Sci Rep 2025; 15:15012. [PMID: 40301608 DOI: 10.1038/s41598-025-99620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Accepted: 04/21/2025] [Indexed: 05/01/2025] Open
Abstract
This study developed and evaluated a deep learning (DL)-based system for automatically measuring talar tilt and anterior talar translation on weight-bearing ankle radiographs, which are key parameters in diagnosing ankle joint instability. The system was trained and tested using a dataset comprising of 1,452 anteroposterior radiographs (mean age ± standard deviation [SD]: 43.70 ± 22.60 years; age range: 6-87 years; males: 733, females: 719) and 2,984 lateral radiographs (mean age ± SD: 44.37 ± 22.72 years; age range: 6-92 years; male: 1,533, female: 1,451) from a total of 4,000 patients, provided by the National Information Society Agency. Patients who underwent joint fusion, bone grafting, or joint replacement were excluded. Statistical analyses, including correlation coefficient analysis and Bland-Altman plots, were conducted to assess the agreement and consistency between the DL-calculated and clinician-assessed measurements. The system demonstrated high accuracy, with strong correlations for talar tilt (Pearson correlation coefficient [r] = 0.798 (p < .001); intraclass correlation coefficient [ICC] = 0.797 [95% CI 0.74, 0.82]; concordance correlation coefficient [CCC] = 0.796 [95% CI 0.69, 0.85]; mean absolute error [MAE] = 1.088° [95% CI 0.06°, 1.14°]; mean square error [MSE] = 1.780° [95% CI 1.69°, 2.73°]; root mean square error [RMSE] = 1.374° [95% CI 1.31°, 1.44°]; 95% limit of agreement [LoA], 2.0° to - 2.3°) and anterior talar translation (r = .862 (p < .001); ICC = 0.861 [95% CI 0.84, 0.89]; CCC = 0.861 [95% CI 0.86, 0.89]; MAE = 0.468 mm [95% CI 0.42 mm, 0.51 mm]; MSE = 0.551 mm [95% CI 0.49 mm, 0.61 mm]; RMSE = 0.742 mm [95% CI 0.69 mm, 0.79 mm]; 95% LoA, 1.5 mm to - 1.3 mm). These results demonstrate the system's capability to provide objective and reproducible measurements, supporting clinical interpretation of ankle instability in routine radiographic practice.
Collapse
Affiliation(s)
- Seungha Noh
- Department of Computer Science, Graduate School, Kyonggi University, Suwon-si, Republic of Korea
| | - Mu Sook Lee
- Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea
| | - Byoung-Dai Lee
- Division of AI and Computer Engineering, Kyonggi University, Suwon-si, Gyeonggi-do, 16227, Republic of Korea.
| |
Collapse
|
4
|
Vega R, Dehghan M, Nagdev A, Buchanan B, Kapur J, Jaremko JL, Zonoobi D. Overcoming barriers in the use of artificial intelligence in point of care ultrasound. NPJ Digit Med 2025; 8:213. [PMID: 40253547 PMCID: PMC12009405 DOI: 10.1038/s41746-025-01633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 04/10/2025] [Indexed: 04/21/2025] Open
Abstract
Point-of-care ultrasound is a portable, low-cost imaging technology focused on answering specific clinical questions in real time. Artificial intelligence amplifies its capabilities by aiding clinicians in the acquisition and interpretation of the images; however, there are growing concerns on its effectiveness and trustworthiness. Here, we address key issues such as population bias, explainability and training of artificial intelligence in this field and propose approaches to ensure clinical effectiveness.
Collapse
Affiliation(s)
| | | | - Arun Nagdev
- Alameda Health System, Highland Hospital, University of California San Francisco, San Francisco, CA, 94143, USA
| | - Brian Buchanan
- Department of Critical Care Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, T6G 2B7, Canada
| | - Jeevesh Kapur
- Department of Diagnostic Imaging, National University of Singapore, Queenstown, 119074, Singapore
| | - Jacob L Jaremko
- Department of Radiology and Diagnostic Imaging, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | |
Collapse
|
5
|
Huang X, Wang Z, Zhou W, Yang K, Wen K, Liu H, Huang S, Lyu M. Tailored self-supervised pretraining improves brain MRI diagnostic models. Comput Med Imaging Graph 2025; 123:102560. [PMID: 40252479 DOI: 10.1016/j.compmedimag.2025.102560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 04/05/2025] [Accepted: 04/16/2025] [Indexed: 04/21/2025]
Abstract
Self-supervised learning has shown potential in enhancing deep learning methods, yet its application in brain magnetic resonance imaging (MRI) analysis remains underexplored. This study seeks to leverage large-scale, unlabeled public brain MRI datasets to improve the performance of deep learning models in various downstream tasks for the development of clinical decision support systems. To enhance training efficiency, data filtering methods based on image entropy and slice positions were developed, condensing a combined dataset of approximately 2 million images from fastMRI-brain, OASIS-3, IXI, and BraTS21 into a more focused set of 250 K images enriched with brain features. The Momentum Contrast (MoCo) v3 algorithm was then employed to learn these image features, resulting in robustly pretrained models specifically tailored to brain MRI. The pretrained models were subsequently evaluated in tumor classification, lesion detection, hippocampal segmentation, and image reconstruction tasks. The results demonstrate that our brain MRI-oriented pretraining outperformed both ImageNet pretraining and pretraining on larger multi-organ, multi-modality medical datasets, achieving a ∼2.8 % increase in 4-class tumor classification accuracy, a ∼0.9 % improvement in tumor detection mean average precision, a ∼3.6 % gain in adult hippocampal segmentation Dice score, and a ∼0.1 PSNR improvement in reconstruction at 2-fold acceleration. This study underscores the potential of self-supervised learning for brain MRI using large-scale, tailored datasets derived from public sources.
Collapse
Affiliation(s)
- Xinhao Huang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China; College of Applied Sciences, Shenzhen University, Shenzhen, China; Guangdong-Hongkong-Macau CNS Regeneration Institute, Key Laboratory of CNS Regeneration (Jinan University)-Ministry of Education, Jinan University, Guangzhou, China
| | - Zihao Wang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China; College of Applied Sciences, Shenzhen University, Shenzhen, China; Guangdong-Hongkong-Macau CNS Regeneration Institute, Key Laboratory of CNS Regeneration (Jinan University)-Ministry of Education, Jinan University, Guangzhou, China
| | - Weichen Zhou
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Kexin Yang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China; College of Applied Sciences, Shenzhen University, Shenzhen, China
| | - Kaihua Wen
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Haiguang Liu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Shoujin Huang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Mengye Lyu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China; College of Applied Sciences, Shenzhen University, Shenzhen, China; Guangdong-Hongkong-Macau CNS Regeneration Institute, Key Laboratory of CNS Regeneration (Jinan University)-Ministry of Education, Jinan University, Guangzhou, China.
| |
Collapse
|
6
|
Chyrmang G, Barua B, Bora K, Ahmed GN, Das AK, Kakoti L, Lemos B, Mallik S. Self-HER2Net: A generative self-supervised framework for HER2 classification in IHC histopathology of breast cancer. Pathol Res Pract 2025; 270:155961. [PMID: 40245674 DOI: 10.1016/j.prp.2025.155961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Accepted: 04/08/2025] [Indexed: 04/19/2025]
Abstract
Breast cancer is a significant global health concern, where precise identification of proteins like Human Epidermal Growth Factor Receptor 2 (HER2) in cancer cells via Immunohistochemistry (IHC) is pivotal for treatment decisions. HER2 overexpression is evaluated through HER2 scoring on a scale from 0 to 3 + based on staining patterns and intensity. Recent efforts have been made to automate HER2 scoring using image processing and AI techniques. However, existing methods require large manually annotated datasets as these follow supervised learning paradigms. Therefore, we proposed a generative self-supervised learning (SSL) framework "Self-HER2Net" for the classification of HER2 scoring, to reduce dependence on large manually annotated data by leveraging one of best performing four novel generative self-supervised tasks, that we proposed. The first two SSL tasks HER2hsl and HER2hsv are domain-agnostic and the other two HER2dab and HER2hae are domain-specific SSL tasks focusing on domain-agnostic and domain-specific staining patterns and intensity representation. Our approach is evaluated under different budget scenarios (2 %, 15 %, & 100 % labeled datasets) and also out distribution test. For tile-level assessment, HER2hsv achieved the best performance with AUC-ROC of 0.965 ± 0.037. Our self-supervised learning approach shows potential for application in scenarios with limited annotated data for HER2 analysis.
Collapse
Affiliation(s)
- Genevieve Chyrmang
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India.
| | - Barun Barua
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India.
| | - Kangkana Bora
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, India.
| | - Gazi N Ahmed
- North East Cancer Hospital and Research Institute, Guwahati, Assam, India.
| | - Anup Kr Das
- Arya Wellness centre, Guwahati, Assam, India.
| | | | - Bernardo Lemos
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA; Department of Pharmacology & Toxicology, University of Arizona, AZ 85721, USA.
| | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA; Department of Pharmacology & Toxicology, University of Arizona, AZ 85721, USA.
| |
Collapse
|
7
|
Shi G, Lu H, Hui H, Tian J. Benefit from public unlabeled data: A Frangi filter-based pretraining network for 3D cerebrovascular segmentation. Med Image Anal 2025; 101:103442. [PMID: 39837153 DOI: 10.1016/j.media.2024.103442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 11/27/2024] [Accepted: 12/16/2024] [Indexed: 01/23/2025]
Abstract
Precise cerebrovascular segmentation in Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) data is crucial for computer-aided clinical diagnosis. The sparse distribution of cerebrovascular structures within TOF-MRA images often results in high costs for manual data labeling. Leveraging unlabeled TOF-MRA data can significantly enhance model performance. In this study, we have constructed the largest preprocessed unlabeled TOF-MRA dataset to date, comprising 1510 subjects. Additionally, we provide manually annotated segmentation masks for 113 subjects based on existing external image datasets to facilitate evaluation. We propose a simple yet effective pretraining strategy utilizing the Frangi filter, known for its capability to enhance vessel-like structures, to optimize the use of the unlabeled data for 3D cerebrovascular segmentation. This involves a Frangi filter-based preprocessing workflow tailored for large-scale unlabeled datasets and a multi-task pretraining strategy to efficiently utilize the preprocessed data. This approach ensures maximal extraction of useful knowledge from the unlabeled data. The efficacy of the pretrained model is assessed across four cerebrovascular segmentation datasets, where it demonstrates superior performance, improving the clDice metric by approximately 2%-3% compared to the latest semi- and self-supervised methods. Additionally, ablation studies validate the generalizability and effectiveness of our pretraining method across various backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.
Collapse
Affiliation(s)
- Gen Shi
- School of Engineering Medicine and School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China; Key Laboratory of Big DataBased Precision Medicine (Beihang University), Ministry of Industry and Information Technology of China, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Hao Lu
- State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academic of Science, Beijing 10086, China
| | - Hui Hui
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; National Key Laboratory of Kidney Diseases, Beijing, 100853, China.
| | - Jie Tian
- School of Engineering Medicine and School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China; Key Laboratory of Big DataBased Precision Medicine (Beihang University), Ministry of Industry and Information Technology of China, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; National Key Laboratory of Kidney Diseases, Beijing, 100853, China.
| |
Collapse
|
8
|
Ben Atitallah S, Ben Rabah C, Driss M, Boulila W, Koubaa A. Self-supervised learning for graph-structured data in healthcare applications: A comprehensive review. Comput Biol Med 2025; 188:109874. [PMID: 39999496 DOI: 10.1016/j.compbiomed.2025.109874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/26/2025] [Accepted: 02/13/2025] [Indexed: 02/27/2025]
Abstract
The increasing complexity and interconnectedness of healthcare data present numerous opportunities to improve prediction, diagnosis, and treatment. Graph-structured data, which represents entities and their relationships, is well-suited for modeling these complex connections. However, effectively utilizing this data often requires strong and efficient learning algorithms, especially when dealing with limited labeled data. Self-supervised learning (SSL) has emerged as a powerful paradigm for leveraging unlabeled data to learn effective representations. This paper presents a comprehensive review of SSL approaches specifically designed for graph-structured data in healthcare applications. We explore the challenges and opportunities associated with healthcare data and assess the effectiveness of SSL techniques in real-world healthcare applications. Our discussion encompasses various healthcare settings, such as disease prediction, medical image analysis, and drug discovery. We critically evaluate the performance of different SSL methods across these tasks, highlighting their strengths, limitations, and potential future research directions. To the best of our knowledge, this is the first comprehensive review of SSL applied to graph data in healthcare, providing valuable guidance for researchers and practitioners looking to leverage these techniques to enhance outcomes and drive progress in the field.
Collapse
Affiliation(s)
- Safa Ben Atitallah
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia.
| | - Chaima Ben Rabah
- RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Maha Driss
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Wadii Boulila
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Anis Koubaa
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
9
|
Zambrano Chaves JM, Huang SC, Xu Y, Xu H, Usuyama N, Zhang S, Wang F, Xie Y, Khademi M, Yang Z, Awadalla H, Gong J, Hu H, Yang J, Li C, Gao J, Gu Y, Wong C, Wei M, Naumann T, Chen M, Lungren MP, Chaudhari A, Yeung-Levy S, Langlotz CP, Wang S, Poon H. A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings. Nat Commun 2025; 16:3108. [PMID: 40169573 PMCID: PMC11962106 DOI: 10.1038/s41467-025-58344-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 03/19/2025] [Indexed: 04/03/2025] Open
Abstract
Large foundation models show promise in biomedicine but face challenges in clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we show that open-source small multimodal models can bridge these gaps in radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated radiology image-text pairs to train a specialized, domain-adapted chest X-ray encoder. We integrate this encoder with pre-trained language models via a lightweight adapter that aligns image and text modalities. To enable robust, clinically relevant evaluation, we develop and validate CheXprompt, a GPT-4-based metric for assessing factual accuracy aligned with radiologists' evaluations. Benchmarked with CheXprompt and other standard factuality metrics, LLaVA-Rad (7B) achieves state-of-the-art performance, outperforming much larger models like GPT-4V and Med-PaLM M (84B). While not immediately ready for real-time clinical deployment, LLaVA-Rad is a scalable, privacy-preserving and cost-effective step towards clinically adaptable multimodal AI for radiology.
Collapse
Affiliation(s)
| | | | - Yanbo Xu
- Microsoft Research, Redmond, WA, USA
| | - Hanwen Xu
- University of Washington, Seattle, WA, USA
| | | | | | - Fei Wang
- University of Southern California, Los Angeles, CA, USA
| | - Yujia Xie
- Microsoft Research, Redmond, WA, USA
| | | | - Ziyi Yang
- Microsoft Research, Redmond, WA, USA
| | | | | | | | | | | | | | - Yu Gu
- Microsoft Research, Redmond, WA, USA
| | | | - Mu Wei
- Microsoft Research, Redmond, WA, USA
| | | | - Muhao Chen
- University of California, Davis, CA, USA
| | - Matthew P Lungren
- Microsoft Research, Redmond, WA, USA
- Stanford University, Stanford, CA, USA
- University of California, San Francisco, CA, USA
| | | | | | | | - Sheng Wang
- University of Washington, Seattle, WA, USA.
| | | |
Collapse
|
10
|
Avram O, Durmus B, Rakocz N, Corradetti G, An U, Nittala MG, Terway P, Rudas A, Chen ZJ, Wakatsuki Y, Hirabayashi K, Velaga S, Tiosano L, Corvi F, Verma A, Karamat A, Lindenberg S, Oncel D, Almidani L, Hull V, Fasih-Ahmad S, Esmaeilkhanian H, Cannesson M, Wykoff CC, Rahmani E, Arnold CW, Zhou B, Zaitlen N, Gronau I, Sankararaman S, Chiang JN, Sadda SR, Halperin E. Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans. Nat Biomed Eng 2025; 9:507-520. [PMID: 39354052 DOI: 10.1038/s41551-024-01257-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/23/2024] [Indexed: 10/03/2024]
Abstract
The application of machine learning to tasks involving volumetric biomedical imaging is constrained by the limited availability of annotated datasets of three-dimensional (3D) scans for model training. Here we report a deep-learning model pre-trained on 2D scans (for which annotated data are relatively abundant) that accurately predicts disease-risk factors from 3D medical-scan modalities. The model, which we named SLIViT (for 'slice integration by vision transformer'), preprocesses a given volumetric scan into 2D images, extracts their feature map and integrates it into a single prediction. We evaluated the model in eight different learning tasks, including classification and regression for six datasets involving four volumetric imaging modalities (computed tomography, magnetic resonance imaging, optical coherence tomography and ultrasound). SLIViT consistently outperformed domain-specific state-of-the-art models and was typically as accurate as clinical specialists who had spent considerable time manually annotating the analysed scans. Automating diagnosis tasks involving volumetric scans may save valuable clinician hours, reduce data acquisition costs and duration, and help expedite medical research and clinical applications.
Collapse
Affiliation(s)
- Oren Avram
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Berkin Durmus
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nadav Rakocz
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Giulia Corradetti
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
- Department of Ophthalmology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ulzee An
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Muneeswar G Nittala
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
- Department of Ophthalmology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Prerit Terway
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Akos Rudas
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Zeyuan Johnson Chen
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yu Wakatsuki
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | | | - Swetha Velaga
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Liran Tiosano
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
- Department of Ophthalmology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| | - Federico Corvi
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Aditya Verma
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
- Department of Ophthalmology and Visual Sciences, University of Louisville, Louisville, KY, USA
| | - Ayesha Karamat
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Sophiana Lindenberg
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Deniz Oncel
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Louay Almidani
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Victoria Hull
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | - Sohaib Fasih-Ahmad
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
| | | | - Maxime Cannesson
- Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Charles C Wykoff
- Retina Consultants of Texas, Retina Consultants of America, Houston, TX, USA
- Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA
| | - Elior Rahmani
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Corey W Arnold
- Department of Radiology, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bolei Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ilan Gronau
- School of Computer Science, Reichman University, Herzliya, Israel
| | - Sriram Sankararaman
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jeffrey N Chiang
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurosurgery, University of California, Los Angeles, Los Angeles, CA, USA
| | - Srinivas R Sadda
- Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA.
- Department of Ophthalmology, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
11
|
Bluethgen C, Chambon P, Delbrouck JB, van der Sluijs R, Połacin M, Zambrano Chaves JM, Abraham TM, Purohit S, Langlotz CP, Chaudhari AS. A vision-language foundation model for the generation of realistic chest X-ray images. Nat Biomed Eng 2025; 9:494-506. [PMID: 39187663 PMCID: PMC11861387 DOI: 10.1038/s41551-024-01246-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/28/2024] [Indexed: 08/28/2024]
Abstract
The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.
Collapse
Affiliation(s)
- Christian Bluethgen
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA.
- Department of Radiology, Stanford University, Palo Alto, CA, USA.
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Pierre Chambon
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Jean-Benoit Delbrouck
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Rogier van der Sluijs
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Małgorzata Połacin
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Juan Manuel Zambrano Chaves
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | | | | - Curtis P Langlotz
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Akshay S Chaudhari
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
12
|
Ryu SY, Choi JY, Yoo TK. Automated detection of retinal artery occlusion in fundus photography via self-supervised deep learning and multimodal interpretability using a multimodal AI chatbot. Med Biol Eng Comput 2025:10.1007/s11517-025-03353-7. [PMID: 40163243 DOI: 10.1007/s11517-025-03353-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 03/18/2025] [Indexed: 04/02/2025]
Abstract
Retinal artery occlusion (RAO) is a sight-threatening condition that requires prompt diagnosis to prevent irreversible vision loss. This study presents an innovative AI-driven approach for RAO detection from fundus images, marking the first application of deep learning for this purpose. Using a self-supervised learning (SSL) framework with SimCLR, our model addresses the challenge of limited labeled RAO data. The ResNet50 model pretrained with SimCLR demonstrated high diagnostic accuracy, achieving areas under the receiver operating characteristic curve (AUC) of 0.924 and 0.988 on two external validation datasets, highlighting its robustness and generalizability in RAO detection. To enhance transparency in clinical AI, we incorporated a multimodal interpretability approach using a ChatGPT-4-based AI chatbot. This chatbot, combined with Grad-CAM visualizations, provides detailed clinical explanations of the model's predictions, emphasizing key RAO features such as retinal whitening and cherry-red spots. This multimodal interpretability framework improves clinicians' understanding of the model's decision-making process, facilitating clinical adoption and trust. By automating RAO detection, this AI model serves as a valuable tool for the early identification of ocular and systemic vascular risks, enabling timely intervention. These findings highlight the potential of fundus imaging for RAO detection and broader cardiovascular risk assessment, advancing AI's role in predictive healthcare.
Collapse
Affiliation(s)
| | - Joon Yul Choi
- Department of Biomedical Engineering, Yonsei University, Wonju, South Korea
| | - Tae Keun Yoo
- Department of Ophthalmology, Hangil Eye Hospital, Incheon, South Korea.
| |
Collapse
|
13
|
Mugambi L, wa Maina C, Zühlke L. Self-Supervised Multi-Task Learning for the Detection and Classification of RHD-Induced Valvular Pathology. J Imaging 2025; 11:97. [PMID: 40278013 PMCID: PMC12028028 DOI: 10.3390/jimaging11040097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 03/10/2025] [Accepted: 03/15/2025] [Indexed: 04/26/2025] Open
Abstract
Rheumatic heart disease (RHD) poses a significant global health challenge, necessitating improved diagnostic tools. This study investigated the use of self-supervised multi-task learning for automated echocardiographic analysis, aiming to predict echocardiographic views, diagnose RHD conditions, and determine severity. We compared two prominent self-supervised learning (SSL) methods: DINOv2, a vision-transformer-based approach known for capturing implicit features, and simple contrastive learning representation (SimCLR), a ResNet-based contrastive learning method recognised for its simplicity and effectiveness. Both models were pre-trained on a large, unlabelled echocardiogram dataset and fine-tuned on a smaller, labelled subset. DINOv2 achieved accuracies of 92% for view classification, 98% for condition detection, and 99% for severity assessment. SimCLR demonstrated good performance as well, achieving accuracies of 99% for view classification, 92% for condition detection, and 96% for severity assessment. Embedding visualisations, using both Uniform Manifold Approximation Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE), revealed distinct clusters for all tasks in both models, indicating the effective capture of the discriminative features of the echocardiograms. This study demonstrates the potential of using self-supervised multi-task learning for automated echocardiogram analysis, offering a scalable and efficient approach to improving RHD diagnosis, especially in resource-limited settings.
Collapse
Affiliation(s)
- Lorna Mugambi
- Centre for Data Science and Artificial Intelligence, Dedan Kimathi University of Technology, Nyeri 10143, Kenya
| | - Ciira wa Maina
- Centre for Data Science and Artificial Intelligence, Dedan Kimathi University of Technology, Nyeri 10143, Kenya
| | - Liesl Zühlke
- South African Medical Research Council, Francie Van Zyl Drive, Cape Town 7505, South Africa
- Division of Paediatric Cardiology, Department of Paediatrics and Child Health, Red Cross War Memorial Children’s Hospital, Cape Town 7700, South Africa
- Division of Cardiology, Department of Medicine, Groote Schuur Hospital, Cape Town 7925, South Africa
| |
Collapse
|
14
|
Zhou K, Xin E, Yang S, Luo X, Zhu Y, Zeng Y, Fu J, Ruan Z, Wang R, Geng D, Yang L. Automated Fast Prediction of Bone Mineral Density From Low-dose Computed Tomography. Acad Radiol 2025:S1076-6332(25)00185-0. [PMID: 40082126 DOI: 10.1016/j.acra.2025.02.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/20/2025] [Accepted: 02/23/2025] [Indexed: 03/16/2025]
Abstract
BACKGROUND Low-dose chest CT (LDCT) is commonly employed for the early screening of lung cancer. However, it has rarely been utilized in the assessment of volumetric bone mineral density (vBMD) and the diagnosis of osteoporosis (OP). PURPOSE This study investigated the feasibility of using deep learning to establish a system for vBMD prediction and OP classification based on LDCT scans. METHODS This study included 551 subjects who underwent both LDCT and QCT examinations. First, the U-net was developed to automatically segment lumbar vertebrae from single 2D LDCT slices near the mid-vertebral level. Then, a prediction model was proposed to estimate vBMD, which was subsequently employed for detecting OP and osteopenia (OA). Specifically, two input modalities were constructed for the prediction model. The performance metrics of the models were calculated and evaluated. RESULTS The segmentation model exhibited a strong correlation with manual segmentation, achieving a mean Dice similarity coefficient (DSC) of 0.974, sensitivity of 0.964, positive predictive value (PPV) of 0.985, and Hausdorff distance of 3.261 in the test set. Linear regression and Bland-Altman analysis demonstrated strong agreement between the predicted vBMD from two-channel inputs and QCT-derived vBMD, with a root mean square error of 8.958 mg/mm3 and an R2 of 0.944. The areas under the curve for detecting OP and OA were 0.800 and 0.878, respectively, with an overall accuracy of 94.2%. The average processing time for this system was 1.5 s. CONCLUSION This prediction system could automatically estimate vBMD and detect OP and OA on LDCT scans, providing great potential for the osteoporosis screening.
Collapse
Affiliation(s)
- Kun Zhou
- Academy for Engineering and Technology, Fudan University, Shanghai, China (K.Z., E.X., X.L., D.G.)
| | - Enhui Xin
- Academy for Engineering and Technology, Fudan University, Shanghai, China (K.Z., E.X., X.L., D.G.); Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China (E.X.)
| | - Shan Yang
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Xiao Luo
- Academy for Engineering and Technology, Fudan University, Shanghai, China (K.Z., E.X., X.L., D.G.)
| | - Yuqi Zhu
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Yanwei Zeng
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Junyan Fu
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Zhuoying Ruan
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Rong Wang
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.)
| | - Daoying Geng
- Academy for Engineering and Technology, Fudan University, Shanghai, China (K.Z., E.X., X.L., D.G.); Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.); Shanghai Engineering Research Center of Intelligent Imaging for Critical Brain Diseases, Shanghai, China (D.G., L.Y.); Institute of Functional and Molecular Medical Imaging, Fudan University, Shanghai, China (D.G., L.Y.)
| | - Liqin Yang
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China (S.Y., Y.Z., Y.Z., J.F., Z.R., R.W., D.G., L.Y.); Shanghai Engineering Research Center of Intelligent Imaging for Critical Brain Diseases, Shanghai, China (D.G., L.Y.); Institute of Functional and Molecular Medical Imaging, Fudan University, Shanghai, China (D.G., L.Y.).
| |
Collapse
|
15
|
Li C, Yang D, Yao S, Wang S, Wu Y, Zhang L, Li Q, Cho KIK, Seitz-Holland J, Ning L, Legarreta JH, Rathi Y, Westin CF, O'Donnell LJ, Sochen NA, Pasternak O, Zhang F. DDEvENet: Evidence-based ensemble learning for uncertainty-aware brain parcellation using diffusion MRI. Comput Med Imaging Graph 2025; 120:102489. [PMID: 39787735 PMCID: PMC11792617 DOI: 10.1016/j.compmedimag.2024.102489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 12/04/2024] [Accepted: 12/30/2024] [Indexed: 01/12/2025]
Abstract
In this study, we developed an Evidential Ensemble Neural Network based on Deep learning and Diffusion MRI, namely DDEvENet, for anatomical brain parcellation. The key innovation of DDEvENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. To do so, we design an evidence-based ensemble learning framework for uncertainty-aware parcellation to leverage the multiple dMRI parameters derived from diffusion MRI. Using DDEvENet, we obtained accurate parcellation and uncertainty estimates across different datasets from healthy and clinical populations and with different imaging acquisitions. The overall network includes five parallel subnetworks, where each is dedicated to learning the FreeSurfer parcellation for a certain diffusion MRI parameter. An evidence-based ensemble methodology is then proposed to fuse the individual outputs. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality diffusion MRI data from healthy adults and clinically diffusion MRI data from participants with various brain diseases (schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Compared to several state-of-the-art methods, our experimental results demonstrate highly improved parcellation accuracy across the multiple testing datasets despite the differences in dMRI acquisition protocols and health conditions. Furthermore, thanks to the uncertainty estimation, our DDEvENet approach demonstrates a good ability to detect abnormal brain regions in patients with lesions that are consistent with expert-drawn results, enhancing the interpretability and reliability of the segmentation results.
Collapse
Affiliation(s)
- Chenjun Li
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Dian Yang
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Shun Yao
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Shuyue Wang
- The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Ye Wu
- Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Le Zhang
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Qiannuo Li
- East China University of Science and Technology, Shanghai, China
| | | | | | | | | | | | | | | | - Nir A Sochen
- School of Mathematical Sciences, University of Tel Aviv, Tel Aviv, Israel
| | | | - Fan Zhang
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
| |
Collapse
|
16
|
Chang JM, Lee W, Bahl M. Clinical Application of Artificial Intelligence in Digital Breast Tomosynthesis. JOURNAL OF THE KOREAN SOCIETY OF RADIOLOGY 2025; 86:205-215. [PMID: 40201610 PMCID: PMC11973105 DOI: 10.3348/jksr.2025.0011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 02/24/2025] [Accepted: 03/04/2025] [Indexed: 04/10/2025]
Abstract
Digital breast tomosynthesis (DBT) provides improved cancer detection and lower recall rates when compared with full-field digital mammography (DM) and has been widely adopted for breast cancer screening. However, adopting DBT presents new challenges such as an increased number of acquired images resulting in longer interpretation times. Artificial intelligence (AI) offers numerous opportunities to enhance the advantages of DBT and mitigate its shortcomings. Research in the DBT AI domain has grown significantly and AI algorithms play a key role in the screening and diagnostic phases of breast cancer detection and characterization. The application of AI may streamline the workflow and reduce the time required for radiologists to interpret images. In addition, AI can minimize radiation exposure and enhance lesion visibility in synthetic two-dimensional DM images. This review provides an overview of AI technology in DBT, its clinical applications, and future considerations.
Collapse
|
17
|
Sugawara K, Takaya E, Inamori R, Konaka Y, Sato J, Shiratori Y, Hario F, Kobayashi T, Ueda T, Okamoto Y. Breast cancer classification based on breast tissue structures using the Jigsaw puzzle task in self-supervised learning. Radiol Phys Technol 2025; 18:209-218. [PMID: 39760975 PMCID: PMC11876229 DOI: 10.1007/s12194-024-00874-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/18/2024] [Accepted: 12/20/2024] [Indexed: 01/07/2025]
Abstract
Self-supervised learning (SSL) has gained attention in the medical field as a deep learning approach utilizing unlabeled data. The Jigsaw puzzle task in SSL enables models to learn both features of images and the positional relationships within images. In breast cancer diagnosis, radiologists evaluate not only lesion-specific features but also the surrounding breast structures. However, deep learning models that adopt a diagnostic approach similar to human radiologists are still limited. This study aims to evaluate the effectiveness of the Jigsaw puzzle task in characterizing breast tissue structures for breast cancer classification on mammographic images. Using the Chinese Mammography Database (CMMD), we compared four pre-training pipelines: (1) IN-Jig, pre-trained with both the ImageNet classification task and the Jigsaw puzzle task, (2) Scratch-Jig, pre-trained only with the Jigsaw puzzle task, (3) IN, pre-trained only with the ImageNet classification task, and (4) Scratch, that is trained from random initialization without any pre-training tasks. All pipelines were fine-tuned using binary classification to distinguish between the presence or absence of breast cancer. Performance was evaluated based on the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. Additionally, detailed analysis was conducted for performance across different radiological findings, breast density, and regions of interest were visualized using gradient-weighted class activation mapping (Grad-CAM). The AUC for the four models were 0.925, 0.921, 0.918, 0.909, respectively. Our results suggest the Jigsaw puzzle task is an effective pre-training method for breast cancer classification, with the potential to enhance diagnostic accuracy with limited data.
Collapse
Affiliation(s)
- Keisuke Sugawara
- Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Eichi Takaya
- Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
- AI Lab, Tohoku University Hospital, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | - Ryusei Inamori
- Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Yuma Konaka
- Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Jumpei Sato
- Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Yuta Shiratori
- Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Fumihito Hario
- Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Tomoya Kobayashi
- Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
- AI Lab, Tohoku University Hospital, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Takuya Ueda
- Department of Diagnostic Radiology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Yoshikazu Okamoto
- Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
- AI Lab, Tohoku University Hospital, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| |
Collapse
|
18
|
Li Y, Wynne JF, Wu Y, Qiu RLJ, Tian S, Wang T, Patel PR, Yu DS, Yang X. Automatic medical imaging segmentation via self-supervising large-scale convolutional neural networks. Radiother Oncol 2025; 204:110711. [PMID: 39798701 PMCID: PMC11938206 DOI: 10.1016/j.radonc.2025.110711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 12/02/2024] [Accepted: 01/04/2025] [Indexed: 01/15/2025]
Abstract
PURPOSE This study aims to develop a robust, large-scale deep learning model for medical image segmentation, leveraging self-supervised learning to overcome the limitations of supervised learning and data variability in clinical settings. METHODS AND MATERIALS We curated a substantial multi-center CT dataset for self-supervised pre-training using masked image modeling with sparse submanifold convolution. We designed a series of Sparse Submanifold U-Nets (SS-UNets) of varying sizes and performed self-supervised pre-training. We fine-tuned the SS-UNets on the TotalSegmentator dataset. The evaluation encompassed robustness tests on four unseen datasets and transferability assessments on three additional datasets. RESULTS Our SS-UNets exhibited superior performance in comparison to state-of-the-art self-supervised methods, demonstrating higher Dice Similarity Coefficient (DSC) and Surface Dice Coefficient (SDC) metrics. SS-UNet-B achieved 84.3 % DSC and 88.0 % SDC in TotalSegmentator. We further demonstrated the scalability of our networks, with segmentation performance increasing with model size, demonstrated from 58 million to 1.4 billion parameters:4.6 % DSC and 3.2 % SDC improvement in TotalSegmentator from SS-UNet-B to SS-UNet-H. CONCLUSIONS We demonstrate the efficacy of self-supervised learning for medical image segmentation in the CT, MRI and PET domains. Our approach significantly reduces reliance on extensively labeled data, mitigates risks of overfitting, and enhances model generalizability. Future applications may allow accurate segmentation of organs and lesions across several imaging domains, potentially streamlining cancer detection and radiotherapy treatment planning.
Collapse
Affiliation(s)
- Yuheng Li
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA; Department of Biomedical Engineering, Emory University and Georgia Institute of Technology Atlanta, GA 30308, USA
| | - Jacob F Wynne
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA
| | - Yizhou Wu
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Richard L J Qiu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA
| | - Sibo Tian
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA
| | - Tonghe Wang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Pretesh R Patel
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA
| | - David S Yu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, USA; Department of Biomedical Engineering, Emory University and Georgia Institute of Technology Atlanta, GA 30308, USA.
| |
Collapse
|
19
|
Brahma S, Kofler A, Zimmermann FF, Schaeffter T, Chiribiri A, Kolbitsch C. Robust Myocardial Perfusion MRI Quantification With DeepFermi. IEEE Trans Biomed Eng 2025; 72:1031-1044. [PMID: 39441677 DOI: 10.1109/tbme.2024.3485233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
Stress perfusion cardiac magnetic resonance is an important technique for examining and assessing the blood supply of the myocardium. Currently, the majority of clinical perfusion scans are evaluated based on visual assessment by experienced clinicians. This makes the process subjective, and to this end, quantitative methods have been proposed to offer a more user-independent assessment of perfusion. These methods, however, rely on time-consuming deconvolution analysis and are susceptible to data outliers caused by artifacts due to cardiac or respiratory motion. In our work, we introduce a novel deep-learning method that integrates the commonly used Fermi function with a neural network architecture for fast, accurate, and robust myocardial perfusion quantification. This approach employs the Fermi model to ensure that the perfusion maps are consistent with measured data, while also utilizing a prior based on a 3D convolutional neural network to generalize spatio-temporal information across different patient data. Our network is trained within a self-supervised learning framework, which circumvents the need for ground-truth perfusion labels that are challenging to obtain. Furthermore, we extended this training methodology by adopting a technique that ensures estimations are resistant to data outliers, thereby improving robustness against motion artifacts. Our simulation experiments demonstrated an overall improvement in the accuracy and robustness of perfusion parameter estimation, consistently outperforming traditional deconvolution analysis algorithms across varying Signal-to-Noise Ratio scenarios in the presence of data outliers. For the in vivo studies, our method generated robust perfusion estimates that aligned with clinical diagnoses, while being approximately five times faster than conventional algorithms.
Collapse
|
20
|
Xu Z, Zhao L, Yin L, Cao M, Liu Y, Gu F, Liu X, Zhang G. Support Vector Machine for Stratification of Cognitive Impairment Using 3D T1WI in Patients with Type 2 Diabetes Mellitus. Diabetes Metab Syndr Obes 2025; 18:435-451. [PMID: 39967716 PMCID: PMC11832351 DOI: 10.2147/dmso.s480317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 02/03/2025] [Indexed: 02/20/2025] Open
Abstract
Purpose To explore the potential of MRI-based radiomics in predicting cognitive dysfunction in patients with diagnosed type 2 diabetes mellitus (T2DM). Patients and Methods In this study, data on 158 patients with T2DM were retrospectively collected between September 2019 and December 2020. The participants were categorized into a normal cognitive function (N) group (n=30), a mild cognitive impairment (MCI) group (n=90), and a dementia (DM) group (n=38) according to the Chinese version of the Montréal Cognitive Assessment Scale-B (MoCA-B). Radiomics features were extracted from the brain tissue except ventricles and sulci in the 3D T1WI images, support vector machine (SVM) model was then established to identify the CI and N groups, and the MCI and DM groups, respectively. The models were evaluated based on their area under the receiver operating characteristic curve (AUC), Precision (P), Recall rate (Recall, R), F1-score, and Support. Finally, ROC curves were plotted for each model. Results The study consisted of 68 cases in the N and CI group, with 54 cases in the training set and 14 in the verification set, and 128 cases were included in the MCI and DM groups, with 90 training sets and 38 verification sets. The consistency for inter-group and intra-group of radiomics features in two physicians were 0.86 and 0.90, respectively. After features selection, there were 11 optimal features to distinguish N and CI and 12 optimal features to MCI and DM. In the test set, the AUC for the SVM classifier was 0.857 and the accuracy was 0.830 in distinguishing CI and N, while AUC was 0.821 and the accuracy was 0.830 in distinguishing MCI and DM. Conclusion The SVM model based on MRI radiomics exhibits high efficacy in the diagnosis of cognitive dysfunction and evaluation of its severity among patients with T2DM.
Collapse
Affiliation(s)
- Zhigao Xu
- Department of Radiology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Lili Zhao
- Department of Radiology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Lei Yin
- Graduate School, Changzhi Medical School, Changzhi, 046013, People’s Republic of China
| | - Milan Cao
- Department of Science and Education, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Yan Liu
- Department of Endocrinology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Feng Gu
- Department of Radiology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Xiaohui Liu
- Department of Radiology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| | - Guojiang Zhang
- Department of Cardiovasology, The Third People’s Hospital of Datong, Datong, 037046, People’s Republic of China
| |
Collapse
|
21
|
Liao W, Luo X, Li L, Xu J, He Y, Huang H, Zhang S. Automatic cervical lymph nodes detection and segmentation in heterogeneous computed tomography images using deep transfer learning. Sci Rep 2025; 15:4250. [PMID: 39905029 PMCID: PMC11794882 DOI: 10.1038/s41598-024-84804-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Accepted: 12/27/2024] [Indexed: 02/06/2025] Open
Abstract
To develop a deep learning model using transfer learning for automatic detection and segmentation of neck lymph nodes (LNs) in computed tomography (CT) images, the study included 11,013 annotated LNs with a short-axis diameter ≥ 3 mm from 626 head and neck cancer patients across four hospitals. The nnUNet model was used as a baseline, pre-trained on a large-scale head and neck dataset, and then fine-tuned with 4,729 LNs from hospital A for detection and segmentation. Validation was conducted on an internal testing cohort (ITC A) and three external testing cohorts (ETCs B, C, and D), with 1684 and 4600 LNs, respectively. Detection was evaluated via sensitivity, positive predictive value (PPV), and false positive rate per case (FP/vol), while segmentation was assessed using the Dice similarity coefficient (DSC) and Hausdorff distance (HD95). For detection, the sensitivity, PPV, and FP/vol in ITC A were 54.6%, 69.0%, and 3.4, respectively. In ETCs, the sensitivity ranged from 45.7% at 3.9 FP/vol to 63.5% at 5.8 FP/vol. Segmentation achieved a mean DSC of 0.72 in ITC A and 0.72 to 0.74 in ETCs, as well as a mean HD95 of 3.78 mm in ITC A and 2.73 mm to 2.85 mm in ETCs. No significant sensitivity difference was found between contrast-enhanced and unenhanced CT images (p = 0.502) or repeated CT images (p = 0.815) during adaptive radiotherapy. The model's segmentation accuracy was comparable to that of experienced oncologists. The model shows promise in automatically detecting and segmenting neck LNs in CT images, potentially reducing oncologists' segmentation workload.
Collapse
Affiliation(s)
- Wenjun Liao
- Department of Radiation Oncology, Sichuan Cancer Hospital and Institute, Sichuan Cancer Center, Cancer Hospital Affiliate to School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610041, China
| | - Xiangde Luo
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Lu Li
- Department of Radiation Oncology, Sichuan Cancer Hospital and Institute, Sichuan Cancer Center, Cancer Hospital Affiliate to School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610041, China
| | - Jinfeng Xu
- Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Yuan He
- Department of Radiation Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 23000, Anhui, China
| | - Hui Huang
- Cancer Center, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China
| | - Shichuan Zhang
- Department of Radiation Oncology, Sichuan Cancer Hospital and Institute, Sichuan Cancer Center, Cancer Hospital Affiliate to School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610041, China.
| |
Collapse
|
22
|
Gao R, Peng A, Duan Y, Chen M, Zheng T, Zhang M, Chen L, Sun H. Associations of Postencephalitic Epilepsy Using Multi-Contrast Whole Brain MRI: A Large Self-Supervised Vision Foundation Model Strategy. J Magn Reson Imaging 2025. [PMID: 39898495 DOI: 10.1002/jmri.29734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 01/22/2025] [Accepted: 01/23/2025] [Indexed: 02/04/2025] Open
Abstract
BACKGROUND Postencephalitic epilepsy (PEE) is a severe neurological complication following encephalitis. Early identification of individuals at high risk for PEE is important for timely intervention. PURPOSE To develop a large self-supervised vision foundation model using a big dataset of multi-contrast head MRI scans, followed by fine-tuning with MRI data and follow-up outcomes from patients with PEE to develop a PEE association model. STUDY TYPE Retrospective. POPULATION Fifty-seven thousand six hundred twenty-one contrast-enhanced head MRI scans from 34,871 patients for foundation model construction, and head MRI scans from 144 patients with encephalitis (64 PEE, 80 N-PEE) for the PEE association model. FIELD STRENGTH/SEQUENCE 1.5-T, 3-T, T1-weighted imaging, T2-weighted imaging, fluid attenuated inversion recovery, T1-weighted contrast-enhanced imaging. ASSESSMENT The foundation model was developed using self-supervised learning and cross-contrast context recovery. Patients with encephalitis were monitored for a median of 3.7 years (range 0.7-7.5 years), with epilepsy diagnosed according to International League Against Epilepsy. Occlusion sensitivity mapping highlighted brain regions involved in PEE classifications. Model performance was compared with DenseNet without pre-trained weights. STATISTICAL TESTS Performance was assessed via confusion matrices, accuracy, sensitivity, specificity, precision, F1 score, and area under the receiver operating characteristic curve (AUC). The DeLong test evaluated AUC between the two models (P < 0.05 for statistical significance). RESULTS The PEE association model achieved accuracy, sensitivity, specificity, precision, F1 score, and AUC of 79.3% (95% CI: 0.71-0.92), 92.3% (95% CI: 0.80-1.00), 68.8% (95% CI: 0.55-0.87), 70.6% (95% CI: 0.61-0.90), 80.0% (95% CI: 0.71-0.93), and 81.0% (95% CI: 0.68-0.92), respectively. A significant AUC improvement was found compared to DenseNet (Delong test, P = 0.03). The association model focused on brain regions affected by encephalitis. DATA CONCLUSION Using extensive unlabeled data via self-supervised learning addressed the limitations of supervised tasks with limited data. The fine-tuned foundation model outperformed DenseNet, which was trained exclusively on task data. PLAIN LANGUAGE SUMMARY This research develops a model to assess the occurrence epilepsy after encephalitis, a severe brain inflammation condition. By using over 57,000 brain scans, the study trains a computer program to recognize patterns in brain images. The model analyzes whole-brain scans to identify areas commonly affected by the disease, such as the temporal and frontal lobes. It was tested on data from patients with encephalitis and showed better performance than older methods. The model can assess the risk of secondary epilepsy in patients with encephalitis, allowing doctors to intervene early and improve treatment outcomes for those affected by this condition. EVIDENCE LEVEL 4 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Ronghui Gao
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
| | - Anjiao Peng
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
| | - Yifei Duan
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
| | - Mengyao Chen
- Huaxi MR Research Center (HMRRC), West China Hospital, Sichuan University, Chengdu, China
| | - Tao Zheng
- IT Center, West China Hospital, Sichuan University, Chengdu, China
| | - Meng Zhang
- NVIDIA Corp, Beijing Representative Office, Beijing, China
| | - Lei Chen
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
- Pazhou Lab, Guangzhou, China
| | - Huaiqiang Sun
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
- Huaxi MR Research Center (HMRRC), West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
23
|
Paschali M, Chen Z, Blankemeier L, Varma M, Youssef A, Bluethgen C, Langlotz C, Gatidis S, Chaudhari A, Atzen S. Foundation Models in Radiology: What, How, Why, and Why Not. Radiology 2025; 314:e240597. [PMID: 39903075 PMCID: PMC11868850 DOI: 10.1148/radiol.240597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/02/2024] [Accepted: 06/11/2024] [Indexed: 02/06/2025]
Abstract
Recent advances in artificial intelligence have witnessed the emergence of large-scale deep learning models capable of interpreting and generating both textual and imaging data. Such models, typically referred to as foundation models (FMs), are trained on extensive corpora of unlabeled data and demonstrate high performance across various tasks. FMs have recently received extensive attention from academic, industry, and regulatory bodies. Given the potentially transformative impact that FMs can have on the field of radiology, radiologists must be aware of potential pathways to train these radiology-specific FMs, including understanding both the benefits and challenges. Thus, this review aims to explain the fundamental concepts and terms of FMs in radiology, with a specific focus on the requirements of training data, model training paradigms, model capabilities, and evaluation strategies. Overall, the goal of this review is to unify technical advances and clinical needs for safe and responsible training of FMs in radiology to ultimately benefit patients, providers, and radiologists.
Collapse
Affiliation(s)
- Magdalini Paschali
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Zhihong Chen
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Louis Blankemeier
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Maya Varma
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Alaa Youssef
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Christian Bluethgen
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Curtis Langlotz
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Sergios Gatidis
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Akshay Chaudhari
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| | - Sarah Atzen
- From the Stanford Center for Artificial Intelligence in Medicine and
Imaging, 1701 Page Mill Rd, Palo Alto, CA 94304 (M.P., Z.C., L.B., M.V., A.Y.,
C.B., C.L., S.G., A.C.); Departments of Radiology (M.P., Z.C., A.Y., C.L., S.G.,
A.C.), Electrical Engineering (L.B.), Computer Science (M.V.), Medicine (C.L.),
and Biomedical Data Science (C.L., A.C.), Stanford University, Stanford, Calif;
and Department of Diagnostic and Interventional Radiology, University Hospital
Zurich, University of Zurich, Zurich, Switzerland (C.B.)
| |
Collapse
|
24
|
Trigka M, Dritsas E. A Comprehensive Survey of Deep Learning Approaches in Image Processing. SENSORS (BASEL, SWITZERLAND) 2025; 25:531. [PMID: 39860903 PMCID: PMC11769216 DOI: 10.3390/s25020531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Revised: 01/13/2025] [Accepted: 01/13/2025] [Indexed: 01/27/2025]
Abstract
The integration of deep learning (DL) into image processing has driven transformative advancements, enabling capabilities far beyond the reach of traditional methodologies. This survey offers an in-depth exploration of the DL approaches that have redefined image processing, tracing their evolution from early innovations to the latest state-of-the-art developments. It also analyzes the progression of architectural designs and learning paradigms that have significantly enhanced the ability to process and interpret complex visual data. Key advancements, such as techniques improving model efficiency, generalization, and robustness, are examined, showcasing DL's ability to address increasingly sophisticated image-processing tasks across diverse domains. Metrics used for rigorous model evaluation are also discussed, underscoring the importance of performance assessment in varied application contexts. The impact of DL in image processing is highlighted through its ability to tackle complex challenges and generate actionable insights. Finally, this survey identifies potential future directions, including the integration of emerging technologies like quantum computing and neuromorphic architectures for enhanced efficiency and federated learning for privacy-preserving training. Additionally, it highlights the potential of combining DL with emerging technologies such as edge computing and explainable artificial intelligence (AI) to address scalability and interpretability challenges. These advancements are positioned to further extend the capabilities and applications of DL, driving innovation in image processing.
Collapse
Affiliation(s)
| | - Elias Dritsas
- Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece;
| |
Collapse
|
25
|
Lee S, Youn J, Kim H, Kim M, Yoon SH. CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images. Eur Radiol 2025:10.1007/s00330-024-11339-6. [PMID: 39812665 DOI: 10.1007/s00330-024-11339-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 10/25/2024] [Accepted: 12/04/2024] [Indexed: 01/16/2025]
Abstract
OBJECTIVE This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists. MATERIALS AND METHODS For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. RESULTS The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. CONCLUSION This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. KEY POINTS Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.
Collapse
Affiliation(s)
- Seowoo Lee
- Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jiwon Youn
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Hyungjin Kim
- Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Mansu Kim
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea.
| | - Soon Ho Yoon
- Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
| |
Collapse
|
26
|
Silva-Rodríguez J, Chakor H, Kobbi R, Dolz J, Ben Ayed I. A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision. Med Image Anal 2025; 99:103357. [PMID: 39418828 DOI: 10.1016/j.media.2024.103357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 05/06/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024]
Abstract
Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 38 open-access, mostly categorical fundus imaging datasets from various sources, with up to 101 different target conditions and 288,307 images. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert's knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a wide margin larger-scale generalist image-language models and retina domain-specific self-supervised networks, which emphasizes the potential of embedding experts' domain knowledge and the limitations of generalist models in medical imaging. The pre-trained model is available at: https://github.com/jusiro/FLAIR.
Collapse
Affiliation(s)
| | | | | | - Jose Dolz
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| | - Ismail Ben Ayed
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| |
Collapse
|
27
|
Li K, Yang J, Liang W, Li X, Zhang C, Chen L, Wu C, Zhang X, Xu Z, Wang Y, Meng L, Zhang Y, Chen Y, Zhou SK. O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision. Med Image Anal 2025; 99:103319. [PMID: 39270466 DOI: 10.1016/j.media.2024.103319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 07/10/2024] [Accepted: 08/19/2024] [Indexed: 09/15/2024]
Abstract
Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We present a novel computational approach, called as O-PRESS, for boosting the axial resolution of OCT with Prior guidance, a Recurrent mechanism, and Equivariant Self-Supervision. Diverging from conventional deconvolution methods that rely on physical models or data-driven techniques, our method seamlessly integrates OCT modeling and deep learning, enabling us to achieve real-time axial-resolution enhancement exclusively from measurements without a need for paired images. Our approach solves two primary tasks of resolution enhancement and noise reduction with one treatment. Both tasks are executed in a self-supervised manner, with equivariance imaging and free space priors guiding their respective processes. Experimental evaluations, encompassing both quantitative metrics and visual assessments, consistently verify the efficacy and superiority of our approach, which exhibits performance on par with fully supervised methods. Importantly, the robustness of our model is affirmed, showcasing its dual capability to enhance axial resolution while concurrently improving the signal-to-noise ratio.
Collapse
Affiliation(s)
- Kaiyan Li
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, USTC, Suzhou Jiangsu, 215123, China
| | - Jingyuan Yang
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Wenxuan Liang
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, USTC, Suzhou Jiangsu, 215123, China; School of Physical Sciences, University of Science and Technology of China, Hefei Anhui, 230026, China
| | - Xingde Li
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21287, USA
| | - Chenxi Zhang
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Lulu Chen
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Chan Wu
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Xiao Zhang
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Zhiyan Xu
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Yueling Wang
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Lihui Meng
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Yue Zhang
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, USTC, Suzhou Jiangsu, 215123, China
| | - Youxin Chen
- Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China; Key Laboratory of Ocular Fundus Diseases, Chinese Academy of Medical Sciences, Beijing, 100730, China.
| | - S Kevin Zhou
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, USTC, Suzhou Jiangsu, 215123, China; Key Laboratory of Precision and Intelligent Chemistry, USTC, Hefei Anhui, 230026, China; Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.
| |
Collapse
|
28
|
Chen CH, Hsieh KY, Huang KE, Cheng ET. Using the Regression Slope of Training Loss to Optimize Chest X-ray Generation in Deep Convolutional Generative Adversarial Networks. Cureus 2025; 17:e77391. [PMID: 39811723 PMCID: PMC11730489 DOI: 10.7759/cureus.77391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2025] [Indexed: 01/16/2025] Open
Abstract
Diffusion models, variational autoencoders, and generative adversarial networks (GANs) are three common types of generative artificial intelligence models for image generation. Among these, GANs are the most frequently used for medical image generation and are often employed for data augmentation in various studies. However, due to the adversarial nature of GANs, where the generator and discriminator compete against each other, the training process can sometimes end with the model unable to generate meaningful images or even producing noise. This phenomenon is rarely discussed in the literature, and no studies have proposed solutions to address this issue. Such outcomes can introduce significant bias when GANs are used for data augmentation in medical image training. Moreover, GANs often require substantial computational power and storage, adding to the challenges. In this study, we used deep convolutional GANs for chest X-ray generation, and three typical training outcomes were found. Two scenarios generated meaningful medical images and one failed to produce usable images. By analyzing the loss history during training, we observed that the regression line of the overall losses tends to diverge slowly. After excluding outlier losses, we found that the slope of the regression line within the stable loss segment indicates the optimal point to terminate training, ensuring the generation of meaningful medical images.
Collapse
Affiliation(s)
- Chih-Hsiung Chen
- Department of Critical Care Medicine, Mennonite Christian Hospital, Hualien, TWN
| | - Kuang-Yu Hsieh
- Department of Critical Care Medicine, Mennonite Christian Hospital, Hualien, TWN
| | - Kuo-En Huang
- Department of Critical Care Medicine, Mennonite Christian Hospital, Hualien, TWN
| | - En-Tsung Cheng
- Department of Critical Care Medicine, Jen Ho Hospital, Show Chwan Health Care System, Changhua, TWN
| |
Collapse
|
29
|
Kim JN, Song Y, Wu H, Subramaniam A, Lee J, Makhlouf MHE, Hassani NS, Al-Kindi S, Wilson DL, Lee J. Improving coronary artery segmentation with self-supervised learning and automated pericoronary adipose tissue segmentation: a multi-institutional study on coronary computed tomography angiography images. J Med Imaging (Bellingham) 2025; 12:016002. [PMID: 39967897 PMCID: PMC11831809 DOI: 10.1117/1.jmi.12.1.016002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 12/20/2024] [Accepted: 01/21/2025] [Indexed: 02/20/2025] Open
Abstract
Purpose Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide, with coronary computed tomography angiography (CCTA) playing a crucial role in its diagnosis. The mean Hounsfield unit (HU) of pericoronary adipose tissue (PCAT) is linked to cardiovascular risk. We utilized a self-supervised learning framework (SSL) to improve the accuracy and generalizability of coronary artery segmentation on CCTA volumes while addressing the limitations of small-annotated datasets. Approach We utilized self-supervised pretraining followed by supervised fine-tuning to segment coronary arteries. To evaluate the data efficiency of SSL, we varied the number of CCTA volumes used during pretraining. In addition, we developed an automated PCAT segmentation algorithm utilizing centerline extraction, spatial-geometric coronary identification, and landmark detection. We evaluated our method on a multi-institutional dataset by assessing coronary artery and PCAT segmentation accuracy via Dice scores and comparing mean PCAT HU values with the ground truth. Results Our approach significantly improved coronary artery segmentation, achieving Dice scores up to 0.787 after self-supervised pretraining. The automated PCAT segmentation achieved near-perfect performance, with R -squared values of 0.9998 for both the left anterior descending artery and the right coronary artery indicating excellent agreement between predicted and actual mean PCAT HU values. Self-supervised pretraining notably enhanced model generalizability on external datasets, improving overall segmentation accuracy. Conclusions We demonstrate the potential of SSL to advance CCTA image analysis, enabling more accurate CAD diagnostics. Our findings highlight the robustness of SSL for automated coronary artery and PCAT segmentation, offering promising advancements in cardiovascular care.
Collapse
Affiliation(s)
- Justin N. Kim
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Yingnan Song
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Hao Wu
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Ananya Subramaniam
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Jihye Lee
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Mohamed H. E. Makhlouf
- Harrington Heart and Vascular Institute, University Hospitals Cleveland Medical Center, Cardiovascular Imaging Core Laboratory, Cleveland, Ohio, United States
| | - Neda S. Hassani
- Harrington Heart and Vascular Institute, University Hospitals Cleveland Medical Center, Cardiovascular Imaging Core Laboratory, Cleveland, Ohio, United States
| | - Sadeer Al-Kindi
- Houston Methodist, DeBakey Heart and Vascular Center, Houston, Texas, United States
| | - David L. Wilson
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
- Case Western Reserve University, Department of Radiology, Cleveland, Ohio, United States
| | - Juhwan Lee
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| |
Collapse
|
30
|
Nguyen MTP, Phan Tran MK, Nakano T, Tran TH, Nguyen QDN. Partial Attention in Global Context and Local Interaction for Addressing Noisy Labels and Weighted Redundancies on Medical Images. SENSORS (BASEL, SWITZERLAND) 2024; 25:163. [PMID: 39796954 PMCID: PMC11722591 DOI: 10.3390/s25010163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/22/2024] [Accepted: 12/27/2024] [Indexed: 01/13/2025]
Abstract
Recently, the application of deep neural networks to detect anomalies on medical images has been facing the appearance of noisy labels, including overlapping objects and similar classes. Therefore, this study aims to address this challenge by proposing a unique attention module that can assist deep neural networks in focusing on important object features in noisy medical image conditions. This module integrates global context modeling to create long-range dependencies and local interactions to enable channel attention ability by using 1D convolution that not only performs well with noisy labels but also consumes significantly less resources without any dimensionality reduction. The module is then named Global Context and Local Interaction (GCLI). We have further experimented and proposed a partial attention strategy for the proposed GCLI module, aiming to efficiently reduce weighted redundancies. This strategy utilizes a subset of channels for GCLI to produce attention weights instead of considering every single channel. As a result, this strategy can greatly reduce the risk of introducing weighted redundancies caused by modeling global context. For classification, our proposed method is able to assist ResNet34 in achieving up to 82.5% accuracy on the Chaoyang test set, which is the highest figure among the other SOTA attention modules without using any processing filter to reduce the effect of noisy labels. For object detection, the GCLI is able to boost the capability of YOLOv8 up to 52.1% mAP50 on the GRAZPEDWRI-DX test set, demonstrating the highest performance among other attention modules and ranking second in the mAP50 metric on the VinDR-CXR test set. In terms of model complexity, our proposed GCLI module can consume fewer extra parameters up to 225 times and has inference speed faster than 30% compared to the other attention modules.
Collapse
Affiliation(s)
- Minh Tai Pham Nguyen
- Faculty of Advanced Program, Ho Chi Minh City Open University, Ho Chi Minh City 700000, Vietnam;
| | - Minh Khue Phan Tran
- Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City 700000, Vietnam
| | - Tadashi Nakano
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| | - Thi Hong Tran
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| | - Quoc Duy Nam Nguyen
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| |
Collapse
|
31
|
Wu Y, Ramai D, Smith ER, Mega PF, Qatomah A, Spadaccini M, Maida M, Papaefthymiou A. Applications of Artificial Intelligence in Gastrointestinal Endoscopic Ultrasound: Current Developments, Limitations and Future Directions. Cancers (Basel) 2024; 16:4196. [PMID: 39766095 PMCID: PMC11674484 DOI: 10.3390/cancers16244196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/09/2024] [Accepted: 12/14/2024] [Indexed: 01/09/2025] Open
Abstract
Endoscopic ultrasound (EUS) effectively diagnoses malignant and pre-malignant gastrointestinal lesions. In the past few years, artificial intelligence (AI) has shown promising results in enhancing EUS sensitivity and accuracy, particularly for subepithelial lesions (SELs) like gastrointestinal stromal tumors (GISTs). Furthermore, AI models have shown high accuracy in predicting malignancy in gastric GISTs and distinguishing between benign and malignant intraductal papillary mucinous neoplasms (IPMNs). The utility of AI has also been applied to existing and emerging technologies involved in the performance and evaluation of EUS-guided biopsies. These advancements may improve training in EUS, allowing trainees to focus on technical skills and image interpretation. This review evaluates the current state of AI in EUS, covering imaging diagnosis, EUS-guided biopsies, and training advancements. It discusses early feasibility studies and recent developments, while also addressing the limitations and challenges. This article aims to review AI applications to EUS and its applications in clinical practice while addressing pitfalls and challenges.
Collapse
Affiliation(s)
- Yizhong Wu
- Department of Internal Medicine, Baylor Scott & White Round Rock Hospital, Round Rock, TX 78665, USA;
| | - Daryl Ramai
- Division of Gastroenterology, Hepatology and Endoscopy, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Eric R. Smith
- Department of Internal Medicine, Baylor Scott & White Round Rock Hospital, Round Rock, TX 78665, USA;
| | - Paulo F. Mega
- Gastrointestinal Endoscopy Unit, Universidade de Sao Paulo Hospital das Clinicas, São Paulo 05403-010, Brazil
| | - Abdulrahman Qatomah
- Division of Gastroenterology, Hepatology and Endoscopy, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Marco Spadaccini
- Department of Endoscopy, Humanitas Research Hospital, 20089 Rozzano, Italy;
| | - Marcello Maida
- Department of Medicine and Surgery, School of Medicine and Surgery, University of Enna ‘Kore’, 94100 Enna, Italy;
| | | |
Collapse
|
32
|
Wang Q, Xiong Y, Zhu H, Mu X, Zhang Y, Ma Y. Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer. Comput Med Imaging Graph 2024; 118:102469. [PMID: 39577206 DOI: 10.1016/j.compmedimag.2024.102469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 10/24/2024] [Accepted: 11/10/2024] [Indexed: 11/24/2024]
Abstract
BACKGROUND AND OBJECTIVE Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images. METHODS In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data. RESULTS We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross-validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions. CONCLUSION Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.
Collapse
Affiliation(s)
- Qingbin Wang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Yuxuan Xiong
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Hanfeng Zhu
- School of Computer Science & Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, 430079, China
| | - Xuefeng Mu
- Department of Obstetrics and Gynecology, Remin Hospital of Wuhan University, Wuhan, 430060, China
| | - Yan Zhang
- Department of Obstetrics and Gynecology, Remin Hospital of Wuhan University, Wuhan, 430060, China
| | - Yutao Ma
- School of Computer Science & Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
33
|
Wolf D, Payer T, Lisson CS, Lisson CG, Beer M, Götz M, Ropinski T. Less is More: Selective reduction of CT data for self-supervised pre-training of deep learning models with contrastive learning improves downstream classification performance. Comput Biol Med 2024; 183:109242. [PMID: 39388839 DOI: 10.1016/j.compbiomed.2024.109242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 10/12/2024]
Abstract
BACKGROUND Self-supervised pre-training of deep learning models with contrastive learning is a widely used technique in image analysis. Current findings indicate a strong potential for contrastive pre-training on medical images. However, further research is necessary to incorporate the particular characteristics of these images. METHOD We hypothesize that the similarity of medical images hinders the success of contrastive learning in the medical imaging domain. To this end, we investigate different strategies based on deep embedding, information theory, and hashing in order to identify and reduce redundancy in medical pre-training datasets. The effect of these different reduction strategies on contrastive learning is evaluated on two pre-training datasets and several downstream classification tasks. RESULTS In all of our experiments, dataset reduction leads to a considerable performance gain in downstream tasks, e.g., an AUC score improvement from 0.78 to 0.83 for the COVID CT Classification Grand Challenge, 0.97 to 0.98 for the OrganSMNIST Classification Challenge and 0.73 to 0.83 for a brain hemorrhage classification task. Furthermore, pre-training is up to nine times faster due to the dataset reduction. CONCLUSIONS In conclusion, the proposed approach highlights the importance of dataset quality and provides a transferable approach to improve contrastive pre-training for classification downstream tasks on medical images.
Collapse
Affiliation(s)
- Daniel Wolf
- Visual Computing Research Group, Institute of Media Informatics, Ulm University, James-Franck-Ring, Ulm, 89081, Germany; Experimental Radiology Research Group, Department for Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert Einstein Allee, Ulm, 89081, Germany.
| | - Tristan Payer
- Visual Computing Research Group, Institute of Media Informatics, Ulm University, James-Franck-Ring, Ulm, 89081, Germany
| | - Catharina Silvia Lisson
- Experimental Radiology Research Group, Department for Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert Einstein Allee, Ulm, 89081, Germany
| | - Christoph Gerhard Lisson
- Experimental Radiology Research Group, Department for Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert Einstein Allee, Ulm, 89081, Germany
| | - Meinrad Beer
- Experimental Radiology Research Group, Department for Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert Einstein Allee, Ulm, 89081, Germany
| | - Michael Götz
- Experimental Radiology Research Group, Department for Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert Einstein Allee, Ulm, 89081, Germany
| | - Timo Ropinski
- Visual Computing Research Group, Institute of Media Informatics, Ulm University, James-Franck-Ring, Ulm, 89081, Germany
| |
Collapse
|
34
|
Zanini LGK, Rubira-Bullen IRF, Nunes FDLDS. Enhancing dental caries classification in CBCT images by using image processing and self-supervised learning. Comput Biol Med 2024; 183:109221. [PMID: 39378579 DOI: 10.1016/j.compbiomed.2024.109221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/21/2024] [Accepted: 09/26/2024] [Indexed: 10/10/2024]
Abstract
Diagnosing dental caries poses a significant challenge in dentistry, necessitating precise and early detection for effective management. This study utilizes Self-Supervised Learning (SSL) tasks to improve the classification of dental caries in Cone Beam Computed Tomography (CBCT) images, employing the International Caries Detection and Assessment System (ICDAS). Faced with the challenge of scarce annotated medical images, our research employs SSL to utilize unlabeled data, thereby improving model performance. We have developed a pipeline incorporating unlabeled data extraction from CBCT exams and subsequent model training using SSL tasks. A distinctive aspect of our approach is the integration of image processing techniques with SSL tasks, along with exploring the necessity for unlabeled data. Our research aims to identify the most effective image processing techniques for data extraction, the most efficient deep learning architectures for caries classification, the impact of unlabeled dataset sizes on model performance, and the comparative effectiveness of different SSL approaches in this domain. Among the tested architectures, ResNet-18, combined with the SimCLR task, demonstrated an average F1-score macro of 88.42%, Precision macro of 90.44%, and Sensitivity macro of 86.67%, reaching a 5.5% increase in F1-score compared to models using only deep learning architecture. These results suggest that SSL can significantly enhance the accuracy and efficiency of caries classification in CBCT images.
Collapse
Affiliation(s)
- Luiz Guilherme Kasputis Zanini
- Polytechnic School University of São Paulo, Av. Prof. Luciano Gualberto, 158 - Butantã, São Paulo, 05089030, São Paulo, Brazil.
| | | | | |
Collapse
|
35
|
Zhang Z, Gao Q, Fang D, Mijit A, Chen L, Li W, Wei Y. Effective automatic classification methods via deep learning for myopic maculopathy. Front Med (Lausanne) 2024; 11:1492808. [PMID: 39606624 PMCID: PMC11598530 DOI: 10.3389/fmed.2024.1492808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024] Open
Abstract
Background Pathologic myopia (PM) associated with myopic maculopathy (MM) is a significant cause of visual impairment, especially in East Asia, where its prevalence has surged. Early detection and accurate classification of myopia-related fundus lesions are critical for managing PM. Traditional clinical analysis of fundus images is time-consuming and dependent on specialist expertise, driving the need for automated, accurate diagnostic tools. Methods This study developed a deep learning-based system for classifying five types of MM using color fundus photographs. Five architectures-ResNet50, EfficientNet-B0, Vision Transformer (ViT), Contrastive Language-Image Pre-Training (CLIP), and RETFound-were utilized. An ensemble learning approach with weighted voting was employed to enhance model performance. The models were trained on a dataset of 2,159 annotated images from Shenzhen Eye Hospital, with performance evaluated using accuracy, sensitivity, specificity, F1-Score, Cohen's Kappa, and area under the receiver operating characteristic curve (AUC). Results The ensemble model achieved superior performance across all metrics, with an accuracy of 95.4% (95% CI: 93.0-97.0%), sensitivity of 95.4% (95% CI: 86.8-97.5%), specificity of 98.9% (95% CI: 97.1-99.5%), F1-Score of 95.3% (95% CI: 93.2-97.2%), Kappa value of 0.976 (95% CI: 0.957-0.989), and AUC of 0.995 (95% CI: 0.992-0.998). The voting ensemble method demonstrated robustness and high generalization ability in classifying complex lesions, outperforming individual models. Conclusion The ensemble deep learning system significantly enhances the accuracy and reliability of MM classification. This system holds potential for assisting ophthalmologists in early detection and precise diagnosis, thereby improving patient outcomes. Future work could focus on expanding the dataset, incorporating image quality assessment, and optimizing the ensemble algorithm for better efficiency and broader applicability.
Collapse
Affiliation(s)
- Zheming Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Qi Gao
- School of Future Technology, South China University of Technology, Guangzhou, China
- Pazhou Lab, Guangzhou, China
| | - Dong Fang
- Shenzhen Eye Hospital, Jinan University, Shenzhen, China
| | - Alfira Mijit
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Lu Chen
- Shenzhen Eye Hospital, Jinan University, Shenzhen, China
| | - Wangting Li
- Shenzhen Eye Hospital, Jinan University, Shenzhen, China
| | - Yantao Wei
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
36
|
Lenskjold A, Brejnebøl MW, Rose MH, Gudbergsen H, Chaudhari A, Troelsen A, Moller A, Nybing JU, Boesen M. Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort. Sci Rep 2024; 14:26782. [PMID: 39500908 PMCID: PMC11538298 DOI: 10.1038/s41598-024-75752-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024] Open
Abstract
Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20-22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.
Collapse
Affiliation(s)
- Anders Lenskjold
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark.
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark.
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark.
| | - Mathias W Brejnebøl
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
| | - Martin H Rose
- Center for Surgical Science, Zealand University Hospital, Køge, Denmark
| | - Henrik Gudbergsen
- The Parker Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Public Health, Center for General Practice, University of Copenhagen, Copenhagen, Denmark
| | - Akshay Chaudhari
- Department of Radiology, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Anders Troelsen
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
- Department of Orthopaedic Surgery, Copenhagen University Hospital Hvidovre & CAG, ROAD - Research OsteoArthritis, Hvidovre, Denmark
| | - Anne Moller
- Department of Public Health, Center for General Practice, University of Copenhagen, Copenhagen, Denmark
- Primary Health Care Research Unit, Region Zealand, Denmark
| | - Janus U Nybing
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
| | - Mikael Boesen
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
| |
Collapse
|
37
|
Sükei E, Rumetshofer E, Schmidinger N, Mayr A, Schmidt-Erfurth U, Klambauer G, Bogunović H. Multi-modal representation learning in retinal imaging using self-supervised learning for enhanced clinical predictions. Sci Rep 2024; 14:26802. [PMID: 39500979 PMCID: PMC11538269 DOI: 10.1038/s41598-024-78515-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 10/31/2024] [Indexed: 11/08/2024] Open
Abstract
Self-supervised learning has become the cornerstone of building generalizable and transferable artificial intelligence systems in medical imaging. In particular, contrastive representation learning techniques trained on large multi-modal datasets have demonstrated impressive capabilities of producing highly transferable representations for different downstream tasks. In ophthalmology, large multi-modal datasets are abundantly available and conveniently accessible as modern retinal imaging scanners acquire both 2D fundus images and 3D optical coherence tomography (OCT) scans to assess the eye. In this context, we introduce a novel multi-modal contrastive learning-based pipeline to facilitate learning joint representations for the two retinal imaging modalities. After self-supervised pre-training on 153,306 scan pairs, we show that such a pre-training framework can provide both a retrieval system and encoders that produce comprehensive OCT and fundus image representations that generalize well for various downstream tasks on three independent external datasets, explicitly focusing on clinically pertinent prediction tasks. In addition, we show that interchanging OCT with lower-cost fundus imaging can preserve the predictive power of the trained models.
Collapse
Affiliation(s)
- Emese Sükei
- OPTIMA Lab, Department of of Ophthalmology and Optometry, Medical University of Vienna, Vienna, Austria.
| | - Elisabeth Rumetshofer
- LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria
| | - Niklas Schmidinger
- LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria
| | - Andreas Mayr
- LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria
| | - Ursula Schmidt-Erfurth
- OPTIMA Lab, Department of of Ophthalmology and Optometry, Medical University of Vienna, Vienna, Austria
| | - Günter Klambauer
- LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria
| | - Hrvoje Bogunović
- OPTIMA Lab, Department of of Ophthalmology and Optometry, Medical University of Vienna, Vienna, Austria.
- Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
38
|
Busayakanon S, Kaewthamasorn M, Pinetsuksai N, Tongloy T, Chuwongin S, Boonsang S, Kittichai V. Identification of veterinary and medically important blood parasites using contrastive loss-based self-supervised learning. Vet World 2024; 17:2619-2634. [PMID: 39829660 PMCID: PMC11736362 DOI: 10.14202/vetworld.2024.2619-2634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Accepted: 10/15/2024] [Indexed: 01/22/2025] Open
Abstract
Background and Aim Zoonotic diseases caused by various blood parasites are important public health concerns that impact animals and humans worldwide. The traditional method of microscopic examination for parasite diagnosis is labor-intensive, time-consuming, and prone to variability among observers, necessitating highly skilled and experienced personnel. Therefore, an innovative approach is required to enhance the conventional method. This study aimed to develop a self-supervised learning (SSL) approach to identify zoonotic blood parasites from microscopic images, with an initial focus on parasite species classification. Materials and Methods We acquired a public dataset featuring microscopic images of Giemsa-stained thin blood films of trypanosomes and other blood parasites, including Babesia, Leishmania, Plasmodium, Toxoplasma, and Trichomonad, as well as images of both white and red blood cells. The input data were subjected to SSL model training using the Bootstrap Your Own Latent (BYOL) algorithm with Residual Network 50 (ResNet50), ResNet101, and ResNet152 as the backbones. The performance of the proposed SSL model was then compared to that of baseline models. Results The proposed BYOL SSL model outperformed supervised learning models across all classes. Among the SSL models, ResNet50 consistently achieved high accuracy, reaching 0.992 in most classes, which aligns well with the patterns observed in the pre-trained uniform manifold approximation and projection representations. Fine-tuned SSL models exhibit high performance, achieving 95% accuracy and a 0.960 area under the curve of the receiver operating characteristics (ROC) curve even when fine-tuned with 1% of the data in the downstream process. Furthermore, 20% of the data for training with SSL models yielded ≥95% in all other statistical metrics, including accuracy, recall, precision, specification, F1 score, and ROC curve. As a result, multi-class classification prediction demonstrated that model performance exceeded 91% for the F1 score, except for the early stage of Trypanosoma evansi, which showed an F1 score of 87%. This may be due to the model being exposed to high levels of variation during the developmental stage. Conclusion This approach can significantly enhance active surveillance efforts to improve disease control and prevent outbreaks, particularly in resource-limited settings. In addition, SSL addresses significant challenges, such as data variability and the requirement for extensive class labeling, which are common in biology and medical fields.
Collapse
Affiliation(s)
- Supasuta Busayakanon
- Faculty of Medicine, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Morakot Kaewthamasorn
- Department of Pathology, Center of Excellence in Veterinary Parasitology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Natchapon Pinetsuksai
- College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Teerawat Tongloy
- College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Santhad Chuwongin
- College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Siridech Boonsang
- Department of Electrical Engineering, School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Veerayuth Kittichai
- Faculty of Medicine, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| |
Collapse
|
39
|
Zeng X, Abdullah N, Sumari P. Self-supervised learning framework application for medical image analysis: a review and summary. Biomed Eng Online 2024; 23:107. [PMID: 39465395 PMCID: PMC11514943 DOI: 10.1186/s12938-024-01299-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 10/17/2024] [Indexed: 10/29/2024] Open
Abstract
Manual annotation of medical image datasets is labor-intensive and prone to biases. Moreover, the rate at which image data accumulates significantly outpaces the speed of manual annotation, posing a challenge to the advancement of machine learning, particularly in the realm of supervised learning. Self-supervised learning is an emerging field that capitalizes on unlabeled data for training, thereby circumventing the need for extensive manual labeling. This learning paradigm generates synthetic pseudo-labels through pretext tasks, compelling the network to acquire image representations in a pseudo-supervised manner and subsequently fine-tuning with a limited set of annotated data to achieve enhanced performance. This review begins with an overview of prevalent types and advancements in self-supervised learning, followed by an exhaustive and systematic examination of methodologies within the medical imaging domain from 2018 to September 2024. The review encompasses a range of medical image modalities, including CT, MRI, X-ray, Histology, and Ultrasound. It addresses specific tasks, such as Classification, Localization, Segmentation, Reduction of False Positives, Improvement of Model Performance, and Enhancement of Image Quality. The analysis reveals a descending order in the volume of related studies, with CT and MRI leading the list, followed by X-ray, Histology, and Ultrasound. Except for CT and MRI, there is a greater prevalence of studies focusing on contrastive learning methods over generative learning approaches. The performance of MRI/Ultrasound classification and all image types segmentation still has room for further exploration. Generally, this review can provide conceptual guidance for medical professionals to combine self-supervised learning with their research.
Collapse
Affiliation(s)
- Xiangrui Zeng
- School of Computer Sciences, Universiti Sains Malaysia, USM, 11800, Pulau Pinang, Malaysia.
| | - Nibras Abdullah
- Faculty of Computer Studies, Arab Open University, Jeddah, Saudi Arabia.
| | - Putra Sumari
- School of Computer Sciences, Universiti Sains Malaysia, USM, 11800, Pulau Pinang, Malaysia
| |
Collapse
|
40
|
Guo B, Chen Y, Lin J, Huang B, Bai X, Guo C, Gao B, Gong Q, Bai X. Self-supervised learning for accurately modelling hierarchical evolutionary patterns of cerebrovasculature. Nat Commun 2024; 15:9235. [PMID: 39455566 PMCID: PMC11511858 DOI: 10.1038/s41467-024-53550-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/16/2024] [Indexed: 10/28/2024] Open
Abstract
Cerebrovascular abnormalities are critical indicators of stroke and neurodegenerative diseases like Alzheimer's disease (AD). Understanding the normal evolution of brain vessels is essential for detecting early deviations and enabling timely interventions. Here, for the first time, we proposed a pipeline exploring the joint evolution of cortical volumes (CVs) and arterial volumes (AVs) in a large cohort of 2841 individuals. Using advanced deep learning for vessel segmentation, we built normative models of CVs and AVs across spatially hierarchical brain regions. We found that while AVs generally decline with age, distinct trends appear in regions like the circle of Willis. Comparing healthy individuals with those affected by AD or stroke, we identified significant reductions in both CVs and AVs, wherein patients with AD showing the most severe impact. Our findings reveal gender-specific effects and provide critical insights into how these conditions alter brain structure, potentially guiding future clinical assessments and interventions.
Collapse
Affiliation(s)
- Bin Guo
- Xiamen Key Laboratory of Psychoradiology and Neuromodulation, Department of Radiology, West China Xiamen Hospital of Sichuan University, Xiamen, China
- Image Processing Center, Beihang University, Beijing, China
| | - Ying Chen
- Image Processing Center, Beihang University, Beijing, China
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, Munich, Germany
| | - Jinping Lin
- Xiamen Key Laboratory of Psychoradiology and Neuromodulation, Department of Radiology, West China Xiamen Hospital of Sichuan University, Xiamen, China
| | - Bin Huang
- Department of Radiology, Affiliated Hospital of Guizhou Medical University, Guizhou, China
| | - Xiangzhuo Bai
- Zhongxiang Hospital of Traditional Chinese Medicine, Hubei, China
| | | | - Bo Gao
- Department of Radiology, Affiliated Hospital of Guizhou Medical University, Guizhou, China
| | - Qiyong Gong
- Xiamen Key Laboratory of Psychoradiology and Neuromodulation, Department of Radiology, West China Xiamen Hospital of Sichuan University, Xiamen, China.
- Functional and Molecular Imaging Key Laboratory of Sichuan Province, West China Hospital of Sichuan University, Chengdu, China.
- Huaxi MR Research Center (HMRRC), Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.
- Research Unit of Psychoradiology, Chinese Academy of Medical Sciences, Chengdu, China.
| | - Xiangzhi Bai
- Image Processing Center, Beihang University, Beijing, China.
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China.
- Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing, China.
| |
Collapse
|
41
|
Wang Y, Ni H, Zhou J, Liu L, Lin J, Yin M, Gao J, Zhu S, Yin Q, Zhu J, Li R. A Semi-Supervised Learning Framework for Classifying Colorectal Neoplasia Based on the NICE Classification. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2342-2353. [PMID: 38653910 PMCID: PMC11522217 DOI: 10.1007/s10278-024-01123-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/02/2024] [Accepted: 04/12/2024] [Indexed: 04/25/2024]
Abstract
Labelling medical images is an arduous and costly task that necessitates clinical expertise and large numbers of qualified images. Insufficient samples can lead to underfitting during training and poor performance of supervised learning models. In this study, we aim to develop a SimCLR-based semi-supervised learning framework to classify colorectal neoplasia based on the NICE classification. First, the proposed framework was trained under self-supervised learning using a large unlabelled dataset; subsequently, it was fine-tuned on a limited labelled dataset based on the NICE classification. The model was evaluated on an independent dataset and compared with models based on supervised transfer learning and endoscopists using accuracy, Matthew's correlation coefficient (MCC), and Cohen's kappa. Finally, Grad-CAM and t-SNE were applied to visualize the models' interpretations. A ResNet-backboned SimCLR model (accuracy of 0.908, MCC of 0.862, and Cohen's kappa of 0.896) outperformed supervised transfer learning-based models (means: 0.803, 0.698, and 0.742) and junior endoscopists (0.816, 0.724, and 0.863), while performing only slightly worse than senior endoscopists (0.916, 0.875, and 0.944). Moreover, t-SNE showed a better clustering of ternary samples through self-supervised learning in SimCLR than through supervised transfer learning. Compared with traditional supervised learning, semi-supervised learning enables deep learning models to achieve improved performance with limited labelled endoscopic images.
Collapse
Affiliation(s)
- Yu Wang
- Department of Hepatobiliary Surgery, Jintan Affiliated Hospital of Jiangsu University, Changzhou, Jiangsu, 213200, China
| | - Haoxiang Ni
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China
| | - Jielu Zhou
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Department of Geriatrics, Kowloon Affiliated Hospital of Shanghai Jiao Tong University, Suzhou, Jiangsu, 215006, China
| | - Lihe Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China
| | - Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China
| | - Minyue Yin
- Department of Gastroenterology, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, China
- National Clinical Research Center for Digestive Disease, Beijing Digestive Disease Center, State Key Laboratory of Digestive Health, Beijing, 100050, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China
| | - Shiqi Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China
| | - Qi Yin
- Department of Anesthesiology, Jintan Affiliated Hospital of Jiangsu University, Changzhou, Jiangsu, 213200, China
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China.
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China.
- Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, China.
| | - Rui Li
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, # 899 Pinghai St., Suzhou, Jiangsu, 215006, China.
- Suzhou Clinical Center of Digestive Disease, Suzhou, Jiangsu, 215006, China.
| |
Collapse
|
42
|
Chen M, Zhang M, Yin L, Ma L, Ding R, Zheng T, Yue Q, Lui S, Sun H. Medical image foundation models in assisting diagnosis of brain tumors: a pilot study. Eur Radiol 2024; 34:6667-6679. [PMID: 38627290 DOI: 10.1007/s00330-024-10728-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/08/2024] [Accepted: 03/04/2024] [Indexed: 04/23/2024]
Abstract
OBJECTIVES To build self-supervised foundation models for multicontrast MRI of the whole brain and evaluate their efficacy in assisting diagnosis of brain tumors. METHODS In this retrospective study, foundation models were developed using 57,621 enhanced head MRI scans through self-supervised learning with a pretext task of cross-contrast context restoration with two different content dropout schemes. Downstream classifiers were constructed based on the pretrained foundation models and fine-tuned for brain tumor detection, discrimination, and molecular status prediction. Metrics including accuracy, sensitivity, specificity, and area under the ROC curve (AUC) were used to evaluate the performance. Convolutional neural networks trained exclusively on downstream task data were employed for comparative analysis. RESULTS The pretrained foundation models demonstrated their ability to extract effective representations from multicontrast whole-brain volumes. The best classifiers, endowed with pretrained weights, showed remarkable performance with accuracies of 94.9, 92.3, and 80.4%, and corresponding AUC values of 0.981, 0.972, and 0.852 on independent test datasets in brain tumor detection, discrimination, and molecular status prediction, respectively. The classifiers with pretrained weights outperformed the convolutional classifiers trained from scratch by approximately 10% in terms of accuracy and AUC across all tasks. The saliency regions in the correctly predicted cases are mainly clustered around the tumors. Classifiers derived from the two dropout schemes differed significantly only in the detection of brain tumors. CONCLUSIONS Foundation models obtained from self-supervised learning have demonstrated encouraging potential for scalability and interpretability in downstream brain tumor-related tasks and hold promise for extension to neurological diseases with diffusely distributed lesions. CLINICAL RELEVANCE STATEMENT The application of our proposed method to the prediction of key molecular status in gliomas is expected to improve treatment planning and patient outcomes. Additionally, the foundation model we developed could serve as a cornerstone for advancing AI applications in the diagnosis of brain-related diseases.
Collapse
Affiliation(s)
- Mengyao Chen
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
- Huaxi MR Research Center (HMRRC), West China Hospital of Sichuan University, Chengdu, China
| | | | - Lijuan Yin
- Department of Pathology, West China Hospital of Sichuan University, Chengdu, China
| | - Lu Ma
- Department of Neurosurgery, West China Hospital of Sichuan University, Chengdu, China
| | - Renxing Ding
- IT center, West China Hospital of Sichuan University, Chengdu, China
| | - Tao Zheng
- IT center, West China Hospital of Sichuan University, Chengdu, China
| | - Qiang Yue
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
- Huaxi MR Research Center (HMRRC), West China Hospital of Sichuan University, Chengdu, China
| | - Su Lui
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
- Huaxi MR Research Center (HMRRC), West China Hospital of Sichuan University, Chengdu, China
| | - Huaiqiang Sun
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.
- Huaxi MR Research Center (HMRRC), West China Hospital of Sichuan University, Chengdu, China.
| |
Collapse
|
43
|
Kudus K, Wagner M, Ertl-Wagner BB, Khalvati F. Applications of machine learning to MR imaging of pediatric low-grade gliomas. Childs Nerv Syst 2024; 40:3027-3035. [PMID: 38972953 DOI: 10.1007/s00381-024-06522-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024]
Abstract
INTRODUCTION Machine learning (ML) shows promise for the automation of routine tasks related to the treatment of pediatric low-grade gliomas (pLGG) such as tumor grading, typing, and segmentation. Moreover, it has been shown that ML can identify crucial information from medical images that is otherwise currently unattainable. For example, ML appears to be capable of preoperatively identifying the underlying genetic status of pLGG. METHODS In this chapter, we reviewed, to the best of our knowledge, all published works that have used ML techniques for the imaging-based evaluation of pLGGs. Additionally, we aimed to provide some context on what it will take to go from the exploratory studies we reviewed to clinically deployed models. RESULTS Multiple studies have demonstrated that ML can accurately grade, type, and segment and detect the genetic status of pLGGs. We compared the approaches used between the different studies and observed a high degree of variability throughout the methodologies. Standardization and cooperation between the numerous groups working on these approaches will be key to accelerating the clinical deployment of these models. CONCLUSION The studies reviewed in this chapter detail the potential for ML techniques to transform the treatment of pLGG. However, there are still challenges that need to be overcome prior to clinical deployment.
Collapse
Affiliation(s)
- Kareem Kudus
- Neurosciences & Mental Health Research Program, The Hospital for Sick Children, Toronto, Canada
- Institute of Medical Science, University of Toronto, Toronto, Canada
| | - Matthias Wagner
- Department of Diagnostic & Interventional Radiology, The Hospital for Sick Children, Toronto, Canada
- Department of Diagnostic and Interventional Neuroradiology, University Hospital Augsburg, Augsburg, Germany
| | - Birgit Betina Ertl-Wagner
- Neurosciences & Mental Health Research Program, The Hospital for Sick Children, Toronto, Canada
- Institute of Medical Science, University of Toronto, Toronto, Canada
- Department of Diagnostic & Interventional Radiology, The Hospital for Sick Children, Toronto, Canada
- Department of Medical Imaging, University of Toronto, Toronto, Canada
| | - Farzad Khalvati
- Neurosciences & Mental Health Research Program, The Hospital for Sick Children, Toronto, Canada.
- Institute of Medical Science, University of Toronto, Toronto, Canada.
- Department of Diagnostic & Interventional Radiology, The Hospital for Sick Children, Toronto, Canada.
- Department of Medical Imaging, University of Toronto, Toronto, Canada.
- Department of Computer Science, University of Toronto, Toronto, Canada.
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada.
| |
Collapse
|
44
|
Martin E, Cook AG, Frost SM, Turner AW, Chen FK, McAllister IL, Nolde JM, Schlaich MP. Ocular biomarkers: useful incidental findings by deep learning algorithms in fundus photographs. Eye (Lond) 2024; 38:2581-2588. [PMID: 38734746 PMCID: PMC11385472 DOI: 10.1038/s41433-024-03085-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Revised: 04/03/2024] [Accepted: 04/11/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND/OBJECTIVES Artificial intelligence can assist with ocular image analysis for screening and diagnosis, but it is not yet capable of autonomous full-spectrum screening. Hypothetically, false-positive results may have unrealized screening potential arising from signals persisting despite training and/or ambiguous signals such as from biomarker overlap or high comorbidity. The study aimed to explore the potential to detect clinically useful incidental ocular biomarkers by screening fundus photographs of hypertensive adults using diabetic deep learning algorithms. SUBJECTS/METHODS Patients referred for treatment-resistant hypertension were imaged at a hospital unit in Perth, Australia, between 2016 and 2022. The same 45° colour fundus photograph selected for each of the 433 participants imaged was processed by three deep learning algorithms. Two expert retinal specialists graded all false-positive results for diabetic retinopathy in non-diabetic participants. RESULTS Of the 29 non-diabetic participants misclassified as positive for diabetic retinopathy, 28 (97%) had clinically useful retinal biomarkers. The models designed to screen for fewer diseases captured more incidental disease. All three algorithms showed a positive correlation between severity of hypertensive retinopathy and misclassified diabetic retinopathy. CONCLUSIONS The results suggest that diabetic deep learning models may be responsive to hypertensive and other clinically useful retinal biomarkers within an at-risk, hypertensive cohort. Observing that models trained for fewer diseases captured more incidental pathology increases confidence in signalling hypotheses aligned with using self-supervised learning to develop autonomous comprehensive screening. Meanwhile, non-referable and false-positive outputs of other deep learning screening models could be explored for immediate clinical use in other populations.
Collapse
Affiliation(s)
- Eve Martin
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Kensington, WA, Australia.
- School of Population and Global Health, The University of Western Australia, Crawley, Australia.
- Dobney Hypertension Centre - Royal Perth Hospital Unit, Medical School, The University of Western Australia, Perth, Australia.
- Australian e-Health Research Centre, Floreat, WA, Australia.
| | - Angus G Cook
- School of Population and Global Health, The University of Western Australia, Crawley, Australia
| | - Shaun M Frost
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Kensington, WA, Australia
- Australian e-Health Research Centre, Floreat, WA, Australia
| | - Angus W Turner
- Lions Eye Institute, Nedlands, WA, Australia
- Centre for Ophthalmology and Visual Science, The University of Western Australia, Perth, Australia
| | - Fred K Chen
- Lions Eye Institute, Nedlands, WA, Australia
- Centre for Ophthalmology and Visual Science, The University of Western Australia, Perth, Australia
- Centre for Eye Research Australia, The Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia
- Ophthalmology, Department of Surgery, The University of Melbourne, East Melbourne, VIC, Australia
- Ophthalmology Department, Royal Perth Hospital, Perth, Australia
| | - Ian L McAllister
- Lions Eye Institute, Nedlands, WA, Australia
- Centre for Ophthalmology and Visual Science, The University of Western Australia, Perth, Australia
| | - Janis M Nolde
- Dobney Hypertension Centre - Royal Perth Hospital Unit, Medical School, The University of Western Australia, Perth, Australia
- Departments of Cardiology and Nephrology, Royal Perth Hospital, Perth, Australia
| | - Markus P Schlaich
- Dobney Hypertension Centre - Royal Perth Hospital Unit, Medical School, The University of Western Australia, Perth, Australia
- Departments of Cardiology and Nephrology, Royal Perth Hospital, Perth, Australia
| |
Collapse
|
45
|
Bhattacharya D, Behrendt F, Becker BT, Maack L, Beyersdorff D, Petersen E, Petersen M, Cheng B, Eggert D, Betz C, Hoffmann AS, Schlaefer A. Self-supervised learning for classifying paranasal anomalies in the maxillary sinus. Int J Comput Assist Radiol Surg 2024; 19:1713-1721. [PMID: 38850438 PMCID: PMC11365849 DOI: 10.1007/s11548-024-03172-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 05/01/2024] [Indexed: 06/10/2024]
Abstract
PURPOSE Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus (MS). METHODS Our approach uses a 3D convolutional autoencoder (CAE) trained in an unsupervised anomaly detection (UAD) framework. Initially, we train the 3D CAE to reduce reconstruction errors when reconstructing normal maxillary sinus (MS) image. Then, this CAE is applied to an unlabelled dataset to generate coarse anomaly locations by creating residual MS images. Following this, a 3D convolutional neural network (CNN) reconstructs these residual images, which forms our SSL task. Lastly, we fine-tune the encoder part of the 3D CNN on a labelled dataset of normal and anomalous MS images. RESULTS The proposed SSL technique exhibits superior performance compared to existing generic self-supervised methods, especially in scenarios with limited annotated data. When trained on just 10% of the annotated dataset, our method achieves an area under the precision-recall curve (AUPRC) of 0.79 for the downstream classification task. This performance surpasses other methods, with BYOL attaining an AUPRC of 0.75, SimSiam at 0.74, SimCLR at 0.73 and masked autoencoding using SparK at 0.75. CONCLUSION A self-supervised learning approach that inherently focuses on localizing paranasal anomalies proves to be advantageous, particularly when the subsequent task involves differentiating normal from anomalous maxillary sinuses. Access our code at https://github.com/mtec-tuhh/self-supervised-paranasal-anomaly .
Collapse
Affiliation(s)
- Debayan Bhattacharya
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany.
- Department of Otorhinolaryngology, Head and Neck Surgery and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Finn Behrendt
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Benjamin Tobias Becker
- Department of Otorhinolaryngology, Head and Neck Surgery and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Lennart Maack
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Dirk Beyersdorff
- Clinic and Polyclinic for Diagnostic and Interventional Radiology and Nuclear Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Elina Petersen
- Population Health Research Department, University Heart and Vascular Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Marvin Petersen
- Clinic and Polyclinic for Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Bastian Cheng
- Clinic and Polyclinic for Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Dennis Eggert
- Department of Otorhinolaryngology, Head and Neck Surgery and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Christian Betz
- Department of Otorhinolaryngology, Head and Neck Surgery and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anna Sophie Hoffmann
- Department of Otorhinolaryngology, Head and Neck Surgery and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Alexander Schlaefer
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| |
Collapse
|
46
|
Zhang J, Xiao F, Zou H, Feng R, He J. Self-supervised learning-enhanced deep learning method for identifying myopic maculopathy in high myopia patients. iScience 2024; 27:110566. [PMID: 39211543 PMCID: PMC11359982 DOI: 10.1016/j.isci.2024.110566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/28/2024] [Accepted: 07/18/2024] [Indexed: 09/04/2024] Open
Abstract
Accurate detection and timely care for patients with high myopia present significant challenges. We developed a deep learning (DL) system enhanced by a self-supervised learning (SSL) approach to improve the automatic diagnosis of myopic maculopathy (MM). Using a dataset of 7,906 images from the Shanghai High Myopia Screening Project and a public validation set of 1,391 images from MMAC2023, our method significantly outperformed conventional techniques. Internally, it achieved 96.8% accuracy, 83.1% sensitivity, and 95.6% specificity, with AUC values of 0.982 and 0.999. Externally, it maintained 89.0% accuracy, 71.7% sensitivity, and 87.8% specificity, with AUC values of 0.978 and 0.973. The model's Cohen's kappa values exceeded 0.8, indicating substantial agreement with retinal experts. Our SSL-enhanced DL approach offers high accuracy and potential to enhance large-scale myopia screenings, demonstrating broader significance in improving early detection and treatment of MM.
Collapse
Affiliation(s)
- Juzhao Zhang
- Shanghai Eye Disease Prevention & Treatment Center/Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Fan Xiao
- School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, China
- Academy for Engineering and Technology, Fudan University, Shanghai, China
| | - Haidong Zou
- Shanghai Eye Disease Prevention & Treatment Center/Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Rui Feng
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
- School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, China
- Academy for Engineering and Technology, Fudan University, Shanghai, China
| | - Jiangnan He
- Shanghai Eye Disease Prevention & Treatment Center/Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| |
Collapse
|
47
|
Imagawa K, Shiomoto K. Evaluation of Effectiveness of Self-Supervised Learning in Chest X-Ray Imaging to Reduce Annotated Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1618-1624. [PMID: 38459399 PMCID: PMC11300406 DOI: 10.1007/s10278-024-00975-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/17/2023] [Accepted: 11/17/2023] [Indexed: 03/10/2024]
Abstract
A significant challenge in machine learning-based medical image analysis is the scarcity of medical images. Obtaining a large number of labeled medical images is difficult because annotating medical images is a time-consuming process that requires specialized knowledge. In addition, inappropriate annotation processes can increase model bias. Self-supervised learning (SSL) is a type of unsupervised learning method that extracts image representations. Thus, SSL can be an effective method to reduce the number of labeled images. In this study, we investigated the feasibility of reducing the number of labeled images in a limited set of unlabeled medical images. The unlabeled chest X-ray (CXR) images were pretrained using the SimCLR framework, and then the representations were fine-tuned as supervised learning for the target task. A total of 2000 task-specific CXR images were used to perform binary classification of coronavirus disease 2019 (COVID-19) and normal cases. The results demonstrate that the performance of pretraining on task-specific unlabeled CXR images can be maintained when the number of labeled CXR images is reduced by approximately 40%. In addition, the performance was significantly better than that obtained without pretraining. In contrast, a large number of pretrained unlabeled images are required to maintain performance regardless of task specificity among a small number of labeled CXR images. In summary, to reduce the number of labeled images using SimCLR, we must consider both the number of images and the task-specific characteristics of the target images.
Collapse
Affiliation(s)
- Kuniki Imagawa
- Faculty of Information Technology, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya-ku, Tokyo, 158-8557, Japan.
| | - Kohei Shiomoto
- Faculty of Information Technology, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya-ku, Tokyo, 158-8557, Japan
| |
Collapse
|
48
|
Paverd H, Zormpas-Petridis K, Clayton H, Burge S, Crispin-Ortuzar M. Radiology and multi-scale data integration for precision oncology. NPJ Precis Oncol 2024; 8:158. [PMID: 39060351 PMCID: PMC11282284 DOI: 10.1038/s41698-024-00656-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 07/15/2024] [Indexed: 07/28/2024] Open
Abstract
In this Perspective paper we explore the potential of integrating radiological imaging with other data types, a critical yet underdeveloped area in comparison to the fusion of other multi-omic data. Radiological images provide a comprehensive, three-dimensional view of cancer, capturing features that would be missed by biopsies or other data modalities. This paper explores the complexities and challenges of incorporating medical imaging into data integration models, in the context of precision oncology. We present the different categories of imaging-omics integration and discuss recent progress, highlighting the opportunities that arise from bringing together spatial data on different scales.
Collapse
Affiliation(s)
- Hania Paverd
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- Department of Oncology, University of Cambridge, Cambridge, UK
- Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK
| | | | - Hannah Clayton
- Department of Oncology, University of Cambridge, Cambridge, UK
- Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK
| | - Sarah Burge
- Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK
| | - Mireia Crispin-Ortuzar
- Department of Oncology, University of Cambridge, Cambridge, UK.
- Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK.
| |
Collapse
|
49
|
Corponi F, Li BM, Anmella G, Valenzuela-Pascual C, Mas A, Pacchiarotti I, Valentí M, Grande I, Benabarre A, Garriga M, Vieta E, Young AH, Lawrie SM, Whalley HC, Hidalgo-Mazzei D, Vergari A. Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study. JMIR Mhealth Uhealth 2024; 12:e55094. [PMID: 39018100 PMCID: PMC11292167 DOI: 10.2196/55094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 04/14/2024] [Accepted: 05/24/2024] [Indexed: 07/18/2024] Open
Abstract
BACKGROUND Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of the worldwide disease burden. However, collecting and annotating wearable data is resource intensive. Studies of this kind can thus typically afford to recruit only a few dozen patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MD detection. OBJECTIVE In this paper, we overcame this data bottleneck and advanced the detection of acute MD episodes from wearables' data on the back of recent advances in self-supervised learning (SSL). This approach leverages unlabeled data to learn representations during pretraining, subsequently exploited for a supervised task. METHODS We collected open access data sets recording with the Empatica E4 wristband spanning different, unrelated to MD monitoring, personal sensing tasks-from emotion recognition in Super Mario players to stress detection in undergraduates-and devised a preprocessing pipeline performing on-/off-body detection, sleep/wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduced E4SelfLearning, the largest-to-date open access collection, and its preprocessing pipeline. We developed a novel E4-tailored transformer (E4mer) architecture, serving as the blueprint for both SSL and fully supervised learning; we assessed whether and under which conditions self-supervised pretraining led to an improvement over fully supervised baselines (ie, the fully supervised E4mer and pre-deep learning algorithms) in detecting acute MD episodes from recording segments taken in 64 (n=32, 50%, acute, n=32, 50%, stable) patients. RESULTS SSL significantly outperformed fully supervised pipelines using either our novel E4mer or extreme gradient boosting (XGBoost): n=3353 (81.23%) against n=3110 (75.35%; E4mer) and n=2973 (72.02%; XGBoost) correctly classified recording segments from a total of 4128 segments. SSL performance was strongly associated with the specific surrogate task used for pretraining, as well as with unlabeled data availability. CONCLUSIONS We showed that SSL, a paradigm where a model is pretrained on unlabeled data with no need for human annotations before deployment on the supervised target task of interest, helps overcome the annotation bottleneck; the choice of the pretraining surrogate task and the size of unlabeled data for pretraining are key determinants of SSL success. We introduced E4mer, which can be used for SSL, and shared the E4SelfLearning collection, along with its preprocessing pipeline, which can foster and expedite future research into SSL for personal sensing.
Collapse
Affiliation(s)
- Filippo Corponi
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Bryan M Li
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| | - Gerard Anmella
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Clàudia Valenzuela-Pascual
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Ariadna Mas
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Isabella Pacchiarotti
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Marc Valentí
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Iria Grande
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Antoni Benabarre
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Marina Garriga
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Eduard Vieta
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
| | - Allan H Young
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Stephen M Lawrie
- Division of Psychiatry, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Heather C Whalley
- Division of Psychiatry, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute for Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Diego Hidalgo-Mazzei
- Bipolar and Depressive Disorders Unit, Department of Psychiatry and Psychology, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Instituto de Salud Carlos III, Madrid, Spain
- Departament de Medicina, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Antonio Vergari
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
50
|
Gryshchuk V, Singh D, Teipel S, Dyrba M. Contrastive Self-supervised Learning for Neurodegenerative Disorder Classification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.03.24309882. [PMID: 39006425 PMCID: PMC11245060 DOI: 10.1101/2024.07.03.24309882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Neurodegenerative diseases such as Alzheimer's disease (AD) or frontotemporal lobar degeneration (FTLD) involve specific loss of brain volume, detectable in vivo using T1-weighted MRI scans. Supervised machine learning approaches classifying neurodegenerative diseases require diagnostic-labels for each sample. However, it can be difficult to obtain expert labels for a large amount of data. Self-supervised learning (SSL) offers an alternative for training machine learning models without data-labels. We investigated if the SSL models can applied to distinguish between different neurodegenerative disorders in an interpretable manner. Our method comprises a feature extractor and a downstream classification head. A deep convolutional neural network trained in a contrastive self-supervised way serves as the feature extractor, learning latent representation, while the classifier head is a single-layer perceptron. We used N=2694 T1-weighted MRI scans from four data cohorts: two ADNI datasets, AIBL and FTLDNI, including cognitively normal controls (CN), cases with prodromal and clinical AD, as well as FTLD cases differentiated into its sub-types. Our results showed that the feature extractor trained in a self-supervised way provides generalizable and robust representations for the downstream classification. For AD vs. CN, our model achieves 82% balanced accuracy on the test subset and 80% on an independent holdout dataset. Similarly, the Behavioral variant of frontotemporal dementia (BV) vs. CN model attains an 88% balanced accuracy on the test subset. The average feature attribution heatmaps obtained by the Integrated Gradient method highlighted hallmark regions, i.e., temporal gray matter atrophy for AD, and insular atrophy for BV. In conclusion, our models perform comparably to state-of-the-art supervised deep learning approaches. This suggests that the SSL methodology can successfully make use of unannotated neuroimaging datasets as training data while remaining robust and interpretable.
Collapse
|