1
|
Li Y, Huang J, Zhang Y, Deng J, Zhang J, Dong L, Wang D, Mei L, Lei C. Dual branch segment anything model-transformer fusion network for accurate breast ultrasound image segmentation. Med Phys 2025; 52:4108-4119. [PMID: 40103542 DOI: 10.1002/mp.17751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 02/14/2025] [Accepted: 02/16/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND Precise and rapid ultrasound-based breast cancer diagnosis is essential for effective treatment. However, existing ultrasound image segmentation methods often fail to capture both global contextual features and fine-grained boundary details. PURPOSE This study proposes a dual-branch network architecture that combines the Swin Transformer and Segment Anything Model (SAM) to enhance breast ultrasound image (BUSI) segmentation accuracy and reliability. METHODS Our network integrates the global attention mechanism of the Swin Transformer with fine-grained boundary detection from SAM through a multi-stage feature fusion module. We evaluated our method against state-of-the-art methods on two datasets: the Breast Ultrasound Images dataset from Wuhan University (BUSI-WHU), which contains 927 images (560 benign and 367 malignant) with ground truth masks annotated by radiologists, and the public BUSI dataset. Performance was evaluated using mean Intersection-over-Union (mIoU), 95th percentile Hausdorff Distance (HD95) and Dice Similarity coefficients, with statistical significance assessed using two-tailed independent t-tests with Holm-Bonferroni correction (α = 0.05 $\alpha =0.05$ ). RESULTS On our proposed dataset, the network achieved a mIoU of 90.82% and a HD95 of 23.50 pixels, demonstrating significant improvements over current state-of-the-art methods with effect sizes for mIoU ranging from 0.38 to 0.61 (p < $<$ 0.05). On the BUSI dataset, the network achieved a mIoU of 82.83% and a HD95 of 71.13 pixels, demonstrating comparable improvements with effect sizes for mIoU ranging from 0.45 to 0.58 (p < $<$ 0.05). CONCLUSIONS Our dual-branch network leverages the complementary strengths of Swin Transformer and SAM through a fusion mechanism, demonstrating superior breast ultrasound segmentation performance. Our code is publicly available at https://github.com/Skylanding/DSATNet.
Collapse
Affiliation(s)
- Yu Li
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Jin Huang
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Yimin Zhang
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jingwen Deng
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jingwen Zhang
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Lan Dong
- The Department of Gynecology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Du Wang
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Liye Mei
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
- School of Computer Science, Hubei University of Technology, Wuhan, China
| | - Cheng Lei
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
- Suzhou Institute of Wuhan University, Suzhou, China
- Shenzhen Institute of Wuhan University, Shenzhen, China
| |
Collapse
|
2
|
Jannatdoust P, Valizadeh P, Saeedi N, Valizadeh G, Salari HM, Saligheh Rad H, Gity M. Computer-Aided Detection (CADe) and Segmentation Methods for Breast Cancer Using Magnetic Resonance Imaging (MRI). J Magn Reson Imaging 2025; 61:2376-2390. [PMID: 39781684 DOI: 10.1002/jmri.29687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 01/12/2025] Open
Abstract
Breast cancer continues to be a major health concern, and early detection is vital for enhancing survival rates. Magnetic resonance imaging (MRI) is a key tool due to its substantial sensitivity for invasive breast cancers. Computer-aided detection (CADe) systems enhance the effectiveness of MRI by identifying potential lesions, aiding radiologists in focusing on areas of interest, extracting quantitative features, and integrating with computer-aided diagnosis (CADx) pipelines. This review aims to provide a comprehensive overview of the current state of CADe systems in breast MRI, focusing on the technical details of pipelines and segmentation models including classical intensity-based methods, supervised and unsupervised machine learning (ML) approaches, and the latest deep learning (DL) architectures. It highlights recent advancements from traditional algorithms to sophisticated DL models such as U-Nets, emphasizing CADe implementation of multi-parametric MRI acquisitions. Despite these advancements, CADe systems face challenges like variable false-positive and negative rates, complexity in interpreting extensive imaging data, variability in system performance, and lack of large-scale studies and multicentric models, limiting the generalizability and suitability for clinical implementation. Technical issues, including image artefacts and the need for reproducible and explainable detection algorithms, remain significant hurdles. Future directions emphasize developing more robust and generalizable algorithms, integrating explainable AI to improve transparency and trust among clinicians, developing multi-purpose AI systems, and incorporating large language models to enhance diagnostic reporting and patient management. Additionally, efforts to standardize and streamline MRI protocols aim to increase accessibility and reduce costs, optimizing the use of CADe systems in clinical practice. LEVEL OF EVIDENCE: NA TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Payam Jannatdoust
- School of Medicine, Tehran University of Medical Science, Tehran, Iran
| | - Parya Valizadeh
- School of Medicine, Tehran University of Medical Science, Tehran, Iran
| | - Nikoo Saeedi
- Student Research Committee, Islamic Azad University, Mashhad Branch, Mashhad, Iran
| | - Gelareh Valizadeh
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hanieh Mobarak Salari
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hamidreza Saligheh Rad
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
- Department of Medical Physics and Biomedical Engineering, Tehran University of Medical Sciences, Tehran, Iran
| | - Masoumeh Gity
- Advanced Diagnostic and Interventional Radiology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
3
|
Synek A, Benca E, Licandro R, Hirtler L, Pahr DH. Predicting strength of femora with metastatic lesions from single 2D radiographic projections using convolutional neural networks. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108724. [PMID: 40174318 DOI: 10.1016/j.cmpb.2025.108724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 03/02/2025] [Accepted: 03/14/2025] [Indexed: 04/04/2025]
Abstract
BACKGROUND AND OBJECTIVE Patients with metastatic bone disease are at risk of pathological femoral fractures and may require prophylactic surgical fixation. Current clinical decision support tools often overestimate fracture risk, leading to overtreatment. While novel scores integrating femoral strength assessment via finite element (FE) models show promise, they require 3D imaging, extensive computation, and are difficult to automate. Predicting femoral strength directly from single 2D radiographic projections using convolutional neural networks (CNNs) could address these limitations, but this approach has not yet been explored for femora with metastatic lesions. This study aimed to test whether CNNs can accurately predict strength of femora with metastatic lesions from single 2D radiographic projections. METHODS CNNs with various architectures were developed and trained using an FE model generated training dataset. This training dataset was based on 36,000 modified computed tomography (CT) scans, created by randomly inserting artificial lytic lesions into the CT scans of 36 intact anatomical femoral specimens. From each modified CT scan, an anterior-posterior 2D projection was generated and femoral strength in one-legged stance was determined using nonlinear FE models. Following training, the CNN performance was evaluated on an independent experimental test dataset consisting of 31 anatomical femoral specimens (16 intact, 15 with artificial lytic lesions). 2D projections of each specimen were created from corresponding CT scans and femoral strength was assessed in mechanical tests. The CNNs' performance was evaluated using linear regression analysis and compared to 2D densitometric predictors (bone mineral density and content) and CT-based 3D FE models. RESULTS All CNNs accurately predicted the experimentally measured strength in femora with and without metastatic lesions of the test dataset (R²≥0.80, CCC≥0.81). In femora with metastatic lesions, the performance of the CNNs (best: R²=0.84, CCC=0.86) was considerably superior to 2D densitometric predictors (R²≤0.07) and slightly inferior to 3D FE models (R²=0.90, CCC=0.94). CONCLUSIONS CNNs, trained on a large dataset generated via FE models, predicted experimentally measured strength of femora with artificial metastatic lesions with accuracy comparable to 3D FE models. By eliminating the need for 3D imaging and reducing computational demands, this novel approach demonstrates potential for application in a clinical setting.
Collapse
Affiliation(s)
- Alexander Synek
- Institute of Lightweight Design and Structural Biomechanics, TU Wien, Gumpendorfer Straße 7, 1060 Vienna, Austria.
| | - Emir Benca
- Department of Orthopedics and Trauma-Surgery, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria
| | - Roxane Licandro
- Department of Biomedical Imaging and Image-guided Therapy, Computational Imaging Research Lab (CIR), Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria
| | - Lena Hirtler
- Center for Anatomy and Cell Biology, Medical University of Vienna, Währinger Straße 13, 1090 Vienna, Austria
| | - Dieter H Pahr
- Institute of Lightweight Design and Structural Biomechanics, TU Wien, Gumpendorfer Straße 7, 1060 Vienna, Austria
| |
Collapse
|
4
|
Li G, Ge H, Jiang Y, Zhang Y, Jin X. Non-destructive detection of early wheat germination via deep learning-optimized terahertz imaging. PLANT METHODS 2025; 21:75. [PMID: 40448208 DOI: 10.1186/s13007-025-01393-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2025] [Accepted: 05/15/2025] [Indexed: 06/02/2025]
Abstract
Wheat, a major global cereal crop, is prone to quality degradation from early sprouting when stored improperly, resulting in substantial economic losses. Traditional methods for detecting early sprouting are labor-intensive and destructive, underscoring the need for rapid, non-destructive alternatives. Terahertz (THz) technology provides a promising solution due to its ability to perform non-invasive imaging of internal structures. However, current THz imaging techniques are limited by low image resolution, which restricts their practical application. We address these challenges by proposing an advanced deep learning framework for THz image classification of early sprouting wheat. We first develop an Enhanced Super-Resolution Generative Adversarial Network (AESRGAN) to improve the resolution of THz images, integrating an attention mechanism to focus on critical image regions. This model achieves a 0.76 dB improvement in Peak Signal-to-Noise Ratio (PSNR). Subsequently, we introduce the EfficientViT-based YOLO V8 classification model, incorporating a Depthwise Separable Attention (C2F-DSA) module, and further optimize the model using the Gazelle Optimization Algorithm (GOA). Experimental results demonstrate the GOA-EViTDSA-YOLO model achieves an accuracy of 97.5% and a mean Average Precision (mAP) of 0.962. The model is efficient and significantly enhances the classification of early sprouting wheat compared to other deep learning models.
Collapse
Affiliation(s)
- Guangming Li
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China
- Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Zhengzhou, 450001, China
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Hongyi Ge
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China
- Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Zhengzhou, 450001, China
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Yuying Jiang
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China.
- Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Zhengzhou, 450001, China.
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China.
| | - Yuan Zhang
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China.
- Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Zhengzhou, 450001, China.
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China.
| | - Xi Jin
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China
- Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Zhengzhou, 450001, China
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| |
Collapse
|
5
|
Darbari Kaul R, Zhong W, Liu S, Azemi G, Liang K, Zou E, Sacks PL, Thiel C, Campbell RG, Kalish L, Sacks R, Di Ieva A, Harvey RJ. Development of an Open-Source Algorithm for Automated Segmentation in Clinician-Led Paranasal Sinus Radiologic Research. Laryngoscope 2025. [PMID: 40421828 DOI: 10.1002/lary.32292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 04/23/2025] [Accepted: 05/05/2025] [Indexed: 05/28/2025]
Abstract
OBJECTIVE Artificial Intelligence (AI) research needs to be clinician led; however, expertise typically lies outside their skill set. Collaborations exist but are often commercially driven. Free and open-source computational algorithms and software expertise are required for meaningful clinically driven AI medical research. Deep learning algorithms automate segmenting regions of interest for analysis and clinical translation. Numerous studies have automatically segmented paranasal sinus computed tomography (CT) scans; however, openly accessible algorithms capturing the sinonasal cavity remain scarce. The purpose of this study was to validate and provide an open-source segmentation algorithm for paranasal sinus CTs for the otolaryngology research community. METHODS A cross-sectional comparative study was conducted with a deep learning algorithm, UNet++, modified for automatic segmentation of paranasal sinuses CTs and "ground-truth" manual segmentations. A dataset of 100 paranasal sinuses scans was manually segmented, with an 80/20 training/testing split. The algorithm is available at https://github.com/rheadkaul/SinusSegment. Primary outcomes included the Dice similarity coefficient (DSC) score, Intersection over Union (IoU), Hausdorff distance (HD), sensitivity, specificity, and visual similarity grading. RESULTS Twenty scans representing 7300 slices were assessed. The mean DSC was 0.87 and IoU 0.80, with HD 33.61 mm. The mean sensitivity was 83.98% and specificity 99.81%. The median visual similarity grading score was 3 (good). There were no statistically significant differences in outcomes with normal or diseased paranasal sinus CTs. CONCLUSION Automatic segmentation of CT paranasal sinuses yields good results when compared with manual segmentation. This study provides an open-source segmentation algorithm as a foundation and gateway for more complex AI-based analysis of large datasets. LEVEL OF EVIDENCE: 3
Collapse
Affiliation(s)
- Rhea Darbari Kaul
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Wenjin Zhong
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, Australia
| | - Sidong Liu
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, Australia
| | - Ghasem Azemi
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Kate Liang
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
| | - Emma Zou
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
| | - Peta-Lee Sacks
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
| | - Cedric Thiel
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
| | - Raewyn Gay Campbell
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
- Department of Otolaryngology Head and Neck Surgery, Royal Prince Alfred Hospital, Sydney, Australia
| | - Larry Kalish
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, Australia
- Department of Otolaryngology, Head and Neck Surgery, Concord General Hospital, University of Sydney, Sydney, Australia
| | - Raymond Sacks
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, Australia
- Department of Otolaryngology, Head and Neck Surgery, Concord General Hospital, University of Sydney, Sydney, Australia
| | - Antonio Di Ieva
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Richard John Harvey
- Rhinology and Skull Base Research Group, Applied Medical Research Centre, University of New South Wales, Sydney, Australia
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
- School of Clinical Medicine, St Vincent's Healthcare Clinical Campus, Faculty of Medicine and Health, UNSW Sydney, Australia
| |
Collapse
|
6
|
Kot WY, Au Yeung SY, Leung YY, Leung PH, Yang WF. Evolution of deep learning tooth segmentation from CT/CBCT images: a systematic review and meta-analysis. BMC Oral Health 2025; 25:800. [PMID: 40420051 PMCID: PMC12107724 DOI: 10.1186/s12903-025-05984-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Accepted: 04/10/2025] [Indexed: 05/28/2025] Open
Abstract
BACKGROUND Deep learning has been utilized to segment teeth from computed tomography (CT) or cone-beam CT (CBCT). However, the performance of deep learning is unknown due to multiple models and diverse evaluation metrics. This systematic review and meta-analysis aims to evaluate the evolution and performance of deep learning in tooth segmentation. METHODS We systematically searched PubMed, Web of Science, Scopus, IEEE Xplore, arXiv.org, and ACM for studies investigating deep learning in human tooth segmentation from CT/CBCT. Included studies were assessed using the Quality Assessment of Diagnostic Accuracy Study (QUADAS-2) tool. Data were extracted for meta-analyses by random-effects models. RESULTS A total of 30 studies were included in the systematic review, and 28 of them were included for meta-analyses. Various deep learning algorithms were categorized according to the backbone network, encompassing single-stage convolutional models, convolutional models with U-Net architecture, Transformer models, convolutional models with attention mechanisms, and combinations of multiple models. Convolutional models with U-Net architecture were the most commonly used deep learning algorithms. The integration of attention mechanism within convolutional models has become a new topic. 29 evaluation metrics were identified, with Dice Similarity Coefficient (DSC) being the most popular. The pooled results were 0.93 [0.93, 0.93] for DSC, 0.86 [0.85, 0.87] for Intersection over Union (IoU), 0.22 [0.19, 0.24] for Average Symmetric Surface Distance (ASSD), 0.92 [0.90, 0.94] for sensitivity, 0.71 [0.26, 1.17] for 95% Hausdorff distance, and 0.96 [0.93, 0.98] for precision. No significant difference was observed in the segmentation of single-rooted or multi-rooted teeth. No obvious correlation between sample size and segmentation performance was observed. CONCLUSIONS Multiple deep learning algorithms have been successfully applied to tooth segmentation from CT/CBCT and their evolution has been well summarized and categorized according to their backbone structures. In future, studies are needed with standardized protocols and open labelled datasets.
Collapse
Affiliation(s)
- Wai Ying Kot
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Sum Yin Au Yeung
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Yin Yan Leung
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Pui Hang Leung
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
- Division of Oral & Maxillofacial Surgery, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Wei-Fa Yang
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China.
- Division of Oral & Maxillofacial Surgery, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
7
|
Zhu Y, Wang X, Liu T, Fu Y. Multi-perspective dynamic consistency learning for semi-supervised medical image segmentation. Sci Rep 2025; 15:18266. [PMID: 40415094 DOI: 10.1038/s41598-025-03124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Accepted: 05/19/2025] [Indexed: 05/27/2025] Open
Abstract
Semi-supervised learning (SSL) is an effective method for medical image segmentation as it alleviates the dependence on clinical pixel-level annotations. Among the SSL methods, pseudo-labels and consistency regularization play a key role as the dominant paradigm. However, current consistency regularization methods based on shared encoder structures are prone to trap the model in cognitive bias, which impairs the segmentation performance. Furthermore, traditional fixed-threshold-based pseudo-label selection methods lack the utilization of low-confidence pixels, making the model's initial segmentation capability insufficient, especially for confusing regions. To this end, we propose a multi-perspective dynamic consistency (MPDC) framework to mitigate model cognitive bias and to fully utilize the low-confidence pixels. Specially, we propose a novel multi-perspective collaborative learning strategy that encourages the sub-branch networks to learn discriminative features from multiple perspectives, thus avoiding the problem of model cognitive bias and enhancing boundary perception. In addition, we further employ a dynamic decoupling consistency scheme to fully utilize low-confidence pixels. By dynamically adjusting the threshold, more pseudo-labels are involved in the early stages of training. Extensive experiments on several challenging medical image segmentation datasets show that our method achieves state-of-the-art performance, especially on boundaries, with significant improvements.
Collapse
Affiliation(s)
- Yongfa Zhu
- College of Computer Science and Technology, Beihua University, Jilin, 132013, China
| | - Xue Wang
- College of Computer Science and Technology, Beihua University, Jilin, 132013, China.
| | - Taihui Liu
- College of Computer Science and Technology, Beihua University, Jilin, 132013, China
| | - Yongkang Fu
- College of Computer Science and Technology, Beihua University, Jilin, 132013, China
| |
Collapse
|
8
|
Padovani Ederli R, Vega-Oliveros DA, Soriano-Vargas A, Rocha A, Dias Z. Time-series visual representations for sleep stages classification. PLoS One 2025; 20:e0323689. [PMID: 40397888 DOI: 10.1371/journal.pone.0323689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 04/13/2025] [Indexed: 05/23/2025] Open
Abstract
Polysomnography is the standard method for sleep stage classification; however, it is costly and requires controlled environments, which can disrupt natural sleep patterns. Smartwatches offer a practical, non-invasive, and cost-effective alternative for sleep monitoring. Equipped with multiple sensors, smartwatches allow continuous data collection in home environments, making them valuable for promoting health and improving sleep habits. Traditional methods for sleep stage classification using smartwatch data often rely on raw data or extracted features combined with artificial intelligence techniques. Transforming time series into visual representations enables the application of two-dimensional convolutional neural networks, which excel in classification tasks. Despite their success in other domains, these methods are underexplored for sleep stage classification. To address this, we evaluated visual representations of time series data collected from accelerometer and heart rate sensors in smartwatches. Techniques such as Gramian Angular Field, Recurrence Plots, Markov Transition Field, and spectrograms were implemented. Additionally, image patching and ensemble methods were applied to enhance classification performance. The results demonstrated that Gramian Angular Field, combined with patching and ensembles, achieved superior performance, exceeding 82% balanced accuracy for two-stage classification and 62% for three-stage classification. A comparison with traditional approaches, conducted under identical conditions, showed that the proposed method outperformed others, offering improvements of up to 8 percentage points in two-stage classification and 9 percentage points in three-stage classification. These findings show that visual representations effectively capture key sleep patterns, enhancing classification accuracy and enabling more reliable health monitoring and earlier interventions. This study highlights that visual representations not only surpass traditional methods but also emerge as a competitive and effective approach for sleep stage classification based on smartwatch data, paving the way for future research.
Collapse
Affiliation(s)
| | - Didier A Vega-Oliveros
- Department of Science and Technology, Federal University of Sao Paulo (Unifesp), São José dos Campos, SP, Brazil
| | - Aurea Soriano-Vargas
- Departamento Académico de Ciencia de Computación y Datos, Universidad de Ingeniería y Tecnología (UTEC), Peru
| | - Anderson Rocha
- Institute of Computing, University of Campinas (Unicamp), Campinas, SP, Brazil
| | - Zanoni Dias
- Institute of Computing, University of Campinas (Unicamp), Campinas, SP, Brazil
| |
Collapse
|
9
|
Gomaa A, Huang Y, Stephan P, Breininger K, Frey B, Dörfler A, Schnell O, Delev D, Coras R, Donaubauer AJ, Schmitter C, Stritzelberger J, Semrau S, Maier A, Bayer S, Schönecker S, Heiland DH, Hau P, Gaipl US, Bert C, Fietkau R, Schmidt MA, Putz F. A self-supervised multimodal deep learning approach to differentiate post-radiotherapy progression from pseudoprogression in glioblastoma. Sci Rep 2025; 15:17133. [PMID: 40382400 DOI: 10.1038/s41598-025-02026-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 05/08/2025] [Indexed: 05/20/2025] Open
Abstract
Accurate differentiation of pseudoprogression (PsP) from True Progression (TP) following radiotherapy (RT) in glioblastoma patients is crucial for optimal treatment planning. However, this task remains challenging due to the overlapping imaging characteristics of PsP and TP. This study therefore proposes a multimodal deep-learning approach utilizing complementary information from routine anatomical MR images, clinical parameters, and RT treatment planning information for improved predictive accuracy. The approach utilizes a self-supervised Vision Transformer (ViT) to encode multi-sequence MR brain volumes to effectively capture both global and local context from the high dimensional input. The encoder is trained in a self-supervised upstream task on unlabeled glioma MRI datasets from the open BraTS2021, UPenn-GBM, and UCSF-PDGM datasets (n = 2317 MRI studies) to generate compact, clinically relevant representations from FLAIR and T1 post-contrast sequences. These encoded MR inputs are then integrated with clinical data and RT treatment planning information through guided cross-modal attention, improving progression classification accuracy. This work was developed using two datasets from different centers: the Burdenko Glioblastoma Progression Dataset (n = 59) for training and validation, and the GlioCMV progression dataset from the University Hospital Erlangen (UKER) (n = 20) for testing. The proposed method achieved competitive performance, with an AUC of 75.3%, outperforming the current state-of-the-art data-driven approaches. Importantly, the proposed approach relies solely on readily available anatomical MRI sequences, clinical data, and RT treatment planning information, enhancing its clinical feasibility. The proposed approach addresses the challenge of limited data availability for PsP and TP differentiation and could allow for improved clinical decision-making and optimized treatment plans for glioblastoma patients.
Collapse
Affiliation(s)
- Ahmed Gomaa
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany.
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany.
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany.
| | - Yixing Huang
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Pluvio Stephan
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Katharina Breininger
- Center for Artificial Intelligence and Data Science, Universität Würzburg, Würzburg, 97074, Germany
| | - Benjamin Frey
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
| | - Arnd Dörfler
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Institute of Neuroradiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Oliver Schnell
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Department of Neurosurgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Daniel Delev
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Department of Neurosurgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Roland Coras
- Department of Neurosurgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Anna-Jasmina Donaubauer
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Charlotte Schmitter
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Jenny Stritzelberger
- Department of Neurology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Sabine Semrau
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Siming Bayer
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Schönecker
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
- Department of Radiation Oncology, University Hospital Ludwig Maximilian University of Munich, 81377, Munich, Germany
| | - Dieter H Heiland
- Translational Neurosurgery, Alexander-Friedrich-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Department of Neurosurgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Department of Neurological Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Peter Hau
- Department of Neurology, University Hospital Regensburg, Regensburg, Germany
- Wilhelm Sander-NeuroOncology Unit, University Hospital Regensburg, Regensburg, Germany
| | - Udo S Gaipl
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Christoph Bert
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Rainer Fietkau
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| | - Manuel A Schmidt
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
- Institute of Neuroradiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - Florian Putz
- Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
- Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, 91054, Germany
- Bavarian Cancer Research Center (BZKF), Erlangen, 91052, Germany
| |
Collapse
|
10
|
Mao YC, Lin YJ, Hu JP, Liu ZY, Chen SL, Chen CA, Chen TY, Li KC, Wang LH, Tu WC, Abu PAR. Automated Caries Detection Under Dental Restorations and Braces Using Deep Learning. Bioengineering (Basel) 2025; 12:533. [PMID: 40428152 PMCID: PMC12108948 DOI: 10.3390/bioengineering12050533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2025] [Revised: 05/07/2025] [Accepted: 05/14/2025] [Indexed: 05/29/2025] Open
Abstract
In the dentistry field, dental caries is a common issue affecting all age groups. The presence of dental braces and dental restoration makes the detection of caries more challenging. Traditionally, dentists rely on visual examinations to diagnose caries under restoration and dental braces, which can be prone to errors and are time-consuming. This study proposes an innovative deep learning and image processing-based approach for automated caries detection under restoration and dental braces, aiming to reduce the clinical burden on dental practitioners. The contributions of this research are summarized as follows: (1) YOLOv8 was employed to detect individual teeth in bitewing radiographs, and a rotation-aware segmentation method was introduced to handle angular variations in BW. The method achieved a sensitivity of 99.40% and a recall of 98.5%. (2) Using the original unprocessed images, AlexNet achieved an accuracy of 95.83% for detecting caries under restoration and dental braces. By incorporating the image processing techniques developed in this study, the accuracy of Inception-v3 improved to a maximum of 99.17%, representing a 3.34% increase over the baseline. (3) In clinical evaluation scenarios, the proposed AlexNet-based model achieved a specificity of 99.94% for non-caries cases and a precision of 99.99% for detecting caries under restoration and dental braces. All datasets used in this study were obtained with IRB approval (certificate number: 02002030B0). A total of 505 bitewing radiographs were collected from Chang Gung Memorial Hospital in Taoyuan, Taiwan. Patients with a history of the human immunodeficiency virus (HIV) were excluded from the dataset. The proposed system effectively identifies caries under restoration and dental braces, strengthens the dentist-patient relationship, and reduces dentist time during clinical consultations.
Collapse
Affiliation(s)
- Yi-Cheng Mao
- Department of Operative Dentistry, Taoyuan Chang Gang Memorial Hospital, Taoyuan City 33305, Taiwan;
| | - Yuan-Jin Lin
- Department of Program on Semiconductor Manufacturing Technology, Academy of Innovative Semiconductor and Sustainable Manufacturing, National Cheng Kung University, Tainan City 701401, Taiwan;
| | - Jen-Peng Hu
- Department of Electronic Engineering, Chung Yuan Christian University, Taoyuan City 32023, Taiwan; (J.-P.H.); (Z.-Y.L.); (S.-L.C.)
| | - Zi-Yu Liu
- Department of Electronic Engineering, Chung Yuan Christian University, Taoyuan City 32023, Taiwan; (J.-P.H.); (Z.-Y.L.); (S.-L.C.)
| | - Shih-Lun Chen
- Department of Electronic Engineering, Chung Yuan Christian University, Taoyuan City 32023, Taiwan; (J.-P.H.); (Z.-Y.L.); (S.-L.C.)
| | - Chiung-An Chen
- Department of Electrical Engineering, Ming Chi University of Technology, New Taipei City 243303, Taiwan
| | - Tsung-Yi Chen
- Department of Electronic Engineering, Feng Chia University, Taichung City 40724, Taiwan;
| | - Kuo-Chen Li
- Department of Information Management, Chung Yuan Christian University, Taoyuan City 320317, Taiwan
| | - Liang-Hung Wang
- Department of Microelectronics, College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China;
| | - Wei-Chen Tu
- Department of Electrical Engineering, National Cheng Kung University, Tainan City 701401, Taiwan;
| | - Patricia Angela R. Abu
- Ateneo Laboratory for Intelligent Visual Environments, Department of Information Systems and Computer Science, Ateneo de Manila University, Quezon City 1108, Philippines;
| |
Collapse
|
11
|
Wakao H, Iizuka T, Shimizu A. Improvements in dementia classification for brain SPECT volumes using vision transformer and the Brodmann areas. Int J Comput Assist Radiol Surg 2025:10.1007/s11548-025-03365-6. [PMID: 40343640 DOI: 10.1007/s11548-025-03365-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 03/21/2025] [Indexed: 05/11/2025]
Abstract
PURPOSE This study proposes a vision transformer (ViT)-based model for dementia classification, able to classify representative dementia with Alzheimer's disease, dementia with Lewy bodies, frontotemporal dementia, and healthy controls using brain single-photon emission computed tomography (SPECT) images. The proposed method allows for an input based on the anatomical structure of the brain and the efficient use of five different SPECT images. METHODS The proposed model comprises a linear projection of input patches, eight transformer encoder layers, and a multilayered perceptron for classification with the following features: 1. diverse feature extraction with a multi-head structure for five different SPECT images; 2. Brodmann area-based input patch reflecting the anatomical structure of the brain; 3. cross-attention to fusion of diverse features. RESULTS The proposed method achieved a classification accuracy of 85.89% for 418 SPECT images from real clinical cases, significantly outperforming previous studies. Ablation studies were conducted to investigate the validity of each contribution, in which the consistency between the model's attention map and the physician's attention region was analyzed in detail. CONCLUSION The proposed ViT-based model demonstrated superior dementia classification accuracy compared to previous methods, and is thus expected to contribute to early diagnosis and treatment of dementia using SPECT imaging. In the future, we aim to further improve the accuracy through the incorporation of patient clinical information.
Collapse
Affiliation(s)
- Hirotaka Wakao
- Institute of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan.
| | | | - Akinobu Shimizu
- Institute of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan.
| |
Collapse
|
12
|
Liu S, Lin Y, Yan R, Wang Z, Bold D, Hu X. Leveraging Artificial Intelligence for Digital Symptom Management in Oncology: The Development of CRCWeb. JMIR Cancer 2025; 11:e68516-e68516. [PMID: 40324958 DOI: 10.2196/68516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2025] Open
Abstract
UNSTRUCTURED Digital health interventions offer promise for scalable and accessible healthcare, but access is still limited by some participatory challenges, especially for disadvantaged families facing limited health literacy, language barriers, low income, or living in marginalized areas. These issues are particularly pronounced for colorectal cancer (CRC) patients, who often experience distressing symptoms and struggle with educational materials due to complex jargon, fatigue, or reading level mismatches. To address these issues, we developed and assessed the feasibility of a digital health platform, CRCWeb, to improve the accessibility of educational resources on symptom management for disadvantaged CRC patients and their caregivers facing limited health literacy or low income. CRCWeb was developed through a stakeholder-centered participatory design approach. Two-phase semi-structured interviews with patients, caregivers, and oncology experts informed the iterative design process. From the interviews, we developed the following five key design principles: user-friendly navigation, multimedia integration, concise and clear content, enhanced accessibility for individuals with vision and reading disabilities, and scalability for future content expansion. Initial feedback from iterative stakeholder engagements confirmed high user satisfaction, with participants rating CRCWeb an average of 3.98 out of 5 on the post-intervention survey. Additionally, using GenAI tools, including large language models (LLMs) like ChatGPT and multimedia generation tools such as Pictory, complex healthcare guidelines were transformed into concise, easily comprehensible multimedia content, and made accessible through CRCWeb. User engagement was notably higher among disadvantaged participants with limited health literacy or low income, who logged into the platform 2.52 times more frequently than non-disadvantaged participants. The structured development approach of CRCWeb demonstrates that GenAI-powered multimedia interventions can effectively address healthcare accessibility barriers faced by disadvantaged CRC patients and caregivers with limited health literacy or low income. This structured approach highlights how digital innovations can enhance healthcare. INTERNATIONAL REGISTERED REPORT RR2-10.2196/48499.
Collapse
Affiliation(s)
- Sizuo Liu
- Department of Computer Science, Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, US
| | - Yufen Lin
- Nell Hodgson Woodruff School of Nursing, Winship Cancer Institute, Emory University, Atlanta, US
| | - Runze Yan
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GE
| | - Zhiyuan Wang
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, US
| | - Delgersuren Bold
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GE
| | - Xiao Hu
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, 1520 Clifton Rd, Atlanta, US
| |
Collapse
|
13
|
Zhong Z, Li J, Sollee J, Collins S, Bai H, Zhang P, Healey T, Atalay M, Gao X, Jiao Z. Multi-Modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation. IEEE J Biomed Health Inform 2025; 29:3293-3303. [PMID: 38905090 DOI: 10.1109/jbhi.2024.3417849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes a Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that focuses on high-risk regions. By learning spatial correlation in the detector, MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. The visual features of each region are embedded using a novel survival attention mechanism, offering spatially and risk-aware features for sentence encoding while maintaining global coherence across tasks. A cross-domain LLMs-Alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologists. Multi-center experiments validate the overall performance and each module's composition within the model, encouraging further advancements in radiology report generation research emphasizing clinical interpretation and trustworthiness in AI models applied to medical studies.
Collapse
|
14
|
Lou M, Ying H, Liu X, Zhou HY, Zhang Y, Yu Y. SDR-Former: A Siamese Dual-Resolution Transformer for liver lesion classification using 3D multi-phase imaging. Neural Netw 2025; 185:107228. [PMID: 39908910 DOI: 10.1016/j.neunet.2025.107228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 12/28/2024] [Accepted: 01/27/2025] [Indexed: 02/07/2025]
Abstract
Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) to process multi-phase imaging inputs, possessing robust feature representations while maintaining computational efficiency. The weight-sharing feature of the SNN is further enriched by a hybrid Dual-Resolution Transformer (DR-Former), comprising a 3D Convolutional Neural Network (CNN) and a tailored 3D Transformer for processing high- and low-resolution images, respectively. This hybrid sub-architecture excels in capturing detailed local features and understanding global contextual information, thereby, boosting the SNN's feature extraction capabilities. Additionally, a novel Adaptive Phase Selection Module (APSM) is introduced, promoting phase-specific intercommunication and dynamically adjusting each phase's influence on the diagnostic outcome. The proposed SDR-Former framework has been validated through comprehensive experiments on two clinically collected datasets: a 3-phase CT dataset and an 8-phase MR dataset. The experimental results affirm the efficacy of the proposed framework. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public. This pioneering dataset, being the first publicly available multi-phase MR dataset in this field, also underpins the MICCAI LLD-MMRI Challenge. The dataset is publicly available at: https://github.com/LMMMEng/LLD-MMRI-Dataset.
Collapse
Affiliation(s)
- Meng Lou
- School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China; AI Lab, Deepwise Healthcare, Beijing, China.
| | - Hanning Ying
- Department of General Surgery, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
| | | | - Hong-Yu Zhou
- School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China; Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
| | - Yuqin Zhang
- Department of Radiology, The Affiliated LiHuiLi Hospital of Ningbo University, Ningbo, Zhejiang, China.
| | - Yizhou Yu
- School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
15
|
Nair A, Ong W, Lee A, Leow NW, Makmur A, Ting YH, Lee YJ, Ong SJ, Tan JJH, Kumar N, Hallinan JTPD. Enhancing Radiologist Productivity with Artificial Intelligence in Magnetic Resonance Imaging (MRI): A Narrative Review. Diagnostics (Basel) 2025; 15:1146. [PMID: 40361962 PMCID: PMC12071790 DOI: 10.3390/diagnostics15091146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 04/06/2025] [Accepted: 04/24/2025] [Indexed: 05/15/2025] Open
Abstract
Artificial intelligence (AI) shows promise in streamlining MRI workflows by reducing radiologists' workload and improving diagnostic accuracy. Despite MRI's extensive clinical use, systematic evaluation of AI-driven productivity gains in MRI remains limited. This review addresses that gap by synthesizing evidence on how AI can shorten scanning and reading times, optimize worklist triage, and automate segmentation. On 15 November 2024, we searched PubMed, EMBASE, MEDLINE, Web of Science, Google Scholar, and Cochrane Library for English-language studies published between 2000 and 15 November 2024, focusing on AI applications in MRI. Additional searches of grey literature were conducted. After screening for relevance and full-text review, 67 studies met inclusion criteria. Extracted data included study design, AI techniques, and productivity-related outcomes such as time savings and diagnostic accuracy. The included studies were categorized into five themes: reducing scan times, automating segmentation, optimizing workflow, decreasing reading times, and general time-saving or workload reduction. Convolutional neural networks (CNNs), especially architectures like ResNet and U-Net, were commonly used for tasks ranging from segmentation to automated reporting. A few studies also explored machine learning-based automation software and, more recently, large language models. Although most demonstrated gains in efficiency and accuracy, limited external validation and dataset heterogeneity could reduce broader adoption. AI applications in MRI offer potential to enhance radiologist productivity, mainly through accelerated scans, automated segmentation, and streamlined workflows. Further research, including prospective validation and standardized metrics, is needed to enable safe, efficient, and equitable deployment of AI tools in clinical MRI practice.
Collapse
Affiliation(s)
- Arun Nair
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
| | - Wilson Ong
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
| | - Aric Lee
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
| | - Naomi Wenxin Leow
- AIO Innovation Office, National University Health System, 3 Research Link, #02-04 Innovation 4.0, Singapore 117602, Singapore;
| | - Andrew Makmur
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore 117597, Singapore
| | - Yong Han Ting
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore 117597, Singapore
| | - You Jun Lee
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
| | - Shao Jin Ong
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore 117597, Singapore
| | - Jonathan Jiong Hao Tan
- National University Spine Institute, Department of Orthopaedic Surgery, National University Health System, 1E Lower Kent Ridge Road, Singapore 119228, Singapore; (J.J.H.T.); (N.K.)
| | - Naresh Kumar
- National University Spine Institute, Department of Orthopaedic Surgery, National University Health System, 1E Lower Kent Ridge Road, Singapore 119228, Singapore; (J.J.H.T.); (N.K.)
| | - James Thomas Patrick Decourcy Hallinan
- Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore; (A.N.); (W.O.); (A.L.); (A.M.); (Y.H.T.); (Y.J.L.); (S.J.O.)
- Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore 117597, Singapore
| |
Collapse
|
16
|
Chu Q, Wang X, Lv H, Zhou Y, Jiang T. Vision transformer-based diagnosis of lumbar disc herniation with grad-CAM interpretability in CT imaging. BMC Musculoskelet Disord 2025; 26:419. [PMID: 40301802 PMCID: PMC12039304 DOI: 10.1186/s12891-025-08602-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 03/31/2025] [Indexed: 05/01/2025] Open
Abstract
BACKGROUND In this study, a computed tomography (CT)-vision transformer (ViT) framework for diagnosing lumbar disc herniation (LDH) was proposed for the first time by taking advantage of the multidirectional advantages of CT and a ViT. METHODS The proposed ViT model was trained and validated on a dataset consisting of 983 patients, including 2100 CT images. We compared the performance of the ViT model with that of several convolutional neural networks (CNNs), including ResNet18, ResNet50, LeNet, AlexNet, and VGG16, across two primary tasks: vertebra localization and disc abnormality classification. RESULTS The integration of a ViT with CT imaging allowed the constructed model to capture the complex spatial relationships and global dependencies within scans, outperforming CNN models and achieving accuracies of 97.13% and 93.63% in terms of vertebra localization and disc abnormality classification, respectively. The performance of the model was further validated via gradient-weighted class activation mapping (Grad-CAM), providing interpretable insights into the regions of the CT scans that contributed to the model predictions. CONCLUSION This study demonstrated the potential of a ViT for diagnosing LDH using CT imaging. The results highlight the promising clinical applications of this approach, particularly for enhancing the diagnostic efficiency and transparency of medical AI systems.
Collapse
Affiliation(s)
- Qingsong Chu
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China
- Anhui University of Chinese Medicine, Hefei, China
| | - Xingyu Wang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China
- Anhui University of Chinese Medicine, Hefei, China
| | - Hao Lv
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China
- Anhui University of Chinese Medicine, Hefei, China
| | - Yao Zhou
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China
- Anhui University of Chinese Medicine, Hefei, China
| | - Ting Jiang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China.
| |
Collapse
|
17
|
Li W, Xia J, Gao W, Hu Z, Nie S, Li Y. Dual-way magnetic resonance image translation with transformer-based adversarial network. Med Phys 2025. [PMID: 40270088 DOI: 10.1002/mp.17837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 04/05/2025] [Indexed: 04/25/2025] Open
Abstract
BACKGROUND The magnetic resonance (MR) image translation model is designed to generate MR images of required sequence from the images of existing sequence. However, the generalization performance of MR image generation models on external datasets tends to be unsatisfactory due to the inconsistency in the data distribution of MR images across different centers or scanners. PURPOSE The aim of this study is to propose a cross-sequence MR image synthesis model that could generate high-quality MR synthetic images with high transferability for small-sized external datasets. METHODS We proposed a dual-way magnetic resonance image translation model using transformer-based adversarial network (DMTrans) for MR image synthesis across sequences. It integrates a transformer-based generative architecture with an innovative discriminator design. The shifted window-based multi-head self-attention mechanism in DMTrans enables efficient capture of global and local features from MR images. The sequential dual-scale discriminator is designed to distinguish features of the generated images at multi-scale. RESULTS We pre-trained DMTrans model for bi-directional image synthesis on a T1/T2-weighted MR image dataset comprising 4229 slices. It demonstrates superior performance to baseline methods on both qualitative and quantitative measurements. The SSIM, PSNR, and MAE metrics for synthetic T1 images generation based on T2 images are 0.91 ± 0.04, 25.30 ± 2.40, and 24.65 ± 10.46, while the metric values are 0.90 ± 0.04, 24.72 ± 1.62, and 23.28 ± 7.40 for the opposite direction. Fine-tuning is then utilized to adapt the model to another public dataset with T1/T2/proton-weighted (PD) images, so that only 6 patients of 500 slices are required for model adaptation to achieve high-quality T1/T2, T1/PD, and T2/PD image translation results. CONCLUSIONS The proposed DMTrans achieves the state-of-the-art performance for cross-sequence MR image conversion, which could provide more information assisting clinical diagnosis and treatment. It also offered a versatile and efficient solution to the needs of high-quality MR image synthesis in data-scarce conditions at different centers.
Collapse
Affiliation(s)
- Wenxin Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, PR China
| | - Jun Xia
- Department of Radiology, The First Affiliated Hospital of Shenzhen University, Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, PR China
| | - Weilin Gao
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, PR China
| | - Zaiqi Hu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, PR China
| | - Shengdong Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, PR China
| | - Yafen Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, PR China
| |
Collapse
|
18
|
Liu Z, Yang H, Nie L, Xian P, Chen J, Huang J, Yao Z, Yuan T. Prediction of Tumor Budding Grading in Rectal Cancer Using a Multiparametric MRI Radiomics Combined with a 3D Vision Transformer Deep Learning Approach. Acad Radiol 2025:S1076-6332(25)00282-X. [PMID: 40246672 DOI: 10.1016/j.acra.2025.03.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 04/19/2025]
Abstract
RATIONALE AND OBJECTIVES The objective is to assess the effectiveness of a multiparametric MRI radiomics strategy combined with a 3D Vision Transformer (ViT) deep learning (DL) model in predicting tumor budding (TB) grading in individuals diagnosed with rectal cancer (RC). MATERIALS AND METHODS This retrospective study analyzed data from 349 patients diagnosed with rectal adenocarcinoma across two hospitals. A total of 267 patients from our institution were randomly allocated to a training cohort (n=187) or an internal test cohort (n=80) in a 7:3 ratio. Furthermore, a cohort of 82 patients from another hospital was established for external testing purposes. Univariate and multivariate analyses were performed to pinpoint independent clinical risk factors, which were then utilized to develop a clinical model. Radiomics (Rad) models, a 3D ViT DL model, and a combined model (DLR) were built using 3D T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and contrast-enhanced T1-weighted imaging (T1CE). The evaluation of each model's predictive performance involved calculating the area under the curve (AUC), conducting the Delong test, and examining calibration curves alongside decision curve analysis (DCA). RESULTS No notable clinical characteristics were observed in either univariate or multivariate analyses, hindering the establishment of a clinical model. The DLR model demonstrated exceptional performance, attaining an AUC of 0.938 (95% CI: 0.906-0.969) within the training cohort, 0.867 (95% CI: 0.779-0.954) in the internal test cohort, and 0.824 (95% CI: 0.734-0.914) in the external test cohort. CONCLUSION The combination of multiparametric MRI radiomics and 3D ViT DL effectively and non-invasively predicts TB grading in RC patients, offering valuable insights for personalized treatment planning and prognosis assessment.
Collapse
Affiliation(s)
- Zhanhong Liu
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Hao Yang
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China.
| | - Lin Nie
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Peng Xian
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Junfan Chen
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Jianru Huang
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Zhengkang Yao
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| | - Tianqi Yuan
- CT/MRI Department, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, No.97, People's South Road, Nanchong, China
| |
Collapse
|
19
|
Pak S, Son HJ, Kim D, Woo JY, Yang I, Hwang HS, Rim D, Choi MS, Lee SH. Comparison of CNNs and Transformer Models in Diagnosing Bone Metastases in Bone Scans Using Grad-CAM. Clin Nucl Med 2025:00003072-990000000-01645. [PMID: 40237349 DOI: 10.1097/rlu.0000000000005898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Accepted: 03/09/2025] [Indexed: 04/18/2025]
Abstract
PURPOSE Convolutional neural networks (CNNs) have been studied for detecting bone metastases on bone scans; however, the application of ConvNeXt and transformer models has not yet been explored. This study aims to evaluate the performance of various deep learning models, including the ConvNeXt and transformer models, in diagnosing metastatic lesions from bone scans. MATERIALS AND METHODS We retrospectively analyzed bone scans from patients with cancer obtained at 2 institutions: the training and validation sets (n=4626) were from Hospital 1 and the test set (n=1428) was from Hospital 2. The deep learning models evaluated included ResNet18, the Data-Efficient Image Transformer (DeiT), the Vision Transformer (ViT Large 16), the Swin Transformer (Swin Base), and ConvNeXt Large. Gradient-weighted class activation mapping (Grad-CAM) was used for visualization. RESULTS Both the validation set and the test set demonstrated that the ConvNeXt large model (0.969 and 0.885, respectively) exhibited the best performance, followed by the Swin Base model (0.965 and 0.840, respectively), both of which significantly outperformed ResNet (0.892 and 0.725, respectively). Subgroup analyses revealed that all the models demonstrated greater diagnostic accuracy for patients with polymetastasis compared with those with oligometastasis. Grad-CAM visualization revealed that the ConvNeXt Large model focused more on identifying local lesions, whereas the Swin Base model focused on global areas such as the axial skeleton and pelvis. CONCLUSIONS Compared with traditional CNN and transformer models, the ConvNeXt model demonstrated superior diagnostic performance in detecting bone metastases from bone scans, especially in cases of polymetastasis, suggesting its potential in medical image analysis.
Collapse
Affiliation(s)
- Sehyun Pak
- Department of Medicine, Hallym University College of Medicine, Chuncheon, Gangwon, Republic of Korea
| | - Hye Joo Son
- Department of Nuclear Medicine, Dankook University Medical Center, Cheonan, Chungnam, Republic of Korea
| | - Dongwoo Kim
- Department of Nuclear Medicine, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Gyeonggi, Republic of Korea
| | - Ji Young Woo
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Ik Yang
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Hee Sung Hwang
- Department of Nuclear Medicine, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Gyeonggi, Republic of Korea
| | | | - Min Seok Choi
- PE Data Solution, SK hynix, Icheon, Gyeonggi, Republic of Korea
| | - Suk Hyun Lee
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
20
|
Lu J, Liu X, Ji X, Jiang Y, Zuo A, Guo Z, Yang S, Peng H, Sun F, Lu D. Predicting PD-L1 status in NSCLC patients using deep learning radiomics based on CT images. Sci Rep 2025; 15:12495. [PMID: 40216830 PMCID: PMC11992188 DOI: 10.1038/s41598-025-91575-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 02/21/2025] [Indexed: 04/14/2025] Open
Abstract
Radiomics refers to the utilization of automated or semi-automated techniques to extract and analyze numerous quantitative features from medical images, such as computerized tomography (CT) or magnetic resonance imaging (MRI) scans. This study aims to develop a deep learning radiomics (DLR)-based approach for predicting programmed death-ligand 1 (PD-L1) expression in patients with non-small cell lung cancer (NSCLC). Data from 352 NSCLC patients with known PD-L1 expression were collected, of which 48.29% (170/352) were tested positive for PD-L1 expression. Tumor regions of interest (ROI) were semi-automatically segmented based on CT images, and DL features were extracted using Residual Network 50. The least absolute shrinkage and selection operator (LASSO) algorithm was used for feature selection and dimensionality reduction. Seven algorithms were used to build models, and the most optimal ones were identified. A combined model integrating DLR with clinical data was also developed. The predictive performance of each model was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve analysis. The DLR model, based on CT images, demonstrated an AUC of 0.85 (95% confidence interval (CI), 0.82-0.88), sensitivity of 0.80 (0.74-0.85), and specificity of 0.73 (0.70-0.77) for predicting PD-L1 status. The integrated model exhibited superior performance, with an AUC of 0.91 (0.87-0.95), sensitivity of 0.85 (0.82-0.89), and specificity of 0.75 (0.72-0.80). Our findings indicate that the DLR model holds promise as a valuable tool for predicting the PD-L1 status in patients with NSCLC, which can greatly assist in clinical decision-making and the selection of personalized treatment strategies.
Collapse
Affiliation(s)
- Jiameng Lu
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
- Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, 999078, Macau Special Administrative Region, People's Republic of China
| | - Xinyi Liu
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
| | - Xiaoqing Ji
- Department of Nursing, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, Shandong, China
| | - Yunxiu Jiang
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
| | - Anli Zuo
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
| | - Zihan Guo
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
| | - Shuran Yang
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China
| | - Haiying Peng
- Department of Respiratory and Critical Care Medicine, The Second People's Hospital of Yibin City, 644002, Yibin, People's Republic of China
| | - Fei Sun
- Department of Respiratory and Critical Care Medicine, Jining No.1 People's Hospital, 272000, Jining, People's Republic of China
| | - Degan Lu
- Department of Respiratory, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Institute of Respiratory Diseases, Shandong Institute of Anesthesia and Respiratory Critical Medicine, 16766 Jingshilu, Lixia, Jinan, 250014, Shandong, People's Republic of China.
| |
Collapse
|
21
|
Zheng S, Ye X, Yang C, Yu L, Li W, Gao X, Zhao Y. Asymmetric Adaptive Heterogeneous Network for Multi-Modality Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1836-1852. [PMID: 40031190 DOI: 10.1109/tmi.2025.3526604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Existing studies of multi-modality medical image segmentation tend to aggregate all modalities without discrimination and employ multiple symmetric encoders or decoders for feature extraction and fusion. They often overlook the different contributions to visual representation and intelligent decisions among multi-modality images. Motivated by this discovery, this paper proposes an asymmetric adaptive heterogeneous network for multi-modality image feature extraction with modality discrimination and adaptive fusion. For feature extraction, it uses a heterogeneous two-stream asymmetric feature-bridging network to extract complementary features from auxiliary multi-modality and leading single-modality images, respectively. For feature adaptive fusion, the proposed Transformer-CNN Feature Alignment and Fusion (T-CFAF) module enhances the leading single-modality information, and the Cross-Modality Heterogeneous Graph Fusion (CMHGF) module further fuses multi-modality features at a high-level semantic layer adaptively. Comparative evaluation with ten segmentation models on six datasets demonstrates significant efficiency gains as well as highly competitive segmentation accuracy. (Our code is publicly available at https://github.com/joker-527/AAHN).
Collapse
|
22
|
Hang T, Fan D, Sun T, Chen Z, Yang X, Yue X. Deep Learning and Hyperspectral Imaging for Liver Cancer Staging and Cirrhosis Differentiation. JOURNAL OF BIOPHOTONICS 2025; 18:e202400557. [PMID: 39873135 DOI: 10.1002/jbio.202400557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 01/14/2025] [Accepted: 01/16/2025] [Indexed: 01/30/2025]
Abstract
Liver malignancies, particularly hepatocellular carcinoma (HCC), pose a formidable global health challenge. Conventional diagnostic techniques frequently fall short in precision, especially at advanced HCC stages. In response, we have developed a novel diagnostic strategy that integrates hyperspectral imaging with deep learning. This innovative approach captures detailed spectral data from tissue samples, pinpointing subtle cellular differences that elude traditional methods. A sophisticated deep convolutional neural network processes this data, effectively distinguishing high-grade liver cancer from cirrhosis with an accuracy of 89.45%, a sensitivity of 90.29%, and a specificity of 88.64%. For HCC differentiation specifically, it achieves an impressive accuracy of 93.73%, sensitivity of 92.53%, and specificity of 90.07%. Our results underscore the potential of this technique as a precise, rapid, and non-invasive diagnostic tool that surpasses existing clinical methods in staging liver cancer and differentiating cirrhosis.
Collapse
Affiliation(s)
- Tianyi Hang
- Nanjing University of Chinese Medicine, Nanjing, China
| | - Danfeng Fan
- Nanjing University of Chinese Medicine, Nanjing, China
| | - Tiefeng Sun
- Nanjing University of Chinese Medicine, Nanjing, China
- Shandong Academy of Chinese Medicine, Jinan, China
| | | | - Xiaoqing Yang
- Department of Pathology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Xiaoqing Yue
- Nanjing University of Chinese Medicine, Nanjing, China
- Yucheng People's Hospital, Dezhou, China
| |
Collapse
|
23
|
Shabani S, Sohaib M, Mohamed SA, Parvin B. COUPLED SWIN TRANSFORMERS AND MULTI-APERTURES NETWORK(CSTA-NET) IMPROVES MEDICAL IMAGE SEGMENTATION. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2025; 2025:10.1109/ISBI60581.2025.10981294. [PMID: 40365016 PMCID: PMC12068877 DOI: 10.1109/isbi60581.2025.10981294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2025]
Abstract
Vision Transformers have outperformed traditional convolution-based frameworks across various visual tasks, including, but not limited to, the segmentation of 3D medical images. To further advance this area, this study introduces the Coupled Swin Transformers and Multi-Apertures Networks (CSTA-Net), which integrates the outputs of each Swin Transformer with an Aperture Network. Each aperture network consists of a convolution and a fusion block for combining global and local feature maps. The proposed model has been tested on two independent datasets to show that fine details are delineated. The proposed architecture was trained on the Synapse multi-organ and ACDC datasets to conclude an average Dice score of 90.19±0.05 and 93.77±0.04, respectively. The code is available here: https://github.com/Siyavashshabani/CSTANet.
Collapse
Affiliation(s)
- Siyavash Shabani
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno
| | - Muhammad Sohaib
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno
| | - Sahar A Mohamed
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno
| | - Bahram Parvin
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno
| |
Collapse
|
24
|
Hu Y, Sirinukunwattana K, Li B, Gaitskell K, Domingo E, Bonnaffé W, Wojciechowska M, Wood R, Alham NK, Malacrino S, Woodcock DJ, Verrill C, Ahmed A, Rittscher J. Self-interactive learning: Fusion and evolution of multi-scale histomorphology features for molecular traits prediction in computational pathology. Med Image Anal 2025; 101:103437. [PMID: 39798526 DOI: 10.1016/j.media.2024.103437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 10/06/2024] [Accepted: 12/09/2024] [Indexed: 01/15/2025]
Abstract
Predicting disease-related molecular traits from histomorphology brings great opportunities for precision medicine. Despite the rich information present in histopathological images, extracting fine-grained molecular features from standard whole slide images (WSI) is non-trivial. The task is further complicated by the lack of annotations for subtyping and contextual histomorphological features that might span multiple scales. This work proposes a novel multiple-instance learning (MIL) framework capable of WSI-based cancer morpho-molecular subtyping by fusion of different-scale features. Our method, debuting as Inter-MIL, follows a weakly-supervised scheme. It enables the training of the patch-level encoder for WSI in a task-aware optimisation procedure, a step normally not modelled in most existing MIL-based WSI analysis frameworks. We demonstrate that optimising the patch-level encoder is crucial to achieving high-quality fine-grained and tissue-level subtyping results and offers a significant improvement over task-agnostic encoders. Our approach deploys a pseudo-label propagation strategy to update the patch encoder iteratively, allowing discriminative subtype features to be learned. This mechanism also empowers extracting fine-grained attention within image tiles (the small patches), a task largely ignored in most existing weakly supervised-based frameworks. With Inter-MIL, we carried out four challenging cancer molecular subtyping tasks in the context of ovarian, colorectal, lung, and breast cancer. Extensive evaluation results show that Inter-MIL is a robust framework for cancer morpho-molecular subtyping with superior performance compared to several recently proposed methods, in small dataset scenarios where the number of available training slides is less than 100. The iterative optimisation mechanism of Inter-MIL significantly improves the quality of the image features learned by the patch embedded and generally directs the attention map to areas that better align with experts' interpretation, leading to the identification of more reliable histopathology biomarkers. Moreover, an external validation cohort is used to verify the robustness of Inter-MIL on molecular trait prediction.
Collapse
Affiliation(s)
- Yang Hu
- Nuffield Department of Medicine, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
| | - Korsuk Sirinukunwattana
- Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Bin Li
- Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Kezia Gaitskell
- Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK; Department of Cellular Pathology, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Enric Domingo
- Department of Oncology, University of Oxford, Oxford, UK
| | - Willem Bonnaffé
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | - Marta Wojciechowska
- Nuffield Department of Medicine, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Ruby Wood
- Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Nasullah Khalid Alham
- Department of Engineering Science, University of Oxford, Oxford, UK; Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | - Stefano Malacrino
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | - Dan J Woodcock
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | - Clare Verrill
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK; Department of Cellular Pathology, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, UK
| | - Ahmed Ahmed
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK; Nuffield Department of Womenś and Reproductive Health, University of Oxford, Oxford, UK; Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, UK
| | - Jens Rittscher
- Nuffield Department of Medicine, University of Oxford, Oxford, UK; Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Ludwig Institute for Cancer Research, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK; Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, UK.
| |
Collapse
|
25
|
Huang X, Qin M, Fang M, Wang Z, Hu C, Zhao T, Qin Z, Zhu H, Wu L, Yu G, De Cobelli F, Xie X, Palumbo D, Tian J, Dong D. The application of artificial intelligence in upper gastrointestinal cancers. JOURNAL OF THE NATIONAL CANCER CENTER 2025; 5:113-131. [PMID: 40265096 PMCID: PMC12010392 DOI: 10.1016/j.jncc.2024.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 09/17/2024] [Accepted: 12/20/2024] [Indexed: 04/24/2025] Open
Abstract
Upper gastrointestinal cancers, mainly comprising esophageal and gastric cancers, are among the most prevalent cancers worldwide. There are many new cases of upper gastrointestinal cancers annually, and the survival rate tends to be low. Therefore, timely screening, precise diagnosis, appropriate treatment strategies, and effective prognosis are crucial for patients with upper gastrointestinal cancers. In recent years, an increasing number of studies suggest that artificial intelligence (AI) technology can effectively address clinical tasks related to upper gastrointestinal cancers. These studies mainly focus on four aspects: screening, diagnosis, treatment, and prognosis. In this review, we focus on the application of AI technology in clinical tasks related to upper gastrointestinal cancers. Firstly, the basic application pipelines of radiomics and deep learning in medical image analysis were introduced. Furthermore, we separately reviewed the application of AI technology in the aforementioned aspects for both esophageal and gastric cancers. Finally, the current limitations and challenges faced in the field of upper gastrointestinal cancers were summarized, and explorations were conducted on the selection of AI algorithms in various scenarios, the popularization of early screening, the clinical applications of AI, and large multimodal models.
Collapse
Affiliation(s)
- Xiaoying Huang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Minghao Qin
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- University of Science and Technology Beijing, Beijing, China
| | - Mengjie Fang
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Zipei Wang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Chaoen Hu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Tongyu Zhao
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- University of Science and Technology of China, Hefei, China
| | - Zhuyuan Qin
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | | | - Ling Wu
- KiangWu Hospital, Macau, China
| | | | | | | | - Diego Palumbo
- Department of Radiology, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Jie Tian
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Di Dong
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
26
|
Aburass S, Dorgham O, Al Shaqsi J, Abu Rumman M, Al-Kadi O. Vision Transformers in Medical Imaging: a Comprehensive Review of Advancements and Applications Across Multiple Diseases. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01481-y. [PMID: 40164818 DOI: 10.1007/s10278-025-01481-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 02/15/2025] [Accepted: 03/11/2025] [Indexed: 04/02/2025]
Abstract
The rapid advancement of artificial intelligence techniques, particularly deep learning, has transformed medical imaging. This paper presents a comprehensive review of recent research that leverage vision transformer (ViT) models for medical image classification across various disciplines. The medical fields of focus include breast cancer, skin lesions, magnetic resonance imaging brain tumors, lung diseases, retinal and eye analysis, COVID-19, heart diseases, colon cancer, brain disorders, diabetic retinopathy, skin diseases, kidney diseases, lymph node diseases, and bone analysis. Each work is critically analyzed and interpreted with respect to its performance, data preprocessing methodologies, model architecture, transfer learning techniques, model interpretability, and identified challenges. Our findings suggest that ViT shows promising results in the medical imaging domain, often outperforming traditional convolutional neural networks (CNN). A comprehensive overview is presented in the form of figures and tables summarizing the key findings from each field. This paper provides critical insights into the current state of medical image classification using ViT and highlights potential future directions for this rapidly evolving research area.
Collapse
Affiliation(s)
- Sanad Aburass
- Department of Computer Science, Luther College, Decorah, IA, USA.
| | - Osama Dorgham
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Jamil Al Shaqsi
- Information Systems Department, Sultan Qaboos University, Seeb, Oman
| | - Maha Abu Rumman
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Omar Al-Kadi
- Artificial Intelligence Department, King Abdullah II School of Information Technology, University of Jordan, Amman, 11942, Jordan
| |
Collapse
|
27
|
Hu C, Cao N, Li X, He Y, Zhou H. CBCT-to-CT synthesis using a hybrid U-Net diffusion model based on transformers and information bottleneck theory. Sci Rep 2025; 15:10816. [PMID: 40155469 PMCID: PMC11953287 DOI: 10.1038/s41598-025-92094-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2024] [Accepted: 02/25/2025] [Indexed: 04/01/2025] Open
Abstract
Cone-beam computed tomography (CBCT) scans are widely used for real time monitoring and patient positioning corrections in image-guided radiation therapy (IGRT), enhancing the precision of radiation treatment. However, compared to high-quality computed tomography (CT) images, CBCT images suffer from severe artifacts and noise, which significantly hinder their application in IGRT. Therefore, synthesizing CBCT images into CT-like quality has become a critical necessity. In this study, we propose a hybrid U-Net diffusion model (HUDiff) based on Vision Transformer (ViT) and the information bottleneck theory to improve CBCT image quality. First, to address the limitations of the original U-Net in diffusion models, which primarily retain and transfer only local feature information, we introduce a ViT-based U-Net framework. By leveraging the self-attention mechanism, our model automatically focuses on different regions of the image during generation, aiming to better capture global features. Second, we incorporate a variational information bottleneck module at the base of the U-Net. This module filters out redundant and irrelevant information while compressing essential input data, thereby enabling more efficient summarization and better feature extraction. Finally, a dynamic modulation factor is introduced to balance the contributions of the main network and skip connections, optimizing the reverse denoising process in the diffusion model. We conducted extensive experiments on private Brain and Head & Neck datasets. The results, evaluated from multiple perspectives, demonstrate that our model outperforms state-of-the-art methods, validating its clinical applicability and robustness. In future clinical practice, our model has the potential to assist clinicians in formulating more precise radiation therapy plans.
Collapse
Affiliation(s)
- Can Hu
- School of Computer and Software, Hohai University, Nanjing, 211100, China
| | - Ning Cao
- School of Computer and Software, Hohai University, Nanjing, 211100, China
| | - Xiuhan Li
- School of Computer and Software, Hohai University, Nanjing, 211100, China
- Jiangsu Province Engineering Research Center of Smart Wearable and Rehabilitation Devices, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
- Engineering Research Center of Intelligent Theranostics Technology and Instruments, Ministry of Education, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Yang He
- School of Computer and Software, Hohai University, Nanjing, 211100, China
| | - Han Zhou
- Department of Radiation Oncology, The Fourth Affiliated Hospital of Nanjing Medical University, Nanjing, 210013, China.
- School of Electronic Science and Engineering, Nanjing University, Nanjing, 210046, China.
| |
Collapse
|
28
|
Zhu M, Zhang L, Wang L, Wang Z, Wang Y, Qian G. Local Extremum Mapping for Weak Supervision Learning on Mammogram Classification and Localization. Bioengineering (Basel) 2025; 12:325. [PMID: 40281685 PMCID: PMC12024162 DOI: 10.3390/bioengineering12040325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 03/06/2025] [Accepted: 03/10/2025] [Indexed: 04/29/2025] Open
Abstract
The early and accurate detection of breast lesions through mammography is crucial for improving survival rates. However, the existing deep learning-based methods often rely on costly pixel-level annotations, limiting their scalability in real-world applications. To address this issue, a novel local extremum mapping (LEM) mechanism is proposed for mammogram classification and weakly supervised lesion localization. The proposed method first divides the input mammogram into multiple regions and generates score maps through convolutional neural networks. Then, it identifies the most informative regions by filtering local extrema in the score maps and aggregating their scores for final classification. This strategy enables lesion localization with only image-level labels, significantly reducing annotation costs. Experiments on two public mammography datasets, CBIS-DDSM and INbreast, demonstrate that the proposed method achieves competitive performance. On the INbreast dataset, LEM improves classification accuracy to 96.3% with an AUC of 0.976. Furthermore, the proposed method effectively localizes lesions with a dice similarity coefficient of 0.37, outperforming Grad-CAM and other baseline approaches. These results highlight the practical significance and potential clinical applications of our approach, making automated mammogram analysis more accessible and efficient.
Collapse
Affiliation(s)
- Minjuan Zhu
- College of Computer Science, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China; (M.Z.); (L.Z.); (L.W.)
| | - Lei Zhang
- College of Computer Science, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China; (M.Z.); (L.Z.); (L.W.)
| | - Lituan Wang
- College of Computer Science, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China; (M.Z.); (L.Z.); (L.W.)
| | - Zizhou Wang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore; (Z.W.); (Y.W.)
| | - Yan Wang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore; (Z.W.); (Y.W.)
| | - Guangwu Qian
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China
| |
Collapse
|
29
|
Luong HH, Hong PP, Minh DV, Quang TNL, The AD, Thai-Nghe N, Nguyen HT. Principal component analysis and fine-tuned vision transformation integrating model explainability for breast cancer prediction. Vis Comput Ind Biomed Art 2025; 8:5. [PMID: 40063312 PMCID: PMC11893953 DOI: 10.1186/s42492-025-00186-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 01/23/2025] [Indexed: 03/14/2025] Open
Abstract
Breast cancer, which is the most commonly diagnosed cancers among women, is a notable health issues globally. Breast cancer is a result of abnormal cells in the breast tissue growing out of control. Histopathology, which refers to the detection and learning of tissue diseases, has appeared as a solution for breast cancer treatment as it plays a vital role in its diagnosis and classification. Thus, considerable research on histopathology in medical and computer science has been conducted to develop an effective method for breast cancer treatment. In this study, a vision Transformer (ViT) was employed to classify tumors into two classes, benign and malignant, in the Breast Cancer Histopathological Database (BreakHis). To enhance the model performance, we introduced the novel multi-head locality large kernel self-attention during fine-tuning, achieving an accuracy of 95.94% at 100× magnification, thereby improving the accuracy by 3.34% compared to a standard ViT (which uses multi-head self-attention). In addition, the application of principal component analysis for dimensionality reduction led to an accuracy improvement of 3.34%, highlighting its role in mitigating overfitting and reducing the computational complexity. In the final phase, SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, and Gradient-weighted Class Activation Mapping were used for the interpretability and explainability of machine-learning models, aiding in understanding the feature importance and local explanations, and visualizing the model attention. In another experiment, ensemble learning with VGGIN further boosted the performance to 97.13% accuracy. Our approach exhibited a 0.98% to 17.13% improvement in accuracy compared with state-of-the-art methods, establishing a new benchmark for breast cancer histopathological image classification.
Collapse
Affiliation(s)
- Huong Hoang Luong
- College of Information and Communication Technology, Can Tho University, Can Tho 900000, Vietnam
- Information Assurance Department, FPT University, Can Tho 900000, Vietnam
| | - Phuc Phan Hong
- Information Technology Department, FPT University, Can Tho 900000, Vietnam
| | - Dat Vo Minh
- Information Technology Department, FPT University, Can Tho 900000, Vietnam
| | | | - Anh Dinh The
- Information Technology Department, FPT University, Can Tho 900000, Vietnam
| | - Nguyen Thai-Nghe
- College of Information and Communication Technology, Can Tho University, Can Tho 900000, Vietnam
| | - Hai Thanh Nguyen
- College of Information and Communication Technology, Can Tho University, Can Tho 900000, Vietnam.
| |
Collapse
|
30
|
Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Comput Biol Med 2025; 186:109621. [PMID: 39793348 DOI: 10.1016/j.compbiomed.2024.109621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 11/14/2024] [Accepted: 12/23/2024] [Indexed: 01/13/2025]
Abstract
Due to the increasing workload of pathologists, the need for automation to support diagnostic tasks and quantitative biomarker evaluation is becoming more and more apparent. Foundation models have the potential to improve generalizability within and across centers and serve as starting points for data efficient development of specialized yet robust AI models. However, the training of foundation models themselves is usually very expensive in terms of data, computation, and time. This paper proposes a supervised training method that drastically reduces these expenses. The proposed method is based on multi-task learning to train a joint encoder, by combining 16 different classification, segmentation, and detection tasks on a total of 912,000 patches. Since the encoder is capable of capturing the properties of the samples, we term it the Tissue Concepts encoder. To evaluate the performance and generalizability of the Tissue Concepts encoder across centers, classification of whole slide images from four of the most prevalent solid cancers - breast, colon, lung, and prostate - was used. The experiments show that the Tissue Concepts model achieve comparable performance to models trained with self-supervision, while requiring only 6% of the amount of training patches. Furthermore, the Tissue Concepts encoder outperforms an ImageNet pre-trained encoder on both in-domain and out-of-domain data. The pre-trained models and will be made available under https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling.
Collapse
Affiliation(s)
- Till Nicke
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany.
| | - Jan Raphael Schäfer
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany
| | - Henning Höfener
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany
| | - Friedrich Feuerhake
- Institute for Pathology, Hannover Medical School, Hannover, Germany; Institute of Neuropathology, Medical Center - University of Freiburg, Freiburg, Germany
| | - Dorit Merhof
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany; Institute of Image Analysis and Computer Vision, University of Regensburg, Regensburg, Germany
| | - Fabian Kießling
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany; Institute for Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany
| | - Johannes Lotz
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen/Lübeck/Aachen, Germany
| |
Collapse
|
31
|
Liu X, Xin J, Shen Q, Huang Z, Wang Z. Automatic medical report generation based on deep learning: A state of the art survey. Comput Med Imaging Graph 2025; 120:102486. [PMID: 39787734 DOI: 10.1016/j.compmedimag.2024.102486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/15/2024] [Accepted: 12/30/2024] [Indexed: 01/12/2025]
Abstract
With the increasing popularity of medical imaging and its expanding applications, posing significant challenges for radiologists. Radiologists need to spend substantial time and effort to review images and manually writing reports every day. To address these challenges and speed up the process of patient care, researchers have employed deep learning methods to automatically generate medical reports. In recent years, researchers have been increasingly focusing on this task and a large amount of related work has emerged. Although there have been some review articles summarizing the state of the art in this field, their discussions remain relatively limited. Therefore, this paper provides a comprehensive review of the latest advancements in automatic medical report generation, focusing on four key aspects: (1) describing the problem of automatic medical report generation, (2) introducing datasets of different modalities, (3) thoroughly analyzing existing evaluation metrics, (4) classifying existing studies into five categories: retrieval-based, domain knowledge-based, attention-based, reinforcement learning-based, large language models-based, and merged model. In addition, we point out the problems in this field and discuss the directions of future challenges. We hope that this review provides a thorough understanding of automatic medical report generation and encourages the continued development in this area.
Collapse
Affiliation(s)
- Xinyao Liu
- College of Medicine and Biological Information Engineering, Northeastern University, 110819, China
| | - Junchang Xin
- College of Computer Science and Engineering, Northeastern University, 110819, China
| | - Qi Shen
- College of Medicine and Biological Information Engineering, Northeastern University, 110819, China
| | - Zhihong Huang
- School of Science and Engineering, University of Dundee, DD1 4HN, UK
| | - Zhiqiong Wang
- College of Medicine and Biological Information Engineering, Northeastern University, 110819, China.
| |
Collapse
|
32
|
Volovăț SR, Boboc DI, Ostafe MR, Buzea CG, Agop M, Ochiuz L, Rusu DI, Vasincu D, Ungureanu MI, Volovăț CC. Utilizing Vision Transformers for Predicting Early Response of Brain Metastasis to Magnetic Resonance Imaging-Guided Stage Gamma Knife Radiosurgery Treatment. Tomography 2025; 11:15. [PMID: 39997998 PMCID: PMC11860310 DOI: 10.3390/tomography11020015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/11/2025] [Accepted: 02/01/2025] [Indexed: 02/26/2025] Open
Abstract
BACKGROUND/OBJECTIVES This study explores the application of vision transformers to predict early responses to stereotactic radiosurgery in patients with brain metastases using minimally pre-processed magnetic resonance imaging scans. The objective is to assess the potential of vision transformers as a predictive tool for clinical decision-making, particularly in the context of imbalanced datasets. METHODS We analyzed magnetic resonance imaging scans from 19 brain metastases patients, focusing on axial fluid-attenuated inversion recovery and high-resolution contrast-enhanced T1-weighted sequences. Patients were categorized into responders (complete or partial response) and non-responders (stable or progressive disease). RESULTS Despite the imbalanced nature of the dataset, our results demonstrate that vision transformers can predict early treatment responses with an overall accuracy of 99%. The model exhibited high precision (99% for progression and 100% for regression) and recall (99% for progression and 100% for regression). The use of the attention mechanism in the vision transformers allowed the model to focus on relevant features in the magnetic resonance imaging images, ensuring an unbiased performance even with the imbalanced data. Confusion matrix analysis further confirmed the model's reliability, with minimal misclassifications. Additionally, the model achieved a perfect area under the receiver operator characteristic curve (AUC = 1.00), effectively distinguishing between responders and non-responders. CONCLUSIONS These findings highlight the potential of vision transformers, aided by the attention mechanism, as a non-invasive, predictive tool for early response assessment in clinical oncology. The vision transformer (ViT) model employed in this study processes MRIs as sequences of patches, enabling the capture of localized tumor features critical for early response prediction. By leveraging patch-based feature learning, this approach enhances robustness, interpretability, and clinical applicability, addressing key challenges in tumor progression prediction following stereotactic radiosurgery (SRS). The model's robust performance, despite the dataset imbalance, underscores its ability to provide unbiased predictions. This approach could significantly enhance clinical decision-making and support personalized treatment strategies for brain metastases. Future research should validate these findings in larger, more diverse cohorts and explore the integration of additional data types to further optimize the model's clinical utility.
Collapse
Affiliation(s)
- Simona Ruxandra Volovăț
- Medical Oncology-Radiotherapy Department, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania; (S.R.V.); (D.-I.B.); (M.-R.O.)
| | - Diana-Ioana Boboc
- Medical Oncology-Radiotherapy Department, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania; (S.R.V.); (D.-I.B.); (M.-R.O.)
| | - Mădălina-Raluca Ostafe
- Medical Oncology-Radiotherapy Department, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania; (S.R.V.); (D.-I.B.); (M.-R.O.)
| | - Călin Gheorghe Buzea
- “Prof. Dr. Nicolae Oblu” Clinical Emergency Hospital Iași, 700309 Iași, Romania;
- National Institute of Research and Development for Technical Physics, IFT Iași, 700050 Iași, Romania
| | - Maricel Agop
- Physics Department, “Gheorghe Asachi” Technical University Iași, 700050 Iași, Romania;
| | - Lăcrămioara Ochiuz
- Faculty of Pharmacy, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania;
| | - Dragoș Ioan Rusu
- Faculty of Science, “V. Alecsandri” University of Bacău, 600115 Bacău, Romania;
| | - Decebal Vasincu
- Surgery Department, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania;
| | - Monica Iuliana Ungureanu
- Preventive Medicine and Interdisciplinarity Department, “Grigore T. Popa” University of Medicine and Pharmacy Iași, 700115 Iași, Romania
| | | |
Collapse
|
33
|
Li H, Dong D, Fang M, He B, Liu S, Hu C, Liu Z, Wang H, Tang L, Tian J. ContraSurv: Enhancing Prognostic Assessment of Medical Images via Data-Efficient Weakly Supervised Contrastive Learning. IEEE J Biomed Health Inform 2025; 29:1232-1242. [PMID: 39437290 DOI: 10.1109/jbhi.2024.3484991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
Prognostic assessment remains a critical challenge in medical research, often limited by the lack of well-labeled data. In this work, we introduce ContraSurv, a weakly-supervised learning framework based on contrastive learning, designed to enhance prognostic predictions in 3D medical images. ContraSurv utilizes both the self-supervised information inherent in unlabeled data and the weakly-supervised cues present in censored data, refining its capacity to extract prognostic representations. For this purpose, we establish a Vision Transformer architecture optimized for our medical image datasets and introduce novel methodologies for both self-supervised and supervised contrastive learning for prognostic assessment. Additionally, we propose a specialized supervised contrastive loss function and introduce SurvMix, a novel data augmentation technique for survival analysis. Evaluations were conducted across three cancer types and two imaging modalities on three real-world datasets. The results confirmed the enhanced performance of ContraSurv over competing methods, particularly in data with a high censoring rate.
Collapse
|
34
|
Trigka M, Dritsas E. A Comprehensive Survey of Deep Learning Approaches in Image Processing. SENSORS (BASEL, SWITZERLAND) 2025; 25:531. [PMID: 39860903 PMCID: PMC11769216 DOI: 10.3390/s25020531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Revised: 01/13/2025] [Accepted: 01/13/2025] [Indexed: 01/27/2025]
Abstract
The integration of deep learning (DL) into image processing has driven transformative advancements, enabling capabilities far beyond the reach of traditional methodologies. This survey offers an in-depth exploration of the DL approaches that have redefined image processing, tracing their evolution from early innovations to the latest state-of-the-art developments. It also analyzes the progression of architectural designs and learning paradigms that have significantly enhanced the ability to process and interpret complex visual data. Key advancements, such as techniques improving model efficiency, generalization, and robustness, are examined, showcasing DL's ability to address increasingly sophisticated image-processing tasks across diverse domains. Metrics used for rigorous model evaluation are also discussed, underscoring the importance of performance assessment in varied application contexts. The impact of DL in image processing is highlighted through its ability to tackle complex challenges and generate actionable insights. Finally, this survey identifies potential future directions, including the integration of emerging technologies like quantum computing and neuromorphic architectures for enhanced efficiency and federated learning for privacy-preserving training. Additionally, it highlights the potential of combining DL with emerging technologies such as edge computing and explainable artificial intelligence (AI) to address scalability and interpretability challenges. These advancements are positioned to further extend the capabilities and applications of DL, driving innovation in image processing.
Collapse
Affiliation(s)
| | - Elias Dritsas
- Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece;
| |
Collapse
|
35
|
Bui DC, Song B, Kim K, Kwak JT. Spatially-Constrained and -Unconstrained Bi-Graph Interaction Network for Multi-Organ Pathology Image Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:194-206. [PMID: 39083386 DOI: 10.1109/tmi.2024.3436080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
In computational pathology, graphs have shown to be promising for pathology image analysis. There exist various graph structures that can discover differing features of pathology images. However, the combination and interaction between differing graph structures have not been fully studied and utilized for pathology image analysis. In this study, we propose a parallel, bi-graph neural network, designated as SCUBa-Net, equipped with both graph convolutional networks and Transformers, that processes a pathology image as two distinct graphs, including a spatially-constrained graph and a spatially-unconstrained graph. For efficient and effective graph learning, we introduce two inter-graph interaction blocks and an intra-graph interaction block. The inter-graph interaction blocks learn the node-to-node interactions within each graph. The intra-graph interaction block learns the graph-to-graph interactions at both global- and local-levels with the help of the virtual nodes that collect and summarize the information from the entire graphs. SCUBa-Net is systematically evaluated on four multi-organ datasets, including colorectal, prostate, gastric, and bladder cancers. The experimental results demonstrate the effectiveness of SCUBa-Net in comparison to the state-of-the-art convolutional neural networks, Transformer, and graph neural networks.
Collapse
|
36
|
Kusters CHJ, Jaspers TJM, Boers TGW, Jong MR, Jukema JB, Fockens KN, de Groof AJ, Bergman JJ, van der Sommen F, De With PHN. Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization. Med Image Anal 2025; 99:103348. [PMID: 39298861 DOI: 10.1016/j.media.2024.103348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/10/2024] [Accepted: 09/10/2024] [Indexed: 09/22/2024]
Abstract
Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety-critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett's esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.
Collapse
Affiliation(s)
- Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Kiki N Fockens
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Peter H N De With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
37
|
Bongrand P. Should Artificial Intelligence Play a Durable Role in Biomedical Research and Practice? Int J Mol Sci 2024; 25:13371. [PMID: 39769135 PMCID: PMC11676049 DOI: 10.3390/ijms252413371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/26/2024] [Accepted: 12/09/2024] [Indexed: 01/11/2025] Open
Abstract
During the last decade, artificial intelligence (AI) was applied to nearly all domains of human activity, including scientific research. It is thus warranted to ask whether AI thinking should be durably involved in biomedical research. This problem was addressed by examining three complementary questions (i) What are the major barriers currently met by biomedical investigators? It is suggested that during the last 2 decades there was a shift towards a growing need to elucidate complex systems, and that this was not sufficiently fulfilled by previously successful methods such as theoretical modeling or computer simulation (ii) What is the potential of AI to meet the aforementioned need? it is suggested that recent AI methods are well-suited to perform classification and prediction tasks on multivariate systems, and possibly help in data interpretation, provided their efficiency is properly validated. (iii) Recent representative results obtained with machine learning suggest that AI efficiency may be comparable to that displayed by human operators. It is concluded that AI should durably play an important role in biomedical practice. Also, as already suggested in other scientific domains such as physics, combining AI with conventional methods might generate further progress and new applications, involving heuristic and data interpretation.
Collapse
Affiliation(s)
- Pierre Bongrand
- Laboratory Adhesion and Inflammation (LAI), Inserm UMR 1067, Cnrs Umr 7333, Aix-Marseille Université UM 61, 13009 Marseille, France
| |
Collapse
|
38
|
Azad R, Aghdam EK, Rauland A, Jia Y, Avval AH, Bozorgpour A, Karimijafarbigloo S, Cohen JP, Adeli E, Merhof D. Medical Image Segmentation Review: The Success of U-Net. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:10076-10095. [PMID: 39167505 DOI: 10.1109/tpami.2024.3435571] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model has received tremendous attention from academic and industrial researchers who have extended it to address the scale and complexity created by medical tasks. These extensions are commonly related to enhancing the U-Net's backbone, bottleneck, or skip connections, or including representation learning, or combining it with a Transformer architecture, or even addressing probabilistic prediction of the segmentation map. Having a compendium of different previously proposed U-Net variants makes it easier for machine learning researchers to identify relevant research questions and understand the challenges of the biological tasks that challenge the model. In this work, we discuss the practical aspects of the U-Net model and organize each variant model into a taxonomy. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. Furthermore, we provide a comprehensive implementation library with trained models. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation.
Collapse
|
39
|
Hussain MA, Grant PE, Ou Y. Inferring neurocognition using artificial intelligence on brain MRIs. FRONTIERS IN NEUROIMAGING 2024; 3:1455436. [PMID: 39664769 PMCID: PMC11631947 DOI: 10.3389/fnimg.2024.1455436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 11/07/2024] [Indexed: 12/13/2024]
Abstract
Brain magnetic resonance imaging (MRI) offers a unique lens to study neuroanatomic support of human neurocognition. A core mystery is the MRI explanation of individual differences in neurocognition and its manifestation in intelligence. The past four decades have seen great advancement in studying this century-long mystery, but the sample size and population-level studies limit the explanation at the individual level. The recent rise of big data and artificial intelligence offers novel opportunities. Yet, data sources, harmonization, study design, and interpretation must be carefully considered. This review aims to summarize past work, discuss rising opportunities and challenges, and facilitate further investigations on artificial intelligence inferring human neurocognition.
Collapse
Affiliation(s)
- Mohammad Arafat Hussain
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| | - Patricia Ellen Grant
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
| | - Yangming Ou
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
40
|
Vidanagamachchi SM, Waidyarathna KMGTR. Opportunities, challenges and future perspectives of using bioinformatics and artificial intelligence techniques on tropical disease identification using omics data. Front Digit Health 2024; 6:1471200. [PMID: 39654982 PMCID: PMC11625773 DOI: 10.3389/fdgth.2024.1471200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/06/2024] [Indexed: 12/12/2024] Open
Abstract
Tropical diseases can often be caused by viruses, bacteria, parasites, and fungi. They can be spread over vectors. Analysis of multiple omics data types can be utilized in providing comprehensive insights into biological system functions and disease progression. To this end, bioinformatics tools and diverse AI techniques are pivotal in identifying and understanding tropical diseases through the analysis of omics data. In this article, we provide a thorough review of opportunities, challenges, and future directions of utilizing Bioinformatics tools and AI-assisted models on tropical disease identification using various omics data types. We conducted the review from 2015 to 2024 considering reliable databases of peer-reviewed journals and conference articles. Several keywords were taken for the article searching and around 40 articles were reviewed. According to the review, we observed that utilization of omics data with Bioinformatics tools like BLAST, and Clustal Omega can make significant outcomes in tropical disease identification. Further, the integration of multiple omics data improves biomarker identification, and disease predictions including disease outbreak predictions. Moreover, AI-assisted models can improve the precision, cost-effectiveness, and efficiency of CRISPR-based gene editing, optimizing gRNA design, and supporting advanced genetic correction. Several AI-assisted models including XAI can be used to identify diseases and repurpose therapeutic targets and biomarkers efficiently. Furthermore, recent advancements including Transformer-based models such as BERT and GPT-4, have been mainly applied for sequence analysis and functional genomics. Finally, the most recent GeneViT model, utilizing Vision Transformers, and other AI techniques like Generative Adversarial Networks, Federated Learning, Transfer Learning, Reinforcement Learning, Automated ML and Attention Mechanism have shown significant performance in disease classification using omics data.
Collapse
Affiliation(s)
- S. M. Vidanagamachchi
- Department of Computer Science, Faculty of Science, University of Ruhuna, Matara, Sri Lanka
| | - K. M. G. T. R. Waidyarathna
- Department of Information Technology, Sri Lanka Institute of Advanced Technological Education, Galle, Sri Lanka
| |
Collapse
|
41
|
Mohanty MR, Mallick PK, Reddy AVN. Optimizing pulmonary chest x-ray classification with stacked feature ensemble and swin transformer integration. Biomed Phys Eng Express 2024; 11:015009. [PMID: 39504146 DOI: 10.1088/2057-1976/ad8c46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Indexed: 11/08/2024]
Abstract
This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.
Collapse
Affiliation(s)
| | | | - Annapareddy V N Reddy
- Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, NTR District, Andhra Pradesh, India
| |
Collapse
|
42
|
Wei J, Xu Y, Wang H, Niu T, Jiang Y, Shen Y, Su L, Dou T, Peng Y, Bi L, Xu X, Wang Y, Liu K. Metadata information and fundus image fusion neural network for hyperuricemia classification in diabetes. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 256:108382. [PMID: 39213898 DOI: 10.1016/j.cmpb.2024.108382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/21/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024]
Abstract
OBJECTIVE In diabetes mellitus patients, hyperuricemia may lead to the development of diabetic complications, including macrovascular and microvascular dysfunction. However, the level of blood uric acid in diabetic patients is obtained by sampling peripheral blood from the patient, which is an invasive procedure and not conducive to routine monitoring. Therefore, we developed deep learning algorithm to detect noninvasively hyperuricemia from retina photographs and metadata of patients with diabetes and evaluated performance in multiethnic populations and different subgroups. MATERIALS AND METHODS To achieve the task of non-invasive detection of hyperuricemia in diabetic patients, given that blood uric acid metabolism is directly related to estimated glomerular filtration rate(eGFR), we first performed a regression task for eGFR value before the classification task for hyperuricemia and reintroduced the eGFR regression values into the baseline information. We trained 3 deep learning models: (1) metadata model adjusted for sex, age, body mass index, duration of diabetes, HbA1c, systolic blood pressure, diastolic blood pressure; (2) image model based on fundus photographs; (3)hybrid model combining image and metadata model. Data from the Shanghai General Hospital Diabetes Management Center (ShDMC) were used to develop (6091 participants with diabetes) and internally validated (using 5-fold cross-validation) the models. External testing was performed on an independent dataset (UK Biobank dataset) consisting of 9327 participants with diabetes. RESULTS For the regression task of eGFR, in ShDMC dataset, the coefficient of determination (R2) was 0.684±0.07 (95 % CI) for image model, 0.501±0.04 for metadata model, and 0.727±0.002 for hybrid model. In external UK Biobank dataset, a coefficient of determination (R2) was 0.647±0.06 for image model, 0.627±0.03 for metadata model, and 0.697±0.07 for hybrid model. Our method was demonstrably superior to previous methods. For the classification of hyperuricemia, in ShDMC validation, the area, under the curve (AUC) was 0.86±0.013for image model, 0.86±0.013 for metadata model, and 0.92±0.026 for hybrid model. Estimates with UK biobank were 0.82±0.017 for image model, 0.79±0.024 for metadata model, and 0.89±0.032 for hybrid model. CONCLUSION There is a potential deep learning algorithm using fundus photographs as a noninvasively screening adjunct for hyperuricemia among individuals with diabetes. Meanwhile, combining patient's metadata enables higher screening accuracy. After applying the visualization tool, it found that the deep learning network for the identification of hyperuricemia mainly focuses on the fundus optic disc region.
Collapse
Affiliation(s)
- Jin Wei
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yupeng Xu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Hanying Wang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Tian Niu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yan Jiang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yinchen Shen
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Li Su
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Tianyu Dou
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yige Peng
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 20080, PR China
| | - Lei Bi
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 20080, PR China
| | - Xun Xu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yufan Wang
- Department of Endocrinology and Metabolism, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, PR China
| | - Kun Liu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China.
| |
Collapse
|
43
|
Artsi Y, Sorin V, Glicksberg BS, Nadkarni GN, Klang E. Advancing Clinical Practice: The Potential of Multimodal Technology in Modern Medicine. J Clin Med 2024; 13:6246. [PMID: 39458196 PMCID: PMC11508674 DOI: 10.3390/jcm13206246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/15/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
Multimodal technology is poised to revolutionize clinical practice by integrating artificial intelligence with traditional diagnostic modalities. This evolution traces its roots from Hippocrates' humoral theory to the use of sophisticated AI-driven platforms that synthesize data across multiple sensory channels. The interplay between historical medical practices and modern technology challenges conventional patient-clinician interactions and redefines diagnostic accuracy. Highlighting applications from neurology to radiology, the potential of multimodal technology emerges, suggesting a future where AI not only supports but enhances human sensory inputs in medical diagnostics. This shift invites the medical community to navigate the ethical, practical, and technological changes reshaping the landscape of clinical medicine.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Zefat 1311502, Israel
| | - Vera Sorin
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA;
| | - Benjamin S. Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
44
|
Huang P, Shang J, Fan Y, Hu Z, Dai J, Liu Z, Yan H. Unsupervised machine learning model for detecting anomalous volumetric modulated arc therapy plans for lung cancer patients. Front Big Data 2024; 7:1462745. [PMID: 39421134 PMCID: PMC11484413 DOI: 10.3389/fdata.2024.1462745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 09/16/2024] [Indexed: 10/19/2024] Open
Abstract
Purpose Volumetric modulated arc therapy (VMAT) is a new treatment modality in modern radiotherapy. To ensure the quality of the radiotherapy plan, a physics plan review is routinely conducted by senior clinicians; however, this process is less efficient and less accurate. In this study, a multi-task AutoEncoder (AE) is proposed to automate anomaly detection of VMAT plans for lung cancer patients. Methods The feature maps are first extracted from a VMAT plan. Then, a multi-task AE is trained based on the input of a feature map, and its output is the two targets (beam aperture and prescribed dose). Based on the distribution of reconstruction errors on the training set, a detection threshold value is obtained. For a testing sample, its reconstruction error is calculated using the AE model and compared with the threshold value to determine its classes (anomaly or regular). The proposed multi-task AE model is compared to the other existing AE models, including Vanilla AE, Contractive AE, and Variational AE. The area under the receiver operating characteristic curve (AUC) and the other statistics are used to evaluate the performance of these models. Results Among the four tested AE models, the proposed multi-task AE model achieves the highest values in AUC (0.964), accuracy (0.821), precision (0.471), and F1 score (0.632), and the lowest value in FPR (0.206). Conclusion The proposed multi-task AE model using two-dimensional (2D) feature maps can effectively detect anomalies in radiotherapy plans for lung cancer patients. Compared to the other existing AE models, the multi-task AE is more accurate and efficient. The proposed model provides a feasible way to carry out automated anomaly detection of VMAT plans in radiotherapy.
Collapse
Affiliation(s)
| | | | | | | | - Jianrong Dai
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhiqiang Liu
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Hui Yan
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
45
|
Guo Z, Zhang J, Wang H, Dong H, Li S, Shao X, Huang J, Yin X, Zhang Q, Guo Y, Sun X, Darwish I. Enhanced detection of Aspergillus flavus in peanut kernels using a multi-scale attention transformer (MSAT): Advancements in food safety and contamination analysis. Int J Food Microbiol 2024; 423:110831. [PMID: 39083880 DOI: 10.1016/j.ijfoodmicro.2024.110831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/18/2024] [Accepted: 07/18/2024] [Indexed: 08/02/2024]
Abstract
In this study, a multi-scale attention transformer (MSAT) was coupled with hyperspectral imaging for classifying peanut kernels contaminated with diverse Aspergillus flavus fungi. The results underscored that the MSAT significantly outperformed classic deep learning models, due to its sophisticated multi-scale attention mechanism which enhanced its classification capabilities. The multi-scale attention mechanism was utilized by employing several multi-head attention layers to focus on both fine-scale and broad-scale features. It also integrated a series of scale processing layers to capture features at different resolutions and incorporated a self-attention mechanism to integrate information across different levels. The MSAT model achieved outstanding performance in different classification tasks, particularly in distinguishing healthy peanut kernels from those contaminated with aflatoxigenic fungi, with test accuracy achieving 98.42±0.22%. However, it faced challenges in differentiating peanut kernels contaminated with aflatoxigenic fungi from those with non-aflatoxigenic contamination. Visualization of attention weights explicitly revealed that the MSAT model's multi-scale attention mechanism progressively refined its focus from broad spatial-spectral features to more specialized signatures. Overall, the MSAT model's advanced processing capabilities marked a notable advancement in the field of food quality safety, offering a robust and reliable tool for the rapid and accurate detection of Aspergillus flavus contaminations in food.
Collapse
Affiliation(s)
- Zhen Guo
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Jing Zhang
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Haifang Wang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Haowei Dong
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Shiling Li
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Xijun Shao
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Jingcheng Huang
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Xiang Yin
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China
| | - Qi Zhang
- Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China
| | - Yemin Guo
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China.
| | - Xia Sun
- School of Agricultural Engineering and Food Science, Shandong University of Technology, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Shandong Provincial Engineering Research Center of Vegetable Safety and Quality Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China; Zibo City Key Laboratory of Agricultural Product Safety Traceability, No. 266 Xincun Xilu, Zibo, Shandong 255049, China.
| | - Ibrahim Darwish
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia
| |
Collapse
|
46
|
Abbasian Ardakani A, Airom O, Khorshidi H, Bureau NJ, Salvi M, Molinari F, Acharya UR. Interpretation of Artificial Intelligence Models in Healthcare: A Pictorial Guide for Clinicians. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2024; 43:1789-1818. [PMID: 39032010 DOI: 10.1002/jum.16524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/19/2024] [Accepted: 07/01/2024] [Indexed: 07/22/2024]
Abstract
Artificial intelligence (AI) models can play a more effective role in managing patients with the explosion of digital health records available in the healthcare industry. Machine-learning (ML) and deep-learning (DL) techniques are two methods used to develop predictive models that serve to improve the clinical processes in the healthcare industry. These models are also implemented in medical imaging machines to empower them with an intelligent decision system to aid physicians in their decisions and increase the efficiency of their routine clinical practices. The physicians who are going to work with these machines need to have an insight into what happens in the background of the implemented models and how they work. More importantly, they need to be able to interpret their predictions, assess their performance, and compare them to find the one with the best performance and fewer errors. This review aims to provide an accessible overview of key evaluation metrics for physicians without AI expertise. In this review, we developed four real-world diagnostic AI models (two ML and two DL models) for breast cancer diagnosis using ultrasound images. Then, 23 of the most commonly used evaluation metrics were reviewed uncomplicatedly for physicians. Finally, all metrics were calculated and used practically to interpret and evaluate the outputs of the models. Accessible explanations and practical applications empower physicians to effectively interpret, evaluate, and optimize AI models to ensure safety and efficacy when integrated into clinical practice.
Collapse
Affiliation(s)
- Ali Abbasian Ardakani
- Department of Radiology Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Omid Airom
- Department of Mathematics, University of Padova, Padova, Italy
| | - Hamid Khorshidi
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Nathalie J Bureau
- Department of Radiology, Centre Hospitalier de l'Université de Montréal, Montreal, Quebec, Canada
| | - Massimo Salvi
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Filippo Molinari
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Queensland, Australia
- Centre for Health Research, University of Southern Queensland, Springfield, Queensland, Australia
| |
Collapse
|
47
|
Kalavakonda V, Mohamed S, Abhay L, Muthu S. Automated Screening of Hip X-rays for Osteoporosis by Singh's Index Using Machine Learning Algorithms. Indian J Orthop 2024; 58:1449-1457. [PMID: 39324087 PMCID: PMC11420408 DOI: 10.1007/s43465-024-01246-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 07/22/2024] [Indexed: 09/11/2024]
Abstract
INTRODUCTION Osteoporosis is a significant and growing global public health problem, projected to increase in the next decade. The Singh Index (SI) is a simple, semi-quantitative evaluation tool for diagnosing osteoporosis with plain hip radiographs based on the visibility of the trabecular pattern in the proximal femur. This work aims to develop an automated tool to diagnose osteoporosis using SI of hip radiograph images with the help of machine learning algorithms. METHODS We used 830 hip X-ray images collected from Indian men and women aged between 20 and 70 which were annotated and labeled for appropriate SI. We employed three state-of-the-art machine learning algorithms-Vision Transformer (ViT), MobileNet-V3, and a Stacked Convolutional Neural Network (CNN)-for image pre-processing, feature extraction, and automation. Each algorithm was evaluated and compared for accuracy, precision, recall, and generalization capabilities to diagnose osteoporosis. RESULTS The ViT model achieved an overall accuracy of 62.6% with macro-averages of 0.672, 0.597, and 0.622 for precision, recall, and F1 score, respectively. MobileNet-V3 presented a more encouraging accuracy of 69.6% with macro-averages for precision, recall, and F1 score of 0.845, 0.636, and 0.652, respectively. The stacked CNN model demonstrated the strongest performance, achieving an accuracy of 93.6% with well-balanced precision, recall, and F1-score metrics. CONCLUSION The superior accuracy, precision-recall balance, and high F1-scores of the stacked CNN model make it the most reliable tool for screening radiographs and diagnosing osteoporosis using the SI.
Collapse
Affiliation(s)
- Vijaya Kalavakonda
- Department of Computing Technologies, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India
| | - Sameer Mohamed
- Sri Ramachandra Institute of Higher Education and Research, Chennai, India
| | - Lal Abhay
- Department of Computing Technologies, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India
| | - Sathish Muthu
- Orthopaedic Research Group, Coimbatore, Tamil Nadu India
- Department of Biotechnology, Faculty of Engineering, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu India
- Department of Orthopaedics, Government Medical College, Karur, Tamil Nadu India
| |
Collapse
|
48
|
Bhati D, Neha F, Amiruzzaman M. A Survey on Explainable Artificial Intelligence (XAI) Techniques for Visualizing Deep Learning Models in Medical Imaging. J Imaging 2024; 10:239. [PMID: 39452402 PMCID: PMC11508748 DOI: 10.3390/jimaging10100239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 09/14/2024] [Accepted: 09/21/2024] [Indexed: 10/26/2024] Open
Abstract
The combination of medical imaging and deep learning has significantly improved diagnostic and prognostic capabilities in the healthcare domain. Nevertheless, the inherent complexity of deep learning models poses challenges in understanding their decision-making processes. Interpretability and visualization techniques have emerged as crucial tools to unravel the black-box nature of these models, providing insights into their inner workings and enhancing trust in their predictions. This survey paper comprehensively examines various interpretation and visualization techniques applied to deep learning models in medical imaging. The paper reviews methodologies, discusses their applications, and evaluates their effectiveness in enhancing the interpretability, reliability, and clinical relevance of deep learning models in medical image analysis.
Collapse
Affiliation(s)
- Deepshikha Bhati
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Fnu Neha
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Md Amiruzzaman
- Department of Computer Science, West Chester University, West Chester, PA 19383, USA;
| |
Collapse
|
49
|
Mahdi MA, Ahamad S, Saad SA, Dafhalla A, Alqushaibi A, Qureshi R. Enhancing Predictive Accuracy for Recurrence-Free Survival in Head and Neck Tumor: A Comparative Study of Weighted Fusion Radiomic Analysis. Diagnostics (Basel) 2024; 14:2038. [PMID: 39335718 PMCID: PMC11431645 DOI: 10.3390/diagnostics14182038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/27/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Despite advancements in oncology, predicting recurrence-free survival (RFS) in head and neck (H&N) cancer remains challenging due to the heterogeneity of tumor biology and treatment responses. This study aims to address the research gap in the prognostic efficacy of traditional clinical predictors versus advanced radiomics features and to explore the potential of weighted fusion techniques for enhancing RFS prediction. We utilized clinical data, radiomic features from CT and PET scans, and various weighted fusion algorithms to stratify patients into low- and high-risk groups for RFS. The predictive performance of each model was evaluated using Kaplan-Meier survival analysis, and the significance of differences in RFS rates was assessed using confidence interval (CI) tests. The weighted fusion model with a 90% emphasis on PET features significantly outperformed individual modalities, yielding the highest C-index. Additionally, the incorporation of contextual information by varying peritumoral radii did not substantially improve prediction accuracy. While the clinical model and the radiomics model, individually, did not achieve statistical significance in survival differentiation, the combined feature set showed improved performance. The integration of radiomic features with clinical data through weighted fusion algorithms enhances the predictive accuracy of RFS outcomes in head and neck cancer. Our findings suggest that the utilization of multi-modal data helps in developing more reliable predictive models and underscore the potential of PET imaging in refining prognostic assessments. This study propels the discussion forward, indicating a pivotal step toward the adoption of precision medicine in cancer care.
Collapse
Affiliation(s)
- Mohammed A Mahdi
- Information and Computer Science Department, College of Computer Science and Engineering, University of Ha'il, Ha'il 55476, Saudi Arabia
| | - Shahanawaj Ahamad
- Software Engineering Department, College of Computer Science and Engineering, University of Ha'il, Ha'il 55476, Saudi Arabia
| | - Sawsan A Saad
- Computer Engineering Department, College of Computer Science and Engineering, University of Ha'il, Ha'il 55476, Saudi Arabia
| | - Alaa Dafhalla
- Computer Engineering Department, College of Computer Science and Engineering, University of Ha'il, Ha'il 55476, Saudi Arabia
| | - Alawi Alqushaibi
- Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia
| | - Rizwan Qureshi
- Fast School of Computing, National University of Computer and Emerging Sciences, Karachi 75270, Pakistan
| |
Collapse
|
50
|
Bi S, Yuan Q, Dai Z, Sun X, Wan Sohaimi WFB, Bin Yusoff AL. Advances in CT-based lung function imaging for thoracic radiotherapy. Front Oncol 2024; 14:1414337. [PMID: 39286020 PMCID: PMC11403405 DOI: 10.3389/fonc.2024.1414337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/14/2024] [Indexed: 09/19/2024] Open
Abstract
The objective of this review is to examine the potential benefits and challenges of CT-based lung function imaging in radiotherapy over recent decades. This includes reviewing background information, defining related concepts, classifying and reviewing existing studies, and proposing directions for further investigation. The lung function imaging techniques reviewed herein encompass CT-based methods, specifically utilizing phase-resolved four-dimensional CT (4D-CT) or end-inspiratory and end-expiratory CT scans, to delineate distinct functional regions within the lungs. These methods extract crucial functional parameters, including lung volume and ventilation distribution, pivotal for assessing and characterizing the functional capacity of the lungs. CT-based lung ventilation imaging offers numerous advantages, notably in the realm of thoracic radiotherapy. By utilizing routine CT scans, additional radiation exposure and financial burdens on patients can be avoided. This imaging technique also enables the identification of different functional areas of the lung, which is crucial for minimizing radiation exposure to healthy lung tissue and predicting and detecting lung injury during treatment. In conclusion, CT-based lung function imaging holds significant promise for improving the effectiveness and safety of thoracic radiotherapy. Nevertheless, challenges persist, necessitating further research to address limitations and optimize clinical utilization. Overall, this review highlights the importance of CT-based lung function imaging as a valuable tool in radiotherapy planning and lung injury monitoring.
Collapse
Affiliation(s)
- Suyan Bi
- School of Medical Sciences, Universiti Sains Malaysia, Kelantan, Malaysia
| | - Qingqing Yuan
- National Cancer Center/National Clinical Research Center for Cancer/ Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Zhitao Dai
- National Cancer Center/National Clinical Research Center for Cancer/ Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Xingru Sun
- Huizhou Third People's Hospital, Guangzhou Medical University, Huizhou, Guangdong, China
| | - Wan Fatihah Binti Wan Sohaimi
- Department of Nuclear Medicine Radiotherapy and Oncology, School of Medical Sciences, Universiti Sains Malaysia, Kelantan, Malaysia
| | - Ahmad Lutfi Bin Yusoff
- Department of Nuclear Medicine Radiotherapy and Oncology, School of Medical Sciences, Universiti Sains Malaysia, Kelantan, Malaysia
| |
Collapse
|