1
|
Lei J, Dai L, Jiang H, Wu C, Zhang X, Zhang Y, Yao J, Xie W, Zhang Y, Li Y, Zhang Y, Wang Y. UniBrain: Universal Brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. Comput Med Imaging Graph 2025; 122:102516. [PMID: 40073706 DOI: 10.1016/j.compmedimag.2025.102516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 01/09/2025] [Accepted: 02/18/2025] [Indexed: 03/14/2025]
Abstract
Magnetic Resonance Imaging (MRI) has become a pivotal tool in diagnosing brain diseases, with a wide array of computer-aided artificial intelligence methods being proposed to enhance diagnostic accuracy. However, early studies were often limited by small-scale datasets and a narrow range of disease types, which posed challenges in model generalization. This study presents UniBrain, a hierarchical knowledge-enhanced pre-training framework designed for universal brain MRI diagnosis. UniBrain leverages a large-scale dataset comprising 24,770 imaging-report pairs from routine diagnostics for pre-training. Unlike previous approaches that either focused solely on visual representation learning or used brute-force alignment between vision and language, the framework introduces a hierarchical alignment mechanism. This mechanism extracts structured knowledge from free-text clinical reports at multiple granularities, enabling vision-language alignment at both the sequence and case levels, thereby significantly improving feature learning efficiency. A coupled vision-language perception module is further employed for text-guided multi-label classification, which facilitates zero-shot evaluation and fine-tuning of downstream tasks without modifying the model architecture. UniBrain is validated on both in-domain and out-of-domain datasets, consistently surpassing existing state-of-the-art diagnostic models and demonstrating performance on par with radiologists in specific disease categories. It shows strong generalization capabilities across diverse tasks, highlighting its potential for broad clinical application. The code is available at https://github.com/ljy19970415/UniBrain.
Collapse
Affiliation(s)
- Jiayu Lei
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Lisong Dai
- Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200233, China
| | - Haoyun Jiang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Chaoyi Wu
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Xiaoman Zhang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Yao Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Jiangchao Yao
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Weidi Xie
- School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, 200230, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Yanyong Zhang
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Yuehua Li
- Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200233, China
| | - Ya Zhang
- School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, 200230, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Yanfeng Wang
- School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, 200230, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| |
Collapse
|
2
|
Yao Z, Xie W, Chen J, Zhan Y, Wu X, Dai Y, Pei Y, Wang Z, Zhang G. IT: An interpretable transformer model for Alzheimer's disease prediction based on PET/MR images. Neuroimage 2025; 311:121210. [PMID: 40222500 DOI: 10.1016/j.neuroimage.2025.121210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 03/24/2025] [Accepted: 04/11/2025] [Indexed: 04/15/2025] Open
Abstract
Alzheimer's disease (AD) represents a significant challenge due to its progressive neurodegenerative impact, particularly within an aging global demographic. This underscores the critical need for developing sophisticated diagnostic tools for its early detection and precise monitoring. Within this realm, PET/MR imaging stands out as a potent dual-modality approach that transforms sensor data into detailed perceptual mappings, thereby enriching our grasp of brain pathophysiology. To capitalize on the strengths of PET/MR imaging in diagnosing AD, we have introduced a novel deep learning framework named "IT", which is inspired by the Transformer architecture. This innovative model adeptly captures both local and global characteristics within the imaging data, refining these features through advanced feature engineering techniques to achieve a synergistic integration. The efficiency of our model is underscored by robust experimental validation, wherein it delivers superior performance on a host of evaluative benchmarks, all while maintaining low demands on computational resources. Furthermore, the features we extracted resonate with established medical theories regarding feature distribution and usage efficiency, enhancing the clinical relevance of our findings. These insights significantly bolster the arsenal of tools available for AD diagnostics and contribute to the broader narrative of deciphering brain functionality through state-of-the-art imaging modalities.
Collapse
Affiliation(s)
- Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Weiming Xie
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Jiaming Chen
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Ying Zhan
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Xiaodan Wu
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Yingxin Dai
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Yusong Pei
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Zhiguo Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China.
| | - Guoxu Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China.
| |
Collapse
|
3
|
Pak S, Son HJ, Kim D, Woo JY, Yang I, Hwang HS, Rim D, Choi MS, Lee SH. Comparison of CNNs and Transformer Models in Diagnosing Bone Metastases in Bone Scans Using Grad-CAM. Clin Nucl Med 2025:00003072-990000000-01645. [PMID: 40237349 DOI: 10.1097/rlu.0000000000005898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Accepted: 03/09/2025] [Indexed: 04/18/2025]
Abstract
PURPOSE Convolutional neural networks (CNNs) have been studied for detecting bone metastases on bone scans; however, the application of ConvNeXt and transformer models has not yet been explored. This study aims to evaluate the performance of various deep learning models, including the ConvNeXt and transformer models, in diagnosing metastatic lesions from bone scans. MATERIALS AND METHODS We retrospectively analyzed bone scans from patients with cancer obtained at 2 institutions: the training and validation sets (n=4626) were from Hospital 1 and the test set (n=1428) was from Hospital 2. The deep learning models evaluated included ResNet18, the Data-Efficient Image Transformer (DeiT), the Vision Transformer (ViT Large 16), the Swin Transformer (Swin Base), and ConvNeXt Large. Gradient-weighted class activation mapping (Grad-CAM) was used for visualization. RESULTS Both the validation set and the test set demonstrated that the ConvNeXt large model (0.969 and 0.885, respectively) exhibited the best performance, followed by the Swin Base model (0.965 and 0.840, respectively), both of which significantly outperformed ResNet (0.892 and 0.725, respectively). Subgroup analyses revealed that all the models demonstrated greater diagnostic accuracy for patients with polymetastasis compared with those with oligometastasis. Grad-CAM visualization revealed that the ConvNeXt Large model focused more on identifying local lesions, whereas the Swin Base model focused on global areas such as the axial skeleton and pelvis. CONCLUSIONS Compared with traditional CNN and transformer models, the ConvNeXt model demonstrated superior diagnostic performance in detecting bone metastases from bone scans, especially in cases of polymetastasis, suggesting its potential in medical image analysis.
Collapse
Affiliation(s)
- Sehyun Pak
- Department of Medicine, Hallym University College of Medicine, Chuncheon, Gangwon, Republic of Korea
| | - Hye Joo Son
- Department of Nuclear Medicine, Dankook University Medical Center, Cheonan, Chungnam, Republic of Korea
| | - Dongwoo Kim
- Department of Nuclear Medicine, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Gyeonggi, Republic of Korea
| | - Ji Young Woo
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Ik Yang
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Hee Sung Hwang
- Department of Nuclear Medicine, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Gyeonggi, Republic of Korea
| | | | - Min Seok Choi
- PE Data Solution, SK hynix, Icheon, Gyeonggi, Republic of Korea
| | - Suk Hyun Lee
- Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
4
|
Kuang H, Hu B, Wan W, Liu S, Yang S, Liao W, Yuan L, Luo G, Qiu W. A dual-branch hybrid network with bilateral-difference awareness for collateral scoring on CT angiography of acute ischemic stroke patients. Phys Med Biol 2025; 70:085010. [PMID: 40179945 DOI: 10.1088/1361-6560/adc8f5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 04/03/2025] [Indexed: 04/05/2025]
Abstract
Objective.Acute ischemic stroke (AIS) patients with good collaterals tend to have better outcomes after endovascular therapy. Existing collateral scoring methods rely mainly on vessel segmentation and convolutional neural networks (CNNs), often ignoring bilateral brain differences. This study aims to develop an automated collateral scoring model incorporating bilateral-difference awareness to improve prediction accuracy.Approach.In this paper, we propose a new dual-branch hybrid network to achieve vessel-segmentation-free collateral scoring on the CT Angiography (CTA) of 255 AIS patients. Specifically, we first adopt a data preprocessing method based on maximum intensity projection. To capture the differences between the left and right sides of the brain, we propose a novel bilateral-difference awareness module (BDAM). Then we design a hybrid network that consists of a multi-scale module, a CNN branch, a transformer branch and a feature interaction enhancement module in each stage. In addition, to learn more effective features, we propose a novel local enhancement module and a novel global enhancement module (GEM) to strengthen the local features captured by the CNN branch and the global features of the transformer branch, respectively.Main results.Experiments on a private clinical dataset with CTA images of 255 AIS patients show that our proposed method achieves an accuracy of 85.49% and an intraclass correlation coefficient of 0.9284 for 3-point collateral scoring, outperforming 13 state-of-the-art methods. Besides, for the binary classification tasks (good vs. non-good collateral scoring, poor vs. non-poor collateral scoring), our proposed method also achieves the best accuracies (89.02% and 92.94%).Significance.In this paper, we propose a novel dual-branch hybrid network that incorporates distinct local and GEMs, along with a BDAM, to achieve collateral scoring without the need for vessel segmentation. Our experimental evaluation shows that our model achieves state-of-the-art performance, providing valuable support for improving the efficiency of stroke treatment.
Collapse
Affiliation(s)
- Hulin Kuang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Bin Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Wenfang Wan
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Shulin Liu
- Department of Radiology, Xiangya Hospital, Central South University, Changsha 410008, People's Republic of China
| | - Shuai Yang
- Department of Radiology, Xiangya Hospital, Central South University, Changsha 410008, People's Republic of China
| | - Weihua Liao
- Department of Radiology, Xiangya Hospital, Central South University, Changsha 410008, People's Republic of China
| | - Li Yuan
- Wuhan Children's Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430016, People's Republic of China
| | - Guanghua Luo
- Department of Radiology, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang 421001, People's Republic of China
| | - Wu Qiu
- School of Life Science and Technology, Advanced Bio-Medical Imaging Facility, Department of Neurology in Union Hospital, Huazhong University of Science and Technology, Wuhan 430074, People's Republic of China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, People's Republic of China
| |
Collapse
|
5
|
Fan C, Zhu Z, Peng B, Xuan Z, Zhu X. EAAC-Net: An Efficient Adaptive Attention and Convolution Fusion Network for Skin Lesion Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025; 38:1120-1136. [PMID: 39147886 PMCID: PMC11950606 DOI: 10.1007/s10278-024-01223-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/13/2024] [Accepted: 07/31/2024] [Indexed: 08/17/2024]
Abstract
Accurate segmentation of skin lesions in dermoscopic images is of key importance for quantitative analysis of melanoma. Although existing medical image segmentation methods significantly improve skin lesion segmentation, they still have limitations in extracting local features with global information, do not handle challenging lesions well, and usually have a large number of parameters and high computational complexity. To address these issues, this paper proposes an efficient adaptive attention and convolutional fusion network for skin lesion segmentation (EAAC-Net). We designed two parallel encoders, where the efficient adaptive attention feature extraction module (EAAM) adaptively establishes global spatial dependence and global channel dependence by constructing the adjacency matrix of the directed graph and can adaptively filter out the least relevant tokens at the coarse-grained region level, thus reducing the computational complexity of the self-attention mechanism. The efficient multiscale attention-based convolution module (EMA⋅C) utilizes multiscale attention for cross-space learning of local features extracted from the convolutional layer to enhance the representation of richly detailed local features. In addition, we designed a reverse attention feature fusion module (RAFM) to enhance the effective boundary information gradually. To validate the performance of our proposed network, we compared it with other methods on ISIC 2016, ISIC 2018, and PH2 public datasets, and the experimental results show that EAAC-Net has superior segmentation performance under commonly used evaluation metrics.
Collapse
Affiliation(s)
- Chao Fan
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou City, Henan Province, China
- Key Laboratory of Grain Information Processing and Control, Ministry of Education, Zhengzhou City, Henan Province, China
| | - Zhentong Zhu
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China.
| | - Bincheng Peng
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| | - Zhihui Xuan
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| | - Xinru Zhu
- School of Information Science and Engineering, Henan University of Technology, Henan Province, Post Code, Zhengzhou City, 450001, China
| |
Collapse
|
6
|
Yan Y, Lu R, Sun J, Zhang J, Zhang Q. Breast cancer histopathology image classification using transformer with discrete wavelet transform. Med Eng Phys 2025; 138:104317. [PMID: 40180530 DOI: 10.1016/j.medengphy.2025.104317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 01/05/2025] [Accepted: 02/25/2025] [Indexed: 04/05/2025]
Abstract
Early diagnosis of breast cancer using pathological images is essential to effective treatment. With the development of deep learning techniques, breast cancer histopathology image classification methods based on neural networks develop rapidly. However, these methods usually capture features in the spatial domain, rarely consider frequency feature distributions, which limits classification performance to some extent. This paper proposes a novel breast cancer histopathology image classification network, called DWNAT-Net, which introduces Discrete Wavelet Transform (DWT) to Neighborhood Attention Transformer (NAT). DWT decomposes inputs into different frequency bands through iterative filtering and downsampling, and it can extract frequency information while retaining spatial information. NAT utilizes Neighborhood Attention (NA) to confine the attention computation to a local neighborhood around each token to enable efficient modeling of local dependencies. The proposed method was evaluated on the BreakHis and Bach datasets, yielding impressive image-level recognition accuracy rates. We achieve a recognition accuracy rate of 99.66% on the BreakHis dataset and 91.25% on the BACH dataset, demonstrating competitive performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Yuting Yan
- School of Computer Science and Engineering, Dalian Minzu University, 116650, Dalain, China
| | - Ruidong Lu
- School of Computer Science and Engineering, Dalian Minzu University, 116650, Dalain, China
| | - Jian Sun
- School of Computer Science and Engineering, Dalian Minzu University, 116650, Dalain, China
| | - Jianxin Zhang
- School of Computer Science and Engineering, Dalian Minzu University, 116650, Dalain, China; SEAC Key Lab of Big Data Applied Technology, School of Computer Science and Engineering, Dalian Minzu University, 116650, Dalain, China.
| | - Qiang Zhang
- Key Lab of Advanced Design and Intelligent Computing (Ministry of Education), Dalian University, 116622, Dalain, China
| |
Collapse
|
7
|
Wang Z, Li F, Cai J, Xue Z, Du K, Tao Y, Zhang H, Zhou Y, Fan H, Wang Z. Identification of lesion bioactivity in hepatic cystic echinococcosis using a transformer-based fusion model. J Infect 2025; 90:106455. [PMID: 40049526 DOI: 10.1016/j.jinf.2025.106455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 02/26/2025] [Indexed: 04/12/2025]
Abstract
BACKGROUND Differentiating whether hepatic cystic echinococcosis (HCE) lesions exhibit biological activity is essential for developing effective treatment plans. This study evaluates the performance of a Transformer-based fusion model in assessing HCE lesion activity. METHODS This study analyzed CT images and clinical variables from 700 HCE patients across three hospitals from 2018 to 2023. Univariate and multivariate logistic regression analyses were conducted for the selection of clinical variables to construct a clinical model. Radiomics features were extracted from CT images using Pyradiomics to develop a radiomics model. Additionally, a 2D deep learning model and a 3D deep learning model were trained using the CT images. The fusion model was constructed using feature-level fusion, decision-level fusion, and a Transformer network architecture, allowing for the analysis of the discriminative ability and correlation among radiomics features, 2D deep learning features, and 3D deep learning features, while comparing the classification performance of the three multimodal fusion models. RESULTS In comparison to radiomics and 2D deep learning features, the 3D deep learning features exhibited superior discriminative ability in identifying the biological activity of HCE lesions. The Transformer-based fusion model demonstrated the highest performance in both the internal validation set and the external validation set, achieving AUC values of 0.997 (0.992-1.000) and 0.944 (0.911-0.977), respectively, thereby outperforming both the feature-level and decision-level fusion models, and enabling precise differentiation of HCE lesion biological activity. CONCLUSION The Transformer multimodal fusion model integrates clinical features, radiomics features, and both 2D and 3D deep learning features, facilitating accurate differentiation of the biological activity of HCE lesions and exhibiting significant potential for clinical application.
Collapse
Affiliation(s)
| | - Fuyuan Li
- Qinghai University, Xining, Qinghai, China
| | - Junjie Cai
- Qinghai University, Xining, Qinghai, China
| | | | - Kaihao Du
- Qinghai University, Xining, Qinghai, China
| | | | - Hanxi Zhang
- Department of Hepatobiliary and Pancreatic Surgery, Qinghai University Affiliated Hospital, Xining, Qinghai, China
| | - Ying Zhou
- Department of Hepatobiliary and Pancreatic Surgery, Qinghai University Affiliated Hospital, Xining, Qinghai, China
| | - Haining Fan
- Department of Hepatobiliary and Pancreatic Surgery, Qinghai University Affiliated Hospital, Xining, Qinghai, China
| | - Zhan Wang
- Department of Medical Engineering and Translational Applications, Qinghai University Affiliated Hospital, Xining, Qinghai, China.
| |
Collapse
|
8
|
Kanakarajan H, De Baene W, Hanssens P, Sitskoorn M. Automated segmentation of brain metastases in T1-weighted contrast-enhanced MR images pre and post stereotactic radiosurgery. BMC Med Imaging 2025; 25:101. [PMID: 40140740 PMCID: PMC11948752 DOI: 10.1186/s12880-025-01643-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 03/17/2025] [Indexed: 03/28/2025] Open
Abstract
BACKGROUND AND PURPOSE Accurate segmentation of brain metastases on Magnetic Resonance Imaging (MRI) is tedious and time-consuming for radiologists that could be optimized with deep learning (DL). Previous studies assessed several DL algorithms focusing only on training and testing the models on the planning MRI only. The purpose of this study is to evaluate well-known DL approaches (nnU-Net and MedNeXt) for their performance on both planning and follow-up MRI. MATERIALS AND METHODS Pre-treatment brain MRIs were retrospectively collected for 255 patients at Elisabeth-TweeSteden Hospital (ETZ): 201 for training and 54 for testing, including follow-up MRIs for the test set. To increase heterogeneity, we added the publicly available MRI scans from the Mathematical oncology laboratory of 75 patients to the training data. The performance was compared between the two models, with and without the addition of the public data. To statistically compare the Dice Similarity Coefficient (DSC) of the two models trained on different datasets over multiple time points, we used Linear Mixed Models. RESULTS All models obtained a good DSC (DSC > = 0.93) for planning MRI. MedNeXt trained with combined data provided the best DSC for follow-ups at 6, 15, and 21 months (DSC of 0.74, 0.74, and 0.70 respectively) and jointly the best DSC for follow-ups at three months with MedNeXt trained with ETZ data only (DSC of 0.78) and 12 months with nnU-Net trained with combined data (DSC of 0.71). On the other hand, nnU-Net trained with combined data provided the best sensitivity and FNR for most follow-ups. The statistical analysis showed that MedNeXt provides higher DSC for both datasets and the addition of public data to the training dataset results in a statistically significant increase in performance in both models. CONCLUSION The models achieved a good performance score for planning MRI. Though the models performed less effectively for follow-ups, the addition of public data enhanced their performance, providing a viable solution to improve their efficacy for the follow-ups. These algorithms hold promise as a valuable tool for clinicians for automated segmentation of planning and follow-up MRI scans during stereotactic radiosurgery treatment planning and response evaluations, respectively. CLINICAL TRIAL NUMBER Not applicable.
Collapse
Affiliation(s)
- Hemalatha Kanakarajan
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.
| | - Wouter De Baene
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.
| | - Patrick Hanssens
- Gamma Knife Center, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
- Department of Neurosurgery, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| | - Margriet Sitskoorn
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
9
|
Liu Y, Li J, Zhao C, Zhang Y, Chen Q, Qin J, Dong L, Wang T, Jiang W, Lei B. FAMF-Net: Feature Alignment Mutual Attention Fusion With Region Awareness for Breast Cancer Diagnosis via Imbalanced Data. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1153-1167. [PMID: 39499601 DOI: 10.1109/tmi.2024.3485612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
Automatic and accurate classification of breast cancer in multimodal ultrasound images is crucial to improve patients' diagnosis and treatment effect and save medical resources. Methodologically, the fusion of multimodal ultrasound images often encounters challenges such as misalignment, limited utilization of complementary information, poor interpretability in feature fusion, and imbalances in sample categories. To solve these problems, we propose a feature alignment mutual attention fusion method (FAMF-Net), which consists of a region awareness alignment (RAA) block, a mutual attention fusion (MAF) block, and a reinforcement learning-based dynamic optimization strategy(RDO). Specifically, RAA achieves region awareness through class activation mapping and performs translation transformation to achieve feature alignment. When MAF utilizes a mutual attention mechanism for feature interaction fusion, it mines edge and color features separately in B-mode and shear wave elastography images, enhancing the complementarity of features and improving interpretability. Finally, RDO uses the distribution of samples and prediction probabilities during training as the state of reinforcement learning to dynamically optimize the weights of the loss function, thereby solving the problem of class imbalance. The experimental results based on our clinically obtained dataset demonstrate the effectiveness of the proposed method. Our code will be available at: https://github.com/Magnety/Multi_modal_Image.
Collapse
|
10
|
Jiang K, Xie Y, Zhang X, Zhang X, Zhou B, Li M, Chen Y, Hu J, Zhang Z, Chen S, Yu K, Qiu C, Zhang X. Fully and Weakly Supervised Deep Learning for Meniscal Injury Classification, and Location Based on MRI. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025; 38:191-202. [PMID: 39020156 PMCID: PMC11811310 DOI: 10.1007/s10278-024-01198-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 06/14/2024] [Accepted: 07/08/2024] [Indexed: 07/19/2024]
Abstract
Meniscal injury is a common cause of knee joint pain and a precursor to knee osteoarthritis (KOA). The purpose of this study is to develop an automatic pipeline for meniscal injury classification and localization using fully and weakly supervised networks based on MRI images. In this retrospective study, data were from the osteoarthritis initiative (OAI). The MR images were reconstructed using a sagittal intermediate-weighted fat-suppressed turbo spin-echo sequence. (1) We used 130 knees from the OAI to develop the LGSA-UNet model which fuses the features of adjacent slices and adjusts the blocks in Siam to enable the central slice to obtain rich contextual information. (2) One thousand seven hundred and fifty-six knees from the OAI were included to establish segmentation and classification models. The segmentation model achieved a DICE coefficient ranging from 0.84 to 0.93. The AUC values ranged from 0.85 to 0.95 in the binary models. The accuracy for the three types of menisci (normal, tear, and maceration) ranged from 0.60 to 0.88. Furthermore, 206 knees from the orthopedic hospital were used as an external validation data set to evaluate the performance of the model. The segmentation and classification models still performed well on the external validation set. To compare the diagnostic performances between the deep learning (DL) models and radiologists, the external validation sets were sent to two radiologists. The binary classification model outperformed the diagnostic performance of the junior radiologist (0.82-0.87 versus 0.74-0.88). This study highlights the potential of DL in knee meniscus segmentation and injury classification which can help improve diagnostic efficiency.
Collapse
Affiliation(s)
- Kexin Jiang
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Yuhan Xie
- School of Electronics and Communication Engineering, Sun Yat-sen University, Guangzhou, China
| | - Xintao Zhang
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Xinru Zhang
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Beibei Zhou
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Mianwen Li
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Yanjun Chen
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Jiaping Hu
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Zhiyong Zhang
- School of Electronics and Communication Engineering, Sun Yat-sen University, Guangzhou, China
| | - Shaolong Chen
- School of Electronics and Communication Engineering, Sun Yat-sen University, Guangzhou, China
| | - Keyan Yu
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China
| | - Changzhen Qiu
- School of Electronics and Communication Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Xiaodong Zhang
- Department of Medical Imaging, The Third Affiliated Hospital, Southern Medical University (Academy of Orthopedics Guangdong Province), 183 Zhongshan Ave W, Guangzhou, 510630, China.
| |
Collapse
|
11
|
Li X, Zhao L, Zhang L, Wu Z, Liu Z, Jiang H, Cao C, Xu S, Li Y, Dai H, Yuan Y, Liu J, Li G, Zhu D, Yan P, Li Q, Liu W, Liu T, Shen D. Artificial General Intelligence for Medical Imaging Analysis. IEEE Rev Biomed Eng 2025; 18:113-129. [PMID: 39509310 DOI: 10.1109/rbme.2024.3493775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
Large-scale Artificial General Intelligence (AGI) models, including Large Language Models (LLMs) such as ChatGPT/GPT-4, have achieved unprecedented success in a variety of general domain tasks. Yet, when applied directly to specialized domains like medical imaging, which require in-depth expertise, these models face notable challenges arising from the medical field's inherent complexities and unique characteristics. In this review, we delve into the potential applications of AGI models in medical imaging and healthcare, with a primary focus on LLMs, Large Vision Models, and Large Multimodal Models. We provide a thorough overview of the key features and enabling techniques of LLMs and AGI, and further examine the roadmaps guiding the evolution and implementation of AGI models in the medical sector, summarizing their present applications, potentialities, and associated challenges. In addition, we highlight potential future research directions, offering a holistic view on upcoming ventures. This comprehensive review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
Collapse
|
12
|
Yue Y, Zeng X, Lin H, Xu J, Zhang F, Zhou K, Li L, Li Z. A deep learning based smartphone application for early detection of nasopharyngeal carcinoma using endoscopic images. NPJ Digit Med 2024; 7:384. [PMID: 39738998 DOI: 10.1038/s41746-024-01403-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
Nasal endoscopy is crucial for the early detection of nasopharyngeal carcinoma (NPC), but its accuracy relies heavily on the clinician's expertise, posing challenges for primary healthcare providers. Here, we retrospectively analysed 39,340 nasal endoscopic white-light images from three high-incidence NPC centres, utilising eight advanced deep learning models to develop an Internet-enabled smartphone application, "Nose-Keeper", that can be used for early detection of NPC and five prevalent nasal diseases and assessment of healthy individuals. Our app demonstrated a remarkable overall accuracy of 92.27% (95% Confidence Interval (CI): 90.66%-93.61%). Notably, its sensitivity and specificity in NPC detection achieved 96.39% and 99.91%, respectively, outperforming nine experienced otolaryngologists. Explainable artificial intelligence was employed to highlight key lesion areas, improving Nose-Keeper's decision-making accuracy and safety. Nose-Keeper can assist primary healthcare providers in diagnosing NPC and common nasal diseases efficiently, offering a valuable resource for people in high-incidence NPC regions to manage nasal cavity health effectively.
Collapse
Affiliation(s)
- Yubiao Yue
- School of Mathematics and Systems Science, Guangdong Polytechnic Normal University, Guangzhou, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Xinyu Zeng
- Department of Otorhinolaryngology, The Second Affiliated Hospital of Shenzhen University, Shenzhen, China
| | - Huanjie Lin
- Department of Radiology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jialong Xu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Fan Zhang
- Department of science and education, Foshan Sanshui District People's Hospital, Foshan, China
| | - KeLin Zhou
- Department of Otorhinolaryngology, Leizhou People's Hospital, Leizhou, China
| | - Li Li
- Department of Otorhinolaryngology, Leizhou People's Hospital, Leizhou, China
| | - Zhenzhang Li
- School of Mathematics and Systems Science, Guangdong Polytechnic Normal University, Guangzhou, China.
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China.
- Eleflai Intelligent Technology (Shenzhen) Co. Ltd, Shenzhen, China.
| |
Collapse
|
13
|
Nguyen MTP, Phan Tran MK, Nakano T, Tran TH, Nguyen QDN. Partial Attention in Global Context and Local Interaction for Addressing Noisy Labels and Weighted Redundancies on Medical Images. SENSORS (BASEL, SWITZERLAND) 2024; 25:163. [PMID: 39796954 PMCID: PMC11722591 DOI: 10.3390/s25010163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/22/2024] [Accepted: 12/27/2024] [Indexed: 01/13/2025]
Abstract
Recently, the application of deep neural networks to detect anomalies on medical images has been facing the appearance of noisy labels, including overlapping objects and similar classes. Therefore, this study aims to address this challenge by proposing a unique attention module that can assist deep neural networks in focusing on important object features in noisy medical image conditions. This module integrates global context modeling to create long-range dependencies and local interactions to enable channel attention ability by using 1D convolution that not only performs well with noisy labels but also consumes significantly less resources without any dimensionality reduction. The module is then named Global Context and Local Interaction (GCLI). We have further experimented and proposed a partial attention strategy for the proposed GCLI module, aiming to efficiently reduce weighted redundancies. This strategy utilizes a subset of channels for GCLI to produce attention weights instead of considering every single channel. As a result, this strategy can greatly reduce the risk of introducing weighted redundancies caused by modeling global context. For classification, our proposed method is able to assist ResNet34 in achieving up to 82.5% accuracy on the Chaoyang test set, which is the highest figure among the other SOTA attention modules without using any processing filter to reduce the effect of noisy labels. For object detection, the GCLI is able to boost the capability of YOLOv8 up to 52.1% mAP50 on the GRAZPEDWRI-DX test set, demonstrating the highest performance among other attention modules and ranking second in the mAP50 metric on the VinDR-CXR test set. In terms of model complexity, our proposed GCLI module can consume fewer extra parameters up to 225 times and has inference speed faster than 30% compared to the other attention modules.
Collapse
Affiliation(s)
- Minh Tai Pham Nguyen
- Faculty of Advanced Program, Ho Chi Minh City Open University, Ho Chi Minh City 700000, Vietnam;
| | - Minh Khue Phan Tran
- Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City 700000, Vietnam
| | - Tadashi Nakano
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| | - Thi Hong Tran
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| | - Quoc Duy Nam Nguyen
- Department of Core Informatics, Graduate School of Informatics, Osaka Metropolitan University, Osaka 558-8585, Japan
| |
Collapse
|
14
|
Bouzarjomehri N, Barzegar M, Rostami H, Keshavarz A, Asghari AN, Azad ST. Multi-modal classification of breast cancer lesions in Digital Mammography and contrast enhanced spectral mammography images. Comput Biol Med 2024; 183:109266. [PMID: 39405734 DOI: 10.1016/j.compbiomed.2024.109266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 11/20/2024]
Abstract
Breast cancer ranks as the second most prevalent cancer in women, recognized as one of the most dangerous types of cancer, and is on the rise globally. Regular screenings are essential for early-stage treatment. Digital mammography (DM) is the most recognized and widely used technique for breast cancer screening. Contrast-Enhanced Spectral Mammography (CESM or CM) is used in conjunction with DM to detect and identify hidden abnormalities, particularly in dense breast tissue where DM alone might not be as effective. In this work, we explore the effectiveness of each modality (CM, DM, or both) in detecting breast cancer lesions using deep learning methods. We introduce an architecture for detecting and classifying breast cancer lesions in DM and CM images in Craniocaudal (CC) and Mediolateral Oblique (MLO) views. The proposed architecture (JointNet) consists of a convolution module for extracting local features, a transformer module for extracting long-range features, and a feature fusion layer to fuse the local features, global features, and global features weighted based on the local ones. This significantly enhances the accuracy of classifying DM and CM images into normal or abnormal categories and lesion classification into benign or malignant. Using our architecture as a backbone, three lesion classification pipelines are introduced that utilize attention mechanisms focused on lesion shape, texture, and overall breast texture, examining the critical features for effective lesion classification. The results demonstrate that our proposed methods outperform their components in classifying images as normal or abnormal and mitigate the limitations of independently using the transformer module or the convolution module. An ensemble model is also introduced to explore the effect of each modality and each view to increase our baseline architecture's accuracy. The results demonstrate superior performance compared with other similar works. The best performance on DM images was achieved with the semi-automatic AOL Lesion Classification Pipeline, yielding an accuracy of 98.85 %, AUROC of 0.9965, F1-score of 98.85 %, precision of 98.85 %, and specificity of 98.85 %. For CM images, the highest results were obtained using the automatic AOL Lesion Classification Pipeline, with an accuracy of 97.47 %, AUROC of 0.9771, F1-score of 97.34 %, precision of 94.45 %, and specificity of 97.23 %. The semi-automatic ensemble AOL Classification Pipeline provided the best overall performance when using both DM and CM images, with an accuracy of 94.74 %, F1-score of 97.67 %, specificity of 93.75 %, and sensitivity of 95.45 %. Furthermore, we explore the comparative effectiveness of CM and DM images in deep learning models, indicating that while CM images offer clearer insights to the human eye, our model trained on DM images yields better results using Attention on Lesion (AOL) techniques. The research also suggests a multimodal approach using both DM and CM images and ensemble learning could provide more robust classification outcomes.
Collapse
Affiliation(s)
- Narjes Bouzarjomehri
- Department of Computer Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran
| | - Mohammad Barzegar
- Department of Computer Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran
| | - Habib Rostami
- Department of Computer Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran.
| | - Ahmad Keshavarz
- Department of Electrical Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran
| | - Ahmad Navid Asghari
- Department of Computer Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran
| | - Saeed Talatian Azad
- Department of Computer Engineering, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr, 7516913817, Iran
| |
Collapse
|
15
|
Oghbaie M, Araújo T, Schmidt-Erfurth U, Bogunović H. VLFATRollout: Fully transformer-based classifier for retinal OCT volumes. Comput Med Imaging Graph 2024; 118:102452. [PMID: 39489098 DOI: 10.1016/j.compmedimag.2024.102452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 09/20/2024] [Accepted: 10/12/2024] [Indexed: 11/05/2024]
Abstract
BACKGROUND AND OBJECTIVE Despite the promising capabilities of 3D transformer architectures in video analysis, their application to high-resolution 3D medical volumes encounters several challenges. One major limitation is the high number of 3D patches, which reduces the efficiency of the global self-attention mechanisms of transformers. Additionally, background information can distract vision transformers from focusing on crucial areas of the input image, thereby introducing noise into the final representation. Moreover, the variability in the number of slices per volume complicates the development of models capable of processing input volumes of any resolution while simple solutions like subsampling may risk losing essential diagnostic details. METHODS To address these challenges, we introduce an end-to-end transformer-based framework, variable length feature aggregator transformer rollout (VLFATRollout), to classify volumetric data. The proposed VLFATRollout enjoys several merits. First, the proposed VLFATRollout can effectively mine slice-level fore-background information with the help of transformer's attention matrices. Second, randomization of volume-wise resolution (i.e. the number of slices) during training enhances the learning capacity of the learnable positional embedding (PE) assigned to each volume slice. This technique allows the PEs to generalize across neighboring slices, facilitating the handling of high-resolution volumes at the test time. RESULTS VLFATRollout was thoroughly tested on the retinal optical coherence tomography (OCT) volume classification task, demonstrating a notable average improvement of 5.47% in balanced accuracy over the leading convolutional models for a 5-class diagnostic task. These results emphasize the effectiveness of our framework in enhancing slice-level representation and its adaptability across different volume resolutions, paving the way for advanced transformer applications in medical image analysis. The code is available at https://github.com/marziehoghbaie/VLFATRollout/.
Collapse
Affiliation(s)
- Marzieh Oghbaie
- Christian Doppler Laboratory for Artificial Intelligence in Retina, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria.
| | - Teresa Araújo
- Christian Doppler Laboratory for Artificial Intelligence in Retina, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria
| | | | - Hrvoje Bogunović
- Christian Doppler Laboratory for Artificial Intelligence in Retina, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria
| |
Collapse
|
16
|
Su Y, Xia X, Sun R, Yuan J, Hua Q, Han B, Gong J, Nie S. Res-TransNet: A Hybrid deep Learning Network for Predicting Pathological Subtypes of lung Adenocarcinoma in CT Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2883-2894. [PMID: 38861071 PMCID: PMC11612082 DOI: 10.1007/s10278-024-01149-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/30/2024] [Accepted: 05/22/2024] [Indexed: 06/12/2024]
Abstract
This study aims to develop a CT-based hybrid deep learning network to predict pathological subtypes of early-stage lung adenocarcinoma by integrating residual network (ResNet) with Vision Transformer (ViT). A total of 1411 pathologically confirmed ground-glass nodules (GGNs) retrospectively collected from two centers were used as internal and external validation sets for model development. 3D ResNet and ViT were applied to investigate two deep learning frameworks to classify three subtypes of lung adenocarcinoma namely invasive adenocarcinoma (IAC), minimally invasive adenocarcinoma and adenocarcinoma in situ, respectively. To further improve the model performance, four Res-TransNet based models were proposed by integrating ResNet and ViT with different ensemble learning strategies. Two classification tasks involving predicting IAC from Non-IAC (Task1) and classifying three subtypes (Task2) were designed and conducted in this study. For Task 1, the optimal Res-TransNet model yielded area under the receiver operating characteristic curve (AUC) values of 0.986 and 0.933 on internal and external validation sets, which were significantly higher than that of ResNet and ViT models (p < 0.05). For Task 2, the optimal fusion model generated the accuracy and weighted F1 score of 68.3% and 66.1% on the external validation set. The experimental results demonstrate that Res-TransNet can significantly increase the classification performance compared with the two basic models and have the potential to assist radiologists in precision diagnosis.
Collapse
Affiliation(s)
- Yue Su
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Xianwu Xia
- Department of Oncology Intervention, Municipal Hospital Affiliated of Taizhou University, Zhejiang, Taizhou, 318000, China
| | - Rong Sun
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jianjun Yuan
- Department of Oncology Intervention, Municipal Hospital Affiliated of Taizhou University, Zhejiang, Taizhou, 318000, China
| | - Qianjin Hua
- Department of Oncology Intervention, Municipal Hospital Affiliated of Taizhou University, Zhejiang, Taizhou, 318000, China
| | - Baosan Han
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
- Department of Breast Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China.
| | - Jing Gong
- Department of Radiology, Fudan University Shanghai Cancer Center, 270 Dongan Road, Shanghai, 200032, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Shengdong Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
| |
Collapse
|
17
|
Gasmi K, Ben Aoun N, Alsalem K, Ltaifa IB, Alrashdi I, Ammar LB, Mrabet M, Shehab A. Enhanced brain tumor diagnosis using combined deep learning models and weight selection technique. Front Neuroinform 2024; 18:1444650. [PMID: 39659489 PMCID: PMC11628532 DOI: 10.3389/fninf.2024.1444650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 10/21/2024] [Indexed: 12/12/2024] Open
Abstract
Brain tumor classification is a critical task in medical imaging, as accurate diagnosis directly influences treatment planning and patient outcomes. Traditional methods often fall short in achieving the required precision due to the complex and heterogeneous nature of brain tumors. In this study, we propose an innovative approach to brain tumor multi-classification by leveraging an ensemble learning method that combines advanced deep learning models with an optimal weighting strategy. Our methodology integrates Vision Transformers (ViT) and EfficientNet-V2 models, both renowned for their powerful feature extraction capabilities in medical imaging. This model enhances the feature extraction step by capturing both global and local features, thanks to the combination of different deep learning models with the ViT model. These models are then combined using a weighted ensemble approach, where each model's prediction is assigned a weight. To optimize these weights, we employ a genetic algorithm, which iteratively selects the best weight combinations to maximize classification accuracy. We trained and validated our ensemble model using a well-curated dataset comprising labeled brain MRI images. The model's performance was benchmarked against standalone ViT and EfficientNet-V2 models, as well as other traditional classifiers. The ensemble approach achieved a notable improvement in classification accuracy, precision, recall, and F1-score compared to individual models. Specifically, our model attained an accuracy rate of 95%, significantly outperforming existing methods. This study underscores the potential of combining advanced deep learning models with a genetic algorithm-optimized weighting strategy to tackle complex medical classification tasks. The enhanced diagnostic precision offered by our ensemble model can lead to better-informed clinical decisions, ultimately improving patient outcomes. Furthermore, our approach can be generalized to other medical imaging classification problems, paving the way for broader applications of AI in healthcare. This advancement in brain tumor classification contributes valuable insights to the field of medical AI, supporting the ongoing efforts to integrate advanced computational tools in clinical practice.
Collapse
Affiliation(s)
- Karim Gasmi
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakkaka, Saudi Arabia
| | - Najib Ben Aoun
- College of Computing and Information, Al-Baha University, Alaqiq, Saudi Arabia
- REGIM-Lab: Research Groups in Intelligent Machines, National School of Engineers of Sfax (ENIS), University of Sfax, Sfax, Tunisia
| | - Khalaf Alsalem
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| | - Ibtihel Ben Ltaifa
- STIH: Sens Texte Informatique Histoire, Sorbonne University, Paris, France
| | - Ibrahim Alrashdi
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakkaka, Saudi Arabia
| | | | - Manel Mrabet
- Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Abdulaziz Shehab
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| |
Collapse
|
18
|
Zheng Q, Zhao W, Wu C, Zhang X, Dai L, Guan H, Li Y, Zhang Y, Wang Y, Xie W. Large-scale long-tailed disease diagnosis on radiology images. Nat Commun 2024; 15:10147. [PMID: 39578456 PMCID: PMC11584732 DOI: 10.1038/s41467-024-54424-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 11/08/2024] [Indexed: 11/24/2024] Open
Abstract
Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize high-quality, clinician-reviewed radiological images available online with diagnosis labels. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5568 disorders (930 unique ICD-10-CM codes). Experimentally, our RadDiag achieves 95.14% AUC on internal evaluation with the knowledge-enhancement strategy. Additionally, RadDiag can be zero-shot applied or fine-tuned to external diagnosis datasets sourced from various medical centers, demonstrating state-of-the-art results. In conclusion, we show that publicly shared medical data on the Internet is a tremendous and valuable resource that can potentially support building strong models for image understanding in healthcare.
Collapse
Affiliation(s)
- Qiaoyu Zheng
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Weike Zhao
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Chaoyi Wu
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Xiaoman Zhang
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Lisong Dai
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University, Shanghai, China
| | - Hengyu Guan
- Department of Reproductive Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory for Assisted Reproduction and Reproductive Genetics, Shanghai, China
| | - Yuehua Li
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University, Shanghai, China
| | - Ya Zhang
- Shanghai Jiao Tong University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Yanfeng Wang
- Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| | - Weidi Xie
- Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| |
Collapse
|
19
|
Kim S, Park H, Park SH. A review of deep learning-based reconstruction methods for accelerated MRI using spatiotemporal and multi-contrast redundancies. Biomed Eng Lett 2024; 14:1221-1242. [PMID: 39465106 PMCID: PMC11502678 DOI: 10.1007/s13534-024-00425-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 08/27/2024] [Accepted: 09/06/2024] [Indexed: 10/29/2024] Open
Abstract
Accelerated magnetic resonance imaging (MRI) has played an essential role in reducing data acquisition time for MRI. Acceleration can be achieved by acquiring fewer data points in k-space, which results in various artifacts in the image domain. Conventional reconstruction methods have resolved the artifacts by utilizing multi-coil information, but with limited robustness. Recently, numerous deep learning-based reconstruction methods have been developed, enabling outstanding reconstruction performances with higher acceleration. Advances in hardware and developments of specialized network architectures have produced such achievements. Besides, MRI signals contain various redundant information including multi-coil redundancy, multi-contrast redundancy, and spatiotemporal redundancy. Utilization of the redundant information combined with deep learning approaches allow not only higher acceleration, but also well-preserved details in the reconstructed images. Consequently, this review paper introduces the basic concepts of deep learning and conventional accelerated MRI reconstruction methods, followed by review of recent deep learning-based reconstruction methods that exploit various redundancies. Lastly, the paper concludes by discussing the challenges, limitations, and potential directions of future developments.
Collapse
Affiliation(s)
- Seonghyuk Kim
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - HyunWook Park
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Sung-Hong Park
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141 Republic of Korea
| |
Collapse
|
20
|
Huang X, Wang Q, He J, Ban C, Zheng H, Chen H, Zhu X. Fast Multiphoton Microscopic Imaging Joint Image Super-Resolution for Automated Gleason Grading of Prostate Cancers. JOURNAL OF BIOPHOTONICS 2024; 17:e202400233. [PMID: 39262127 DOI: 10.1002/jbio.202400233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/13/2024]
Abstract
Gleason grading system is dependable for quantifying prostate cancer. This paper introduces a fast multiphoton microscopic imaging method via deep learning for automatic Gleason grading. Due to the contradiction between multiphoton microscopy (MPM) imaging speed and quality, a deep learning architecture (SwinIR) is used for image super-resolution to address this issue. The quality of low-resolution image is improved, which increased the acquisition speed from 7.55 s per frame to 0.24 s per frame. A classification network (Swin Transformer) was introduced for automated Gleason grading. The classification accuracy and Macro-F1 achieved by training on high-resolution images are respectively 90.9% and 90.9%. For training on super-resolution images, the classification accuracy and Macro-F1 are respectively 89.9% and 89.9%. It shows that super-resolution image can provide a comparable performance to high-resolution image. Our results suggested that MPM joint image super-resolution and automatic classification methods hold the potential to be a real-time clinical diagnostic tool for prostate cancer diagnosis.
Collapse
Affiliation(s)
- Xinpeng Huang
- Institute of Laser and Optoelectronics Technology, Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, China
| | - Qianqiong Wang
- Institute of Laser and Optoelectronics Technology, Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, China
| | - Jia He
- Institute of Laser and Optoelectronics Technology, Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, China
| | - Chaoran Ban
- Department of Pathology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Hua Zheng
- Institute of Laser and Optoelectronics Technology, Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, China
| | - Hong Chen
- Department of Pathology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Xiaoqin Zhu
- Institute of Laser and Optoelectronics Technology, Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, China
| |
Collapse
|
21
|
Qiao S, Xue M, Zuo Y, Zheng J, Jiang H, Zeng X, Peng D. Four-phase CT lesion recognition based on multi-phase information fusion framework and spatiotemporal prediction module. Biomed Eng Online 2024; 23:103. [PMID: 39434126 PMCID: PMC11492744 DOI: 10.1186/s12938-024-01297-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 10/02/2024] [Indexed: 10/23/2024] Open
Abstract
Multiphase information fusion and spatiotemporal feature modeling play a crucial role in the task of four-phase CT lesion recognition. In this paper, we propose a four-phase CT lesion recognition algorithm based on multiphase information fusion framework and spatiotemporal prediction module. Specifically, the multiphase information fusion framework uses the interactive perception mechanism to realize the channel-spatial information interactive weighting between multiphase features. In the spatiotemporal prediction module, we design a 1D deep residual network to integrate multiphase feature vectors, and use the GRU architecture to model the temporal enhancement information between CT slices. In addition, we employ CT image pseudo-color processing for data augmentation and train the whole network based on a multi-task learning framework. We verify the proposed network on a four-phase CT dataset. The experimental results show that the proposed network can effectively fuse the multi-phase information and model the temporal enhancement information between CT slices, showing excellent performance in lesion recognition.
Collapse
Affiliation(s)
- Shaohua Qiao
- HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Mengfan Xue
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Yan Zuo
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Jiannan Zheng
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Haodong Jiang
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Xiangai Zeng
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Dongliang Peng
- School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China.
| |
Collapse
|
22
|
Priyadharshini S, Ramkumar K, Vairavasundaram S, Narasimhan K, Venkatesh S, Madhavasarma P, Kotecha K. Bio-inspired feature selection for early diagnosis of Parkinson's disease through optimization of deep 3D nested learning. Sci Rep 2024; 14:23394. [PMID: 39379451 PMCID: PMC11461848 DOI: 10.1038/s41598-024-74405-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/25/2024] [Indexed: 10/10/2024] Open
Abstract
Parkinson's disease (PD) is one of the most common neurodegenerative disorders that affect the quality of human life of millions of people throughout the world. The probability of getting affected by this disease increases with age, and it is common among the elderly population. Early detection can help in initiating medications at an earlier stage. It can significantly slow down the progression of this disease, assisting the patient to maintain a good quality of life for a more extended period. Magnetic resonance imaging (MRI)-based brain imaging is an area of active research that is used to diagnose PD disease early and to understand the key biomarkers. The prior research investigations using MRI data mainly focus on volume, structural, and morphological changes in the basal ganglia (BG) region for diagnosing PD. Recently, researchers have emphasized the significance of studying other areas of the human brain for a more comprehensive understanding of PD and also to analyze changes happening in brain tissue. Thus, to perform accurate diagnosis and treatment planning for early identification of PD, this work focuses on learning the onset of PD from images taken from whole-brain MRI using a novel 3D-convolutional neural network (3D-CNN) deep learning architecture. The conventional 3D-Resent deep learning model, after various hyper-parameter tuning and architectural changes, has achieved an accuracy of 90%. In this work, a novel 3D-CNN architecture was developed, and after several ablation studies, the model yielded results with an improved accuracy of 93.4%. Combining features from the 3D-CNN and 3D ResNet models using Canonical Correlation Analysis (CCA) resulted in 95% accuracy. For further enhancements of the model performance, feature fusion with optimization was employed, utilizing various optimization techniques. Whale optimization based on a biologically inspired approach was selected on the basis of a convergence diagram. The performance of this approach is compared to other methods and has given an accuracy of 97%. This work represents a critical advancement in improving PD diagnosis techniques and emphasizing the importance of deep nested 3D learning and bio-inspired feature selection.
Collapse
Affiliation(s)
- S Priyadharshini
- School of Electrical and Electronics Engineering, SASTRA Deemed University, Thanjavur, India
| | - K Ramkumar
- School of Computing, SASTRA Deemed University, Thanjavur, India
| | | | - K Narasimhan
- School of Electrical and Electronics Engineering, SASTRA Deemed University, Thanjavur, India
| | - S Venkatesh
- School of Electrical and Electronics Engineering, SASTRA Deemed University, Thanjavur, India
| | - P Madhavasarma
- School of Electrical and Electronics Engineering, SASTRA Deemed University, Thanjavur, India
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India.
| |
Collapse
|
23
|
Zhao R, Li W, Chen X, Li Y, He B, Zhang Y, Deng Y, Wang C, Jia F. A position-enhanced sequential feature encoding model for lung infections and lymphoma classification on CT images. Int J Comput Assist Radiol Surg 2024; 19:2001-2009. [PMID: 39003438 DOI: 10.1007/s11548-024-03230-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024]
Abstract
PURPOSE Differentiating pulmonary lymphoma from lung infections using CT images is challenging. Existing deep neural network-based lung CT classification models rely on 2D slices, lacking comprehensive information and requiring manual selection. 3D models that involve chunking compromise image information and struggle with parameter reduction, limiting performance. These limitations must be addressed to improve accuracy and practicality. METHODS We propose a transformer sequential feature encoding structure to integrate multi-level information from complete CT images, inspired by the clinical practice of using a sequence of cross-sectional slices for diagnosis. We incorporate position encoding and cross-level long-range information fusion modules into the feature extraction CNN network for cross-sectional slices, ensuring high-precision feature extraction. RESULTS We conducted comprehensive experiments on a dataset of 124 patients, with respective sizes of 64, 20 and 40 for training, validation and testing. The results of ablation experiments and comparative experiments demonstrated the effectiveness of our approach. Our method outperforms existing state-of-the-art methods in the 3D CT image classification problem of distinguishing between lung infections and pulmonary lymphoma, achieving an accuracy of 0.875, AUC of 0.953 and F1 score of 0.889. CONCLUSION The experiments verified that our proposed position-enhanced transformer-based sequential feature encoding model is capable of effectively performing high-precision feature extraction and contextual feature fusion in the lungs. It enhances the ability of a standalone CNN network or transformer to extract features, thereby improving the classification performance. The source code is accessible at https://github.com/imchuyu/PTSFE .
Collapse
Affiliation(s)
- Rui Zhao
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Wenhao Li
- Depatrment of Hematology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xilai Chen
- Department of Radiology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yuchong Li
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Baochun He
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Yucong Zhang
- Department of Radiation Oncology, Shenzhen People's Hospital, Shenzhen, China
| | - Yu Deng
- Department of Radiology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Chunyan Wang
- Depatrment of Hematology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Fucang Jia
- Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China.
- Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
24
|
Chen C, Zhao LL, Lang Q, Xu Y. A Novel Detection and Classification Framework for Diagnosing of Cerebral Microbleeds Using Transformer and Language. Bioengineering (Basel) 2024; 11:993. [PMID: 39451369 PMCID: PMC11504022 DOI: 10.3390/bioengineering11100993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 09/24/2024] [Accepted: 09/27/2024] [Indexed: 10/26/2024] Open
Abstract
The detection of Cerebral Microbleeds (CMBs) is crucial for diagnosing cerebral small vessel disease. However, due to the small size and subtle appearance of CMBs in susceptibility-weighted imaging (SWI), manual detection is both time-consuming and labor-intensive. Meanwhile, the presence of similar-looking features in SWI images demands significant expertise from clinicians, further complicating this process. Recently, there has been a significant advancement in automated detection of CMBs using a Convolutional Neural Network (CNN) structure, aiming at enhancing diagnostic efficiency for neurologists. However, existing methods still show discrepancies when compared to the actual clinical diagnostic process. To bridge this gap, we introduce a novel multimodal detection and classification framework for CMBs' diagnosis, termed MM-UniCMBs. This framework includes a light-weight detection model and a multi-modal classification network. Specifically, we proposed a new CMBs detection network, CMBs-YOLO, designed to capture the salient features of CMBs in SWI images. Additionally, we design an innovative language-vision classification network, CMBsFormer (CF), which integrates patient textual descriptions-such as gender, age, and medical history-with image data. The MM-UniCMBs framework is designed to closely align with the diagnostic workflow of clinicians, offering greater interpretability and flexibility compared to existing methods. Extensive experimental results show that MM-UniCMBs achieves a sensitivity of 94% in CMBs' classification and can process a patient's data within 5 s.
Collapse
Affiliation(s)
- Cong Chen
- School of Clinical Medicine, College of Medicine, Nanjing Medical University, Nanjing 211166, China
- Department of Neurology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Lin-Lin Zhao
- Department of Computer Science and Technology, Shanghai University, 99 Shangda Road, Baoshan District, Shanghai 200444, China
| | - Qin Lang
- Department of Neurology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Yun Xu
- School of Clinical Medicine, College of Medicine, Nanjing Medical University, Nanjing 211166, China
- Department of Neurology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
- Department of Neurology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing 210008, China
| |
Collapse
|
25
|
Li Q, Huang X, Fang B, Chen H, Ding S, Liu X. Embracing Large Natural Data: Enhancing Medical Image Analysis via Cross-Domain Fine-Tuning. IEEE J Biomed Health Inform 2024; 28:4512-4521. [PMID: 38100336 DOI: 10.1109/jbhi.2023.3343518] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
With the rapid advancements of Big Data and computer vision, many large-scale natural visual datasets are proposed, such as ImageNet-21K, LAION-400M, and LAION-2B. These large-scale datasets significantly improve the robustness and accuracy of models in the natural vision domain. However, the field of medical images continues to face limitations due to relatively small-scale datasets. In this article, we propose a novel method to enhance medical image analysis across domains by leveraging pre-trained models on large natural datasets. Specifically, a Cross-Domain Transfer Module (CDTM) is proposed to transfer natural vision domain features to the medical image domain, facilitating efficient fine-tuning of models pre-trained on large datasets. In addition, we design a Staged Fine-Tuning (SFT) strategy in conjunction with CDTM to further improve the model performance. Experimental results demonstrate that our method achieves state-of-the-art performance on multiple medical image datasets through efficient fine-tuning of models pre-trained on large natural datasets.
Collapse
|
26
|
Zhu S, Lin L, Liu Q, Liu J, Song Y, Xu Q. Integrating a deep neural network and Transformer architecture for the automatic segmentation and survival prediction in cervical cancer. Quant Imaging Med Surg 2024; 14:5408-5419. [PMID: 39144008 PMCID: PMC11320496 DOI: 10.21037/qims-24-560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 05/24/2024] [Indexed: 08/16/2024]
Abstract
Background Automated tumor segmentation and survival prediction are critical to clinical diagnosis and treatment. This study aimed to develop deep-learning models for automatic tumor segmentation and survival prediction in magnetic resonance imaging (MRI) of cervical cancer (CC) by combining deep neural networks and Transformer architecture. Methods This study included 406 patients with CC, each with comprehensive clinical information and MRI scans. We randomly divided patients into training, validation, and independent test cohorts in a 6:2:2 ratio. During the model training, we employed two architecture types: one being a hybrid model combining convolutional neural network (CNN) and ransformer (CoTr) and one of pure CNNs. For survival prediction, the hybrid model combined tumor image features extracted by segmentation models with clinical information. The performance of the segmentation models was evaluated using the Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95). The performance of the survival models was assessed using the concordance index. Results The CoTr model performed well in both contrast-enhanced T1-weighted (ceT1W) and T2-weighted (T2W) imaging segmentation tasks, with average DSCs of 0.827 and 0.820, respectively, which outperformed other the CNN models such as U-Net (DSC: 0.807 and 0.808), attention U-Net (DSC: 0.814 and 0.811), and V-Net (DSC: 0.805 and 0.807). For survival prediction, the proposed deep-learning model significantly outperformed traditional methods, yielding a concordance index of 0.732. Moreover, it effectively divided patients into low-risk and high-risk groups for disease progression (P<0.001). Conclusions Combining Transformer architecture with a CNN can improve MRI tumor segmentation, and this deep-learning model excelled in the survival prediction of patients with CC as compared to traditional methods.
Collapse
Affiliation(s)
- Shitao Zhu
- College of Computer and Data Science, Fuzhou University, Fuzhou, China
| | - Ling Lin
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
| | - Qin Liu
- Department of Clinical Oncology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Jing Liu
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
| | - Yanwen Song
- Department of Radiation Oncology, Xiamen Humanity Hospital, Xiamen, China
| | - Qin Xu
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
| |
Collapse
|
27
|
Xu Z, Dai Y, Liu F, Li S, Liu S, Shi L, Fu J. Parotid Gland Segmentation Using Purely Transformer-Based U-Shaped Network and Multimodal MRI. Ann Biomed Eng 2024; 52:2101-2117. [PMID: 38691234 DOI: 10.1007/s10439-024-03510-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 04/03/2024] [Indexed: 05/03/2024]
Abstract
Parotid gland tumors account for approximately 2% to 10% of head and neck tumors. Segmentation of parotid glands and tumors on magnetic resonance images is essential in accurately diagnosing and selecting appropriate surgical plans. However, segmentation of parotid glands is particularly challenging due to their variable shape and low contrast with surrounding structures. Recently, deep learning has developed rapidly, and Transformer-based networks have performed well on many computer vision tasks. However, Transformer-based networks have yet to be well used in parotid gland segmentation tasks. We collected a multi-center multimodal parotid gland MRI dataset and implemented parotid gland segmentation using a purely Transformer-based U-shaped segmentation network. We used both absolute and relative positional encoding to improve parotid gland segmentation and achieved multimodal information fusion without increasing the network computation. In addition, our novel training approach reduces the clinician's labeling workload by nearly half. Our method achieved good segmentation of both parotid glands and tumors. On the test set, our model achieved a Dice-Similarity Coefficient of 86.99%, Pixel Accuracy of 99.19%, Mean Intersection over Union of 81.79%, and Hausdorff Distance of 3.87. The purely Transformer-based U-shaped segmentation network we used outperforms other convolutional neural networks. In addition, our method can effectively fuse the information from multi-center multimodal MRI dataset, thus improving the parotid gland segmentation.
Collapse
Affiliation(s)
- Zi'an Xu
- Northeastern University, Shenyang, China
| | - Yin Dai
- Northeastern University, Shenyang, China.
| | - Fayu Liu
- China Medical University, Shenyang, China
| | - Siqi Li
- China Medical University, Shenyang, China
| | - Sheng Liu
- China Medical University, Shenyang, China
| | - Lifu Shi
- Liaoning Jiayin Medical Technology Co., Shenyang, China
| | - Jun Fu
- Northeastern University, Shenyang, China
| |
Collapse
|
28
|
Zhang D, Han Q, Xiong Y, Du H. Mutli-modal straight flow matching for accelerated MR imaging. Comput Biol Med 2024; 178:108668. [PMID: 38870720 DOI: 10.1016/j.compbiomed.2024.108668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/05/2024] [Accepted: 05/26/2024] [Indexed: 06/15/2024]
Abstract
Diffusion models have garnered great interest lately in Magnetic Resonance (MR) image reconstruction. A key component of generating high-quality samples from noise is iterative denoising for thousands of steps. However, the complexity of inference steps has limited its applications. To solve the challenge in obtaining high-quality reconstructed images with fewer inference steps and computational complexity, we introduce a novel straight flow matching, based on a neural ordinary differential equation (ODE) generative model. Our model creates a linear path between undersampled images and reconstructed images, which can be accurately simulated with a few Euler steps. Furthermore, we propose a multi-modal straight flow matching model, which uses relatively easily available modalities as supplementary information to guide the reconstruction of target modalities. We introduce the low frequency fusion layer and the high frequency fusion layer into our multi-modal model, which has been proved to produce promising results in fusion tasks. The proposed multi-modal straight flow matching (MMSflow) achieves state-of-the-art performances in task of reconstruction in fastMRI and Brats-2020 and improves the sampling rate by an order of magnitude than other methods based on stochastic differential equations (SDE).
Collapse
Affiliation(s)
- Daikun Zhang
- University of Science and Technology of China, Hefei, Anhui 230026, China.
| | - Qiuyi Han
- University of Science and Technology of China, Hefei, Anhui 230026, China.
| | - Yuzhu Xiong
- University of Science and Technology of China, Hefei, Anhui 230026, China.
| | - Hongwei Du
- University of Science and Technology of China, Hefei, Anhui 230026, China.
| |
Collapse
|
29
|
Reddy CKK, Reddy PA, Janapati H, Assiri B, Shuaib M, Alam S, Sheneamer A. A fine-tuned vision transformer based enhanced multi-class brain tumor classification using MRI scan imagery. Front Oncol 2024; 14:1400341. [PMID: 39091923 PMCID: PMC11291226 DOI: 10.3389/fonc.2024.1400341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/25/2024] [Indexed: 08/04/2024] Open
Abstract
Brain tumors occur due to the expansion of abnormal cell tissues and can be malignant (cancerous) or benign (not cancerous). Numerous factors such as the position, size, and progression rate are considered while detecting and diagnosing brain tumors. Detecting brain tumors in their initial phases is vital for diagnosis where MRI (magnetic resonance imaging) scans play an important role. Over the years, deep learning models have been extensively used for medical image processing. The current study primarily investigates the novel Fine-Tuned Vision Transformer models (FTVTs)-FTVT-b16, FTVT-b32, FTVT-l16, FTVT-l32-for brain tumor classification, while also comparing them with other established deep learning models such as ResNet50, MobileNet-V2, and EfficientNet - B0. A dataset with 7,023 images (MRI scans) categorized into four different classes, namely, glioma, meningioma, pituitary, and no tumor are used for classification. Further, the study presents a comparative analysis of these models including their accuracies and other evaluation metrics including recall, precision, and F1-score across each class. The deep learning models ResNet-50, EfficientNet-B0, and MobileNet-V2 obtained an accuracy of 96.5%, 95.1%, and 94.9%, respectively. Among all the FTVT models, FTVT-l16 model achieved a remarkable accuracy of 98.70% whereas other FTVT models FTVT-b16, FTVT-b32, and FTVT-132 achieved an accuracy of 98.09%, 96.87%, 98.62%, respectively, hence proving the efficacy and robustness of FTVT's in medical image processing.
Collapse
Affiliation(s)
- C. Kishor Kumar Reddy
- Department of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Hyderabad, India
| | - Pulakurthi Anaghaa Reddy
- Department of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Hyderabad, India
| | - Himaja Janapati
- Department of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Hyderabad, India
| | - Basem Assiri
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Mohammed Shuaib
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Shadab Alam
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Abdullah Sheneamer
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| |
Collapse
|
30
|
Schwab RJ, Erus G. We Can Use Machine Learning to Predict Obstructive Sleep Apnea. Am J Respir Crit Care Med 2024; 210:141-143. [PMID: 38701391 PMCID: PMC11273305 DOI: 10.1164/rccm.202403-0666ed] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 05/02/2024] [Indexed: 05/05/2024] Open
Affiliation(s)
- Richard J Schwab
- Department of Medicine University of Pennsylvania Perelman School of Medicine Philadelphia, Pennsylvania
| | - Guray Erus
- Center for Biomedical Image Computing and Analytics University of Pennsylvania Philadelphia, Pennsylvania
| |
Collapse
|
31
|
Li Y, El Habib Daho M, Conze PH, Zeghlache R, Le Boité H, Tadayoni R, Cochener B, Lamard M, Quellec G. A review of deep learning-based information fusion techniques for multimodal medical image classification. Comput Biol Med 2024; 177:108635. [PMID: 38796881 DOI: 10.1016/j.compbiomed.2024.108635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 03/18/2024] [Accepted: 05/18/2024] [Indexed: 05/29/2024]
Abstract
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
Collapse
Affiliation(s)
- Yihao Li
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | - Mostafa El Habib Daho
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France.
| | | | - Rachid Zeghlache
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | - Hugo Le Boité
- Sorbonne University, Paris, France; Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France
| | - Ramin Tadayoni
- Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France; Paris Cité University, Paris, France
| | - Béatrice Cochener
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France; Ophthalmology Department, CHRU Brest, Brest, France
| | - Mathieu Lamard
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | | |
Collapse
|
32
|
Liu S, Yue W, Guo Z, Wang L. Multi-branch CNN and grouping cascade attention for medical image classification. Sci Rep 2024; 14:15013. [PMID: 38951526 PMCID: PMC11217469 DOI: 10.1038/s41598-024-64982-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/14/2024] [Indexed: 07/03/2024] Open
Abstract
Visual Transformers(ViT) have made remarkable achievements in the field of medical image analysis. However, ViT-based methods have poor classification results on some small-scale medical image classification datasets. Meanwhile, many ViT-based models sacrifice computational cost for superior performance, which is a great challenge in practical clinical applications. In this paper, we propose an efficient medical image classification network based on an alternating mixture of CNN and Transformer tandem, which is called Eff-CTNet. Specifically, the existing ViT-based method still mainly relies on multi-head self-attention (MHSA). Among them, the attention maps of MHSA are highly similar, which leads to computational redundancy. Therefore, we propose a group cascade attention (GCA) module to split the feature maps, which are provided to different attention heads to further improves the diversity of attention and reduce the computational cost. In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images. Finally, we connect them and design an efficient hybrid medical image classification network, namely Eff-CTNet. Extensive experimental results show that our Eff-CTNet achieves advanced classification performance with less computational cost on three public medical image classification datasets.
Collapse
Affiliation(s)
- Shiwei Liu
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830017, Xinjiang, China
| | - Wenwen Yue
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830017, Xinjiang, China
| | - Zhiqing Guo
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830017, Xinjiang, China
| | - Liejun Wang
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830017, Xinjiang, China.
| |
Collapse
|
33
|
Wang G, Jiang N, Ma Y, Chen D, Wu J, Li G, Liang D, Yan T. Connectional-style-guided contextual representation learning for brain disease diagnosis. Neural Netw 2024; 175:106296. [PMID: 38653077 DOI: 10.1016/j.neunet.2024.106296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/26/2024] [Accepted: 04/06/2024] [Indexed: 04/25/2024]
Abstract
Structural magnetic resonance imaging (sMRI) has shown great clinical value and has been widely used in deep learning (DL) based computer-aided brain disease diagnosis. Previous DL-based approaches focused on local shapes and textures in brain sMRI that may be significant only within a particular domain. The learned representations are likely to contain spurious information and have poor generalization ability in other diseases and datasets. To facilitate capturing meaningful and robust features, it is necessary to first comprehensively understand the intrinsic pattern of the brain that is not restricted within a single data/task domain. Considering that the brain is a complex connectome of interlinked neurons, the connectional properties in the brain have strong biological significance, which is shared across multiple domains and covers most pathological information. In this work, we propose a connectional style contextual representation learning model (CS-CRL) to capture the intrinsic pattern of the brain, used for multiple brain disease diagnosis. Specifically, it has a vision transformer (ViT) encoder and leverages mask reconstruction as the proxy task and Gram matrices to guide the representation of connectional information. It facilitates the capture of global context and the aggregation of features with biological plausibility. The results indicate that CS-CRL achieves superior accuracy in multiple brain disease diagnosis tasks across six datasets and three diseases and outperforms state-of-the-art models. Furthermore, we demonstrate that CS-CRL captures more brain-network-like properties, and better aggregates features, is easier to optimize, and is more robust to noise, which explains its superiority in theory.
Collapse
Affiliation(s)
- Gongshu Wang
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Ning Jiang
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Yunxiao Ma
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Duanduan Chen
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Jinglong Wu
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, Beijing, China.
| | - Dong Liang
- Research Center for Medical AI, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Tianyi Yan
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| |
Collapse
|
34
|
Piffer S, Ubaldi L, Tangaro S, Retico A, Talamonti C. Tackling the small data problem in medical image classification with artificial intelligence: a systematic review. PROGRESS IN BIOMEDICAL ENGINEERING (BRISTOL, ENGLAND) 2024; 6:032001. [PMID: 39655846 DOI: 10.1088/2516-1091/ad525b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/30/2024] [Indexed: 12/18/2024]
Abstract
Though medical imaging has seen a growing interest in AI research, training models require a large amount of data. In this domain, there are limited sets of data available as collecting new data is either not feasible or requires burdensome resources. Researchers are facing with the problem of small datasets and have to apply tricks to fight overfitting. 147 peer-reviewed articles were retrieved from PubMed, published in English, up until 31 July 2022 and articles were assessed by two independent reviewers. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) guidelines for the paper selection and 77 studies were regarded as eligible for the scope of this review. Adherence to reporting standards was assessed by using TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis). To solve the small data issue transfer learning technique, basic data augmentation and generative adversarial network were applied in 75%, 69% and 14% of cases, respectively. More than 60% of the authors performed a binary classification given the data scarcity and the difficulty of the tasks. Concerning generalizability, only four studies explicitly stated an external validation of the developed model was carried out. Full access to all datasets and code was severely limited (unavailable in more than 80% of studies). Adherence to reporting standards was suboptimal (<50% adherence for 13 of 37 TRIPOD items). The goal of this review is to provide a comprehensive survey of recent advancements in dealing with small medical images samples size. Transparency and improve quality in publications as well as follow existing reporting standards are also supported.
Collapse
Affiliation(s)
- Stefano Piffer
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, Florence, Italy
- National Institute for Nuclear Physics (INFN), Florence Division, Florence, Italy
| | - Leonardo Ubaldi
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, Florence, Italy
- National Institute for Nuclear Physics (INFN), Florence Division, Florence, Italy
| | - Sabina Tangaro
- Department of Soil, Plant and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- INFN, Bari Division, Bari, Italy
| | | | - Cinzia Talamonti
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, Florence, Italy
- National Institute for Nuclear Physics (INFN), Florence Division, Florence, Italy
| |
Collapse
|
35
|
Bougourzi F, Dornaika F, Distante C, Taleb-Ahmed A. D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images. Comput Biol Med 2024; 176:108590. [PMID: 38763066 DOI: 10.1016/j.compbiomed.2024.108590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 04/16/2024] [Accepted: 05/09/2024] [Indexed: 05/21/2024]
Abstract
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
Collapse
Affiliation(s)
- Fares Bougourzi
- Junia, UMR 8520, CNRS, Centrale Lille, University of Polytechnique Hauts-de-France, 59000 Lille, France.
| | - Fadi Dornaika
- University of the Basque Country UPV/EHU, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| | - Cosimo Distante
- Institute of Applied Sciences and Intelligent Systems, National Research Council of Italy, 73100 Lecce, Italy.
| | - Abdelmalik Taleb-Ahmed
- Université Polytechnique Hauts-de-France, Université de Lille, CNRS, Valenciennes, 59313, Hauts-de-France, France.
| |
Collapse
|
36
|
Liu X, Li W, Miao S, Liu F, Han K, Bezabih TT. HAMMF: Hierarchical attention-based multi-task and multi-modal fusion model for computer-aided diagnosis of Alzheimer's disease. Comput Biol Med 2024; 176:108564. [PMID: 38744010 DOI: 10.1016/j.compbiomed.2024.108564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/15/2024] [Accepted: 05/05/2024] [Indexed: 05/16/2024]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative condition, and early intervention can help slow its progression. However, integrating multi-dimensional information and deep convolutional networks increases the model parameters, affecting diagnosis accuracy and efficiency and hindering clinical diagnostic model deployment. Multi-modal neuroimaging can offer more precise diagnostic results, while multi-task modeling of classification and regression tasks can enhance the performance and stability of AD diagnosis. This study proposes a Hierarchical Attention-based Multi-task Multi-modal Fusion model (HAMMF) that leverages multi-modal neuroimaging data to concurrently learn AD classification tasks, cognitive score regression, and age regression tasks using attention-based techniques. Firstly, we preprocess MRI and PET image data to obtain two modal data, each containing distinct information. Next, we incorporate a novel Contextual Hierarchical Attention Module (CHAM) to aggregate multi-modal features. This module employs channel and spatial attention to extract fine-grained pathological features from unimodal image data across various dimensions. Using these attention mechanisms, the Transformer can effectively capture correlated features of multi-modal inputs. Lastly, we adopt multi-task learning in our model to investigate the influence of different variables on diagnosis, with a primary classification task and a secondary regression task for optimal multi-task prediction performance. Our experiments utilized MRI and PET images from 720 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The results show that our proposed model achieves an overall accuracy of 93.15% for AD/NC recognition, and the visualization results demonstrate its strong pathological feature recognition performance.
Collapse
Affiliation(s)
- Xiao Liu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
| | - Shang Miao
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Fangyu Liu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China; BGI-Shenzhen, Shenzhen, China
| | - Ke Han
- Medical and Health Center, Liaocheng People's Hospital, LiaoCheng, China
| | - Tsigabu T Bezabih
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| |
Collapse
|
37
|
Zhang G, Gu W, Wang S, Li Y, Zhao D, Liang T, Gong Z, Ju R. MOTC: Abdominal Multi-objective Segmentation Model with Parallel Fusion of Global and Local Information. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1-16. [PMID: 38347391 PMCID: PMC11169149 DOI: 10.1007/s10278-024-00978-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/19/2023] [Accepted: 11/21/2023] [Indexed: 06/13/2024]
Abstract
Convolutional Neural Networks have been widely applied in medical image segmentation. However, the existence of local inductive bias in convolutional operations restricts the modeling of long-term dependencies. The introduction of Transformer enables the modeling of long-term dependencies and partially eliminates the local inductive bias in convolutional operations, thereby improving the accuracy of tasks such as segmentation and classification. Researchers have proposed various hybrid structures combining Transformer and Convolutional Neural Networks. One strategy is to stack Transformer blocks and convolutional blocks to concentrate on eliminating the accumulated local bias of convolutional operations. Another strategy is to nest convolutional blocks and Transformer blocks to eliminate bias within each nested block. However, due to the granularity of bias elimination operations, these two strategies cannot fully exploit the potential of Transformer. In this paper, a parallel hybrid model is proposed for segmentation, which includes a Transformer branch and a Convolutional Neural Network branch in encoder. After parallel feature extraction, inter-layer information fusion and exchange of complementary information are performed between the two branches, simultaneously extracting local and global features while eliminating the local bias generated by convolutional operations within the current layer. A pure convolutional operation is used in decoder to obtain final segmentation results. To validate the impact of the granularity of bias elimination operations on the effectiveness of local bias elimination, the experiments in this paper were conducted on Flare21 dataset and Amos22 dataset. The average Dice coefficient reached 92.65% on Flare21 dataset, and 91.61% on Amos22 dataset, surpassing comparative methods. The experimental results demonstrate that smaller granularity of bias elimination operations leads to better performance.
Collapse
Affiliation(s)
- GuoDong Zhang
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - WenWen Gu
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - SuRan Wang
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - YanLin Li
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - DaZhe Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Northeastern University, Wenhua Street, Shenyang, 110819, Liaoning Province, China
| | - TingYu Liang
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - ZhaoXuan Gong
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China
| | - RongHui Ju
- School of Computer, Shenyang Aerospace University, Daoyi South Street, Shenyang, 110135, Liaoning Province, China.
- Department of Radiology, The Peoples Hospital of Liaoning Province, Wenyi Street, Shenyang, 110016, Liaoning Province, China.
| |
Collapse
|
38
|
Zhao H, Cai H, Liu M. Transformer based multi-modal MRI fusion for prediction of post-menstrual age and neonatal brain development analysis. Med Image Anal 2024; 94:103140. [PMID: 38461655 DOI: 10.1016/j.media.2024.103140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 11/23/2023] [Accepted: 03/05/2024] [Indexed: 03/12/2024]
Abstract
The brain development during the perinatal period is characterized by rapid changes in both structure and function, which have significant impact on the cognitive and behavioral abilities later in life. Accurate assessment of brain age is a crucial indicator for brain development maturity and can help predict the risk of neonatal pathology. However, evaluating neonatal brains using magnetic resonance imaging (MRI) is challenging due to its complexity, multi-dimension, and noise with subtle alterations. In this paper, we propose a multi-modal deep learning framework based on transformers for precise post-menstrual age (PMA) estimation and brain development analysis using T2-weighted structural MRI (T2-sMRI) and diffusion MRI (dMRI) data. First, we build a two-stream dense network to learn modality-specific features from T2-sMRI and dMRI of brain individually. Then, a transformer module based on self-attention mechanism integrates these features for PMA prediction and preterm/term classification. Finally, saliency maps on brain templates are used to enhance the interpretability of results. Our method is evaluated on the multi-modal MRI dataset of the developing Human Connectome Project (dHCP), which contains 592 neonates, including 478 term-born and 114 preterm-born subjects. The results demonstrate that our method achieves a 0.5-week mean absolute error (MAE) in PMA estimation for term-born subjects. Notably, preterm-born subjects exhibit delayed brain development, worsening with increasing prematurity. Our method also achieves 95% accuracy in classification of term-born and preterm-born subjects, revealing significant group differences.
Collapse
Affiliation(s)
- Haiyan Zhao
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hongjie Cai
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Manhua Liu
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai, China.
| |
Collapse
|
39
|
Liu Z, Shen L. CECT: Controllable ensemble CNN and transformer for COVID-19 image classification. Comput Biol Med 2024; 173:108388. [PMID: 38569235 DOI: 10.1016/j.compbiomed.2024.108388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/05/2024]
Abstract
The COVID-19 pandemic has resulted in hundreds of million cases and numerous deaths worldwide. Here, we develop a novel classification network CECT by controllable ensemble convolutional neural network and transformer to provide a timely and accurate COVID-19 diagnosis. The CECT is composed of a parallel convolutional encoder block, an aggregate transposed-convolutional decoder block, and a windowed attention classification block. Each block captures features at different scales from 28 × 28 to 224 × 224 from the input, composing enriched and comprehensive information. Different from existing methods, our CECT can capture features at both multi-local and global scales without any sophisticated module design. Moreover, the contribution of local features at different scales can be controlled with the proposed ensemble coefficients. We evaluate CECT on two public COVID-19 datasets and it reaches the highest accuracy of 98.1% in the intra-dataset evaluation, outperforming existing state-of-the-art methods. Moreover, the developed CECT achieves an accuracy of 90.9% on the unseen dataset in the inter-dataset evaluation, showing extraordinary generalization ability. With remarkable feature capture ability and generalization ability, we believe CECT can be extended to other medical scenarios as a powerful diagnosis tool. Code is available at https://github.com/NUS-Tim/CECT.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
40
|
Zhang H, Liu J, Liu W, Chen H, Yu Z, Yuan Y, Wang P, Qin J. MHD-Net: Memory-Aware Hetero-Modal Distillation Network for Thymic Epithelial Tumor Typing With Missing Pathology Modality. IEEE J Biomed Health Inform 2024; 28:3003-3014. [PMID: 38470599 DOI: 10.1109/jbhi.2024.3376462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
Fusing multi-modal radiology and pathology data with complementary information can improve the accuracy of tumor typing. However, collecting pathology data is difficult since it is high-cost and sometimes only obtainable after the surgery, which limits the application of multi-modal methods in diagnosis. To address this problem, we propose comprehensively learning multi-modal radiology-pathology data in training, and only using uni-modal radiology data in testing. Concretely, a Memory-aware Hetero-modal Distillation Network (MHD-Net) is proposed, which can distill well-learned multi-modal knowledge with the assistance of memory from the teacher to the student. In the teacher, to tackle the challenge in hetero-modal feature fusion, we propose a novel spatial-differentiated hetero-modal fusion module (SHFM) that models spatial-specific tumor information correlations across modalities. As only radiology data is accessible to the student, we store pathology features in the proposed contrast-boosted typing memory module (CTMM) that achieves type-wise memory updating and stage-wise contrastive memory boosting to ensure the effectiveness and generalization of memory items. In the student, to improve the cross-modal distillation, we propose a multi-stage memory-aware distillation (MMD) scheme that reads memory-aware pathology features from CTMM to remedy missing modal-specific information. Furthermore, we construct a Radiology-Pathology Thymic Epithelial Tumor (RPTET) dataset containing paired CT and WSI images with annotations. Experiments on the RPTET and CPTAC-LUAD datasets demonstrate that MHD-Net significantly improves tumor typing and outperforms existing multi-modal methods on missing modality situations.
Collapse
|
41
|
Wu H, Peng L, Du D, Xu H, Lin G, Zhou Z, Lu L, Lv W. BAF-Net: bidirectional attention-aware fluid pyramid feature integrated multimodal fusion network for diagnosis and prognosis. Phys Med Biol 2024; 69:105007. [PMID: 38593831 DOI: 10.1088/1361-6560/ad3cb2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 04/09/2024] [Indexed: 04/11/2024]
Abstract
Objective. To go beyond the deficiencies of the three conventional multimodal fusion strategies (i.e. input-, feature- and output-level fusion), we propose a bidirectional attention-aware fluid pyramid feature integrated fusion network (BAF-Net) with cross-modal interactions for multimodal medical image diagnosis and prognosis.Approach. BAF-Net is composed of two identical branches to preserve the unimodal features and one bidirectional attention-aware distillation stream to progressively assimilate cross-modal complements and to learn supplementary features in both bottom-up and top-down processes. Fluid pyramid connections were adopted to integrate the hierarchical features at different levels of the network, and channel-wise attention modules were exploited to mitigate cross-modal cross-level incompatibility. Furthermore, depth-wise separable convolution was introduced to fuse the cross-modal cross-level features to alleviate the increase in parameters to a great extent. The generalization abilities of BAF-Net were evaluated in terms of two clinical tasks: (1) an in-house PET-CT dataset with 174 patients for differentiation between lung cancer and pulmonary tuberculosis. (2) A public multicenter PET-CT head and neck cancer dataset with 800 patients from nine centers for overall survival prediction.Main results. On the LC-PTB dataset, improved performance was found in BAF-Net (AUC = 0.7342) compared with input-level fusion model (AUC = 0.6825;p< 0.05), feature-level fusion model (AUC = 0.6968;p= 0.0547), output-level fusion model (AUC = 0.7011;p< 0.05). On the H&N cancer dataset, BAF-Net (C-index = 0.7241) outperformed the input-, feature-, and output-level fusion model, with 2.95%, 3.77%, and 1.52% increments of C-index (p= 0.3336, 0.0479 and 0.2911, respectively). The ablation experiments demonstrated the effectiveness of all the designed modules regarding all the evaluated metrics in both datasets.Significance. Extensive experiments on two datasets demonstrated better performance and robustness of BAF-Net than three conventional fusion strategies and PET or CT unimodal network in terms of diagnosis and prognosis.
Collapse
Affiliation(s)
- Huiqin Wu
- Department of Medical Imaging, Guangdong Second Provincial General Hospital, Guangzhou, Guangdong, 518037, People's Republic of China
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Lihong Peng
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Dongyang Du
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Hui Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Guoyu Lin
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Zidong Zhou
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Lijun Lu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Pazhou Lab, Guangzhou, Guangdong, 510330, People's Republic of China
| | - Wenbing Lv
- School of Information and Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, Yunnan, 650504, People's Republic of China
| |
Collapse
|
42
|
Chen W, Lim LJR, Lim RQR, Yi Z, Huang J, He J, Yang G, Liu B. Artificial intelligence powered advancements in upper extremity joint MRI: A review. Heliyon 2024; 10:e28731. [PMID: 38596104 PMCID: PMC11002577 DOI: 10.1016/j.heliyon.2024.e28731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 04/11/2024] Open
Abstract
Magnetic resonance imaging (MRI) is an indispensable medical imaging examination technique in musculoskeletal medicine. Modern MRI techniques achieve superior high-quality multiplanar imaging of soft tissue and skeletal pathologies without the harmful effects of ionizing radiation. Some current limitations of MRI include long acquisition times, artifacts, and noise. In addition, it is often challenging to distinguish abutting or closely applied soft tissue structures with similar signal characteristics. In the past decade, Artificial Intelligence (AI) has been widely employed in musculoskeletal MRI to help reduce the image acquisition time and improve image quality. Apart from being able to reduce medical costs, AI can assist clinicians in diagnosing diseases more accurately. This will effectively help formulate appropriate treatment plans and ultimately improve patient care. This review article intends to summarize AI's current research and application in musculoskeletal MRI, particularly the advancement of DL in identifying the structure and lesions of upper extremity joints in MRI images.
Collapse
Affiliation(s)
- Wei Chen
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Lincoln Jian Rong Lim
- Department of Medical Imaging, Western Health, Footscray Hospital, Victoria, Australia
- Department of Surgery, The University of Melbourne, Victoria, Australia
| | - Rebecca Qian Ru Lim
- Department of Hand & Reconstructive Microsurgery, Singapore General Hospital, Singapore
| | - Zhe Yi
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| | - Jiaxing Huang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jia He
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Ge Yang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Bo Liu
- Department of Hand Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
43
|
Lambert B, Forbes F, Doyle S, Dehaene H, Dojat M. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in Deep Learning models for medical image analysis. Artif Intell Med 2024; 150:102830. [PMID: 38553168 DOI: 10.1016/j.artmed.2024.102830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/02/2024]
Abstract
The full acceptance of Deep Learning (DL) models in the clinical field is rather low with respect to the quantity of high-performing solutions reported in the literature. End users are particularly reluctant to rely on the opaque predictions of DL models. Uncertainty quantification methods have been proposed in the literature as a potential solution, to reduce the black-box effect of DL models and increase the interpretability and the acceptability of the result by the final user. In this review, we propose an overview of the existing methods to quantify uncertainty associated with DL predictions. We focus on applications to medical image analysis, which present specific challenges due to the high dimensionality of images and their variable quality, as well as constraints associated with real-world clinical routine. Moreover, we discuss the concept of structural uncertainty, a corpus of methods to facilitate the alignment of segmentation uncertainty estimates with clinical attention. We then discuss the evaluation protocols to validate the relevance of uncertainty estimates. Finally, we highlight the open challenges for uncertainty quantification in the medical field.
Collapse
Affiliation(s)
- Benjamin Lambert
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France; Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Florence Forbes
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, 38000, France
| | - Senan Doyle
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Harmonie Dehaene
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Michel Dojat
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France.
| |
Collapse
|
44
|
Sun S, Mei Z, Li X, Tang T, Su Z, Wu Y. A label information fused medical image report generation framework. Artif Intell Med 2024; 150:102823. [PMID: 38553163 DOI: 10.1016/j.artmed.2024.102823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 02/21/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at https://github.com/watersunhznu/LIFMRG.
Collapse
Affiliation(s)
- Shuifa Sun
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China; Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China
| | - Zhoujunsen Mei
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Xiaolong Li
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Economics and Management, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Tinglong Tang
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Zhanglin Su
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China
| | - Yirong Wu
- Institute of Advanced Studies in Humanities and Social Sciences, Beijing Normal University, Zhuhai, 519087, Guangdong, China.
| |
Collapse
|
45
|
Liu Y, Zhang Z, Yue J, Guo W. SCANeXt: Enhancing 3D medical image segmentation with dual attention network and depth-wise convolution. Heliyon 2024; 10:e26775. [PMID: 38439873 PMCID: PMC10909707 DOI: 10.1016/j.heliyon.2024.e26775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/06/2024] Open
Abstract
Existing approaches to 3D medical image segmentation can be generally categorized into convolution-based or transformer-based methods. While convolutional neural networks (CNNs) demonstrate proficiency in extracting local features, they encounter challenges in capturing global representations. In contrast, the consecutive self-attention modules present in vision transformers excel at capturing long-range dependencies and achieving an expanded receptive field. In this paper, we propose a novel approach, termed SCANeXt, for 3D medical image segmentation. Our method combines the strengths of dual attention (Spatial and Channel Attention) and ConvNeXt to enhance representation learning for 3D medical images. In particular, we propose a novel self-attention mechanism crafted to encompass spatial and channel relationships throughout the entire feature dimension. To further extract multiscale features, we introduce a depth-wise convolution block inspired by ConvNeXt after the dual attention block. Extensive evaluations on three benchmark datasets, namely Synapse, BraTS, and ACDC, demonstrate the effectiveness of our proposed method in terms of accuracy. Our SCANeXt model achieves a state-of-the-art result with a Dice Similarity Score of 95.18% on the ACDC dataset, significantly outperforming current methods.
Collapse
Affiliation(s)
- Yajun Liu
- Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, China
| | - Zenghui Zhang
- Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, China
| | - Jiang Yue
- Department of Endocrinology and Metabolism, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, China
| | - Weiwei Guo
- Center for Digital Innovation, Tongji University, China
| |
Collapse
|
46
|
Liu J, Wang H, Shan X, Zhang L, Cui S, Shi Z, Liu Y, Zhang Y, Wang L. Hybrid transformer convolutional neural network-based radiomics models for osteoporosis screening in routine CT. BMC Med Imaging 2024; 24:62. [PMID: 38486185 PMCID: PMC10938662 DOI: 10.1186/s12880-024-01240-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 03/06/2024] [Indexed: 03/18/2024] Open
Abstract
OBJECTIVE Early diagnosis of osteoporosis is crucial to prevent osteoporotic vertebral fracture and complications of spine surgery. We aimed to conduct a hybrid transformer convolutional neural network (HTCNN)-based radiomics model for osteoporosis screening in routine CT. METHODS To investigate the HTCNN algorithm for vertebrae and trabecular segmentation, 92 training subjects and 45 test subjects were employed. Furthermore, we included 283 vertebral bodies and randomly divided them into the training cohort (n = 204) and test cohort (n = 79) for radiomics analysis. Area receiver operating characteristic curves (AUCs) and decision curve analysis (DCA) were applied to compare the performance and clinical value between radiomics models and Hounsfield Unit (HU) values to detect dual-energy X-ray absorptiometry (DXA) based osteoporosis. RESULTS HTCNN algorithm revealed high precision for the segmentation of the vertebral body and trabecular compartment. In test sets, the mean dice scores reach 0.968 and 0.961. 12 features from the trabecular compartment and 15 features from the entire vertebral body were used to calculate the radiomics score (rad score). Compared with HU values and trabecular rad-score, the vertebrae rad-score suggested the best efficacy for osteoporosis and non-osteoporosis discrimination (training group: AUC = 0.95, 95%CI 0.91-0.99; test group: AUC = 0.97, 95%CI 0.93-1.00) and the differences were significant in test group according to the DeLong test (p < 0.05). CONCLUSIONS This retrospective study demonstrated the superiority of the HTCNN-based vertebrae radiomics model for osteoporosis discrimination in routine CT.
Collapse
Affiliation(s)
- Jiachen Liu
- Department of Orthopedics, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China
| | - Huan Wang
- Department of Orthopedics, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China
| | - Xiuqi Shan
- Department of Orthopedics, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China
| | - Lei Zhang
- Department of Orthopedics, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China
| | - Shaoqian Cui
- Department of Orthopedics, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China
| | - Zelin Shi
- Shenyang Institute of Automation, Chinese Academy of Sciences, 110016, Shenyang, People's Republic of China
| | - Yunpeng Liu
- Shenyang Institute of Automation, Chinese Academy of Sciences, 110016, Shenyang, People's Republic of China
| | - Yingdi Zhang
- Shenyang Institute of Automation, Chinese Academy of Sciences, 110016, Shenyang, People's Republic of China
| | - Lanbo Wang
- Department of Radiology, Shengjing Hospital of China Medical University, 110004, Shenyang, People's Republic of China.
| |
Collapse
|
47
|
Kodipalli A, Fernandes SL, Dasar S. An Empirical Evaluation of a Novel Ensemble Deep Neural Network Model and Explainable AI for Accurate Segmentation and Classification of Ovarian Tumors Using CT Images. Diagnostics (Basel) 2024; 14:543. [PMID: 38473015 DOI: 10.3390/diagnostics14050543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/18/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024] Open
Abstract
Ovarian cancer is one of the leading causes of death worldwide among the female population. Early diagnosis is crucial for patient treatment. In this work, our main objective is to accurately detect and classify ovarian cancer. To achieve this, two datasets are considered: CT scan images of patients with cancer and those without, and biomarker (clinical parameters) data from all patients. We propose an ensemble deep neural network model and an ensemble machine learning model for the automatic binary classification of ovarian CT scan images and biomarker data. The proposed model incorporates four convolutional neural network models: VGG16, ResNet 152, Inception V3, and DenseNet 101, with transformers applied for feature extraction. These extracted features are fed into our proposed ensemble multi-layer perceptron model for classification. Preprocessing and CNN tuning techniques such as hyperparameter optimization, data augmentation, and fine-tuning are utilized during model training. Our ensemble model outperforms single classifiers and machine learning algorithms, achieving a mean accuracy of 98.96%, a precision of 97.44%, and an F1-score of 98.7%. We compared these results with those obtained using features extracted by the UNet model, followed by classification with our ensemble model. The transformer demonstrated superior performance in feature extraction over the UNet, with a mean Dice score and mean Jaccard score of 0.98 and 0.97, respectively, and standard deviations of 0.04 and 0.06 for benign tumors and 0.99 and 0.98 with standard deviations of 0.01 for malignant tumors. For the biomarker data, the combination of five machine learning models-KNN, logistic regression, SVM, decision tree, and random forest-resulted in an improved accuracy of 92.8% compared to single classifiers.
Collapse
Affiliation(s)
- Ashwini Kodipalli
- Department of Artificial Intelligence and Data Science, Global Academy of Technology, Bangalore 560098, India
| | - Steven L Fernandes
- Department of Computer Science, Design, Journalism, Creighton University, Omaha, NE 68178, USA
| | - Santosh Dasar
- Department of Radiology, SDM College of Medical Sciences & Hospital, Shri Dharmasthala Manjunatheshwara University, Dharwad 580009, India
| |
Collapse
|
48
|
Papanastasiou G, Dikaios N, Huang J, Wang C, Yang G. Is Attention all You Need in Medical Image Analysis? A Review. IEEE J Biomed Health Inform 2024; 28:1398-1411. [PMID: 38157463 DOI: 10.1109/jbhi.2023.3348436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. Despite their important advances, typical CNN have relatively limited capabilities in modelling "global" pixel interactions, which restricts their generalisation ability to understand out-of-distribution data with different "global" information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments ("Transf/Attention") which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced an analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
Collapse
|
49
|
Mozaffari J, Amirkhani A, Shokouhi SB. ColonGen: an efficient polyp segmentation system for generalization improvement using a new comprehensive dataset. Phys Eng Sci Med 2024; 47:309-325. [PMID: 38224384 DOI: 10.1007/s13246-023-01368-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 12/06/2023] [Indexed: 01/16/2024]
Abstract
Colorectal cancer (CRC) is one of the most common causes of cancer-related deaths. While polyp detection is important for diagnosing CRC, high miss rates for polyps have been reported during colonoscopy. Most deep learning methods extract features from images using convolutional neural networks (CNNs). In recent years, vision transformer (ViT) models have been employed for image processing and have been successful in image segmentation. It is possible to improve image processing by using transformer models that can extract spatial location information, and CNNs that are capable of aggregating local information. Despite this, recent research shows limited effectiveness in increasing data diversity and generalization accuracy. This paper investigates the generalization proficiency of polyp image segmentation based on transformer architecture and proposes a novel approach using two different ViT architectures. This allows the model to learn representations from different perspectives, which can then be combined to create a richer feature representation. Additionally, a more universal and comprehensive dataset has been derived from the datasets presented in the related research, which can be used for improving generalizations. We first evaluated the generalization of our proposed model using three distinct training-testing scenarios. Our experimental results demonstrate that our ColonGen-V1 outperforms other state-of-the-art methods in all scenarios. As a next step, we used the comprehensive dataset for improving the performance of the model against in- and out-of-domain data. The results show that our ColonGen-V2 outperforms state-of-the-art studies by 5.1%, 1.3%, and 1.1% in ETIS-Larib, Kvasir-Seg, and CVC-ColonDB datasets, respectively. The inclusive dataset and the model introduced in this paper are available to the public through this link: https://github.com/javadmozaffari/Polyp_segmentation .
Collapse
Affiliation(s)
- Javad Mozaffari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| | - Abdollah Amirkhani
- School of Automotive Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran.
| | - Shahriar B Shokouhi
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| |
Collapse
|
50
|
Liu S, Zhang A, Xiong J, Su X, Zhou Y, Li Y, Zhang Z, Li Z, Liu F. The application of radiomics machine learning models based on multimodal MRI with different sequence combinations in predicting cervical lymph node metastasis in oral tongue squamous cell carcinoma patients. Head Neck 2024; 46:513-527. [PMID: 38108536 DOI: 10.1002/hed.27605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/15/2023] [Accepted: 12/06/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND The purpose of this study was to explore preliminary the performance of radiomics machine learning models based on multimodal MRI to predict the risk of cervical lymph node metastasis (CLNM) for oral tongue squamous cell carcinoma (OTSCC) patients. METHODS A total of 400 patients were enrolled in this study and divided into six groups according to the different combinations of MRI sequences. Group I consisted of patients with T1-weighted images (T1WI) and FS-T2WI (fat-suppressed T2-weighted images), group II consisted of patients with T1WI, FS-T2WI, and contrast enhanced MRI (CE-MRI), group III consisted of patients with T1WI, FS-T2WI, and T2-weighted images (T2WI), group IV consisted of patients with T1WI, FS-T2WI, CE-MRI, and T2WI, group V consisted of patients with T1WI, FS-T2WI, T2WI, and apparent diffusion coefficient map (ADC), and group VI consisted of patients with T1WI, FS-T2WI, CE-MRI, T2WI, and ADC. Machine learning models were constructed. The performance of the models was compared in each group. RESULTS The machine learning model in group IV including T1WI, FS-T2WI, T2WI, and CE-MRI presented best prediction performance, with AUCs of 0.881 and 0.868 in the two sets. The models with CE-MRI performed better than the models without CE-MRI(I vs. II, III vs. IV, V vs. VI). CONCLUSIONS The radiomics machine learning models based on CE-MRI showed great accuracy and stability in predicting the risk of CLNM for OTSCC patients.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Aihua Zhang
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Jianjun Xiong
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Xingzhou Su
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Yuhang Zhou
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Yang Li
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Zheng Zhang
- Department of Radiology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Zhenning Li
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| | - Fayu Liu
- Department of Oromaxillofacial-Head and Neck Surgery, Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, China Medical University, Liaoning Provincial Key Laboratory of Oral Diseases, Shenyang, China
| |
Collapse
|