51
|
Asiri AA, Shaf A, Ali T, Pasha MA, Khan A, Irfan M, Alqahtani S, Alghamdi A, Alghamdi AH, Alshamrani AFA, Alelyani M, Alamri S. Advancing brain tumor detection: harnessing the Swin Transformer's power for accurate classification and performance analysis. PeerJ Comput Sci 2024; 10:e1867. [PMID: 38435590 PMCID: PMC10909192 DOI: 10.7717/peerj-cs.1867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 01/19/2024] [Indexed: 03/05/2024]
Abstract
The accurate detection of brain tumors through medical imaging is paramount for precise diagnoses and effective treatment strategies. In this study, we introduce an innovative and robust methodology that capitalizes on the transformative potential of the Swin Transformer architecture for meticulous brain tumor image classification. Our approach handles the classification of brain tumors across four distinct categories: glioma, meningioma, non-tumor, and pituitary, leveraging a dataset comprising 2,870 images. Employing the Swin Transformer architecture, our method intricately integrates a multifaceted pipeline encompassing sophisticated preprocessing, intricate feature extraction mechanisms, and a highly nuanced classification framework. Utilizing 21 matrices for performance evaluation across all four classes, these matrices provide a detailed insight into the model's behavior throughout the learning process, furthermore showcasing a graphical representation of confusion matrix, training and validation loss and accuracy. The standout performance parameter, accuracy, stands at an impressive 97%. This achievement outperforms established models like CNN, DCNN, ViT, and their variants in brain tumor classification. Our methodology's robustness and exceptional accuracy showcase its potential as a pioneering model in this domain, promising substantial advancements in accurate tumor identification and classification, thereby contributing significantly to the landscape of medical image analysis.
Collapse
Affiliation(s)
- Abdullah A. Asiri
- Radiological Sciences Department, College of Applied Medical Sciences, Najran University, Najran, Saudi Arabia
| | - Ahmad Shaf
- Department of Computer Science, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan
| | - Tariq Ali
- Department of Computer Science, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan
| | - Muhammad Ahmad Pasha
- Department of Computer Science, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan
| | - Aiza Khan
- Department of Computer Science, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Najran University, Najran, Saudi Arabia
| | - Saeed Alqahtani
- Radiological Sciences Department, College of Applied Medical Sciences, Najran University, Najran, Saudi Arabia
| | - Ahmad Alghamdi
- Radiological Sciences Department, College of Applied Medical Sciences, Taif University, Taif, Saudi Arabia
| | - Ali H. Alghamdi
- Department of Radiological Sciences, Faculty of Applied Medical Sciences, University of Tabuk, Tabuk, Saudi Arabia
| | - Abdullah Fahad A. Alshamrani
- Department of Diagnostic Radiology Technology, College of Applied Medical Sciences, Taibah University, Taibah, Saudi Arabia
| | - Magbool Alelyani
- Department of Radiological Sciences, College of Applied Medical Science, King Khalid University, Abha, Saudi Arabia
| | - Sultan Alamri
- Radiological Sciences Department, College of Applied Medical Sciences, Taif University, Taif, Saudi Arabia
| |
Collapse
|
52
|
Ji W, Chung ACS. Unsupervised Domain Adaptation for Medical Image Segmentation Using Transformer With Meta Attention. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:820-831. [PMID: 37801381 DOI: 10.1109/tmi.2023.3322581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Image segmentation is essential to medical image analysis as it provides the labeled regions of interest for the subsequent diagnosis and treatment. However, fully-supervised segmentation methods require high-quality annotations produced by experts, which is laborious and expensive. In addition, when performing segmentation on another unlabeled image modality, the segmentation performance will be adversely affected due to the domain shift. Unsupervised domain adaptation (UDA) is an effective way to tackle these problems, but the performance of the existing methods is still desired to improve. Also, despite the effectiveness of recent Transformer-based methods in medical image segmentation, the adaptability of Transformers is rarely investigated. In this paper, we present a novel UDA framework using a Transformer for building a cross-modality segmentation method with the advantages of learning long-range dependencies and transferring attentive information. To fully utilize the attention learned by the Transformer in UDA, we propose Meta Attention (MA) and use it to perform a fully attention-based alignment scheme, which can learn the hierarchical consistencies of attention and transfer more discriminative information between two modalities. We have conducted extensive experiments on cross-modality segmentation using three datasets, including a whole heart segmentation dataset (MMWHS), an abdominal organ segmentation dataset, and a brain tumor segmentation dataset. The promising results show that our method can significantly improve performance compared with the state-of-the-art UDA methods.
Collapse
|
53
|
Huang L, Xu Y, Wang S, Sang L, Ma H. SRT: Swin-residual transformer for benign and malignant nodules classification in thyroid ultrasound images. Med Eng Phys 2024; 124:104101. [PMID: 38418029 DOI: 10.1016/j.medengphy.2024.104101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/13/2023] [Accepted: 01/01/2024] [Indexed: 03/01/2024]
Abstract
With the advancement of deep learning technology, computer-aided diagnosis (CAD) is playing an increasing role in the field of medical diagnosis. In particular, the emergence of Transformer-based models has led to a wider application of computer vision technology in the field of medical image processing. In the diagnosis of thyroid diseases, the diagnosis of benign and malignant thyroid nodules based on the TI-RADS classification is greatly influenced by the subjective judgment of ultrasonographers, and at the same time, it also brings an extremely heavy workload to ultrasonographers. To address this, we propose Swin-Residual Transformer (SRT) in this paper, which incorporates residual blocks and triplet loss into Swin Transformer (SwinT). It improves the sensitivity to global and localized features of thyroid nodules and better distinguishes small feature differences. In our exploratory experiments, SRT model achieves an accuracy of 0.8832 with an AUC of 0.8660, outperforming state-of-the-art convolutional neural network (CNN) and Transformer models. Also, ablation experiments have demonstrated the improved performance in the thyroid nodule classification task after introducing residual blocks and triple loss. These results validate the potential of the proposed SRT model to improve the diagnosis of thyroid nodules' ultrasound images. It also provides a feasible guarantee to avoid excessive puncture sampling of thyroid nodules in future clinical diagnosis.
Collapse
Affiliation(s)
- Long Huang
- Department of Oncology, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, China.
| | - Yanran Xu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China.
| | - Shuhuan Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China.
| | - Liang Sang
- Department of Ultrasound, The First Hospital of China Medical University, Shenyang, Liaoning, 110001, China.
| | - He Ma
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China; National University of Singapore (Suzhou) Research Institute, Suzhou, Jiangsu, 215123, China.
| |
Collapse
|
54
|
Oukdach Y, Kerkaou Z, El Ansari M, Koutti L, Fouad El Ouafdi A, De Lange T. ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism. MULTIMEDIA TOOLS AND APPLICATIONS 2024; 83:63635-63654. [DOI: 10.1007/s11042-023-18039-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 09/27/2023] [Accepted: 12/26/2023] [Indexed: 02/10/2025]
|
55
|
Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D. Advances in medical image analysis with vision Transformers: A comprehensive review. Med Image Anal 2024; 91:103000. [PMID: 37883822 DOI: 10.1016/j.media.2023.103000] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 09/30/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de facto standard in Computer Vision problems so far. Thus, Transformers have become an integral part of modern medical image analysis. In this review, we provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks, including classification, segmentation, detection, registration, synthesis, and clinical report generation. For each of these applications, we investigate the novelty, strengths and weaknesses of the different proposed strategies and develop taxonomies highlighting key properties and contributions. Further, if applicable, we outline current benchmarks on different datasets. Finally, we summarize key challenges and discuss different future research directions. In addition, we have provided cited papers with their corresponding implementations in https://github.com/mindflow-institue/Awesome-Transformer.
Collapse
Affiliation(s)
- Reza Azad
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Amirhossein Kazerouni
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Moein Heidari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | | | - Amirali Molaei
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Yiwei Jia
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Abin Jose
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Rijo Roy
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Dorit Merhof
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany; Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany.
| |
Collapse
|
56
|
Hussain D, Al-Masni MA, Aslam M, Sadeghi-Niaraki A, Hussain J, Gu YH, Naqvi RA. Revolutionizing tumor detection and classification in multimodality imaging based on deep learning approaches: Methods, applications and limitations. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:857-911. [PMID: 38701131 DOI: 10.3233/xst-230429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
BACKGROUND The emergence of deep learning (DL) techniques has revolutionized tumor detection and classification in medical imaging, with multimodal medical imaging (MMI) gaining recognition for its precision in diagnosis, treatment, and progression tracking. OBJECTIVE This review comprehensively examines DL methods in transforming tumor detection and classification across MMI modalities, aiming to provide insights into advancements, limitations, and key challenges for further progress. METHODS Systematic literature analysis identifies DL studies for tumor detection and classification, outlining methodologies including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variants. Integration of multimodality imaging enhances accuracy and robustness. RESULTS Recent advancements in DL-based MMI evaluation methods are surveyed, focusing on tumor detection and classification tasks. Various DL approaches, including CNNs, YOLO, Siamese Networks, Fusion-Based Models, Attention-Based Models, and Generative Adversarial Networks, are discussed with emphasis on PET-MRI, PET-CT, and SPECT-CT. FUTURE DIRECTIONS The review outlines emerging trends and future directions in DL-based tumor analysis, aiming to guide researchers and clinicians toward more effective diagnosis and prognosis. Continued innovation and collaboration are stressed in this rapidly evolving domain. CONCLUSION Conclusions drawn from literature analysis underscore the efficacy of DL approaches in tumor detection and classification, highlighting their potential to address challenges in MMI analysis and their implications for clinical practice.
Collapse
Affiliation(s)
- Dildar Hussain
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Korea
| | - Mohammed A Al-Masni
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Korea
| | - Muhammad Aslam
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Korea
| | - Abolghasem Sadeghi-Niaraki
- Department of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Korea
| | - Jamil Hussain
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Korea
| | - Yeong Hyeon Gu
- Department of Artificial Intelligence and Data Science, Sejong University, Seoul, Korea
| | - Rizwan Ali Naqvi
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul, Korea
| |
Collapse
|
57
|
Zhang H, Chen K, Guo K, Tao J, Song L, Ren S, Zhao Y, Teng Z, Qiu W, Wang Z. Multimodal Imaging-Guided Photoimmunotherapy of Pancreatic Cancer by Organosilica Nanomedicine. Adv Healthc Mater 2024; 13:e2302195. [PMID: 37792547 DOI: 10.1002/adhm.202302195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/10/2023] [Indexed: 10/06/2023]
Abstract
Immune checkpoint blockade (ICB) treatments have contributed to substantial clinical progress. However, challenges persist, including inefficient drug delivery and penetration into deep tumor areas, inadequate response to ICB treatments, and potential risk of inflammation due to over-activation of immune cells and uncontrolled release of cytokines following immunotherapy. In response, this study, for the first time, presents a multimodal imaging-guided organosilica nanomedicine (DCCGP) for photoimmunotherapy of pancreatic cancer. The novel DCCGP nanoplatform integrates fluorescence, magnetic resonance, and real-time infrared photothermal imaging, thereby enhancing diagnostic precision and treatment efficacy for pancreatic cancer. In addition, the incorporated copper sulfide nanoparticles (CuS NPs) lead to improved tumor penetration and provide external regulation of immunotherapy via photothermal stimulation. The synergistic immunotherapy effect is realized through the photothermal behavior of CuS NPs, inducing immunogenic cell death and relieving the immunosuppressive tumor microenvironment. Coupling photothermal stimulation with αPD-L1-induced ICB, the platform amplifies the clearance efficiency of tumor cells, achieving an optimized synergistic photoimmunotherapy effect. This study offers a promising strategy for the clinical application of ICB-based combined immunotherapy and presents valuable insights for applications of organosilica in precise tumor immunotherapy and theranostics.
Collapse
Affiliation(s)
- Huifeng Zhang
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| | - Kun Chen
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism & Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Kai Guo
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, 250021, China
| | - Jun Tao
- Key Laboratory for Organic Electronics and Information Displays, Jiangsu Key Laboratory for Biosensors, Institute of Advanced Materials, Jiangsu National Synergetic Innovation Centre for Advanced Materials, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Lina Song
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| | - Shuai Ren
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| | - Yatong Zhao
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| | - Zhaogang Teng
- Key Laboratory for Organic Electronics and Information Displays, Jiangsu Key Laboratory for Biosensors, Institute of Advanced Materials, Jiangsu National Synergetic Innovation Centre for Advanced Materials, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Wenli Qiu
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| | - Zhongqiu Wang
- Department of Radiology, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210029, China
| |
Collapse
|
58
|
Gu J, Qiu Q, Zhu J, Cao Q, Hou Z, Li B, Shu H. Deep learning-based combination of [18F]-FDG PET and CT images for producing pulmonary perfusion image. Med Phys 2023; 50:7779-7790. [PMID: 37387645 DOI: 10.1002/mp.16566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 06/07/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND The main application of [18F] FDG-PET (18 FDG-PET) and CT images in oncology is tumor identification and quantification. Combining PET and CT images to mine pulmonary perfusion information for functional lung avoidance radiation therapy (FLART) is desirable but remains challenging. PURPOSE To develop a deep-learning-based (DL) method to combine 18 FDG-PET and CT images for producing pulmonary perfusion images (PPI). METHODS Pulmonary technetium-99 m-labeled macroaggregated albumin SPECT (PPISPECT ), 18 FDG-PET, and CT images obtained from 53 patients were enrolled. CT and PPISPECT images were rigidly registered, and registration displacement was subsequently used to align 18 FDG-PET and PPISPECT images. The left/right lung was separated and rigidly registered again to improve the registration accuracy. A DL model based on 3D Unet architecture was constructed to directly combine multi-modality 18 FDG-PET and CT images for producing PPI (PPIDLM ). 3D Unet architecture was used as the basic architecture, and the input was expanded from a single-channel to a dual-channel to combine multi-modality images. For comparative evaluation, 18 FDG-PET images were also used alone to generate PPIDLPET . Sixty-seven samples were randomly selected for training and cross-validation, and 36 were used for testing. The Spearman correlation coefficient (rs ) and multi-scale structural similarity index measure (MS-SSIM) between PPIDLM /PPIDLPET and PPISPECT were computed to assess the statistical and perceptual image similarities. The Dice similarity coefficient (DSC) was calculated to determine the similarity between high-/low- functional lung (HFL/LFL) volumes. RESULTS The voxel-wise rs and MS-SSIM of PPIDLM /PPIDLPET were 0.78 ± 0.04/0.57 ± 0.03, 0.93 ± 0.01/0.89 ± 0.01 for cross-validation and 0.78 ± 0.11/0.55 ± 0.18, 0.93 ± 0.03/0.90 ± 0.04 for testing. PPIDLM /PPIDLPET achieved averaged DSC values of 0.78 ± 0.03/0.64 ± 0.02 for HFL and 0.83 ± 0.01/0.72 ± 0.03 for LFL in the training dataset and 0.77 ± 0.11/0.64 ± 0.12, 0.82 ± 0.05/0.72 ± 0.06 in the testing dataset. PPIDLM yielded a stronger correlation and higher MS-SSIM with PPISPECT than PPIDLPET (p < 0.001). CONCLUSIONS The DL-based method integrates lung metabolic and anatomy information for producing PPI and significantly improved the accuracy over methods based on metabolic information alone. The generated PPIDLM can be applied for pulmonary perfusion volume segmentation, which is potentially beneficial for FLART treatment plan optimization.
Collapse
Affiliation(s)
- Jiabing Gu
- Laboratory of Image Science and Technology, School of Computer Science and Engineering Southeast University, Nanjing, Jiangsu, P.R. China
- Department of Radiation Oncology Physics and Technology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, P.R. China
| | - Qingtao Qiu
- Laboratory of Image Science and Technology, School of Computer Science and Engineering Southeast University, Nanjing, Jiangsu, P.R. China
| | - Jian Zhu
- Department of Radiation Oncology Physics and Technology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, P.R. China
- Shandong Key Laboratory of Digital Medicine and Computer Assisted Surgery, The Affiliated Hospital of Qingdao University, Qingdao, P.R. China
| | - Qiang Cao
- Department of Radiation Oncology Physics and Technology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, P.R. China
| | - Zhen Hou
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, Jiangsu, P.R. China
| | - Baosheng Li
- Laboratory of Image Science and Technology, School of Computer Science and Engineering Southeast University, Nanjing, Jiangsu, P.R. China
- Department of Radiation Oncology Physics and Technology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, P.R. China
| | - Huazhong Shu
- Laboratory of Image Science and Technology, School of Computer Science and Engineering Southeast University, Nanjing, Jiangsu, P.R. China
| |
Collapse
|
59
|
Chen W, Ayoub M, Liao M, Shi R, Zhang M, Su F, Huang Z, Li Y, Wang Y, Wong KK. A fusion of VGG-16 and ViT models for improving bone tumor classification in computed tomography. J Bone Oncol 2023; 43:100508. [PMID: 38021075 PMCID: PMC10654018 DOI: 10.1016/j.jbo.2023.100508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/14/2023] [Accepted: 09/20/2023] [Indexed: 12/01/2023] Open
Abstract
Background and Objective Bone tumors present significant challenges in orthopedic medicine due to variations in clinical treatment approaches for different tumor types, which includes benign, malignant, and intermediate cases. Convolutional Neural Networks (CNNs) have emerged as prominent models for tumor classification. However, their limited perception ability hinders the acquisition of global structural information, potentially affecting classification accuracy. To address this limitation, we propose an optimized deep learning algorithm for precise classification of diverse bone tumors. Materials and Methods Our dataset comprises 786 computed tomography (CT) images of bone tumors, featuring sections from two distinct bone species, namely the tibia and femur. Sourced from The Second Affiliated Hospital of Fujian Medical University, the dataset was meticulously preprocessed with noise reduction techniques. We introduce a novel fusion model, VGG16-ViT, leveraging the advantages of the VGG-16 network and the Vision Transformer (ViT) model. Specifically, we select 27 features from the third layer of VGG-16 and input them into the Vision Transformer encoder for comprehensive training. Furthermore, we evaluate the impact of secondary migration using CT images from Xiangya Hospital for validation. Results The proposed fusion model demonstrates notable improvements in classification performance. It effectively reduces the training time while achieving an impressive classification accuracy rate of 97.6%, marking a significant enhancement of 8% in sensitivity and specificity optimization. Furthermore, the investigation into secondary migration's effects on experimental outcomes across the three models reveals its potential to enhance system performance. Conclusion Our novel VGG-16 and Vision Transformer joint network exhibits robust classification performance on bone tumor datasets. The integration of these models enables precise and efficient classification, accommodating the diverse characteristics of different bone tumor types. This advancement holds great significance for the early detection and prognosis of bone tumor patients in the future.
Collapse
Affiliation(s)
- Weimin Chen
- School of Information and Electronics, Hunan City University, Yiyang 413000, China
| | - Muhammad Ayoub
- School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, China
| | - Mengyun Liao
- School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, China
| | - Ruizheng Shi
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Mu Zhang
- Department of Emergency, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Feng Su
- Department of Emergency, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Zhiguo Huang
- Department of Emergency, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Yuanzhe Li
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Yi Wang
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Kevin K.L. Wong
- School of Information and Electronics, Hunan City University, Yiyang 413000, China
- Department of Mechanical Engineering, College of Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
60
|
Khan RF, Lee BD, Lee MS. Transformers in medical image segmentation: a narrative review. Quant Imaging Med Surg 2023; 13:8747-8767. [PMID: 38106306 PMCID: PMC10722011 DOI: 10.21037/qims-23-542] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 09/14/2023] [Indexed: 12/19/2023]
Abstract
Background and Objective Transformers, which have been widely recognized as state-of-the-art tools in natural language processing (NLP), have also come to be recognized for their value in computer vision tasks. With this increasing popularity, they have also been extensively researched in the more complex medical imaging domain. The associated developments have resulted in transformers being on par with sought-after convolution neural networks, particularly for medical image segmentation. Methods combining both types of networks have proven to be especially successful in capturing local and global contexts, thereby significantly boosting their performances in various segmentation problems. Motivated by this success, we have attempted to survey the consequential research focused on innovative transformer networks, specifically those designed to cater to medical image segmentation in an efficient manner. Methods Databases like Google Scholar, arxiv, ResearchGate, Microsoft Academic, and Semantic Scholar have been utilized to find recent developments in this field. Specifically, research in the English language from 2021 to 2023 was considered. Key Content and Findings In this survey, we look into the different types of architectures and attention mechanisms that uniquely improve performance and the structures that are in place to handle complex medical data. Through this survey, we summarize the popular and unconventional transformer-based research as seen through different key angles and analyze quantitatively the strategies that have proven more advanced. Conclusions We have also attempted to discern existing gaps and challenges within current research, notably highlighting the deficiency of annotated medical data for precise deep learning model training. Furthermore, potential future directions for enhancing transformers' utility in healthcare are outlined, encompassing strategies such as transfer learning and exploiting foundation models for specialized medical image segmentation.
Collapse
Affiliation(s)
- Rabeea Fatma Khan
- Department of Computer Science, Graduate School, Kyonggi University, Suwon, Republic of Korea
| | - Byoung-Dai Lee
- Department of Computer Science, Graduate School, Kyonggi University, Suwon, Republic of Korea
| | - Mu Sook Lee
- Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea
| |
Collapse
|
61
|
Bakasa W, Viriri S. Stacked ensemble deep learning for pancreas cancer classification using extreme gradient boosting. Front Artif Intell 2023; 6:1232640. [PMID: 37876961 PMCID: PMC10591225 DOI: 10.3389/frai.2023.1232640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/04/2023] [Indexed: 10/26/2023] Open
Abstract
Ensemble learning aims to improve prediction performance by combining several models or forecasts. However, how much and which ensemble learning techniques are useful in deep learning-based pipelines for pancreas computed tomography (CT) image classification is a challenge. Ensemble approaches are the most advanced solution to many machine learning problems. These techniques entail training multiple models and combining their predictions to improve the predictive performance of a single model. This article introduces the idea of Stacked Ensemble Deep Learning (SEDL), a pipeline for classifying pancreas CT medical images. The weak learners are Inception V3, VGG16, and ResNet34, and we employed a stacking ensemble. By combining the first-level predictions, an input train set for XGBoost, the ensemble model at the second level of prediction, is created. Extreme Gradient Boosting (XGBoost), employed as a strong learner, will make the final classification. Our findings showed that SEDL performed better, with a 98.8% ensemble accuracy, after some adjustments to the hyperparameters. The Cancer Imaging Archive (TCIA) public access dataset consists of 80 pancreas CT scans with a resolution of 512 * 512 pixels, from 53 male and 27 female subjects. A sample of two hundred and twenty-two images was used for training and testing data. We concluded that implementing the SEDL technique is an effective way to strengthen the robustness and increase the performance of the pipeline for classifying pancreas CT medical images. Interestingly, grouping like-minded or talented learners does not make a difference.
Collapse
Affiliation(s)
| | - Serestina Viriri
- School of Mathematics Statistics & Computer Science, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
62
|
Fu CP, Yu MJ, Huang YS, Fuh CS, Chang RF. Stratifying High-Risk Thyroid Nodules Using a Novel Deep Learning System. Exp Clin Endocrinol Diabetes 2023; 131:508-514. [PMID: 37604165 DOI: 10.1055/a-2122-5585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
INTRODUCTION The current ultrasound scan classification system for thyroid nodules is time-consuming, labor-intensive, and subjective. Artificial intelligence (AI) has been shown to increase the accuracy of predicting the malignancy rate of thyroid nodules. This study aims to demonstrate the state-of-the-art Swin Transformer to classify thyroid nodules. MATERIALS AND METHODS Ultrasound images were collected prospectively from patients who received fine needle aspiration biopsy for thyroid nodules from January 2016 to June 2021. One hundred thirty-nine patients with malignant thyroid nodules were enrolled, while 235 patients with benign nodules served as controls. Images were fed to Swin-T and ResNeSt50 models to classify the thyroid nodules. RESULTS Patients with malignant nodules were younger and more likely male compared to those with benign nodules. The average sensitivity and specificity of Swin-T were 82.46% and 84.29%, respectively. The average sensitivity and specificity of ResNeSt50 were 72.51% and 77.14%, respectively. Receiver operating characteristics analysis revealed that the area under the curve of Swin-T was higher (AUC=0.91) than that of ResNeSt50 (AUC=0.82). The McNemar test evaluating the performance of these models showed that Swin-T had significantly better performance than ResNeSt50.Swin-T classifier can be a useful tool in helping shared decision-making between physicians and patients with thyroid nodules, particularly in those with high-risk characteristics of sonographic patterns.
Collapse
Affiliation(s)
- Chia-Po Fu
- Graduate Institute of Biomedical Electronics and Bioinformatics, College of Electrical Engineering and Computer Science, National Taiwan University, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
- Department of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ming-Jen Yu
- Graduate Institute of Biomedical Electronics and Bioinformatics, College of Electrical Engineering and Computer Science, National Taiwan University, Taipei, Taiwan
| | - Yao-Sian Huang
- Department of Computer Science and Information Engineering, National Changhua University of Education, Changhua County, Taiwan
| | - Chiou-Shann Fuh
- Graduate Institute of Biomedical Electronics and Bioinformatics, College of Electrical Engineering and Computer Science, National Taiwan University, Taipei, Taiwan
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Ruey-Feng Chang
- Graduate Institute of Biomedical Electronics and Bioinformatics, College of Electrical Engineering and Computer Science, National Taiwan University, Taipei, Taiwan
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
63
|
Fujima N, Kamagata K, Ueda D, Fujita S, Fushimi Y, Yanagawa M, Ito R, Tsuboyama T, Kawamura M, Nakaura T, Yamada A, Nozaki T, Fujioka T, Matsui Y, Hirata K, Tatsugami F, Naganawa S. Current State of Artificial Intelligence in Clinical Applications for Head and Neck MR Imaging. Magn Reson Med Sci 2023; 22:401-414. [PMID: 37532584 PMCID: PMC10552661 DOI: 10.2463/mrms.rev.2023-0047] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/09/2023] [Indexed: 08/04/2023] Open
Abstract
Due primarily to the excellent soft tissue contrast depictions provided by MRI, the widespread application of head and neck MRI in clinical practice serves to assess various diseases. Artificial intelligence (AI)-based methodologies, particularly deep learning analyses using convolutional neural networks, have recently gained global recognition and have been extensively investigated in clinical research for their applicability across a range of categories within medical imaging, including head and neck MRI. Analytical approaches using AI have shown potential for addressing the clinical limitations associated with head and neck MRI. In this review, we focus primarily on the technical advancements in deep-learning-based methodologies and their clinical utility within the field of head and neck MRI, encompassing aspects such as image acquisition and reconstruction, lesion segmentation, disease classification and diagnosis, and prognostic prediction for patients presenting with head and neck diseases. We then discuss the limitations of current deep-learning-based approaches and offer insights regarding future challenges in this field.
Collapse
Affiliation(s)
- Noriyuki Fujima
- Department of Diagnostic and Interventional Radiology, Hokkaido University Hospital, Sapporo, Hokkaido, Japan
| | - Koji Kamagata
- Department of Radiology, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Daiju Ueda
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Osaka, Japan
| | - Shohei Fujita
- Department of Radiology, University of Tokyo, Tokyo, Japan
| | - Yasutaka Fushimi
- Department of Diagnostic Imaging and Nuclear Medicine, Kyoto University Graduate School of Medicine, Kyoto, Kyoto, Japan
| | - Masahiro Yanagawa
- Department of Radiology, Osaka University Graduate School of Medicine, Suita, Osaka, Japan
| | - Rintaro Ito
- Department of Radiology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Takahiro Tsuboyama
- Department of Radiology, Osaka University Graduate School of Medicine, Suita, Osaka, Japan
| | - Mariko Kawamura
- Department of Radiology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Takeshi Nakaura
- Department of Diagnostic Radiology, Kumamoto University Graduate School of Medicine, Kumamoto, Kumamoto, Japan
| | - Akira Yamada
- Department of Radiology, Shinshu University School of Medicine, Matsumoto, Nagano, Japan
| | - Taiki Nozaki
- Department of Radiology, Keio University School of Medicine, Tokyo, Japan
| | - Tomoyuki Fujioka
- Department of Diagnostic Radiology, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yusuke Matsui
- Department of Radiology, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Okayama, Japan
| | - Kenji Hirata
- Department of Diagnostic Imaging, Graduate School of Medicine, Hokkaido University, Sapporo, Hokkaido, Japan
| | - Fuminari Tatsugami
- Department of Diagnostic Radiology, Hiroshima University, Hiroshima, Hiroshima, Japan
| | - Shinji Naganawa
- Department of Radiology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| |
Collapse
|
64
|
Wang K, George-Jones NA, Chen L, Hunter JB, Wang J. Joint Vestibular Schwannoma Enlargement Prediction and Segmentation Using a Deep Multi-task Model. Laryngoscope 2023; 133:2754-2760. [PMID: 36495306 PMCID: PMC10256836 DOI: 10.1002/lary.30516] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 11/17/2022] [Accepted: 11/20/2022] [Indexed: 12/14/2022]
Abstract
OBJECTIVE To develop a deep-learning-based multi-task (DMT) model for joint tumor enlargement prediction (TEP) and automatic tumor segmentation (TS) for vestibular schwannoma (VS) patients using their initial diagnostic contrast-enhanced T1-weighted (ceT1) magnetic resonance images (MRIs). METHODS Initial ceT1 MRIs for VS patients meeting the inclusion/exclusion criteria of this study were retrospectively collected. VSs on the initial MRIs and their first follow-up scans were manually contoured. Tumor volume and enlargement ratio were measured based on expert contours. A DMT model was constructed for jointly TS and TEP. The manually segmented VS volume on the initial scan and the tumor enlargement label (≥20% volumetric growth) were used as the ground truth for training and evaluating the TS and TEP modules, respectively. RESULTS We performed 5-fold cross-validation with the eligible patients (n = 103). Median segmentation dice coefficient, prediction sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC) were measured and achieved the following values: 84.20%, 0.68, 0.78, 0.72, and 0.77, respectively. The segmentation result is significantly better than the separate TS network (dice coefficient of 83.13%, p = 0.03) and marginally lower than the state-of-the-art segmentation model nnU-Net (dice coefficient of 86.45%, p = 0.16). The TEP performance is significantly better than the single-task prediction model (AUC = 0.60, p = 0.01) and marginally better than a radiomics-based prediction model (AUC = 0.70, p = 0.17). CONCLUSION The proposed DMT model is of higher learning efficiency and achieves promising performance on TEP and TS. The proposed technology has the potential to improve VS patient management. LEVEL OF EVIDENCE NA Laryngoscope, 133:2754-2760, 2023.
Collapse
Affiliation(s)
- Kai Wang
- The Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Nicholas A George-Jones
- The Department of Otolaryngology-Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- The Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Liyuan Chen
- The Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Jacob B Hunter
- The Department of Otolaryngology-Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Jing Wang
- The Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
65
|
Barbero JA, Unadkat P, Choi YY, Eidelberg D. Functional Brain Networks to Evaluate Treatment Responses in Parkinson's Disease. Neurotherapeutics 2023; 20:1653-1668. [PMID: 37684533 PMCID: PMC10684458 DOI: 10.1007/s13311-023-01433-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Network analysis of functional brain scans acquired with [18F]-fluorodeoxyglucose positron emission tomography (FDG PET, to map cerebral glucose metabolism), or resting-state functional magnetic resonance imaging (rs-fMRI, to map blood oxygen level-dependent brain activity) has increasingly been used to identify and validate reproducible circuit abnormalities associated with neurodegenerative disorders such as Parkinson's disease (PD). In addition to serving as imaging markers of the underlying disease process, these networks can be used singly or in combination as an adjunct to clinical diagnosis and as a screening tool for therapeutics trials. Disease networks can also be used to measure rates of progression in natural history studies and to assess treatment responses in individual subjects. Recent imaging studies in PD subjects scanned before and after treatment have revealed therapeutic effects beyond the modulation of established disease networks. Rather, other mechanisms of action may be at play, such as the induction of novel functional brain networks directly by treatment. To date, specific treatment-induced networks have been described in association with novel interventions for PD such as subthalamic adeno-associated virus glutamic acid decarboxylase (AAV2-GAD) gene therapy, as well as sham surgery or oral placebo under blinded conditions. Indeed, changes in the expression of these networks with treatment have been found to correlate consistently with clinical outcome. In aggregate, these attributes suggest a role for functional brain networks as biomarkers in future clinical trials.
Collapse
Affiliation(s)
- János A Barbero
- Center for Neurosciences, The Feinstein Institutes for Medical Research, 350 Community Drive, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, 11549, USA
| | - Prashin Unadkat
- Center for Neurosciences, The Feinstein Institutes for Medical Research, 350 Community Drive, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, 11549, USA
- Elmezzi Graduate School of Molecular Medicine, Manhasset, NY, 11030, USA
| | - Yoon Young Choi
- Center for Neurosciences, The Feinstein Institutes for Medical Research, 350 Community Drive, Manhasset, NY, 11030, USA
| | - David Eidelberg
- Center for Neurosciences, The Feinstein Institutes for Medical Research, 350 Community Drive, Manhasset, NY, 11030, USA.
- Molecular Medicine and Neurology, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, 11549, USA.
| |
Collapse
|
66
|
Dai Y, Zou B, Zhu C, Li Y, Chen Z, Ji Z, Kui X, Zhang W. DE-JANet: A unified network based on dual encoder and joint attention for Alzheimer's disease classification using multi-modal data. Comput Biol Med 2023; 165:107396. [PMID: 37703717 DOI: 10.1016/j.compbiomed.2023.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/28/2023] [Accepted: 08/26/2023] [Indexed: 09/15/2023]
Abstract
Structural magnetic resonance imaging (sMRI), which can reflect cerebral atrophy, plays an important role in the early detection of Alzheimer's disease (AD). However, the information provided by analyzing only the morphological changes in sMRI is relatively limited, and the assessment of the atrophy degree is subjective. Therefore, it is meaningful to combine sMRI with other clinical information to acquire complementary diagnosis information and achieve a more accurate classification of AD. Nevertheless, how to fuse these multi-modal data effectively is still challenging. In this paper, we propose DE-JANet, a unified AD classification network that integrates image data sMRI with non-image clinical data, such as age and Mini-Mental State Examination (MMSE) score, for more effective multi-modal analysis. DE-JANet consists of three key components: (1) a dual encoder module for extracting low-level features from the image and non-image data according to specific encoding regularity, (2) a joint attention module for fusing multi-modal features, and (3) a token classification module for performing AD-related classification according to the fused multi-modal features. Our DE-JANet is evaluated on the ADNI dataset, with a mean accuracy of 0.9722 and 0.9538 for AD classification and mild cognition impairment (MCI) classification, respectively, which is superior to existing methods and indicates advanced performance on AD-related diagnosis tasks.
Collapse
Affiliation(s)
- Yulan Dai
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Beiji Zou
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Chengzhang Zhu
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China.
| | - Yang Li
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Zhi Chen
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Zexin Ji
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Xiaoyan Kui
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Wensheng Zhang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
67
|
Shetty ND, Dhande R, Unadkat BS, Parihar P. A Comprehensive Review on the Diagnosis of Knee Injury by Deep Learning-Based Magnetic Resonance Imaging. Cureus 2023; 15:e45730. [PMID: 37868582 PMCID: PMC10590246 DOI: 10.7759/cureus.45730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/21/2023] [Indexed: 10/24/2023] Open
Abstract
The continual improvement in the field of medical diagnosis has led to the monopoly of using deep learning (DL)-based magnetic resonance imaging (MRI) for the diagnosis of knee injury related to meniscal injury, ligament injury including the cruciate ligaments, collateral ligaments and medial patella-femoral ligament, and cartilage injury. The present systematic review was done by PubMed and Directory of Open Access Journals (DOAJ), wherein we finalised 24 studies conducted on the accuracy of DL MRI studies for knee injury identification. The studies showed an accuracy of 72.5% to 100% indicating that DL MRI holds an equivalent performance as humans in decision-making and management of knee injuries. This further opens up future exploration for improving MRI-based diagnosis keeping in mind the limitations of verification bias and data imbalance in ground truth subjectivity.
Collapse
Affiliation(s)
- Neha D Shetty
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Rajasbala Dhande
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Bhavik S Unadkat
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Pratapsingh Parihar
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| |
Collapse
|
68
|
Liu Z, Lv Q, Yang Z, Li Y, Lee CH, Shen L. Recent progress in transformer-based medical image analysis. Comput Biol Med 2023; 164:107268. [PMID: 37494821 DOI: 10.1016/j.compbiomed.2023.107268] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/30/2023] [Accepted: 07/16/2023] [Indexed: 07/28/2023]
Abstract
The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, enhancement, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. A large number of experiments studied in this review illustrate that the transformer-based method outperforms existing methods through comparisons with multiple evaluation metrics. Finally, we discuss the open challenges and future opportunities in this field. This task-modality review with the latest contents, detailed information, and comprehensive comparison may greatly benefit the broad MIA community.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Ziduo Yang
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Yifan Li
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore, 308433, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
69
|
Pachetti E, Colantonio S. 3D-Vision-Transformer Stacking Ensemble for Assessing Prostate Cancer Aggressiveness from T2w Images. Bioengineering (Basel) 2023; 10:1015. [PMID: 37760117 PMCID: PMC10525095 DOI: 10.3390/bioengineering10091015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/27/2023] [Accepted: 08/20/2023] [Indexed: 09/29/2023] Open
Abstract
Vision transformers represent the cutting-edge topic in computer vision and are usually employed on two-dimensional data following a transfer learning approach. In this work, we propose a trained-from-scratch stacking ensemble of 3D-vision transformers to assess prostate cancer aggressiveness from T2-weighted images to help radiologists diagnose this disease without performing a biopsy. We trained 18 3D-vision transformers on T2-weighted axial acquisitions and combined them into two- and three-model stacking ensembles. We defined two metrics for measuring model prediction confidence, and we trained all the ensemble combinations according to a five-fold cross-validation, evaluating their accuracy, confidence in predictions, and calibration. In addition, we optimized the 18 base ViTs and compared the best-performing base and ensemble models by re-training them on a 100-sample bootstrapped training set and evaluating each model on the hold-out test set. We compared the two distributions by calculating the median and the 95% confidence interval and performing a Wilcoxon signed-rank test. The best-performing 3D-vision-transformer stacking ensemble provided state-of-the-art results in terms of area under the receiving operating curve (0.89 [0.61-1]) and exceeded the area under the precision-recall curve of the base model of 22% (p < 0.001). However, it resulted to be less confident in classifying the positive class.
Collapse
Affiliation(s)
- Eva Pachetti
- “Alessandro Faedo” Institute of Information Science and Technologies (ISTI), National Research Council of Italy (CNR), 56127 Pisa, Italy;
- Department of Information Engineering (DII), University of Pisa, 56122 Pisa, Italy
| | - Sara Colantonio
- “Alessandro Faedo” Institute of Information Science and Technologies (ISTI), National Research Council of Italy (CNR), 56127 Pisa, Italy;
| |
Collapse
|
70
|
Li Y, El Habib Daho M, Conze PH, Zeghlache R, Le Boité H, Bonnin S, Cosette D, Magazzeni S, Lay B, Le Guilcher A, Tadayoni R, Cochener B, Lamard M, Quellec G. Hybrid Fusion of High-Resolution and Ultra-Widefield OCTA Acquisitions for the Automatic Diagnosis of Diabetic Retinopathy. Diagnostics (Basel) 2023; 13:2770. [PMID: 37685306 PMCID: PMC10486731 DOI: 10.3390/diagnostics13172770] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/19/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Optical coherence tomography angiography (OCTA) can deliver enhanced diagnosis for diabetic retinopathy (DR). This study evaluated a deep learning (DL) algorithm for automatic DR severity assessment using high-resolution and ultra-widefield (UWF) OCTA. Diabetic patients were examined with 6×6 mm2 high-resolution OCTA and 15×15 mm2 UWF-OCTA using PLEX®Elite 9000. A novel DL algorithm was trained for automatic DR severity inference using both OCTA acquisitions. The algorithm employed a unique hybrid fusion framework, integrating structural and flow information from both acquisitions. It was trained on data from 875 eyes of 444 patients. Tested on 53 patients (97 eyes), the algorithm achieved a good area under the receiver operating characteristic curve (AUC) for detecting DR (0.8868), moderate non-proliferative DR (0.8276), severe non-proliferative DR (0.8376), and proliferative/treated DR (0.9070). These results significantly outperformed detection with the 6×6 mm2 (AUC = 0.8462, 0.7793, 0.7889, and 0.8104, respectively) or 15×15 mm2 (AUC = 0.8251, 0.7745, 0.7967, and 0.8786, respectively) acquisitions alone. Thus, combining high-resolution and UWF-OCTA acquisitions holds the potential for improved early and late-stage DR detection, offering a foundation for enhancing DR management and a clear path for future works involving expanded datasets and integrating additional imaging modalities.
Collapse
Affiliation(s)
- Yihao Li
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- Univ Bretagne Occidentale, F-29200 Brest, France
| | - Mostafa El Habib Daho
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- Univ Bretagne Occidentale, F-29200 Brest, France
| | - Pierre-Henri Conze
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- IMT Atlantique, ITI Department, F-29200 Brest, France
| | - Rachid Zeghlache
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- Univ Bretagne Occidentale, F-29200 Brest, France
| | - Hugo Le Boité
- Sorbonne University, F-75006 Paris, France
- Service d’Ophtalmologie, Hôpital Lariboisière, AP-HP, F-75475 Paris, France
| | - Sophie Bonnin
- Service d’Ophtalmologie, Hôpital Lariboisière, AP-HP, F-75475 Paris, France
| | | | | | - Bruno Lay
- ADCIS, F-14280 Saint-Contest, France
| | | | - Ramin Tadayoni
- Service d’Ophtalmologie, Hôpital Lariboisière, AP-HP, F-75475 Paris, France
| | - Béatrice Cochener
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- Univ Bretagne Occidentale, F-29200 Brest, France
- Service d’Ophtalmologie, CHRU Brest, F-29200 Brest, France
| | - Mathieu Lamard
- Inserm, UMR 1101 LaTIM, F-29200 Brest, France
- Univ Bretagne Occidentale, F-29200 Brest, France
| | | |
Collapse
|
71
|
Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu H. Transformers in medical imaging: A survey. Med Image Anal 2023; 88:102802. [PMID: 37315483 DOI: 10.1016/j.media.2023.102802] [Citation(s) in RCA: 186] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/11/2023] [Accepted: 03/23/2023] [Indexed: 06/16/2023]
Abstract
Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as de facto operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, restoration, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging.
Collapse
Affiliation(s)
- Fahad Shamshad
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Salman Khan
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; CECS, Australian National University, Canberra ACT 0200, Australia
| | - Syed Waqas Zamir
- Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | | | - Munawar Hayat
- Faculty of IT, Monash University, Clayton VIC 3800, Australia
| | - Fahad Shahbaz Khan
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; Computer Vision Laboratory, Linköping University, Sweden
| | - Huazhu Fu
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore
| |
Collapse
|
72
|
Zou X, Zhai J, Qian S, Li A, Tian F, Cao X, Wang R. Improved breast ultrasound tumor classification using dual-input CNN with GAP-guided attention loss. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:15244-15264. [PMID: 37679179 DOI: 10.3934/mbe.2023682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Ultrasonography is a widely used medical imaging technique for detecting breast cancer. While manual diagnostic methods are subject to variability and time-consuming, computer-aided diagnostic (CAD) methods have proven to be more efficient. However, current CAD approaches neglect the impact of noise and artifacts on the accuracy of image analysis. To enhance the precision of breast ultrasound image analysis for identifying tissues, organs and lesions, we propose a novel approach for improved tumor classification through a dual-input model and global average pooling (GAP)-guided attention loss function. Our approach leverages a convolutional neural network with transformer architecture and modifies the single-input model for dual-input. This technique employs a fusion module and GAP operation-guided attention loss function simultaneously to supervise the extraction of effective features from the target region and mitigate the effect of information loss or redundancy on misclassification. Our proposed method has three key features: (i) ResNet and MobileViT are combined to enhance local and global information extraction. In addition, a dual-input channel is designed to include both attention images and original breast ultrasound images, mitigating the impact of noise and artifacts in ultrasound images. (ii) A fusion module and GAP operation-guided attention loss function are proposed to improve the fusion of dual-channel feature information, as well as supervise and constrain the weight of the attention mechanism on the fused focus region. (iii) Using the collected uterine fibroid ultrasound dataset to train ResNet18 and load the pre-trained weights, our experiments on the BUSI and BUSC public datasets demonstrate that the proposed method outperforms some state-of-the-art methods. The code will be publicly released at https://github.com/425877/Improved-Breast-Ultrasound-Tumor-Classification.
Collapse
Affiliation(s)
- Xiao Zou
- School of Physics and Electronics, Hunan Normal University, Changsha 410081, China
| | - Jintao Zhai
- School of Physics and Electronics, Hunan Normal University, Changsha 410081, China
| | - Shengyou Qian
- School of Physics and Electronics, Hunan Normal University, Changsha 410081, China
| | - Ang Li
- School of Physics and Electronics, Hunan Normal University, Changsha 410081, China
| | - Feng Tian
- School of Physics and Electronics, Hunan Normal University, Changsha 410081, China
| | - Xiaofei Cao
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
| | - Runmin Wang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
73
|
Al-Hammuri K, Gebali F, Kanan A, Chelvan IT. Vision transformer architecture and applications in digital health: a tutorial and survey. Vis Comput Ind Biomed Art 2023; 6:14. [PMID: 37428360 PMCID: PMC10333157 DOI: 10.1186/s42492-023-00140-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 05/30/2023] [Indexed: 07/11/2023] Open
Abstract
The vision transformer (ViT) is a state-of-the-art architecture for image recognition tasks that plays an important role in digital health applications. Medical images account for 90% of the data in digital medicine applications. This article discusses the core foundations of the ViT architecture and its digital health applications. These applications include image segmentation, classification, detection, prediction, reconstruction, synthesis, and telehealth such as report generation and security. This article also presents a roadmap for implementing the ViT in digital health systems and discusses its limitations and challenges.
Collapse
Affiliation(s)
- Khalid Al-Hammuri
- Electrical and Computer Engineering, University of Victoria, Victoria, V8W 2Y2, Canada.
| | - Fayez Gebali
- Electrical and Computer Engineering, University of Victoria, Victoria, V8W 2Y2, Canada
| | - Awos Kanan
- Computer Engineering, Princess Sumaya University for Technology, Amman, 11941, Jordan
| | | |
Collapse
|
74
|
Li R, Yang F, Liu X, Shi H. HGT: A Hierarchical GCN-Based Transformer for Multimodal Periprosthetic Joint Infection Diagnosis Using Computed Tomography Images and Text. SENSORS (BASEL, SWITZERLAND) 2023; 23:5795. [PMID: 37447649 DOI: 10.3390/s23135795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/12/2023] [Accepted: 06/19/2023] [Indexed: 07/15/2023]
Abstract
Prosthetic joint infection (PJI) is a prevalent and severe complication characterized by high diagnostic challenges. Currently, a unified diagnostic standard incorporating both computed tomography (CT) images and numerical text data for PJI remains unestablished, owing to the substantial noise in CT images and the disparity in data volume between CT images and text data. This study introduces a diagnostic method, HGT, based on deep learning and multimodal techniques. It effectively merges features from CT scan images and patients' numerical text data via a Unidirectional Selective Attention (USA) mechanism and a graph convolutional network (GCN)-based Feature Fusion network. We evaluated the proposed method on a custom-built multimodal PJI dataset, assessing its performance through ablation experiments and interpretability evaluations. Our method achieved an accuracy (ACC) of 91.4% and an area under the curve (AUC) of 95.9%, outperforming recent multimodal approaches by 2.9% in ACC and 2.2% in AUC, with a parameter count of only 68 M. Notably, the interpretability results highlighted our model's strong focus and localization capabilities at lesion sites. This proposed method could provide clinicians with additional diagnostic tools to enhance accuracy and efficiency in clinical practice.
Collapse
Affiliation(s)
- Ruiyang Li
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China
| | - Fujun Yang
- College of Computer Science, Sichuan University, Chengdu 610041, China
| | - Xianjie Liu
- College of Computer Science, Sichuan University, Chengdu 610041, China
| | - Hongwei Shi
- College of Computer Science, Sichuan University, Chengdu 610041, China
| |
Collapse
|
75
|
Sanmarchi F, Bucci A, Nuzzolese AG, Carullo G, Toscano F, Nante N, Golinelli D. A step-by-step researcher's guide to the use of an AI-based transformer in epidemiology: an exploratory analysis of ChatGPT using the STROBE checklist for observational studies. ZEITSCHRIFT FUR GESUNDHEITSWISSENSCHAFTEN = JOURNAL OF PUBLIC HEALTH 2023:1-36. [PMID: 37361298 PMCID: PMC10215032 DOI: 10.1007/s10389-023-01936-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 05/03/2023] [Indexed: 06/28/2023]
Abstract
Objective This study aims at investigating how AI-based transformers can support researchers in designing and conducting an epidemiological study. To accomplish this, we used ChatGPT to reformulate the STROBE recommendations into a list of questions to be answered by the transformer itself. We then qualitatively evaluated the coherence and relevance of the transformer's outputs. Study design Descriptive study. Methods We first chose a study to be used as a basis for the simulation. We then used ChatGPT to transform each STROBE checklist's item into specific prompts. Each answer to the respective prompt was evaluated by independent researchers in terms of coherence and relevance. Results The mean scores assigned to each prompt were heterogeneous. On average, for the coherence domain, the overall mean score was 3.6 out of 5.0, and for relevance it was 3.3 out of 5.0. The lowest scores were assigned to items belonging to the Methods section of the checklist. Conclusions ChatGPT can be considered as a valuable support for researchers in conducting an epidemiological study, following internationally recognized guidelines and standards. It is crucial for the users to have knowledge on the subject and a critical mindset when evaluating the outputs. The potential benefits of AI in scientific research and publishing are undeniable, but it is crucial to address the risks, and the ethical and legal consequences associated with its use.
Collapse
Affiliation(s)
- Francesco Sanmarchi
- Department of Biomedical and Neuromotor Sciences, Alma Mater Studiorum – University of Bologna, Via San Giacomo 12, 40126 Bologna, Italy
| | - Andrea Bucci
- Department of Economics and Law, University of Macerata, Macerata, Italy
| | | | - Gherardo Carullo
- Department of Italian and Supranational Public Law, University of Milan, Milan, Italy
| | | | - Nicola Nante
- Present Address: Department of Molecular and Developmental Medicine, University of Siena, Siena, Italy
| | - Davide Golinelli
- Department of Biomedical and Neuromotor Sciences, Alma Mater Studiorum – University of Bologna, Via San Giacomo 12, 40126 Bologna, Italy
- Present Address: Department of Molecular and Developmental Medicine, University of Siena, Siena, Italy
| |
Collapse
|
76
|
Wang J, Wang J, Chen D, Wu X, Xu Z, Yu X, Sheng S, Lin X, Chen X, Wu J, Ying H, Xu W. Prediction of postoperative visual acuity in patients with age-related cataracts using macular optical coherence tomography-based deep learning method. Front Med (Lausanne) 2023; 10:1165135. [PMID: 37250634 PMCID: PMC10213207 DOI: 10.3389/fmed.2023.1165135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 04/14/2023] [Indexed: 05/31/2023] Open
Abstract
Background To predict postoperative visual acuity (VA) in patients with age-related cataracts using macular optical coherence tomography-based deep learning method. Methods A total of 2,051 eyes from 2,051 patients with age-related cataracts were included. Preoperative optical coherence tomography (OCT) images and best-corrected visual acuity (BCVA) were collected. Five novel models (I, II, III, IV, and V) were proposed to predict postoperative BCVA. The dataset was randomly divided into a training (n = 1,231), validation (n = 410), and test set (n = 410). The performance of the models in predicting exact postoperative BCVA was evaluated using mean absolute error (MAE) and root mean square error (RMSE). The performance of the models in predicting whether postoperative BCVA was improved by at least two lines in the visual chart (0.2LogMAR) was evaluated using precision, sensitivity, accuracy, F1 and area under curve (AUC). Results Model V containing preoperative OCT images with horizontal and vertical B-scans, macular morphological feature indices, and preoperative BCVA had a better performance in predicting postoperative VA, with the lowest MAE (0.1250 and 0.1194LogMAR) and RMSE (0.2284 and 0.2362LogMAR), and the highest precision (90.7% and 91.7%), sensitivity (93.4% and 93.8%), accuracy (88% and 89%), F1 (92% and 92.7%) and AUCs (0.856 and 0.854) in the validation and test datasets, respectively. Conclusion The model had a good performance in predicting postoperative VA, when the input information contained preoperative OCT scans, macular morphological feature indices, and preoperative BCVA. The preoperative BCVA and macular OCT indices were of great significance in predicting postoperative VA in patients with age-related cataracts.
Collapse
Affiliation(s)
- Jingwen Wang
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jinhong Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Dan Chen
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xingdi Wu
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zhe Xu
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xuewen Yu
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Ophthalmology, The First People’s Hospital of Xiaoshan District, Xiaoshan Affiliated Hospital of Wenzhou Medical University, Hangzhou, Zhejiang, China
| | - Siting Sheng
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xueqi Lin
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiang Chen
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, School of Public Health, and Institute of Wenzhou, Zhejiang University, Hangzhou, Zhejiang, China
| | - Haochao Ying
- School of Public Health, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wen Xu
- Eye Center of the Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
77
|
Gao Y, Dai Y, Liu F, Chen W, Shi L. An anatomy-aware framework for automatic segmentation of parotid tumor from multimodal MRI. Comput Biol Med 2023; 161:107000. [PMID: 37201442 DOI: 10.1016/j.compbiomed.2023.107000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 03/10/2023] [Accepted: 05/02/2023] [Indexed: 05/20/2023]
Abstract
Magnetic Resonance Imaging (MRI) plays an important role in diagnosing the parotid tumor, where accurate segmentation of tumors is highly desired for determining appropriate treatment plans and avoiding unnecessary surgery. However, the task remains nontrivial and challenging due to ambiguous boundaries and various sizes of the tumor, as well as the presence of a large number of anatomical structures around the parotid gland that are similar to the tumor. To overcome these problems, we propose a novel anatomy-aware framework for automatic segmentation of parotid tumors from multimodal MRI. First, a Transformer-based multimodal fusion network PT-Net is proposed in this paper. The encoder of PT-Net extracts and fuses contextual information from three modalities of MRI from coarse to fine, to obtain cross-modality and multi-scale tumor information. The decoder stacks the feature maps of different modalities and calibrates the multimodal information using the channel attention mechanism. Second, considering that the segmentation model is prone to be disturbed by similar anatomical structures and make wrong predictions, we design anatomy-aware loss. By calculating the distance between the activation regions of the prediction segmentation and the ground truth, our loss function forces the model to distinguish similar anatomical structures with the tumor and make correct predictions. Extensive experiments with MRI scans of the parotid tumor showed that our PT-Net achieved higher segmentation accuracy than existing networks. The anatomy-aware loss outperformed state-of-the-art loss functions for parotid tumor segmentation. Our framework can potentially improve the quality of preoperative diagnosis and surgery planning of parotid tumors.
Collapse
Affiliation(s)
- Yifan Gao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Yin Dai
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China; Engineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang, 110169, China.
| | - Fayu Liu
- Department of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang, 110002, China
| | - Weibing Chen
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China; Engineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang, 110169, China
| | - Lifu Shi
- Liaoning Jiayin Medical Technology Co., LTD, Shenyang, 110170, China
| |
Collapse
|
78
|
Li J, Chen J, Tang Y, Wang C, Landman BA, Zhou SK. Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives. Med Image Anal 2023; 85:102762. [PMID: 36738650 PMCID: PMC10010286 DOI: 10.1016/j.media.2023.102762] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 01/18/2023] [Accepted: 01/27/2023] [Indexed: 02/01/2023]
Abstract
Transformer, one of the latest technological advances of deep learning, has gained prevalence in natural language processing or computer vision. Since medical imaging bear some resemblance to computer vision, it is natural to inquire about the status quo of Transformers in medical imaging and ask the question: can the Transformer models transform medical imaging? In this paper, we attempt to make a response to the inquiry. After a brief introduction of the fundamentals of Transformers, especially in comparison with convolutional neural networks (CNNs), and highlighting key defining properties that characterize the Transformers, we offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging and exhibit current research progresses made in the areas of medical image segmentation, recognition, detection, registration, reconstruction, enhancement, etc. In particular, what distinguishes our review lies in its organization based on the Transformer's key defining properties, which are mostly derived from comparing the Transformer and CNN, and its type of architecture, which specifies the manner in which the Transformer and CNN are combined, all helping the readers to best understand the rationale behind the reviewed approaches. We conclude with discussions of future perspectives.
Collapse
Affiliation(s)
- Jun Li
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Junyu Chen
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medical Institutes, Baltimore, MD, USA
| | - Yucheng Tang
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA
| | - Ce Wang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bennett A Landman
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA
| | - S Kevin Zhou
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China; School of Biomedical Engineering & Suzhou Institute for Advanced Research, Center for Medical Imaging, Robotics, and Analytic Computing & Learning (MIRACLE), University of Science and Technology of China, Suzhou 215123, China.
| |
Collapse
|
79
|
Tang S, Yu X, Cheang CF, Liang Y, Zhao P, Yu HH, Choi IC. Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images. Comput Biol Med 2023; 157:106723. [PMID: 36907035 DOI: 10.1016/j.compbiomed.2023.106723] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 02/04/2023] [Accepted: 02/26/2023] [Indexed: 03/07/2023]
Abstract
Despite being widely utilized to help endoscopists identify gastrointestinal (GI) tract diseases using classification and segmentation, models based on convolutional neural network (CNN) have difficulties in distinguishing the similarities among some ambiguous types of lesions presented in endoscopic images, and in the training when lacking labeled datasets. Those will prevent CNN from further improving the accuracy of diagnosis. To address these challenges, we first proposed a Multi-task Network (TransMT-Net) capable of simultaneously learning two tasks (classification and segmentation), which has the transformer designed to learn global features and can combine the advantages of CNN in learning local features so that to achieve a more accurate prediction in identifying the lesion types and regions in GI tract endoscopic images. We further adopted the active learning in TransMT-Net to tackle the labeled image-hungry problem. A dataset was created from the CVC-ClinicDB dataset, Macau Kiang Wu Hospital, and Zhongshan Hospital to evaluate the model performance. Then, the experimental results show that our model not only achieved 96.94% accuracy in the classification task and 77.76% Dice Similarity Coefficient in the segmentation task but also outperformed those of other models on our test set. Meanwhile, active learning also produced positive results for the performance of our model with a small-scale initial training set, and even its performance with 30% of the initial training set was comparable to that of most comparable models with the full training set. Consequently, the proposed TransMT-Net has demonstrated its potential performance in GI tract endoscopic images and it through active learning can alleviate the shortage of labeled images.
Collapse
Affiliation(s)
- Suigu Tang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Xiaoyuan Yu
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Chak Fong Cheang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China.
| | - Yanyan Liang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Penghui Zhao
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Hon Ho Yu
- Kiang Wu Hospital, Macao Special Administrative Region of China
| | - I Cheong Choi
- Kiang Wu Hospital, Macao Special Administrative Region of China
| |
Collapse
|
80
|
Patsanis A, Sunoqrot MRS, Bathen TF, Elschot M. CROPro: a tool for automated cropping of prostate magnetic resonance images. J Med Imaging (Bellingham) 2023; 10:024004. [PMID: 36895761 PMCID: PMC9990132 DOI: 10.1117/1.jmi.10.2.024004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 02/09/2023] [Indexed: 03/09/2023] Open
Abstract
Purpose To bypass manual data preprocessing and optimize deep learning performance, we developed and evaluated CROPro, a tool to standardize automated cropping of prostate magnetic resonance (MR) images. Approach CROPro enables automatic cropping of MR images regardless of patient health status, image size, prostate volume, or pixel spacing. CROPro can crop foreground pixels from a region of interest (e.g., prostate) with different image sizes, pixel spacing, and sampling strategies. Performance was evaluated in the context of clinically significant prostate cancer (csPCa) classification. Transfer learning was used to train five convolutional neural network (CNN) and five vision transformer (ViT) models using different combinations of cropped image sizes ( 64 × 64 , 128 × 128 , and 256 × 256 pixels2), pixel spacing ( 0.2 × 0.2 , 0.3 × 0.3 , 0.4 × 0.4 , and 0.5 × 0.5 mm 2 ), and sampling strategies (center, random, and stride cropping) over the prostate. T2-weighted MR images ( N = 1475 ) from the online available PI-CAI challenge were used to train ( N = 1033 ), validate ( N = 221 ), and test ( N = 221 ) all models. Results Among CNNs, SqueezeNet with stride cropping (image size: 128 × 128 , pixel spacing: 0.2 × 0.2 mm 2 ) achieved the best classification performance ( 0.678 ± 0.006 ). Among ViTs, ViT-H/14 with random cropping (image size: 64 × 64 and pixel spacing: 0.5 × 0.5 mm 2 ) achieved the best performance ( 0.756 ± 0.009 ). Model performance depended on the cropped area, with optimal size generally larger with center cropping ( ∼ 40 cm 2 ) than random/stride cropping ( ∼ 10 cm 2 ). Conclusion We found that csPCa classification performance of CNNs and ViTs depends on the cropping settings. We demonstrated that CROPro is well suited to optimize these settings in a standardized manner, which could improve the overall performance of deep learning models.
Collapse
Affiliation(s)
- Alexandros Patsanis
- Norwegian University of Science and Technology, Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Trondheim, Norway
| | - Mohammed R. S. Sunoqrot
- Norwegian University of Science and Technology, Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Trondheim, Norway
- St. Olavs Hospital, Trondheim University Hospital, Department of Radiology and Nuclear Medicine, Trondheim, Norway
| | - Tone F. Bathen
- Norwegian University of Science and Technology, Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Trondheim, Norway
- St. Olavs Hospital, Trondheim University Hospital, Department of Radiology and Nuclear Medicine, Trondheim, Norway
| | - Mattijs Elschot
- Norwegian University of Science and Technology, Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Trondheim, Norway
- St. Olavs Hospital, Trondheim University Hospital, Department of Radiology and Nuclear Medicine, Trondheim, Norway
| |
Collapse
|
81
|
Dual encoder network with transformer-CNN for multi-organ segmentation. Med Biol Eng Comput 2023; 61:661-671. [PMID: 36580181 DOI: 10.1007/s11517-022-02723-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 11/27/2022] [Indexed: 12/30/2022]
Abstract
Medical image segmentation is a critical step in many imaging applications. Automatic segmentation has gained extensive concern using a convolutional neural network (CNN). However, the traditional CNN-based methods fail to extract global and long-range contextual information due to local convolution operation. Transformer overcomes the limitation of CNN-based models. Inspired by the success of transformers in computer vision (CV), many researchers focus on designing the transformer-based U-shaped method in medical image segmentation. The transformer-based approach cannot effectively capture the fine-grained details. This paper proposes a dual encoder network with transformer-CNN for multi-organ segmentation. The new segmentation framework takes full advantage of CNN and transformer to enhance the segmentation accuracy. The Swin-transformer encoder extracts global information, and the CNN encoder captures local information. We introduce fusion modules to fuse convolutional features and the sequence of features from the transformer. Feature fusion is concatenated through the skip connection to smooth the decision boundary effectively. We extensively evaluate our method on the synapse multi-organ CT dataset and the automated cardiac diagnosis challenge (ACDC) dataset. The results demonstrate that the proposed method achieves Dice similarity coefficient (DSC) metrics of 80.68% and 91.12% on the synapse multi-organ CT and ACDC datasets, respectively. We perform the ablation studies on the ACDC dataset, demonstrating the effectiveness of critical components of our method. Our results match the ground-truth boundary more consistently than the existing models. Our approach gains more accurate results on challenging 2D images for multi-organ segmentation. Compared with the state-of-the-art methods, our proposed method achieves superior performance in multi-organ segmentation tasks. Graphical Abstract The key process in medical image segmentation.
Collapse
|
82
|
Feng J, Zhang H, Geng M, Chen H, Jia K, Sun Z, Li Z, Cao X, Pogue BW. X-ray Cherenkov-luminescence tomography reconstruction with a three-component deep learning algorithm: Swin transformer, convolutional neural network, and locality module. JOURNAL OF BIOMEDICAL OPTICS 2023; 28:026004. [PMID: 36818584 PMCID: PMC9932523 DOI: 10.1117/1.jbo.28.2.026004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/19/2023] [Indexed: 06/18/2023]
Abstract
SIGNIFICANCE X-ray Cherenkov-luminescence tomography (XCLT) produces fast emission data from megavoltage (MV) x-ray scanning, in which the excitation location of molecules within tissue is reconstructed. However standard filtered backprojection (FBP) algorithms for XCLT sinogram reconstruction can suffer from insufficient data due to dose limitations, so there are limits in the reconstruction quality with some artifacts. We report a deep learning algorithm for XCLT with high image quality and improved quantitative accuracy. AIM To directly reconstruct the distribution of emission quantum yield for x-ray Cherenkov-luminescence tomography, we proposed a three-component deep learning algorithm that includes a Swin transformer, convolution neural network, and locality module model. APPROACH A data-to-image model x-ray Cherenkov-luminescence tomography is developed based on a Swin transformer, which is used to extract pixel-level prior information from the sinogram domain. Meanwhile, a convolutional neural network structure is deployed to transform the extracted pixel information from the sinogram domain to the image domain. Finally, a locality module is designed between the encoder and decoder connection structures for delivering features. Its performance was validated with simulation, physical phantom, and in vivo experiments. RESULTS This approach can better deal with the limits to data than conventional FBP methods. The method was validated with numerical and physical phantom experiments, with results showing that it improved the reconstruction performance mean square error ( > 94.1 % ), peak signal-to-noise ratio ( > 41.7 % ), and Pearson correlation ( > 19 % ) compared with the FBP algorithm. The Swin-CNN also achieved a 32.1% improvement in PSNR over the deep learning method AUTOMAP. CONCLUSIONS This study shows that the three-component deep learning algorithm provides an effective reconstruction method for x-ray Cherenkov-luminescence tomography.
Collapse
Affiliation(s)
- Jinchao Feng
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
- Beijing Laboratory of Advanced Information Networks, Beijing, China
| | - Hu Zhang
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
| | - Mengfan Geng
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
| | - Hanliang Chen
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
| | - Kebin Jia
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
- Beijing Laboratory of Advanced Information Networks, Beijing, China
| | - Zhonghua Sun
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
- Beijing Laboratory of Advanced Information Networks, Beijing, China
| | - Zhe Li
- Beijing University of Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing, China
- Beijing Laboratory of Advanced Information Networks, Beijing, China
| | - Xu Cao
- Xidian University, Engineering Research Center of Molecular and Neuro Imaging of the Ministry of Education and School of Life Science and Technology, Xi’an, China
| | - Brian W. Pogue
- University of Wisconsin-Madison, Department of Medical Physics, Madison, Wisconsin, United States
| |
Collapse
|
83
|
Wei Y, Yang M, Xu L, Liu M, Zhang F, Xie T, Cheng X, Wang X, Che F, Li Q, Xu Q, Huang Z, Liu M. Novel Computed-Tomography-Based Transformer Models for the Noninvasive Prediction of PD-1 in Pre-Operative Settings. Cancers (Basel) 2023; 15:cancers15030658. [PMID: 36765615 PMCID: PMC9913645 DOI: 10.3390/cancers15030658] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 01/05/2023] [Accepted: 01/12/2023] [Indexed: 01/24/2023] Open
Abstract
The expression status of programmed cell death protein 1 (PD-1) in patients with hepatocellular carcinoma (HCC) is associated with the checkpoint blockade treatment responses of PD-1/PD-L1. Thus, accurately and preoperatively identifying the status of PD-1 has great clinical implications for constructing personalized treatment strategies. To investigate the preoperative predictive value of the transformer-based model for identifying the status of PD-1 expression, 93 HCC patients with 75 training cohorts (2859 images) and 18 testing cohorts (670 images) were included. We propose a transformer-based network architecture, ResTransNet, that efficiently employs convolutional neural networks (CNNs) and self-attention mechanisms to automatically acquire a persuasive feature to obtain a prediction score using a nonlinear classifier. The area under the curve, receiver operating characteristic curve, and decision curves were applied to evaluate the prediction model's performance. Then, Kaplan-Meier survival analyses were applied to evaluate the overall survival (OS) and recurrence-free survival (RFS) in PD-1-positive and PD-1-negative patients. The proposed transformer-based model obtained an accuracy of 88.2% with a sensitivity of 88.5%, a specificity of 88.9%, and an area under the curve of 91.1% in the testing cohort.
Collapse
Affiliation(s)
- Yi Wei
- Department of Radiology, West China Hospital, Sichuan University, Chengdu 610000, China
| | - Meiyi Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
| | - Lifeng Xu
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou 324000, China
| | - Minghui Liu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
| | - Feng Zhang
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou 324000, China
| | - Tianshu Xie
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Xuan Cheng
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
| | - Xiaomin Wang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
| | - Feng Che
- Department of Radiology, West China Hospital, Sichuan University, Chengdu 610000, China
| | - Qian Li
- Department of Radiology, West China Hospital, Sichuan University, Chengdu 610000, China
| | - Qing Xu
- Institute of Clinical Pathology, West China Hospital, Sichuan University, Chengdu 610000, China
| | - Zixing Huang
- Department of Radiology, West China Hospital, Sichuan University, Chengdu 610000, China
- Correspondence: (Z.H.); (M.L.)
| | - Ming Liu
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou 324000, China
- Correspondence: (Z.H.); (M.L.)
| |
Collapse
|
84
|
Wang J, Mao Y, Gao X, Zhang Y. Recurrence risk stratification for locally advanced cervical cancer using multi-modality transformer network. Front Oncol 2023; 13:1100087. [PMID: 36874136 PMCID: PMC9978213 DOI: 10.3389/fonc.2023.1100087] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/01/2023] [Indexed: 02/18/2023] Open
Abstract
Objectives Recurrence risk evaluation is clinically significant for patients with locally advanced cervical cancer (LACC). We investigated the ability of transformer network in recurrence risk stratification of LACC based on computed tomography (CT) and magnetic resonance (MR) images. Methods A total of 104 patients with pathologically diagnosed LACC between July 2017 and December 2021 were enrolled in this study. All patients underwent CT and MR scanning, and their recurrence status was identified by the biopsy. We randomly divided patients into training cohort (48 cases, non-recurrence: recurrence = 37: 11), validation cohort (21 cases, non-recurrence: recurrence = 16: 5), and testing cohort (35 cases, non-recurrence: recurrence = 27: 8), upon which we extracted 1989, 882 and 315 patches for model's development, validation and evaluation, respectively. The transformer network consisted of three modality fusion modules to extract multi-modality and multi-scale information, and a fully-connected module to perform recurrence risk prediction. The model's prediction performance was assessed by six metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, f1-score, sensitivity, specificity and precision. Univariate analysis with F-test and T-test were conducted for statistical analysis. Results The proposed transformer network is superior to conventional radiomics methods and other deep learning networks in both training, validation and testing cohorts. Particularly, in testing cohort, the transformer network achieved the highest AUC of 0.819 ± 0.038, while four conventional radiomics methods and two deep learning networks got the AUCs of 0.680 ± 0.050, 0.720 ± 0.068, 0.777 ± 0.048, 0.691 ± 0.103, 0.743 ± 0.022 and 0.733 ± 0.027, respectively. Conclusions The multi-modality transformer network showed promising performance in recurrence risk stratification of LACC and may be used as an effective tool to help clinicians make clinical decisions.
Collapse
Affiliation(s)
- Jian Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China.,Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China
| | - Yixiao Mao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China.,Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China
| | - Xinna Gao
- Department of Radiation Oncology, Southern Medical University Nanfang Hospital, Guangzhou, Guangdong, China
| | - Yu Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China.,Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China
| |
Collapse
|
85
|
Sun J, Wu B, Zhao T, Gao L, Xie K, Lin T, Sui J, Li X, Wu X, Ni X. Classification for thyroid nodule using ViT with contrastive learning in ultrasound images. Comput Biol Med 2023; 152:106444. [PMID: 36565481 DOI: 10.1016/j.compbiomed.2022.106444] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 12/01/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The lack of representative features between benign nodules, especially level 3 of Thyroid Imaging Reporting and Data System (TI-RADS), and malignant nodules limits diagnostic accuracy, leading to inconsistent interpretation, overdiagnosis, and unnecessary biopsies. We propose a Vision-Transformer-based (ViT) thyroid nodule classification model using contrast learning, called TC-ViT, to improve accuracy of diagnosis and specificity of biopsy recommendations. ViT can explore the global features of thyroid nodules well. Nodule images are used as ROI to enhance the local features of the ViT. Contrast learning can minimize the representation distance between nodules of the same category, enhance the representation consistency of global and local features, and achieve accurate diagnosis of TI-RADS 3 or malignant nodules. The test results achieve an accuracy of 86.9%. The evaluation metrics show that the network outperforms other classical deep learning-based networks in terms of classification performance. TC-ViT can achieve automatic classification of TI-RADS 3 and malignant nodules on ultrasound images. It can also be used as a key step in computer-aided diagnosis for comprehensive analysis and accurate diagnosis. The code will be available at https://github.com/Jiawei217/TC-ViT.
Collapse
Affiliation(s)
- Jiawei Sun
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China
| | - Bobo Wu
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China
| | - Tong Zhao
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China
| | - Liugang Gao
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China
| | - Kai Xie
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China
| | - Tao Lin
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China
| | - Jianfeng Sui
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China
| | - Xiaoqin Li
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China
| | - Xiaojin Wu
- Oncology Department, Xuzhou NO.1 People's Hospital, Xuzhou 221000, China.
| | - Xinye Ni
- The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China.
| |
Collapse
|
86
|
He P, Chen Z, He Y, Chen J, Hayat K, Pan J, Lin H. A reliable and low-cost deep learning model integrating convolutional neural network and transformer structure for fine-grained classification of chicken Eimeria species. Poult Sci 2022; 102:102459. [PMID: 36682127 PMCID: PMC9876957 DOI: 10.1016/j.psj.2022.102459] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 12/07/2022] [Accepted: 12/25/2022] [Indexed: 12/31/2022] Open
Abstract
Chicken coccidiosis is a disease caused by Eimeria spp. and costs the broiler industry more than 14 billion dollars per year globally. Different chicken Eimeria species vary significantly in pathogenicity and virulence, so the classification of different chicken Eimeria species is of great significance for the epidemiological survey and related prevention and control. The microscopic morphological examination for their classification was widely used in clinical applications, but it is a time-consuming task and needs expertise. To increase the classification efficiency and accuracy, a novel model integrating transformer and convolutional neural network (CNN), named Residual-Transformer-Fine-Grained (ResTFG), was proposed and evaluated for fine-grained classification of microscopic images of seven chicken Eimeria species. The results showed that ResTFG achieved the best performance with high accuracy and low cost compared with traditional models. Specifically, the parameters, inference speed and overall accuracy of ResTFG are 1.95M, 256 FPS and 96.9%, respectively, which are 10.9 times lighter, 1.5 times faster and 2.7% higher in accuracy than the benchmark model. In addition, ResTFG showed better performance on the classification of the more virulent species. The results of ablation experiments showed that CNN or Transformer alone had model accuracies of only 89.8% and 87.0%, which proved that the improved performance of ResTFG was benefit from the complementary effect of CNN's local feature extraction and transformer's global receptive field. This study invented a reliable, low-cost, and promising deep learning model for the automatic fine-grain classification of chicken Eimeria species, which could potentially be embedded in microscopic devices to improve the work efficiency of researchers and extended to other parasite ova, and applied to other agricultural tasks as a backbone.
Collapse
Affiliation(s)
- Pengguang He
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Zhonghao Chen
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Yefan He
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Jintian Chen
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Khawar Hayat
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Jinming Pan
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China,Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China
| | - Hongjian Lin
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China; Key Laboratory of Equipment and Informatization in Environment Controlled Agriculture, Ministry of Agriculture and Rural Affairs of China, Hangzhou 310058, China.
| |
Collapse
|
87
|
Park M, Oh S, Jeong T, Yu S. Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition. Diagnostics (Basel) 2022; 13:107. [PMID: 36611399 PMCID: PMC9818879 DOI: 10.3390/diagnostics13010107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/28/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022] Open
Abstract
In recent times, many studies concerning surgical video analysis are being conducted due to its growing importance in many medical applications. In particular, it is very important to be able to recognize the current surgical phase because the phase information can be utilized in various ways both during and after surgery. This paper proposes an efficient phase recognition network, called MomentNet, for cholecystectomy endoscopic videos. Unlike LSTM-based network, MomentNet is based on a multi-stage temporal convolutional network. Besides, to improve the phase prediction accuracy, the proposed method adopts a new loss function to supplement the general cross entropy loss function. The new loss function significantly improves the performance of the phase recognition network by constraining un-desirable phase transition and preventing over-segmentation. In addition, MomnetNet effectively applies positional encoding techniques, which are commonly applied in transformer architectures, to the multi-stage temporal convolution network. By using the positional encoding techniques, MomentNet can provide important temporal context, resulting in higher phase prediction accuracy. Furthermore, the MomentNet applies label smoothing technique to suppress overfitting and replaces the backbone network for feature extraction to further improve the network performance. As a result, the MomentNet achieves 92.31% accuracy in the phase recognition task with the Cholec80 dataset, which is 4.55% higher than that of the baseline architecture.
Collapse
Affiliation(s)
- Minyoung Park
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Seungtaek Oh
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Taikyeong Jeong
- School of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sungwook Yu
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| |
Collapse
|
88
|
Simović A, Lutovac-Banduka M, Lekić S, Kuleto V. Smart Visualization of Medical Images as a Tool in the Function of Education in Neuroradiology. Diagnostics (Basel) 2022; 12:3208. [PMID: 36553215 PMCID: PMC9777748 DOI: 10.3390/diagnostics12123208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/09/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
The smart visualization of medical images (SVMI) model is based on multi-detector computed tomography (MDCT) data sets and can provide a clearer view of changes in the brain, such as tumors (expansive changes), bleeding, and ischemia on native imaging (i.e., a non-contrast MDCT scan). The new SVMI method provides a more precise representation of the brain image by hiding pixels that are not carrying information and rescaling and coloring the range of pixels essential for detecting and visualizing the disease. In addition, SVMI can be used to avoid the additional exposure of patients to ionizing radiation, which can lead to the occurrence of allergic reactions due to the contrast media administration. Results of the SVMI model were compared with the final diagnosis of the disease after additional diagnostics and confirmation by neuroradiologists, who are highly trained physicians with many years of experience. The application of the realized and presented SVMI model can optimize the engagement of material, medical, and human resources and has the potential for general application in medical training, education, and clinical research.
Collapse
Affiliation(s)
- Aleksandar Simović
- Department of Information Technology, Information Technology School ITS, 11000 Belgrade, Serbia
| | - Maja Lutovac-Banduka
- Department of RT-RK Institute, RT-RK for Computer Based Systems, 21000 Novi Sad, Serbia
| | - Snežana Lekić
- Department of Emergency Neuroradiology, University Clinical Centre of Serbia UKCS, 11000 Belgrade, Serbia
| | - Valentin Kuleto
- Department of Information Technology, Information Technology School ITS, 11000 Belgrade, Serbia
| |
Collapse
|
89
|
Zhao Y, Zhang J, Hu D, Qu H, Tian Y, Cui X. Application of Deep Learning in Histopathology Images of Breast Cancer: A Review. MICROMACHINES 2022; 13:2197. [PMID: 36557496 PMCID: PMC9781697 DOI: 10.3390/mi13122197] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/04/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
With the development of artificial intelligence technology and computer hardware functions, deep learning algorithms have become a powerful auxiliary tool for medical image analysis. This study was an attempt to use statistical methods to analyze studies related to the detection, segmentation, and classification of breast cancer in pathological images. After an analysis of 107 articles on the application of deep learning to pathological images of breast cancer, this study is divided into three directions based on the types of results they report: detection, segmentation, and classification. We introduced and analyzed models that performed well in these three directions and summarized the related work from recent years. Based on the results obtained, the significant ability of deep learning in the application of breast cancer pathological images can be recognized. Furthermore, in the classification and detection of pathological images of breast cancer, the accuracy of deep learning algorithms has surpassed that of pathologists in certain circumstances. Our study provides a comprehensive review of the development of breast cancer pathological imaging-related research and provides reliable recommendations for the structure of deep learning network models in different application scenarios.
Collapse
Affiliation(s)
- Yue Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Shenyang 110169, China
- Key Laboratory of Data Analytics and Optimization for Smart Industry, Northeastern University, Shenyang 110169, China
| | - Jie Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Dayu Hu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Hui Qu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Ye Tian
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Xiaoyu Cui
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Shenyang 110169, China
- Key Laboratory of Data Analytics and Optimization for Smart Industry, Northeastern University, Shenyang 110169, China
| |
Collapse
|
90
|
Hu W, Chen H, Liu W, Li X, Sun H, Huang X, Grzegorzek M, Li C. A comparative study of gastric histopathology sub-size image classification: From linear regression to visual transformer. Front Med (Lausanne) 2022; 9:1072109. [PMID: 36569152 PMCID: PMC9767945 DOI: 10.3389/fmed.2022.1072109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open
Abstract
Introduction Gastric cancer is the fifth most common cancer in the world. At the same time, it is also the fourth most deadly cancer. Early detection of cancer exists as a guide for the treatment of gastric cancer. Nowadays, computer technology has advanced rapidly to assist physicians in the diagnosis of pathological pictures of gastric cancer. Ensemble learning is a way to improve the accuracy of algorithms, and finding multiple learning models with complementarity types is the basis of ensemble learning. Therefore, this paper compares the performance of multiple algorithms in anticipation of applying ensemble learning to a practical gastric cancer classification problem. Methods The complementarity of sub-size pathology image classifiers when machine performance is insufficient is explored in this experimental platform. We choose seven classical machine learning classifiers and four deep learning classifiers for classification experiments on the GasHisSDB database. Among them, classical machine learning algorithms extract five different image virtual features to match multiple classifier algorithms. For deep learning, we choose three convolutional neural network classifiers. In addition, we also choose a novel Transformer-based classifier. Results The experimental platform, in which a large number of classical machine learning and deep learning methods are performed, demonstrates that there are differences in the performance of different classifiers on GasHisSDB. Classical machine learning models exist for classifiers that classify Abnormal categories very well, while classifiers that excel in classifying Normal categories also exist. Deep learning models also exist with multiple models that can be complementarity. Discussion Suitable classifiers are selected for ensemble learning, when machine performance is insufficient. This experimental platform demonstrates that multiple classifiers are indeed complementarity and can improve the efficiency of ensemble learning. This can better assist doctors in diagnosis, improve the detection of gastric cancer, and increase the cure rate.
Collapse
Affiliation(s)
- Weiming Hu
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Haoyuan Chen
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Wanli Liu
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Xiaoyan Li
- Department of Pathology, Liaoning Cancer Hospital and Institute, Cancer Hospital, China Medical University, Shenyang, China
| | - Hongzan Sun
- Department of Radiology, Shengjing Hospital, China Medical University, Shenyang, China
| | - Xinyu Huang
- Institute of Medical Informatics, University of Luebeck, Luebeck, Germany
| | - Marcin Grzegorzek
- Institute of Medical Informatics, University of Luebeck, Luebeck, Germany
- Department of Knowledge Engineering, University of Economics in Katowice, Katowice, Poland
| | - Chen Li
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| |
Collapse
|
91
|
Liu S, Xin J, Wu J, Deng Y, Su R, Niessen WJ, Zheng N, van Walsum T. Multi-view Contour-constrained Transformer Network for Thin-cap Fibroatheroma Identification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.12.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
92
|
Wu C, Long C, Li S, Yang J, Jiang F, Zhou R. MSRAformer: Multiscale spatial reverse attention network for polyp segmentation. Comput Biol Med 2022; 151:106274. [PMID: 36375412 DOI: 10.1016/j.compbiomed.2022.106274] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/10/2022] [Accepted: 10/30/2022] [Indexed: 11/11/2022]
Abstract
Colon polyp is an important reference basis in the diagnosis of colorectal cancer(CRC). In routine diagnosis, the polyp area is segmented from the colorectal enteroscopy image, and the obtained pathological information is used to assist in the diagnosis of the disease and surgery. It is always a challenging task for accurate segmentation of polyps in colonoscopy images. There are great differences in shape, size, color and texture of the same type of polyps, and it is difficult to distinguish the polyp region from the mucosal boundary. In recent years, convolutional neural network(CNN) has achieved some results in the task of medical image segmentation. However, CNNs focus on the extraction of local features and be short of the extracting ability of global feature information. This paper presents a Multiscale Spatial Reverse Attention Network called MSRAformer with high performance in medical segmentation, which adopts the Swin Transformer encoder with pyramid structure to extract the features of four different stages, and extracts the multi-scale feature information through the multi-scale channel attention module, which enhances the global feature extraction ability and generalization of the network, and preliminarily aggregates a pre-segmentation result. This paper proposes a spatial reverse attention mechanism module to gradually supplement the edge structure and detail information of the polyp region. Extensive experiments on MSRAformer proved that the segmentation effect on the colonoscopy polyp dataset is better than most state-of-the-art(SOTA) medical image segmentation methods, with better generalization performance. Reference implementation of MSRAformer is available at https://github.com/ChengLong1222/MSRAformer-main.
Collapse
Affiliation(s)
- Cong Wu
- School of computer science, Hubei University of Technology, Wuhan, China.
| | - Cheng Long
- School of computer science, Hubei University of Technology, Wuhan, China.
| | - Shijun Li
- School of computer science, Hubei University of Technology, Wuhan, China
| | - Junjie Yang
- Union Hospital Tongji Medical College Huazhong University of Science and Technology, Wuhan, China
| | - Fagang Jiang
- Union Hospital Tongji Medical College Huazhong University of Science and Technology, Wuhan, China
| | - Ran Zhou
- School of computer science, Hubei University of Technology, Wuhan, China
| |
Collapse
|
93
|
Usman M, Zia T, Tariq A. Analyzing Transfer Learning of Vision Transformers for Interpreting Chest Radiography. J Digit Imaging 2022; 35:1445-1462. [PMID: 35819537 PMCID: PMC9274969 DOI: 10.1007/s10278-022-00666-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 05/28/2022] [Accepted: 06/03/2022] [Indexed: 12/01/2022] Open
Abstract
Limited availability of medical imaging datasets is a vital limitation when using "data hungry" deep learning to gain performance improvements. Dealing with the issue, transfer learning has become a de facto standard, where a pre-trained convolution neural network (CNN), typically on natural images (e.g., ImageNet), is finetuned on medical images. Meanwhile, pre-trained transformers, which are self-attention-based models, have become de facto standard in natural language processing (NLP) and state of the art in image classification due to their powerful transfer learning abilities. Inspired by the success of transformers in NLP and image classification, large-scale transformers (such as vision transformer) are trained on natural images. Based on these recent developments, this research aims to explore the efficacy of pre-trained natural image transformers for medical images. Specifically, we analyze pre-trained vision transformer on CheXpert and pediatric pneumonia dataset. We use CNN standard models including VGGNet and ResNet as baseline models. By examining the acquired representations and results, we discover that transfer learning from the pre-trained vision transformer shows improved results as compared to pre-trained CNN which demonstrates a greater transfer ability of the transformers in medical imaging.
Collapse
Affiliation(s)
- Mohammad Usman
- Department of Computer Science, COMSATS University Islamabad (CUI), Islamabad, Pakistan
| | - Tehseen Zia
- Department of Computer Science, COMSATS University Islamabad (CUI), Islamabad, Pakistan
- Medical Imaging and Diagnostic Center, National Center for Artificial Intelligence, Islamabad, Pakistan
| | - Ali Tariq
- Department of Computer Science, COMSATS University Islamabad (CUI), Islamabad, Pakistan
| |
Collapse
|
94
|
Zhong X, Gu Y, Luo Y, Zeng X, Liu G. Bi-hemisphere asymmetric attention network: recognizing emotion from EEG signals based on the transformer. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04228-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
95
|
Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer. Mol Cell Biochem 2022; 478:1439-1445. [DOI: 10.1007/s11010-022-04600-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 10/24/2022] [Indexed: 11/10/2022]
|
96
|
Li J, Qu Z, Yang Y, Zhang F, Li M, Hu S. TCGAN: a transformer-enhanced GAN for PET synthetic CT. BIOMEDICAL OPTICS EXPRESS 2022; 13:6003-6018. [PMID: 36733758 PMCID: PMC9872870 DOI: 10.1364/boe.467683] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/06/2022] [Accepted: 10/05/2022] [Indexed: 06/18/2023]
Abstract
Multimodal medical images can be used in a multifaceted approach to resolve a wide range of medical diagnostic problems. However, these images are generally difficult to obtain due to various limitations, such as cost of capture and patient safety. Medical image synthesis is used in various tasks to obtain better results. Recently, various studies have attempted to use generative adversarial networks for missing modality image synthesis, making good progress. In this study, we propose a generator based on a combination of transformer network and a convolutional neural network (CNN). The proposed method can combine the advantages of transformers and CNNs to promote a better detail effect. The network is designed for positron emission tomography (PET) to computer tomography synthesis, which can be used for PET attenuation correction. We also experimented on two datasets for magnetic resonance T1- to T2-weighted image synthesis. Based on qualitative and quantitative analyses, our proposed method outperforms the existing methods.
Collapse
Affiliation(s)
- Jitao Li
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
- College of Chemistry and Chemical Engineering, Linyi University, Linyi, 276000, China
- These authors contributed equally
| | - Zongjin Qu
- College of Chemistry and Chemical Engineering, Linyi University, Linyi, 276000, China
- These authors contributed equally
| | - Yue Yang
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Fuchun Zhang
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Meng Li
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Shunbo Hu
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| |
Collapse
|
97
|
|
98
|
Tummala S, Kadry S, Bukhari SAC, Rauf HT. Classification of Brain Tumor from Magnetic Resonance Imaging Using Vision Transformers Ensembling. Curr Oncol 2022; 29:7498-7511. [PMID: 36290867 PMCID: PMC9600395 DOI: 10.3390/curroncol29100590] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 09/29/2022] [Accepted: 10/04/2022] [Indexed: 01/13/2023] Open
Abstract
The automated classification of brain tumors plays an important role in supporting radiologists in decision making. Recently, vision transformer (ViT)-based deep neural network architectures have gained attention in the computer vision research domain owing to the tremendous success of transformer models in natural language processing. Hence, in this study, the ability of an ensemble of standard ViT models for the diagnosis of brain tumors from T1-weighted (T1w) magnetic resonance imaging (MRI) is investigated. Pretrained and finetuned ViT models (B/16, B/32, L/16, and L/32) on ImageNet were adopted for the classification task. A brain tumor dataset from figshare, consisting of 3064 T1w contrast-enhanced (CE) MRI slices with meningiomas, gliomas, and pituitary tumors, was used for the cross-validation and testing of the ensemble ViT model's ability to perform a three-class classification task. The best individual model was L/32, with an overall test accuracy of 98.2% at 384 × 384 resolution. The ensemble of all four ViT models demonstrated an overall testing accuracy of 98.7% at the same resolution, outperforming individual model's ability at both resolutions and their ensembling at 224 × 224 resolution. In conclusion, an ensemble of ViT models could be deployed for the computer-aided diagnosis of brain tumors based on T1w CE MRI, leading to radiologist relief.
Collapse
Affiliation(s)
- Sudhakar Tummala
- Department of Electronics and Communication Engineering, School of Engineering and Sciences, SRM University—AP, Amaravati 522503, India
| | - Seifedine Kadry
- Department of Applied Data Science, Noroff University College, 4612 Kristiansand, Norway
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos P.O. Box 36, Lebanon
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman 346, United Arab Emirates
| | - Syed Ahmad Chan Bukhari
- Division of Computer Science, Mathematics and Science, Collins College of Professional Studies, St. John’s University, New York, NY 11439, USA
| | - Hafiz Tayyab Rauf
- Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent ST4 2DE, UK
| |
Collapse
|
99
|
Dalmaz O, Yurt M, Cukur T. ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2598-2614. [PMID: 35436184 DOI: 10.1109/tmi.2022.3167808] [Citation(s) in RCA: 122] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning. ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.
Collapse
|
100
|
Yang M, He X, Xu L, Liu M, Deng J, Cheng X, Wei Y, Li Q, Wan S, Zhang F, Wu L, Wang X, Song B, Liu M. CT-based transformer model for non-invasively predicting the Fuhrman nuclear grade of clear cell renal cell carcinoma. Front Oncol 2022; 12:961779. [PMID: 36249050 PMCID: PMC9555088 DOI: 10.3389/fonc.2022.961779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
Background Clear cell Renal Cell Carcinoma (ccRCC) is the most common malignant tumor in the urinary system and the predominant subtype of malignant renal tumors with high mortality. Biopsy is the main examination to determine ccRCC grade, but it can lead to unavoidable complications and sampling bias. Therefore, non-invasive technology (e.g., CT examination) for ccRCC grading is attracting more and more attention. However, noise labels on CT images containing multiple grades but only one label make prediction difficult. However, noise labels exist in CT images, which contain multiple grades but only one label, making prediction difficult. Aim We proposed a Transformer-based deep learning algorithm with CT images to improve the diagnostic accuracy of grading prediction and to improve the diagnostic accuracy of ccRCC grading. Methods We integrate different training models to improve robustness and predict Fuhrman nuclear grade. Then, we conducted experiments on a collected ccRCC dataset containing 759 patients and used average classification accuracy, sensitivity, specificity, and AreaUnderCurve as indicators to evaluate the quality of research. In the comparative experiments, we further performed various current deep learning algorithms to show the advantages of the proposed method. We collected patients with pathologically proven ccRCC diagnosed from April 2010 to December 2018 as the training and internal test dataset, containing 759 patients. We propose a transformer-based network architecture that efficiently employs convolutional neural networks (CNNs) and self-attention mechanisms to extract a persuasive feature automatically. And then, a nonlinear classifier is applied to classify. We integrate different training models to improve the accuracy and robustness of the model. The average classification accuracy, sensitivity, specificity, and area under curve are used as indicators to evaluate the quality of a model. Results The mean accuracy, sensitivity, specificity, and Area Under Curve achieved by CNN were 82.3%, 89.4%, 83.2%, and 85.7%, respectively. In contrast, the proposed Transformer-based model obtains a mean accuracy of 87.1% with a sensitivity of 91.3%, a specificity of 85.3%, and an Area Under Curve (AUC) of 90.3%. The integrated model acquires a better performance (86.5% ACC and an AUC of 91.2%). Conclusion A transformer-based network performs better than traditional deep learning algorithms in terms of the accuracy of ccRCC prediction. Meanwhile, the transformer has a certain advantage in dealing with noise labels existing in CT images of ccRCC. This method is promising to be applied to other medical tasks (e.g., the grade of neurogliomas and meningiomas).
Collapse
Affiliation(s)
- Meiyi Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaopeng He
- Department of Radiology, Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Lifeng Xu
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China
| | - Minghui Liu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiali Deng
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Xuan Cheng
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi Wei
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
| | - Qian Li
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
| | - Shang Wan
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
| | - Feng Zhang
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China
| | - Lei Wu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaomin Wang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Bin Song
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
- *Correspondence: Ming Liu, ; Bin Song,
| | - Ming Liu
- Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China
- *Correspondence: Ming Liu, ; Bin Song,
| |
Collapse
|