1
|
Wang H, Ahn E, Bi L, Kim J. Self-supervised multi-modality learning for multi-label skin lesion classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108729. [PMID: 40184849 DOI: 10.1016/j.cmpb.2025.108729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 03/10/2025] [Accepted: 03/16/2025] [Indexed: 04/07/2025]
Abstract
BACKGROUND The clinical diagnosis of skin lesions involves the analysis of dermoscopic and clinical modalities. Dermoscopic images provide detailed views of surface structures, while clinical images offer complementary macroscopic information. Clinicians frequently use the seven-point checklist as an auxiliary tool for melanoma diagnosis and identifying lesion attributes. Supervised deep learning approaches, such as convolutional neural networks, have performed well using dermoscopic and clinical modalities (multi-modality) and further enhanced classification by predicting seven skin lesion attributes (multi-label). However, the performance of these approaches is reliant on the availability of large-scale labeled data, which are costly and time-consuming to obtain, more so with annotating multi-attributes METHODS:: To reduce the dependency on large labeled datasets, we propose a self-supervised learning (SSL) algorithm for multi-modality multi-label skin lesion classification. Compared with single-modality SSL, our algorithm enables multi-modality SSL by maximizing the similarities between paired dermoscopic and clinical images from different views. We introduce a novel multi-modal and multi-label SSL strategy that generates surrogate pseudo-multi-labels for seven skin lesion attributes through clustering analysis. A label-relation-aware module is proposed to refine each pseudo-label embedding, capturing the interrelationships between pseudo-multi-labels. We further illustrate the interrelationships of skin lesion attributes and their relationships with clinical diagnoses using an attention visualization technique. RESULTS The proposed algorithm was validated using the well-benchmarked seven-point skin lesion dataset. Our results demonstrate that our method outperforms the state-of-the-art SSL counterparts. Improvements in the area under receiver operating characteristic curve, precision, sensitivity, and specificity were observed across various lesion attributes and melanoma diagnoses. CONCLUSIONS Our self-supervised learning algorithm offers a robust and efficient solution for multi-modality multi-label skin lesion classification, reducing the reliance on large-scale labeled data. By effectively capturing and leveraging the complementary information between the dermoscopic and clinical images and interrelationships between lesion attributes, our approach holds the potential for improving clinical diagnosis accuracy in dermatology.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia; Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Euijoon Ahn
- College of Science and Engineering, James Cook University, Cairns, QLD 4870, Australia.
| | - Lei Bi
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Jinman Kim
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
2
|
Yang Y, Fu H, Aviles-Rivero AI, Xing Z, Zhu L. DiffMIC-v2: Medical Image Classification via Improved Diffusion Network. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:2244-2255. [PMID: 40031019 DOI: 10.1109/tmi.2025.3530399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Recently, Denoising Diffusion Models have achieved outstanding success in generative image modeling and attracted significant attention in the computer vision community. Although a substantial amount of diffusion-based research has focused on generative tasks, few studies apply diffusion models to medical diagnosis. In this paper, we propose a diffusion-based network (named DiffMIC-v2) to address general medical image classification by eliminating unexpected noise and perturbations in image representations. To achieve this goal, we first devise an improved dual-conditional guidance strategy that conditions each diffusion step with multiple granularities to enhance step-wise regional attention. Furthermore, we design a novel Heterologous diffusion process that achieves efficient visual representation learning in the latent space. We evaluate the effectiveness of our DiffMIC-v2 on four medical classification tasks with different image modalities, including thoracic diseases classification on chest X-ray, placental maturity grading on ultrasound images, skin lesion classification using dermatoscopic images, and diabetic retinopathy grading using fundus images. Experimental results demonstrate that our DiffMIC-v2 outperforms state-of-the-art methods by a significant margin, which indicates the universality and effectiveness of the proposed model on multi-class and multi-label classification tasks. DiffMIC-v2 can use fewer iterations than our previous DiffMIC to obtain accurate estimations, and also achieves greater runtime efficiency with superior results. The code will be publicly available at https://github.com/scott-yjyang/DiffMICv2.
Collapse
|
3
|
Ni L, Liu Y, Zhang Z, Li Y, Zhang J. Dual-Filter Cross Attention and Onion Pooling Network for Enhanced Few-Shot Medical Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2025; 25:2176. [PMID: 40218686 PMCID: PMC11991012 DOI: 10.3390/s25072176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2025] [Revised: 03/19/2025] [Accepted: 03/26/2025] [Indexed: 04/14/2025]
Abstract
Few-shot learning has demonstrated remarkable performance in medical image segmentation. However, existing few-shot medical image segmentation (FSMIS) models often struggle to fully utilize query image information, leading to prototype bias and limited generalization ability. To address these issues, we propose the dual-filter cross attention and onion pooling network (DCOP-Net) for FSMIS. DCOP-Net consists of a prototype learning stage and a segmentation stage. During the prototype learning stage, we introduce a dual-filter cross attention (DFCA) module to avoid entanglement between query background features and support foreground features, effectively integrating query foreground features into support prototypes. Additionally, we design an onion pooling (OP) module that combines eroding mask operations with masked average pooling to generate multiple prototypes, preserving contextual information and mitigating prototype bias. In the segmentation stage, we present a parallel threshold perception (PTP) module to generate robust thresholds for foreground and background differentiation and a query self-reference regularization (QSR) strategy to enhance model accuracy and consistency. Extensive experiments on three publicly available medical image datasets demonstrate that DCOP-Net outperforms state-of-the-art methods, exhibiting superior segmentation and generalization capabilities.
Collapse
Affiliation(s)
- Lina Ni
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China; (L.N.); (Y.L.); (Y.L.)
- Key Laboratory of the Ministry of Education for Embedded System and Service Computing, Tongji University, Shanghai 201804, China
| | - Yang Liu
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China; (L.N.); (Y.L.); (Y.L.)
| | - Zekun Zhang
- School of Computer Science, University of Glasgow, Glasgow G12 8QQ, UK;
| | - Yongtao Li
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China; (L.N.); (Y.L.); (Y.L.)
| | - Jinquan Zhang
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China; (L.N.); (Y.L.); (Y.L.)
| |
Collapse
|
4
|
Pham TT, Brecheisen J, Wu CC, Nguyen H, Deng Z, Adjeroh D, Doretto G, Choudhary A, Le N. ItpCtrl-AI: End-to-end interpretable and controllable artificial intelligence by modeling radiologists' intentions. Artif Intell Med 2025; 160:103054. [PMID: 39689443 PMCID: PMC11757032 DOI: 10.1016/j.artmed.2024.103054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 10/13/2024] [Accepted: 12/05/2024] [Indexed: 12/19/2024]
Abstract
Using Deep Learning in computer-aided diagnosis systems has been of great interest due to its impressive performance in the general domain and medical domain. However, a notable challenge is the lack of explainability of many advanced models, which poses risks in critical applications such as diagnosing findings in CXR. To address this problem, we propose ItpCtrl-AI, a novel end-to-end interpretable and controllable framework that mirrors the decision-making process of the radiologist. By emulating the eye gaze patterns of radiologists, our framework initially determines the focal areas and assesses the significance of each pixel within those regions. As a result, the model generates an attention heatmap representing radiologists' attention, which is then used to extract attended visual information to diagnose the findings. By allowing the directional input, our framework is controllable by the user. Furthermore, by displaying the eye gaze heatmap which guides the diagnostic conclusion, the underlying rationale behind the model's decision is revealed, thereby making it interpretable. In addition to developing an interpretable and controllable framework, our work includes the creation of a dataset, named Diagnosed-Gaze++, which aligns medical findings with eye gaze data. Our extensive experimentation validates the effectiveness of our approach in generating accurate attention heatmaps and diagnoses. The experimental results show that our model not only accurately identifies medical findings but also precisely produces the eye gaze attention of radiologists. The dataset, models, and source code will be made publicly available upon acceptance.
Collapse
Affiliation(s)
- Trong-Thang Pham
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| | - Jacob Brecheisen
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| | - Carol C Wu
- MD Anderson Cancer Center, Houston, TX 77079, USA.
| | - Hien Nguyen
- Department of ECE, University of Houston, TX 77204, USA.
| | - Zhigang Deng
- Department of CS, University of Houston, TX 77204, USA.
| | - Donald Adjeroh
- Department of CSEE, West Virginia University, WV 26506, USA.
| | | | - Arabinda Choudhary
- University of Arkansas for Medical Sciences, Little Rock, AR 72705, USA.
| | - Ngan Le
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| |
Collapse
|
5
|
Liu C, Wang W, Lian J, Jiao W. Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer. Front Public Health 2025; 12:1442114. [PMID: 39835306 PMCID: PMC11743363 DOI: 10.3389/fpubh.2024.1442114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 12/09/2024] [Indexed: 01/22/2025] Open
Abstract
Introduction Diabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task. Methods Among these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention. Results To evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets. Discussion Accordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.
Collapse
Affiliation(s)
- Chong Liu
- School of Intelligence Engineering, Shandong Management University, Jinan, China
| | - Weiguang Wang
- Department of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Jian Lian
- School of Intelligence Engineering, Shandong Management University, Jinan, China
| | - Wanzhen Jiao
- Department of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| |
Collapse
|
6
|
Li H, Yin D, Li B, Liu C, Xiong C, Fan Q, Yao S, Huang W, Li W, Zhang J, Li H. A novel semi-supervised learning model based on pelvic radiographs for ankylosing spondylitis diagnosis reduces 90% of annotation cost. Comput Biol Med 2025; 184:109232. [PMID: 39522130 DOI: 10.1016/j.compbiomed.2024.109232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 09/27/2024] [Accepted: 09/30/2024] [Indexed: 11/16/2024]
Abstract
OBJECTIVE Our study aims to develop a deep learning-based Ankylosing Spondylitis (AS) diagnostic model that achieves human expert-level performance using only a minimal amount of labeled samples for training, in regions with limited access to expert resources. METHODS Our semi-supervised diagnostic model for AS was developed using 5389 pelvic radiographs (PXRs) from a single medical center, collected from March 2014 to April 2022. The dataset was split into a training set and a validation set with an 8:2 ratio, allocating 431 labeled images and the remaining 3880 unlabeled images for semi-supervised learning. The model's performance was evaluated on 982 PXRs from the same center, assessing metrics such as AUC, accuracy, precision, recall, and F1 scores. Interpretability analysis was performed using explainable algorithms to validate the model's clinical applicability. RESULTS Our semi-supervised learning model achieved accuracy, recall, and precision values of 0.891, 0.865, and 0.859, respectively, using only 10% of labeled data from the entire training set, surpassing human expert performance. Extensive interpretability analysis demonstrated the reliability of our model's predictions, making the deep neural network no longer a black box. CONCLUSION This study marks the first application of semi-supervised learning to diagnose AS using PXRs, achieving a 90% reduction in manual annotation costs. The model showcases robust generalization on an independent test set and delivers reliable diagnostic performance, supported by comprehensive interpretability analysis. This innovative approach paves the way for training high-performance diagnostic models on large datasets with minimal labeled data, heralding a cost-effective future for medical imaging research in big data analytics.
Collapse
Affiliation(s)
- Hao Li
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Dong Yin
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Baichuan Li
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Chong Liu
- Department of Spine and Osteopathy Ward, The First Affiliated Hospital of Guangxi Medical University, Nanning, 530021, Guangxi Province, China.
| | - Chunxiang Xiong
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Qie Fan
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Shuyu Yao
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Wenwen Huang
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Wenhao Li
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Jingda Zhang
- Orthopedic, People's Hospital of Guangxi Zhuang Autonomous Region, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| | - Hongmian Li
- Department of Plastic and Reconstructive Surgery, People's Hospital of Guangxi Zhuang Autonomous Region & Research Center of Medical Sciences, 6 Taoyuan Road, Qingxiu District, Nanning, 530016, Guangxi Province, China.
| |
Collapse
|
7
|
Li J, Shi H, Chen W, Liu N, Hwang KS. Semi-Supervised Detection Model Based on Adaptive Ensemble Learning for Medical Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:237-248. [PMID: 37339032 DOI: 10.1109/tnnls.2023.3282809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
Introducing deep learning technologies into the medical image processing field requires accuracy guarantee, especially for high-resolution images relayed through endoscopes. Moreover, works relying on supervised learning are powerless in the case of inadequate labeled samples. Therefore, for end-to-end medical image detection with overcritical efficiency and accuracy in endoscope detection, an ensemble-learning-based model with a semi-supervised mechanism is developed in this work. To gain a more accurate result through multiple detection models, we propose a new ensemble mechanism, termed alternative adaptive boosting method (Al-Adaboost), combining the decision-making of two hierarchical models. Specifically, the proposal consists of two modules. One is a local region proposal model with attentive temporal-spatial pathways for bounding box regression and classification, and the other one is a recurrent attention model (RAM) to provide more precise inferences for further classification according to the regression result. The proposal Al-Adaboost will adjust the weights of labeled samples and the two classifiers adaptively, and the nonlabel samples are assigned pseudolabels by our model. We investigate the performance of Al-Adaboost on both the colonoscopy and laryngoscopy data coming from CVC-ClinicDB and the affiliated hospital of Kaohsiung Medical University. The experimental results prove the feasibility and superiority of our model.
Collapse
|
8
|
Zeng Q, Xie Y, Lu Z, Lu M, Zhang J, Xia Y. Consistency-Guided Differential Decoding for Enhancing Semi-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:44-56. [PMID: 39088492 DOI: 10.1109/tmi.2024.3429340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Semi-supervised learning (SSL) has been proven beneficial for mitigating the issue of limited labeled data, especially on volumetric medical image segmentation. Unlike previous SSL methods which focus on exploring highly confident pseudo-labels or developing consistency regularization schemes, our empirical findings suggest that differential decoder features emerge naturally when two decoders strive to generate consistent predictions. Based on the observation, we first analyze the treasure of discrepancy in learning towards consistency, under both pseudo-labeling and consistency regularization settings, and subsequently propose a novel SSL method called LeFeD, which learns the feature-level discrepancies obtained from two decoders, by feeding such information as feedback signals to the encoder. The core design of LeFeD is to enlarge the discrepancies by training differential decoders, and then learn from the differential features iteratively. We evaluate LeFeD against eight state-of-the-art (SOTA) methods on three public datasets. Experiments show LeFeD surpasses competitors without any bells and whistles, such as uncertainty estimation and strong constraints, as well as setting a new state of the art for semi-supervised medical image segmentation. Code has been released at https://github.com/maxwell0027/LeFeD.
Collapse
|
9
|
Xiao J, Li S, Lin T, Zhu J, Yuan X, Feng DD, Sheng B. Multi-Label Chest X-Ray Image Classification With Single Positive Labels. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4404-4418. [PMID: 38949934 DOI: 10.1109/tmi.2024.3421644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Deep learning approaches for multi-label Chest X-ray (CXR) images classification usually require large-scale datasets. However, acquiring such datasets with full annotations is costly, time-consuming, and prone to noisy labels. Therefore, we introduce a weakly supervised learning problem called Single Positive Multi-label Learning (SPML) into CXR images classification (abbreviated as SPML-CXR), in which only one positive label is annotated per image. A simple solution to SPML-CXR problem is to assume that all the unannotated pathological labels are negative, however, it might introduce false negative labels and decrease the model performance. To this end, we present a Multi-level Pseudo-label Consistency (MPC) framework for SPML-CXR. First, inspired by the pseudo-labeling and consistency regularization in semi-supervised learning, we construct a weak-to-strong consistency framework, where the model prediction on weakly-augmented image is treated as the pseudo label for supervising the model prediction on a strongly-augmented version of the same image, and define an Image-level Perturbation-based Consistency (IPC) regularization to recover the potential mislabeled positive labels. Besides, we incorporate Random Elastic Deformation (RED) as an additional strong augmentation to enhance the perturbation. Second, aiming to expand the perturbation space, we design a perturbation stream to the consistency framework at the feature-level and introduce a Feature-level Perturbation-based Consistency (FPC) regularization as a supplement. Third, we design a Transformer-based encoder module to explore the sample relationship within each mini-batch by a Batch-level Transformer-based Correlation (BTC) regularization. Extensive experiments on the CheXpert and MIMIC-CXR datasets have shown the effectiveness of our MPC framework for solving the SPML-CXR problem.
Collapse
|
10
|
Bayasi N, Hamarneh G, Garbi R. GC 2: Generalizable Continual Classification of Medical Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3767-3779. [PMID: 38717881 DOI: 10.1109/tmi.2024.3398533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
Deep learning models have achieved remarkable success in medical image classification. These models are typically trained once on the available annotated images and thus lack the ability of continually learning new tasks (i.e., new classes or data distributions) due to the problem of catastrophic forgetting. Recently, there has been more interest in designing continual learning methods to learn different tasks presented sequentially over time while preserving previously acquired knowledge. However, these methods focus mainly on preventing catastrophic forgetting and are tested under a closed-world assumption; i.e., assuming the test data is drawn from the same distribution as the training data. In this work, we advance the state-of-the-art in continual learning by proposing GC2 for medical image classification, which learns a sequence of tasks while simultaneously enhancing its out-of-distribution robustness. To alleviate forgetting, GC2 employs a gradual culpability-based network pruning to identify an optimal subnetwork for each task. To improve generalization, GC2 incorporates adversarial image augmentation and knowledge distillation approaches for learning generalized and robust representations for each subnetwork. Our extensive experiments on multiple benchmarks in a task-agnostic inference demonstrate that GC2 significantly outperforms baselines and other continual learning methods in reducing forgetting and enhancing generalization. Our code is publicly available at the following link: https://github.com/nourhanb/TMI2024-GC2.
Collapse
|
11
|
Kamalakannan N, Macharla SR, Kanimozhi M, Sudhakar MS. Exponential Pixelating Integral transform with dual fractal features for enhanced chest X-ray abnormality detection. Comput Biol Med 2024; 182:109093. [PMID: 39232407 DOI: 10.1016/j.compbiomed.2024.109093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 08/25/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]
Abstract
The heightened prevalence of respiratory disorders, particularly exacerbated by a significant upswing in fatalities due to the novel coronavirus, underscores the critical need for early detection and timely intervention. This imperative is paramount, possessing the potential to profoundly impact and safeguard numerous lives. Medically, chest radiography stands out as an essential and economically viable medical imaging approach for diagnosing and assessing the severity of diverse Respiratory Disorders. However, their detection in Chest X-Rays is a cumbersome task even for well-trained radiologists owing to low contrast issues, overlapping of the tissue structures, subjective variability, and the presence of noise. To address these issues, a novel analytical model termed Exponential Pixelating Integral is introduced for the automatic detection of infections in Chest X-Rays in this work. Initially, the presented Exponential Pixelating Integral enhances the pixel intensities to overcome the low-contrast issues that are then polar-transformed followed by their representation using the locally invariant Mandelbrot and Julia fractal geometries for effective distinction of structural features. The collated features labeled Exponential Pixelating Integral with dually characterized fractal features are then classified by the non-parametric multivariate adaptive regression splines to establish an ensemble model between each pair of classes for effective diagnosis of diverse diseases. Rigorous analysis of the proposed classification framework on large medical benchmarked datasets showcases its superiority over its peers by registering a higher classification accuracy and F1 scores ranging from 98.46 to 99.45 % and 96.53-98.10 % respectively, making it a precise and interpretable automated system for diagnosing respiratory disorders.
Collapse
Affiliation(s)
| | | | - M Kanimozhi
- School of Electrical & Electronics, Sathyabama Institute of Science and Technology, Chennai, Tamilnadu, India
| | - M S Sudhakar
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India.
| |
Collapse
|
12
|
Fang P, Feng R, Liu C, Wen R. Boundary sample-based class-weighted semi-supervised learning for malignant tumor classification of medical imaging. Med Biol Eng Comput 2024; 62:2987-2997. [PMID: 38727760 DOI: 10.1007/s11517-024-03114-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 04/28/2024] [Indexed: 09/07/2024]
Abstract
Medical image classification plays a pivotal role within the field of medicine. Existing models predominantly rely on supervised learning methods, which necessitate large volumes of labeled data for effective training. However, acquiring and annotating medical image data is both an expensive and time-consuming endeavor. In contrast, semi-supervised learning methods offer a promising approach by harnessing limited labeled data alongside abundant unlabeled data to enhance the performance of medical image classification. Nonetheless, current methods often encounter confirmation bias due to noise inherent in self-generated pseudo-labels and the presence of boundary samples from different classes. To overcome these challenges, this study introduces a novel framework known as boundary sample-based class-weighted semi-supervised learning (BSCSSL) for medical image classification. Our method aims to alleviate the impact of intra- and inter-class boundary samples derived from unlabeled data. Specifically, we address reliable confidential data and inter-class boundary samples separately through the utilization of an inter-class boundary sample mining module. Additionally, we implement an intra-class boundary sample weighting mechanism to extract class-aware features specific to intra-class boundary samples. Rather than discarding such intra-class boundary samples outright, our approach acknowledges their intrinsic value despite the difficulty associated with accurate classification, as they contribute significantly to model prediction. Experimental results on widely recognized medical image datasets demonstrate the superiority of our proposed BSCSSL method over existing semi-supervised learning approaches. By enhancing the accuracy and robustness of medical image classification, our BSCSSL approach yields considerable implications for advancing medical diagnosis and future research endeavors.
Collapse
Affiliation(s)
- Pei Fang
- China Comservice Enrising Information Technology Co., Ltd., Chengdu, Sichuan, 610041, China.
| | - Renwei Feng
- China Comservice Enrising Information Technology Co., Ltd., Chengdu, Sichuan, 610041, China
| | - Changdong Liu
- China Comservice Enrising Information Technology Co., Ltd., Chengdu, Sichuan, 610041, China
| | - Renjun Wen
- China Comservice Enrising Information Technology Co., Ltd., Chengdu, Sichuan, 610041, China
| |
Collapse
|
13
|
Hou H, Zhang R, Li J. Artificial intelligence in the clinical laboratory. Clin Chim Acta 2024; 559:119724. [PMID: 38734225 DOI: 10.1016/j.cca.2024.119724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/13/2024]
Abstract
Laboratory medicine has become a highly automated medical discipline. Nowadays, artificial intelligence (AI) applied to laboratory medicine is also gaining more and more attention, which can optimize the entire laboratory workflow and even revolutionize laboratory medicine in the future. However, only a few commercially available AI models are currently approved for use in clinical laboratories and have drawbacks such as high cost, lack of accuracy, and the need for manual review of model results. Furthermore, there are a limited number of literature reviews that comprehensively address the research status, challenges, and future opportunities of AI applications in laboratory medicine. Our article begins with a brief introduction to AI and some of its subsets, then reviews some AI models that are currently being used in clinical laboratories or that have been described in emerging studies, and explains the existing challenges associated with their application and possible solutions, finally provides insights into the future opportunities of the field. We highlight the current status of implementation and potential applications of AI models in different stages of the clinical testing process.
Collapse
Affiliation(s)
- Hanjing Hou
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, PR China; National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, PR China
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, PR China; National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, PR China.
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, PR China; National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, PR China.
| |
Collapse
|
14
|
Berrimi M, Hans D, Jennane R. A semi-supervised multiview-MRI network for the detection of Knee Osteoarthritis. Comput Med Imaging Graph 2024; 114:102371. [PMID: 38513397 DOI: 10.1016/j.compmedimag.2024.102371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 03/09/2024] [Accepted: 03/13/2024] [Indexed: 03/23/2024]
Abstract
Knee OsteoArthritis (OA) is a prevalent chronic condition, affecting a significant proportion of the global population. Detecting knee OA is crucial as the degeneration of the knee joint is irreversible. In this paper, we introduce a semi-supervised multi-view framework and a 3D CNN model for detecting knee OA using 3D Magnetic Resonance Imaging (MRI) scans. We introduce a semi-supervised learning approach combining labeled and unlabeled data to improve the performance and generalizability of the proposed model. Experimental results show the efficacy of our proposed approach in detecting knee OA from 3D MRI scans using a large cohort of 4297 subjects. An ablation study was conducted to investigate the contributions of various components of the proposed model, providing insights into the optimal design of the model. Our results indicate the potential of the proposed approach to improve the accuracy and efficiency of OA diagnosis. The proposed framework reported an AUC of 93.20% for the detection of knee OA.
Collapse
Affiliation(s)
- Mohamed Berrimi
- University of Orleans, Institut Denis Poisson, UMR CNRS 7013, Orleans, 45067, France
| | - Didier Hans
- Lausanne University Hospital, Center of Bone Diseases & University of Lausanne, Lausanne, Switzerland
| | - Rachid Jennane
- University of Orleans, Institut Denis Poisson, UMR CNRS 7013, Orleans, 45067, France.
| |
Collapse
|
15
|
Zhang Z, Yao P, Chen M, Zeng L, Shao P, Shen S, Xu RX. SCAC: A Semi-Supervised Learning Approach for Cervical Abnormal Cell Detection. IEEE J Biomed Health Inform 2024; 28:3501-3512. [PMID: 38470598 DOI: 10.1109/jbhi.2024.3375889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
Cervical abnormal cell detection plays a crucial role in the early screening of cervical cancer. In recent years, some deep learning-based methods have been proposed. However, these methods rely heavily on large amounts of annotated images, which are time-consuming and labor-intensive to acquire, thus limiting the detection performance. In this paper, we present a novel Semi-supervised Cervical Abnormal Cell detector (SCAC), which effectively utilizes the abundant unlabeled data. We utilize Transformer as the backbone of SCAC to capture long-range dependencies to mimic the diagnostic process of pathologists. In addition, in SCAC, we design a Unified Strong and Weak Augment strategy (USWA) that unifies two data augmentation pipelines, implementing consistent regularization in semi-supervised learning and enhancing the diversity of the training data. We also develop a Global Attention Feature Pyramid Network (GAFPN), which utilizes the attention mechanism to better extract multi-scale features from cervical cytology images. Notably, we have created an unlabeled cervical cytology image dataset, which can be leveraged by semi-supervised learning to enhance detection accuracy. To the best of our knowledge, this is the first publicly available large unlabeled cervical cytology image dataset. By combining this dataset with two publicly available annotated datasets, we demonstrate that SCAC outperforms other existing methods, achieving state-of-the-art performance. Additionally, comprehensive ablation studies are conducted to validate the effectiveness of USWA and GAFPN. These promising results highlight the capability of SCAC to achieve high diagnostic accuracy and extensive clinical applications.
Collapse
|
16
|
Berenguer AD, Kvasnytsia M, Bossa MN, Mukherjee T, Deligiannis N, Sahli H. Semi-supervised medical image classification via distance correlation minimization and graph attention regularization. Med Image Anal 2024; 94:103107. [PMID: 38401269 DOI: 10.1016/j.media.2024.103107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 12/11/2023] [Accepted: 02/13/2024] [Indexed: 02/26/2024]
Abstract
We propose a novel semi-supervised learning method to leverage unlabeled data alongside minimal annotated data and improve medical imaging classification performance in realistic scenarios with limited labeling budgets to afford data annotations. Our method introduces distance correlation to minimize correlations between feature representations from different views of the same image encoded with non-coupled deep neural networks architectures. In addition, it incorporates a data-driven graph-attention based regularization strategy to model affinities among images within the unlabeled data by exploiting their inherent relational information in the feature space. We conduct extensive experiments on four medical imaging benchmark data sets involving X-ray, dermoscopic, magnetic resonance, and computer tomography imaging on single and multi-label medical imaging classification scenarios. Our experiments demonstrate the effectiveness of our method in achieving very competitive performance and outperforming several state-of-the-art semi-supervised learning methods. Furthermore, they confirm the suitability of distance correlation as a versatile dependence measure and the benefits of the proposed graph-attention based regularization for semi-supervised learning in medical imaging analysis.
Collapse
Affiliation(s)
- Abel Díaz Berenguer
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium.
| | - Maryna Kvasnytsia
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium
| | - Matías Nicolás Bossa
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium
| | - Tanmoy Mukherjee
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium
| | - Nikos Deligiannis
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium; Interuniversity Microelectronics Centre (IMEC), Kapeldreef 75, 3001 Heverlee, Belgium
| | - Hichem Sahli
- Vrije Universiteit Brussel (VUB), Department of Electronics and Informatics (ETRO), Pleinlaan 2, 1050 Brussels, Belgium; Interuniversity Microelectronics Centre (IMEC), Kapeldreef 75, 3001 Heverlee, Belgium
| |
Collapse
|
17
|
Shakya KS, Alavi A, Porteous J, K P, Laddi A, Jaiswal M. A Critical Analysis of Deep Semi-Supervised Learning Approaches for Enhanced Medical Image Classification. INFORMATION 2024; 15:246. [DOI: 10.3390/info15050246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025] Open
Abstract
Deep semi-supervised learning (DSSL) is a machine learning paradigm that blends supervised and unsupervised learning techniques to improve the performance of various models in computer vision tasks. Medical image classification plays a crucial role in disease diagnosis, treatment planning, and patient care. However, obtaining labeled medical image data is often expensive and time-consuming for medical practitioners, leading to limited labeled datasets. DSSL techniques aim to address this challenge, particularly in various medical image tasks, to improve model generalization and performance. DSSL models leverage both the labeled information, which provides explicit supervision, and the unlabeled data, which can provide additional information about the underlying data distribution. That offers a practical solution to resource-intensive demands of data annotation, and enhances the model’s ability to generalize across diverse and previously unseen data landscapes. The present study provides a critical review of various DSSL approaches and their effectiveness and challenges in enhancing medical image classification tasks. The study categorized DSSL techniques into six classes: consistency regularization method, deep adversarial method, pseudo-learning method, graph-based method, multi-label method, and hybrid method. Further, a comparative analysis of performance for six considered methods is conducted using existing studies. The referenced studies have employed metrics such as accuracy, sensitivity, specificity, AUC-ROC, and F1 score to evaluate the performance of DSSL methods on different medical image datasets. Additionally, challenges of the datasets, such as heterogeneity, limited labeled data, and model interpretability, were discussed and highlighted in the context of DSSL for medical image classification. The current review provides future directions and considerations to researchers to further address the challenges and take full advantage of these methods in clinical practices.
Collapse
Affiliation(s)
- Kaushlesh Singh Shakya
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Azadeh Alavi
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Julie Porteous
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Priti K
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
| | - Amit Laddi
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
| | - Manojkumar Jaiswal
- Oral Health Sciences Centre, Post Graduate Institute of Medical Education & Research (PGIMER), Chandigarh 160012, India
| |
Collapse
|
18
|
Qu A, Wu Q, Wang J, Yu L, Li J, Liu J. TNCB: Tri-Net With Cross-Balanced Pseudo Supervision for Class Imbalanced Medical Image Classification. IEEE J Biomed Health Inform 2024; 28:2187-2198. [PMID: 38329849 DOI: 10.1109/jbhi.2024.3362243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
In clinical settings, the implementation of deep neural networks is impeded by the prevalent problems of label scarcity and class imbalance in medical images. To mitigate the need for labeled data, semi-supervised learning (SSL) has gained traction. However, existing SSL schemes exhibit certain limitations. 1) They commonly fail to address the class imbalance problem. Training with imbalanced data makes the model's prediction biased towards majority classes, consequently introducing prediction bias. 2) They usually suffer from training bias arising from unreasonable training strategies, such as strong coupling between the generation and utilization of pseudo labels. To address these problems, we propose a novel SSL framework called Tri-Net with Cross-Balanced pseudo supervision (TNCB). Specifically, two student networks focusing on different learning tasks and a teacher network equipped with an adaptive balancer are designed. This design enables the teacher model to pay more focus on minority classes, thereby reducing prediction bias. Additionally, we propose a virtual optimization strategy to further enhance the teacher model's resistance to class imbalance. Finally, to fully exploit valuable knowledge from unlabeled images, we employ cross-balanced pseudo supervision, where an adaptive cross loss function is introduced to reduce training bias. Extensive evaluation on four datasets with different diseases, image modalities, and imbalance ratios consistently demonstrate the superior performance of TNCB over state-of-the-art SSL methods. These results indicate the effectiveness and robustness of TNCB in addressing imbalanced medical image classification challenges.
Collapse
|
19
|
Zeng H, Zhou K, Ge S, Gao Y, Zhao J, Gao S, Zheng R. Anatomical Prior and Inter-Slice Consistency for Semi-Supervised Vertebral Structure Detection in 3D Ultrasound Volume. IEEE J Biomed Health Inform 2024; 28:2211-2222. [PMID: 38289848 DOI: 10.1109/jbhi.2024.3360102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but the current assessment method only uses coronal projection images and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch to detect vertebral structures in 3D ultrasound volume containing a detector and classifier. The detector network finds the potential positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The classifier is used to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the detector. VertMatch utilizes unlabeled data in a semi-supervised manner, and we develop two novel techniques for semi-supervised learning: 1) anatomical prior is used to acquire high-quality pseudo labels; 2) inter-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. Moreover, VertMatch is also validated in automatic spinous process angle measurement on forty subjects with scoliosis, and the results illustrate that it can be a promising approach for the 3D assessment of scoliosis.
Collapse
|
20
|
Xue Y, Zhang D, Jia L, Yang W, Zhao J, Qiang Y, Wang L, Qiao Y, Yue H. Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer. PLoS One 2024; 19:e0297331. [PMID: 38466735 PMCID: PMC10927133 DOI: 10.1371/journal.pone.0297331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 01/03/2024] [Indexed: 03/13/2024] Open
Abstract
KRAS is a pathogenic gene frequently implicated in non-small cell lung cancer (NSCLC). However, biopsy as a diagnostic method has practical limitations. Therefore, it is important to accurately determine the mutation status of the KRAS gene non-invasively by combining NSCLC CT images and genetic data for early diagnosis and subsequent targeted therapy of patients. This paper proposes a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM). S2MMAM comprises a Supervised Multilevel Fusion Segmentation Network (SMF-SN) and a Semi-supervised Multimodal Fusion Classification Network (S2MF-CN). S2MMAM facilitates the execution of the classification task by transferring the useful information captured in SMF-SN to the S2MF-CN to improve the model prediction accuracy. In SMF-SN, we propose a Triple Attention-guided Feature Aggregation module for obtaining segmentation features that incorporate high-level semantic abstract features and low-level semantic detail features. Segmentation features provide pre-guidance and key information expansion for S2MF-CN. S2MF-CN shares the encoder and decoder parameters of SMF-SN, which enables S2MF-CN to obtain rich classification features. S2MF-CN uses the proposed Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to first guide segmentation and classification feature fusion to extract hidden multi-scale contextual information. I2MGAF then guides the multidimensional fusion of genetic data and CT image data to compensate for the lack of information in single modality data. S2MMAM achieved 83.27% AUC and 81.67% accuracy in predicting KRAS gene mutation status in NSCLC. This method uses medical image CT and genetic data to effectively improve the accuracy of predicting KRAS gene mutation status in NSCLC.
Collapse
Affiliation(s)
- Yuting Xue
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Dongxu Zhang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Liye Jia
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Wanting Yang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Juanjuan Zhao
- School of Software, Taiyuan University of Technology, Taiyuan, Shanxi, China
- College of Information, Jinzhong College of Information, Taiyuan, Shanxi, China
| | - Yan Qiang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Long Wang
- College of Information, Jinzhong College of Information, Taiyuan, Shanxi, China
| | - Ying Qiao
- First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Huajie Yue
- First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| |
Collapse
|
21
|
Gao X, Jiang B, Wang X, Huang L, Tu Z. Chest x-ray diagnosis via spatial-channel high-order attention representation learning. Phys Med Biol 2024; 69:045026. [PMID: 38347732 DOI: 10.1088/1361-6560/ad2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 01/18/2024] [Indexed: 02/15/2024]
Abstract
Objective. Chest x-ray image representation and learning is an important problem in computer-aided diagnostic area. Existing methods usually adopt CNN or Transformers for feature representation learning and focus on learning effective representations for chest x-ray images. Although good performance can be obtained, however, these works are still limited mainly due to the ignorance of mining the correlations of channels and pay little attention on the local context-aware feature representation of chest x-ray image.Approach. To address these problems, in this paper, we propose a novel spatial-channel high-order attention model (SCHA) for chest x-ray image representation and diagnosis. The proposed network architecture mainly contains three modules, i.e. CEBN, SHAM and CHAM. To be specific, firstly, we introduce a context-enhanced backbone network by employing multi-head self-attention to extract initial features for the input chest x-ray images. Then, we develop a novel SCHA which contains both spatial and channel high-order attention learning branches. For the spatial branch, we develop a novel local biased self-attention mechanism which can capture both local and long-range global dependences of positions to learn rich context-aware representation. For the channel branch, we employ Brownian Distance Covariance to encode the correlation information of channels and regard it as the image representation. Finally, the two learning branches are integrated together for the final multi-label diagnosis classification and prediction.Main results. Experiments on the commonly used datasets including ChestX-ray14 and CheXpert demonstrate that our proposed SCHA approach can obtain better performance when comparing many related approaches.Significance. This study obtains a more discriminative method for chest x-ray classification and provides a technique for computer-aided diagnosis.
Collapse
Affiliation(s)
- Xinyue Gao
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Bo Jiang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Xixi Wang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Lili Huang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Zhengzheng Tu
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| |
Collapse
|
22
|
Fan L, Gong X, Zheng C, Li J. Data pyramid structure for optimizing EUS-based GISTs diagnosis in multi-center analysis with missing label. Comput Biol Med 2024; 169:107897. [PMID: 38171262 DOI: 10.1016/j.compbiomed.2023.107897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/05/2024]
Abstract
This study introduces the Data Pyramid Structure (DPS) to address data sparsity and missing labels in medical image analysis. The DPS optimizes multi-task learning and enables sustainable expansion of multi-center data analysis. Specifically, It facilitates attribute prediction and malignant tumor diagnosis tasks by implementing a segmentation and aggregation strategy on data with absent attribute labels. To leverage multi-center data, we propose the Unified Ensemble Learning Framework (UELF) and the Unified Federated Learning Framework (UFLF), which incorporate strategies for data transfer and incremental learning in scenarios with missing labels. The proposed method was evaluated on a challenging EUS patient dataset from five centers, achieving promising diagnostic performance. The average accuracy was 0.984 with an AUC of 0.927 for multi-center analysis, surpassing state-of-the-art approaches. The interpretability of the predictions further highlights the potential clinical relevance of our method.
Collapse
Affiliation(s)
- Lin Fan
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China
| | - Xun Gong
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China.
| | - Cenyang Zheng
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China
| | - Jiao Li
- Department of Gastroenterology, The Third People's Hospital of Chendu, Affiliated Hospital of Southwest Jiaotong University, Chengdu 610031, China
| |
Collapse
|
23
|
Feng W, Huang Q, Ma T, Ju L, Ge Z, Chen Y, Zhao P. Development and validation of a semi-supervised deep learning model for automatic retinopathy of prematurity staging. iScience 2024; 27:108516. [PMID: 38269093 PMCID: PMC10805639 DOI: 10.1016/j.isci.2023.108516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 07/03/2023] [Accepted: 11/20/2023] [Indexed: 01/26/2024] Open
Abstract
Retinopathy of prematurity (ROP) is currently one of the leading causes of infant blindness worldwide. Recently significant progress has been made in deep learning-based computer-aided diagnostic methods. However, deep learning often requires a large amount of annotated data for model optimization, but this requires long hours of effort by experienced doctors in clinical scenarios. In contrast, a large number of unlabeled images are relatively easy to obtain. In this paper, we propose a new semi-supervised learning framework to reduce annotation costs for automatic ROP staging. We design two consistency regularization strategies, prediction consistency loss and semantic structure consistency loss, which can help the model mine useful discriminative information from unlabeled data, thus improving the generalization performance of the classification model. Extensive experiments on a real clinical dataset show that the proposed method promises to greatly reduce the labeling requirements in clinical scenarios while achieving good classification performance.
Collapse
Affiliation(s)
- Wei Feng
- Beijing Airdoc Technology Co., Ltd, Beijing 100089, China
- Faculty of Engineering, Monash University, Melbourne, VIC 3000, Australia
| | - Qiujing Huang
- Department of Ophthalmology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200092, China
- Department of Ophthalmology, Rainbow Children’s Clinic, Shanghai 200010, China
| | - Tong Ma
- Beijing Airdoc Technology Co., Ltd, Beijing 100089, China
| | - Lie Ju
- Beijing Airdoc Technology Co., Ltd, Beijing 100089, China
- Faculty of Engineering, Monash University, Melbourne, VIC 3000, Australia
| | - Zongyuan Ge
- Faculty of Engineering, Monash University, Melbourne, VIC 3000, Australia
| | - Yuzhong Chen
- Beijing Airdoc Technology Co., Ltd, Beijing 100089, China
| | - Peiquan Zhao
- Department of Ophthalmology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200092, China
| |
Collapse
|
24
|
Li Z, Li Y, Li Q, Wang P, Guo D, Lu L, Jin D, Zhang Y, Hong Q. LViT: Language Meets Vision Transformer in Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:96-107. [PMID: 37399157 DOI: 10.1109/tmi.2023.3291719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.
Collapse
|
25
|
Xie Y, Zhang J, Liu L, Wang H, Ye Y, Verjans J, Xia Y. ReFs: A hybrid pre-training paradigm for 3D medical image segmentation. Med Image Anal 2024; 91:103023. [PMID: 37956551 DOI: 10.1016/j.media.2023.103023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/31/2023] [Accepted: 11/02/2023] [Indexed: 11/15/2023]
Abstract
Self-supervised learning (SSL) has achieved remarkable progress in medical image segmentation. The application of an SSL algorithm often follows a two-stage training process: using unlabeled data to perform label-free representation learning and fine-tuning the pre-trained model on the downstream tasks. One issue of this paradigm is that the SSL step is unaware of the downstream task, which may lead to sub-optimal feature representation for a target task. In this paper, we propose a hybrid pre-training paradigm that is driven by both self-supervised and supervised objectives. To achieve this, a supervised reference task is involved in self-supervised learning, aiming to improve the representation quality. Specifically, we employ the off-the-shelf medical image segmentation task as reference, and encourage learning a representation that (1) incurs low prediction loss on both SSL and reference tasks and (2) leads to a similar gradient when updating the feature extractor from either task. In this way, the reference task pilots SSL in the direction beneficial for the downstream segmentation. To this end, we propose a simple but effective gradient matching method to optimize the model towards a consistent direction, thus improving the compatibility of both SSL and supervised reference tasks. We call this hybrid pre-training paradigm reference-guided self-supervised learning (ReFs), and perform it on a large-scale unlabeled dataset and an additional reference dataset. The experimental results demonstrate its effectiveness on seven downstream medical image segmentation benchmarks.
Collapse
Affiliation(s)
| | - Jianpeng Zhang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | | | - Hu Wang
- University of Adelaide, Australia
| | - Yiwen Ye
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | | | - Yong Xia
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
26
|
Wang C, Chen Y, Liu F, Elliott M, Kwok CF, Pena-Solorzano C, Frazer H, McCarthy DJ, Carneiro G. An Interpretable and Accurate Deep-Learning Diagnosis Framework Modeled With Fully and Semi-Supervised Reciprocal Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:392-404. [PMID: 37603481 DOI: 10.1109/tmi.2023.3306781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
The deployment of automated deep-learning classifiers in clinical practice has the potential to streamline the diagnosis process and improve the diagnosis accuracy, but the acceptance of those classifiers relies on both their accuracy and interpretability. In general, accurate deep-learning classifiers provide little model interpretability, while interpretable models do not have competitive classification accuracy. In this paper, we introduce a new deep-learning diagnosis framework, called InterNRL, that is designed to be highly accurate and interpretable. InterNRL consists of a student-teacher framework, where the student model is an interpretable prototype-based classifier (ProtoPNet) and the teacher is an accurate global image classifier (GlobalNet). The two classifiers are mutually optimised with a novel reciprocal learning paradigm in which the student ProtoPNet learns from optimal pseudo labels produced by the teacher GlobalNet, while GlobalNet learns from ProtoPNet's classification performance and pseudo labels. This reciprocal learning paradigm enables InterNRL to be flexibly optimised under both fully- and semi-supervised learning scenarios, reaching state-of-the-art classification performance in both scenarios for the tasks of breast cancer and retinal disease diagnosis. Moreover, relying on weakly-labelled training images, InterNRL also achieves superior breast cancer localisation and brain tumour segmentation results than other competing methods.
Collapse
|
27
|
Xiang H, Shen J, Yan Q, Xu M, Shi X, Zhu X. Multi-scale representation attention based deep multiple instance learning for gigapixel whole slide image analysis. Med Image Anal 2023; 89:102890. [PMID: 37467642 DOI: 10.1016/j.media.2023.102890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 04/22/2023] [Accepted: 07/03/2023] [Indexed: 07/21/2023]
Abstract
Recently, convolutional neural networks (CNNs) directly using whole slide images (WSIs) for tumor diagnosis and analysis have attracted considerable attention, because they only utilize the slide-level label for model training without any additional annotations. However, it is still a challenging task to directly handle gigapixel WSIs, due to the billions of pixels and intra-variations in each WSI. To overcome this problem, in this paper, we propose a novel end-to-end interpretable deep MIL framework for WSI analysis, by using a two-branch deep neural network and a multi-scale representation attention mechanism to directly extract features from all patches of each WSI. Specifically, we first divide each WSI into bag-, patch- and cell-level images, and then assign the slide-level label to its corresponding bag-level images, so that WSI classification becomes a MIL problem. Additionally, we design a novel multi-scale representation attention mechanism, and embed it into a two-branch deep network to simultaneously mine the bag with a correct label, the significant patches and their cell-level information. Extensive experiments demonstrate the superior performance of the proposed framework over recent state-of-the-art methods, in term of classification accuracy and model interpretability. All source codes are released at: https://github.com/xhangchen/MRAN/.
Collapse
Affiliation(s)
- Hangchen Xiang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Junyi Shen
- Division of Liver Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, 610044, China
| | - Qingguo Yan
- Department of Pathology Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, School of Medicine, Northwest University, 229 Taibai North Road, Xi'an 710069, China
| | - Meilian Xu
- School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan, 614000, China.
| | - Xiaoshuang Shi
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Xiaofeng Zhu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|
28
|
Zhao S, Wang J, Wang X, Wang Y, Zheng H, Chen B, Zeng A, Wei F, Al-Kindi S, Li S. Attractive deep morphology-aware active contour network for vertebral body contour extraction with extensions to heterogeneous and semi-supervised scenarios. Med Image Anal 2023; 89:102906. [PMID: 37499333 DOI: 10.1016/j.media.2023.102906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 07/07/2023] [Accepted: 07/13/2023] [Indexed: 07/29/2023]
Abstract
Automatic vertebral body contour extraction (AVBCE) from heterogeneous spinal MRI is indispensable for the comprehensive diagnosis and treatment of spinal diseases. However, AVBCE is challenging due to data heterogeneity, image characteristics complexity, and vertebral body morphology variations, which may cause morphology errors in semantic segmentation. Deep active contour-based (deep ACM-based) methods provide a promising complement for tackling morphology errors by directly parameterizing the contour coordinates. Extending the target contours' capture range and providing morphology-aware parameter maps are crucial for deep ACM-based methods. For this purpose, we propose a novel Attractive Deep Morphology-aware actIve contouR nEtwork (ADMIRE) that embeds an elaborated contour attraction term (CAT) and a comprehensive contour quality (CCQ) loss into the deep ACM-based framework. The CAT adaptively extends the target contours' capture range by designing an all-to-all force field to enable the target contours' energy to contribute to farther locations. Furthermore, the CCQ loss is carefully designed to generate morphology-aware active contour parameters by simultaneously supervising the contour shape, tension, and smoothness. These designs, in cooperation with the deep ACM-based framework, enable robustness to data heterogeneity, image characteristics complexity, and target contour morphology variations. Furthermore, the deep ACM-based ADMIRE is able to cooperate well with semi-supervised strategies such as mean teacher, which enables its function in semi-supervised scenarios. ADMIRE is trained and evaluated on four challenging datasets, including three spinal datasets with more than 1000 heterogeneous images and more than 10000 vertebrae bodies, as well as a cardiac dataset with both normal and pathological cases. Results show ADMIRE achieves state-of-the-art performance on all datasets, which proves ADMIRE's accuracy, robustness, and generalization ability.
Collapse
Affiliation(s)
- Shen Zhao
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinhong Wang
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Xinxin Wang
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Yikang Wang
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Hanying Zheng
- Department of Artificial Intelligence, Sun Yat-sen University, Guangzhou 510006, China
| | - Bin Chen
- Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Zhejiang, China.
| | - An Zeng
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Fuxin Wei
- Department of Orthopedics, the Seventh Affiliated Hospital of Sun Yet-sen University, Shen Zhen, China
| | - Sadeer Al-Kindi
- School of Medicine, Case Western Reserve University, Cleveland, USA
| | - Shuo Li
- School of Medicine, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
29
|
Huang Z, Wu J, Wang T, Li Z, Ioannou A. Class-Specific Distribution Alignment for semi-supervised medical image classification. Comput Biol Med 2023; 164:107280. [PMID: 37517324 DOI: 10.1016/j.compbiomed.2023.107280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 07/11/2023] [Accepted: 07/16/2023] [Indexed: 08/01/2023]
Abstract
Despite the success of deep neural networks in medical image classification, the problem remains challenging as data annotation is time-consuming, and the class distribution is imbalanced due to the relative scarcity of diseases. To address this problem, we propose Class-Specific Distribution Alignment (CSDA), a semi-supervised learning framework based on self-training that is suitable to learn from highly imbalanced datasets. Specifically, we first provide a new perspective to distribution alignment by considering the process as a change of basis in the vector space spanned by marginal predictions, and then derive CSDA to capture class-dependent marginal predictions on both labeled and unlabeled data, in order to avoid the bias towards majority classes. Furthermore, we propose a Variable Condition Queue (VCQ) module to maintain a proportionately balanced number of unlabeled samples for each class. Experiments on three public datasets HAM10000, CheXpert and Kvasir show that our method provides competitive performance on semi-supervised skin disease, thoracic disease, and endoscopic image classification tasks.
Collapse
Affiliation(s)
- Zhongzheng Huang
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China; College of Computer and Data Science, Fuzhou University, Fuzhou, China
| | - Jiawei Wu
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China; College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Tao Wang
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China; International Digital Economy College, Minjiang University, Fuzhou, China.
| | - Zuoyong Li
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China.
| | - Anastasia Ioannou
- International Digital Economy College, Minjiang University, Fuzhou, China; Department of Computer Science and Engineering, European University Cyprus, Nicosia, Cyprus
| |
Collapse
|
30
|
Shen N, Xu T, Bian Z, Huang S, Mu F, Huang B, Xiao Y, Li J. SCANet: A Unified Semi-Supervised Learning Framework for Vessel Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2476-2489. [PMID: 35862338 DOI: 10.1109/tmi.2022.3193150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Automatic subcutaneous vessel imaging with near-infrared (NIR) optical apparatus can promote the accuracy of locating blood vessels, thus significantly contributing to clinical venipuncture research. Though deep learning models have achieved remarkable success in medical image segmentation, they still struggle in the subfield of subcutaneous vessel segmentation due to the scarcity and low-quality of annotated data. To relieve it, this work presents a novel semi-supervised learning framework, SCANet, that achieves accurate vessel segmentation through an alternate training strategy. The SCANet is composed of a multi-scale recurrent neural network that embeds coarse-to-fine features and two auxiliary branches, a consistency decoder and an adversarial learning branch, responsible for strengthening fine-grained details and eliminating differences between ground-truths and predictions, respectively. Equipped with a novel semi-supervised alternate training strategy, the three components work collaboratively, enabling SCANet to accurately segment vessel regions with only a handful of labeled data and abounding unlabeled data. Moreover, to mitigate the shortage of annotated data in this field, we provide a new subcutaneous vessel dataset, VESSEL-NIR. Extensive experiments on a wide variety of tasks, including the segmentation of subcutaneous vessels, retinal vessels, and skin lesions, well demonstrate the superiority and generality of our approach.
Collapse
|
31
|
FixMatch-LS: Semi-supervised skin lesion classification with label smoothing. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
32
|
Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PHC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, Natarajan V. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng 2023:10.1038/s41551-023-01049-7. [PMID: 37291435 DOI: 10.1038/s41551-023-01049-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 05/02/2023] [Indexed: 06/10/2023]
Abstract
Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ting Chen
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | - Aaron Loh
- Google Research, Mountain View, CA, USA
| | | | - Yuan Liu
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | - Fiona Ryan
- Georgia Institute of Technology, Computer Science, Atlanta, GA, USA
| | | | - Mozziyar Etemadi
- School of Medicine/School of Engineering, Northwestern University, Chicago, IL, USA
| | | | - Yun Liu
- Google Research, Mountain View, CA, USA
| | - Lily Peng
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Park S, Ye JC, Lee ES, Cho G, Yoon JW, Choi JH, Joo I, Lee YJ. Deep Learning-Enabled Detection of Pneumoperitoneum in Supine and Erect Abdominal Radiography: Modeling Using Transfer Learning and Semi-Supervised Learning. Korean J Radiol 2023; 24:541-552. [PMID: 37271208 DOI: 10.3348/kjr.2022.1032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 06/06/2023] Open
Abstract
OBJECTIVE Detection of pneumoperitoneum using abdominal radiography, particularly in the supine position, is often challenging. This study aimed to develop and externally validate a deep learning model for the detection of pneumoperitoneum using supine and erect abdominal radiography. MATERIALS AND METHODS A model that can utilize "pneumoperitoneum" and "non-pneumoperitoneum" classes was developed through knowledge distillation. To train the proposed model with limited training data and weak labels, it was trained using a recently proposed semi-supervised learning method called distillation for self-supervised and self-train learning (DISTL), which leverages the Vision Transformer. The proposed model was first pre-trained with chest radiographs to utilize common knowledge between modalities, fine-tuned, and self-trained on labeled and unlabeled abdominal radiographs. The proposed model was trained using data from supine and erect abdominal radiographs. In total, 191212 chest radiographs (CheXpert data) were used for pre-training, and 5518 labeled and 16671 unlabeled abdominal radiographs were used for fine-tuning and self-supervised learning, respectively. The proposed model was internally validated on 389 abdominal radiographs and externally validated on 475 and 798 abdominal radiographs from the two institutions. We evaluated the performance in diagnosing pneumoperitoneum using the area under the receiver operating characteristic curve (AUC) and compared it with that of radiologists. RESULTS In the internal validation, the proposed model had an AUC, sensitivity, and specificity of 0.881, 85.4%, and 73.3% and 0.968, 91.1, and 95.0 for supine and erect positions, respectively. In the external validation at the two institutions, the AUCs were 0.835 and 0.852 for the supine position and 0.909 and 0.944 for the erect position. In the reader study, the readers' performances improved with the assistance of the proposed model. CONCLUSION The proposed model trained with the DISTL method can accurately detect pneumoperitoneum on abdominal radiography in both the supine and erect positions.
Collapse
Affiliation(s)
- Sangjoon Park
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea
| | - Jong Chul Ye
- Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology, Daejeon, Korea.
| | - Eun Sun Lee
- Department of Radiology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Korea
- Biomedical Research Institute, Chung-Ang University Hospital, Seoul, Korea.
| | - Gyeongme Cho
- Department of Radiology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Korea
| | - Jin Woo Yoon
- Department of Radiology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Korea
| | - Joo Hyeok Choi
- Department of Radiology, Chung-Ang University Hospital, Chung-Ang University College of Medicine, Seoul, Korea
| | - Ijin Joo
- Department of Radiology, Seoul National University Hospital, Seoul, Korea
| | - Yoon Jin Lee
- Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea
| |
Collapse
|
34
|
Tsuji T, Hirata Y, Kusunose K, Sata M, Kumagai S, Shiraishi K, Kotoku J. Classification of chest X-ray images by incorporation of medical domain knowledge into operation branch networks. BMC Med Imaging 2023; 23:62. [PMID: 37161392 PMCID: PMC10169130 DOI: 10.1186/s12880-023-01019-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 05/02/2023] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND This study was conducted to alleviate a common difficulty in chest X-ray image diagnosis: The attention region in a convolutional neural network (CNN) does not often match the doctor's point of focus. The method presented herein, which guides the area of attention in CNN to a medically plausible region, can thereby improve diagnostic capabilities. METHODS The model is based on an attention branch network, which has excellent interpretability of the classification model. This model has an additional new operation branch that guides the attention region to the lung field and heart in chest X-ray images. We also used three chest X-ray image datasets (Teikyo, Tokushima, and ChestX-ray14) to evaluate the CNN attention area of interest in these fields. Additionally, after devising a quantitative method of evaluating improvement of a CNN's region of interest, we applied it to evaluation of the proposed model. RESULTS Operation branch networks maintain or improve the area under the curve to a greater degree than conventional CNNs do. Furthermore, the network better emphasizes reasonable anatomical parts in chest X-ray images. CONCLUSIONS The proposed network better emphasizes the reasonable anatomical parts in chest X-ray images. This method can enhance capabilities for image interpretation based on judgment.
Collapse
Affiliation(s)
- Takumasa Tsuji
- Graduate School of Medical Care and Technology, Teikyo University, 2-11-1 Kaga, Itabashi-Ku, Tokyo, 173-8605, Japan
| | - Yukina Hirata
- Ultrasound Examination Center, Tokushima University Hospital, 2-50-1, Kuramoto, Tokushima, Japan
| | - Kenya Kusunose
- Department of Cardiovascular Medicine, Tokushima University Hospital, 2-50-1, Kuramoto, Tokushima, Japan
| | - Masataka Sata
- Department of Cardiovascular Medicine, Tokushima University Hospital, 2-50-1, Kuramoto, Tokushima, Japan
| | - Shinobu Kumagai
- Central Radiology Division, Teikyo University Hospital, 2-11-1 Kaga, Itabashi-Ku, Tokyo, 173-8606, Japan
| | - Kenshiro Shiraishi
- Department of Radiology, Teikyo University School of Medicine, 2-11-1 Kaga, Itabashi-Ku, Tokyo, 173-8605, Japan
| | - Jun'ichi Kotoku
- Graduate School of Medical Care and Technology, Teikyo University, 2-11-1 Kaga, Itabashi-Ku, Tokyo, 173-8605, Japan.
- Central Radiology Division, Teikyo University Hospital, 2-11-1 Kaga, Itabashi-Ku, Tokyo, 173-8606, Japan.
| |
Collapse
|
35
|
Su L, Wang Z, Shi Y, Li A, Wang M. Local augmentation based consistency learning for semi-supervised pathology image classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 232:107446. [PMID: 36871546 DOI: 10.1016/j.cmpb.2023.107446] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 02/17/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Labeling pathology images is often costly and time-consuming, which is quite detrimental for supervised pathology image classification that relies heavily on sufficient labeled data during training. Exploring semi-supervised methods based on image augmentation and consistency regularization may effectively alleviate this problem. Nevertheless, traditional image-based augmentation (e.g., flip) produces only a single enhancement to an image, whereas combining multiple image sources may mix unimportant image regions resulting in poor performance. In addition, the regularization losses used in these augmentation approaches typically enforce the consistency of image level predictions, and meanwhile simply require each prediction of augmented image to be consistent bilaterally, which may force pathology image features with better predictions to be wrongly aligned towards the features with worse predictions. METHODS To tackle these problems, we propose a novel semi-supervised method called Semi-LAC for pathology image classification. Specifically, we first present local augmentation technique to randomly apply different augmentations produces to each local pathology patch, which can boost the diversity of pathology image and avoid mixing unimportant regions in other images. Moreover, we further propose the directional consistency loss to enforce restrictions on the consistency of both features and prediction results, thus improving the ability of the network to obtain robust representations and achieve accurate predictions. RESULTS The proposed method is evaluated on Bioimaging2015 and BACH datasets, and the extensive experiments show the superior performance of our Semi-LAC compared with state-of-the-art methods for pathology image classification. CONCLUSIONS We conclude that using the Semi-LAC method can effectively reduce the cost for annotating pathology images, and enhance the ability of classification networks to represent pathology images by using local augmentation techniques and directional consistency loss.
Collapse
Affiliation(s)
- Lei Su
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Zhi Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Yi Shi
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China.
| |
Collapse
|
36
|
Yang G, Luo S, Greer P. A Novel Vision Transformer Model for Skin Cancer Classification. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11204-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
AbstractSkin cancer can be fatal if it is found to be malignant. Modern diagnosis of skin cancer heavily relies on visual inspection through clinical screening, dermoscopy, or histopathological examinations. However, due to similarity among cancer types, it is usually challenging to identify the type of skin cancer, especially at its early stages. Deep learning techniques have been developed over the last few years and have achieved success in helping to improve the accuracy of diagnosis and classification. However, the latest deep learning algorithms still do not provide ideal classification accuracy. To further improve the performance of classification accuracy, this paper presents a novel method of classifying skin cancer in clinical skin images. The method consists of four blocks. First, class rebalancing is applied to the images of seven skin cancer types for better classification performance. Second, an image is preprocessed by being split into patches of the same size and then flattened into a series of tokens. Third, a transformer encoder is used to process the flattened patches. The transformer encoder consists of N identical layers with each layer containing two sublayers. Sublayer one is a multihead self-attention unit, and sublayer two is a fully connected feed-forward network unit. For each of the two sublayers, a normalization operation is applied to its input, and a residual connection of its input and its output is calculated. Finally, a classification block is implemented after the transformer encoder. The block consists of a flattened layer and a dense layer with batch normalization. Transfer learning is implemented to build the whole network, where the ImageNet dataset is used to pretrain the network and the HAM10000 dataset is used to fine-tune the network. Experiments have shown that the method has achieved a classification accuracy of 94.1%, outperforming the current state-of-the-art model IRv2 with soft attention on the same training and testing datasets. On the Edinburgh DERMOFIT dataset also, the method has better performance compared with baseline models.
Collapse
|
37
|
Gulakala R, Markert B, Stoffel M. Rapid diagnosis of Covid-19 infections by a progressively growing GAN and CNN optimisation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 229:107262. [PMID: 36463675 PMCID: PMC9699959 DOI: 10.1016/j.cmpb.2022.107262] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 11/04/2022] [Accepted: 11/22/2022] [Indexed: 05/23/2023]
Abstract
BACKGROUND AND OBJECTIVE Covid-19 infections are spreading around the globe since December 2019. Several diagnostic methods were developed based on biological investigations and the success of each method depends on the accuracy of identifying Covid infections. However, access to diagnostic tools can be limited, depending on geographic region and the diagnosis duration plays an important role in treating Covid-19. Since the virus causes pneumonia, its presence can also be detected using medical imaging by Radiologists. Hospitals with X-ray capabilities are widely distributed all over the world, so a method for diagnosing Covid-19 from chest X-rays would present itself. Studies have shown promising results in automatically detecting Covid-19 from medical images using supervised Artificial neural network (ANN) algorithms. The major drawback of supervised learning algorithms is that they require huge amounts of data to train. Also, the radiology equipment is not computationally efficient for deep neural networks. Therefore, we aim to develop a Generative Adversarial Network (GAN) based image augmentation to optimize the performance of custom, light, Convolutional networks used for the classification of Chest X-rays (CXR). METHODS A Progressively Growing Generative Adversarial Network (PGGAN) is used to generate synthetic and augmented data to supplement the dataset. We propose two novel CNN architectures to perform the Multi-class classification of Covid-19, healthy and pneumonia affected Chest X-rays. Comparisons have been drawn to the state of the art models and transfer learning methods to evaluate the superiority of the networks. All the models are trained using enhanced and augmented X-ray images and are compared based on classification metrics. RESULTS The proposed models had extremely high classification metrics with proposed Architectures having test accuracy of 98.78% and 99.2% respectively while having 40% lesser training parameters than their state of the art counterpart. CONCLUSION In the present study, a method based on artificial intelligence is proposed, leading to a rapid diagnostic tool for Covid infections based on Generative Adversarial Network (GAN) and Convolutional Neural Networks (CNN). The benefit will be a high accuracy of detection with up to 99% hit rate, a rapid diagnosis, and an accessible Covid identification method by chest X-ray images.
Collapse
Affiliation(s)
- Rutwik Gulakala
- Institute of General Mechanics, RWTH Aachen University, Eilfschornsteinstr. 18, D-52062 Aachen, Germany
| | - Bernd Markert
- Institute of General Mechanics, RWTH Aachen University, Eilfschornsteinstr. 18, D-52062 Aachen, Germany
| | - Marcus Stoffel
- Institute of General Mechanics, RWTH Aachen University, Eilfschornsteinstr. 18, D-52062 Aachen, Germany.
| |
Collapse
|
38
|
Chen Z, Liu Y, Zhang Y, Li Q. Orthogonal latent space learning with feature weighting and graph learning for multimodal Alzheimer's disease diagnosis. Med Image Anal 2023; 84:102698. [PMID: 36462372 DOI: 10.1016/j.media.2022.102698] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/18/2022] [Accepted: 11/17/2022] [Indexed: 11/23/2022]
Abstract
Recent studies have shown that multimodal neuroimaging data provide complementary information of the brain and latent space-based methods have achieved promising results in fusing multimodal data for Alzheimer's disease (AD) diagnosis. However, most existing methods treat all features equally and adopt nonorthogonal projections to learn the latent space, which cannot retain enough discriminative information in the latent space. Besides, they usually preserve the relationships among subjects in the latent space based on the similarity graph constructed on original features for performance boosting. However, the noises and redundant features significantly corrupt the graph. To address these limitations, we propose an Orthogonal Latent space learning with Feature weighting and Graph learning (OLFG) model for multimodal AD diagnosis. Specifically, we map multiple modalities into a common latent space by orthogonal constrained projection to capture the discriminative information for AD diagnosis. Then, a feature weighting matrix is utilized to sort the importance of features in AD diagnosis adaptively. Besides, we devise a regularization term with learned graph to preserve the local structure of the data in the latent space and integrate the graph construction into the learning processing for accurately encoding the relationships among samples. Instead of constructing a similarity graph for each modality, we learn a joint graph for multiple modalities to capture the correlations among modalities. Finally, the representations in the latent space are projected into the target space to perform AD diagnosis. An alternating optimization algorithm with proved convergence is developed to solve the optimization objective. Extensive experimental results show the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Zhi Chen
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yongguo Liu
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Yun Zhang
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qiaoqin Li
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
39
|
Semi-supervised medical image classification with adaptive threshold pseudo-labeling and unreliable sample contrastive loss. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
40
|
Gulakala R, Markert B, Stoffel M. Generative adversarial network based data augmentation for CNN based detection of Covid-19. Sci Rep 2022; 12:19186. [PMID: 36357530 PMCID: PMC9647771 DOI: 10.1038/s41598-022-23692-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022] Open
Abstract
Covid-19 has been a global concern since 2019, crippling the world economy and health. Biological diagnostic tools have since been developed to identify the virus from bodily fluids and since the virus causes pneumonia, which results in lung inflammation, the presence of the virus can also be detected using medical imaging by expert radiologists. The success of each diagnostic method is measured by the hit rate for identifying Covid infections. However, the access for people to each diagnosis tool can be limited, depending on the geographic region and, since Covid treatment denotes a race against time, the diagnosis duration plays an important role. Hospitals with X-ray opportunities are widely distributed all over the world, so a method investigating lung X-ray images for possible Covid-19 infections would offer itself. Promising results have been achieved in the literature in automatically detecting the virus using medical images like CT scans and X-rays using supervised artificial neural network algorithms. One of the major drawbacks of supervised learning models is that they require enormous amounts of data to train, and generalize on new data. In this study, we develop a Swish activated, Instance and Batch normalized Residual U-Net GAN with dense blocks and skip connections to create synthetic and augmented data for training. The proposed GAN architecture, due to the presence of instance normalization and swish activation, can deal with the randomness of luminosity, that arises due to different sources of X-ray images better than the classical architecture and generate realistic-looking synthetic data. Also, the radiology equipment is not generally computationally efficient. They cannot efficiently run state-of-the-art deep neural networks such as DenseNet and ResNet effectively. Hence, we propose a novel CNN architecture that is 40% lighter and more accurate than state-of-the-art CNN networks. Multi-class classification of the three classes of chest X-rays (CXR), ie Covid-19, healthy and Pneumonia, is performed using the proposed model which had an extremely high test accuracy of 99.2% which has not been achieved in any previous studies in the literature. Based on the mentioned criteria for developing Corona infection diagnosis, in the present study, an Artificial Intelligence based method is proposed, resulting in a rapid diagnostic tool for Covid infections based on generative adversarial and convolutional neural networks. The benefit will be a high accuracy of lung infection identification with 99% accuracy. This could lead to a support tool that helps in rapid diagnosis, and an accessible Covid identification method using CXR images.
Collapse
Affiliation(s)
- Rutwik Gulakala
- grid.1957.a0000 0001 0728 696XInstitute of General Mechanics, RWTH Aachen University, Aachen, Germany
| | - Bernd Markert
- grid.1957.a0000 0001 0728 696XInstitute of General Mechanics, RWTH Aachen University, Aachen, Germany
| | - Marcus Stoffel
- grid.1957.a0000 0001 0728 696XInstitute of General Mechanics, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
41
|
Wang C, Grau A, Guerra E, Shen Z, Hu J, Fan H. Semi-supervised wildfire smoke detection based on smoke-aware consistency. FRONTIERS IN PLANT SCIENCE 2022; 13:980425. [PMID: 36426142 PMCID: PMC9678925 DOI: 10.3389/fpls.2022.980425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
The semi-transparency property of smoke integrates it highly with the background contextual information in the image, which results in great visual differences in different areas. In addition, the limited annotation of smoke images from real forest scenarios brings more challenges for model training. In this paper, we design a semi-supervised learning strategy, named smoke-aware consistency (SAC), to maintain pixel and context perceptual consistency in different backgrounds. Furthermore, we propose a smoke detection strategy with triple classification assistance for smoke and smoke-like object discrimination. Finally, we simplified the LFNet fire-smoke detection network to LFNet-v2, due to the proposed SAC and triple classification assistance that can perform the functions of some specific module. The extensive experiments validate that the proposed method significantly outperforms state-of-the-art object detection algorithms on wildfire smoke datasets and achieves satisfactory performance under challenging weather conditions.
Collapse
Affiliation(s)
- Chuansheng Wang
- Department of Automatic Control Technical, Polytechnic University of Catalonia, Barcelona, Spain
| | - Antoni Grau
- Department of Automatic Control Technical, Polytechnic University of Catalonia, Barcelona, Spain
| | - Edmundo Guerra
- Department of Automatic Control Technical, Polytechnic University of Catalonia, Barcelona, Spain
| | - Zhiguo Shen
- Henan Academy of Forestry, Zhengzhou, Henan, China
| | - Jinxing Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haoyi Fan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
42
|
Zhang W, Zhou Z, Gao Z, Yang G, Xu L, Wu W, Zhang H. Multiple Adversarial Learning based Angiography Reconstruction for Ultra-low-dose Contrast Medium CT. IEEE J Biomed Health Inform 2022; 27:409-420. [PMID: 36219660 DOI: 10.1109/jbhi.2022.3213595] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Iodinated contrast medium (ICM) dose reduction is beneficial for decreasing potential health risk to renal-insufficiency patients in CT scanning. Due to the lowintensity vessel in ultra-low-dose-ICM CT angiography, it cannot provide clinical diagnosis of vascular diseases. Angiography reconstruction for ultra-low-dose-ICM CT can enhance vascular intensity for directly vascular diseases diagnosis. However, the angiography reconstruction is challenging since patient individual differences and vascular disease diversity. In this paper, we propose a Multiple Adversarial Learning based Angiography Reconstruction (i.e., MALAR) framework to enhance vascular intensity. Specifically, a bilateral learning mechanism is developed for mapping a relationship between source and target domains rather than the image-to-image mapping. Then, a dual correlation constraint is introduced to characterize both distribution uniformity from across-domain features and sample inconsistency with domain simultaneously. Finally, an adaptive fusion module by combining multiscale information and long-range interactive dependency is explored to alleviate the interference of high-noise metal. Experiments are performed on CT sequences with different ICM doses. Quantitative results based on multiple metrics demonstrate the effectiveness of our MALAR on angiography reconstruction. Qualitative assessments by radiographers confirm the potential of our MALAR for the clinical diagnosis of vascular diseases. The code and model are available at https://github.com/HIC-SYSU/MALAR.
Collapse
Affiliation(s)
- Weiwei Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Zhen Zhou
- Department of Radiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Zhifan Gao
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Guang Yang
- Cardiovascular Research Centre, Royal Brompton Hospital, London, U.K
| | - Lei Xu
- Department of Radiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Weiwen Wu
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Heye Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
43
|
A semi-supervised learning approach for COVID-19 detection from chest CT scans. Neurocomputing 2022; 503:314-324. [PMID: 35765410 PMCID: PMC9221925 DOI: 10.1016/j.neucom.2022.06.076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/11/2022] [Accepted: 06/18/2022] [Indexed: 01/17/2023]
Abstract
COVID-19 has spread rapidly all over the world and has infected more than 200 countries and regions. Early screening of suspected infected patients is essential for preventing and combating COVID-19. Computed Tomography (CT) is a fast and efficient tool which can quickly provide chest scan results. To reduce the burden on doctors of reading CTs, in this article, a high precision diagnosis algorithm of COVID-19 from chest CTs is designed for intelligent diagnosis. A semi-supervised learning approach is developed to solve the problem when only small amount of labelled data is available. While following the MixMatch rules to conduct sophisticated data augmentation, we introduce a model training technique to reduce the risk of model over-fitting. At the same time, a new data enhancement method is proposed to modify the regularization term in MixMatch. To further enhance the generalization of the model, a convolutional neural network based on an attention mechanism is then developed that enables to extract multi-scale features on CT scans. The proposed algorithm is evaluated on an independent CT dataset of the chest from COVID-19 and achieves the area under the receiver operating characteristic curve (AUC) value of 0.932, accuracy of 90.1%, sensitivity of 91.4%, specificity of 88.9%, and F1-score of 89.9%. The results show that the proposed algorithm can accurately diagnose whether a chest CT belongs to a positive or negative indication of COVID-19, and can help doctors to diagnose rapidly in the early stages of a COVID-19 outbreak.
Collapse
|
44
|
Mao C, Yao L, Luo Y. ImageGCN: Multi-Relational Image Graph Convolutional Networks for Disease Identification With Chest X-Rays. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1990-2003. [PMID: 35192461 PMCID: PMC9367633 DOI: 10.1109/tmi.2022.3153322] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Image representation is a fundamental task in computer vision. However, most of the existing approaches for image representation ignore the relations between images and consider each input image independently. Intuitively, relations between images can help to understand the images and maintain model consistency over related images, leading to better explainability. In this paper, we consider modeling the image-level relations to generate more informative image representations, and propose ImageGCN, an end-to-end graph convolutional network framework for inductive multi-relational image modeling. We apply ImageGCN to chest X-ray images where rich relational information is available for disease identification. Unlike previous image representation models, ImageGCN learns the representation of an image using both its original pixel features and its relationship with other images. Besides learning informative representations for images, ImageGCN can also be used for object detection in a weakly supervised manner. The experimental results on 3 open-source x-ray datasets, ChestX-ray14, CheXpert and MIMIC-CXR demonstrate that ImageGCN can outperform respective baselines in both disease identification and localization tasks and can achieve comparable and often better results than the state-of-the-art methods.
Collapse
|
45
|
Zhou Q, Wang R, Zeng G, Fan H, Zheng G. Towards bridging the distribution gap: Instance to Prototype Earth Mover’s Distance for distribution alignment. Med Image Anal 2022; 82:102607. [DOI: 10.1016/j.media.2022.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 06/28/2022] [Accepted: 08/25/2022] [Indexed: 11/16/2022]
|
46
|
Park S, Kim G, Oh Y, Seo JB, Lee SM, Kim JH, Moon S, Lim JK, Park CM, Ye JC. Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation. Nat Commun 2022; 13:3848. [PMID: 35789159 PMCID: PMC9252561 DOI: 10.1038/s41467-022-31514-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 06/16/2022] [Indexed: 11/14/2022] Open
Abstract
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, developing a robust model requires large, high-quality data with annotations that are expensive to obtain. This situation poses a conundrum that annually-collected chest x-rays cannot be utilized due to the absence of labels, especially in deprived areas. In this study, we present a framework named distillation for self-supervision and self-train learning (DISTL) inspired by the learning process of the radiologists, which can improve the performance of vision transformer simultaneously with self-supervision and self-training through knowledge distillation. In external validation from three hospitals for diagnosis of tuberculosis, pneumothorax, and COVID-19, DISTL offers gradually improved performance as the amount of unlabeled data increase, even better than the fully supervised model with the same amount of labeled data. We additionally show that the model obtained with DISTL is robust to various real-world nuisances, offering better applicability in clinical setting. Although deep learning-based computer-aided diagnosis systems have recently achieved expert level performance, developing a robust model requires large, high-quality data with annotations. Here, the authors present a framework which can improve the performance of vision transformer simultaneously with self-supervision and self-training.
Collapse
Affiliation(s)
- Sangjoon Park
- Department of Bio and Brain Engineering, KAIST, Daejeon, Korea
| | - Gwanghyun Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, Korea
| | - Yujin Oh
- Department of Bio and Brain Engineering, KAIST, Daejeon, Korea
| | - Joon Beom Seo
- Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
| | - Sang Min Lee
- Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
| | - Jin Hwan Kim
- College of Medicine, Chungnam National Univerity, Daejeon, South Korea
| | - Sungjun Moon
- College of Medicine, Yeungnam University, Daegu, South Korea
| | - Jae-Kwang Lim
- School of Medicine, Kyungpook National University, Daegu, South Korea
| | - Chang Min Park
- College of Medicine, Seoul National University, Seoul, South Korea
| | - Jong Chul Ye
- Department of Bio and Brain Engineering, KAIST, Daejeon, Korea. .,Kim Jaechul Graduate School of AI, KAIST, Daejeon, Korea.
| |
Collapse
|
47
|
Semi-Supervised Medical Image Classification Based on Attention and Intrinsic Features of Samples. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136726] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The training of deep neural networks usually requires a lot of high-quality data with good annotations to obtain good performance. However, in clinical medicine, obtaining high-quality marker data is laborious and expensive because it requires the professional skill of clinicians. In this paper, based on the consistency strategy, we propose a new semi-supervised model for medical image classification which introduces a self-attention mechanism into the backbone network to learn more meaningful features in image classification tasks and uses the improved version of focal loss at the supervision loss to reduce the misclassification of samples. Finally, we add a consistency loss similar to the unsupervised consistency loss to encourage the model to learn more about the internal features of unlabeled samples. Our method achieved 94.02% AUC and 72.03% Sensitivity on the ISIC 2018 dataset and 79.74% AUC on the ChestX-ray14 dataset. These results show the effectiveness of our method in single-label and multi-label classification.
Collapse
|
48
|
Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, Mannel RS, Liu H, Zheng B, Qiu Y. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal 2022; 79:102444. [PMID: 35472844 PMCID: PMC9156578 DOI: 10.1016/j.media.2022.102444] [Citation(s) in RCA: 275] [Impact Index Per Article: 91.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 03/09/2022] [Accepted: 04/01/2022] [Indexed: 02/07/2023]
Abstract
Deep learning has received extensive research interest in developing new medical image processing algorithms, and deep learning based models have been remarkably successful in a variety of medical imaging tasks to support disease detection and diagnosis. Despite the success, the further improvement of deep learning models in medical image analysis is majorly bottlenecked by the lack of large-sized and well-annotated datasets. In the past five years, many studies have focused on addressing this challenge. In this paper, we reviewed and summarized these recent studies to provide a comprehensive overview of applying deep learning methods in various medical image analysis tasks. Especially, we emphasize the latest progress and contributions of state-of-the-art unsupervised and semi-supervised deep learning in medical image analysis, which are summarized based on different application scenarios, including classification, segmentation, detection, and image registration. We also discuss major technical challenges and suggest possible solutions in the future research efforts.
Collapse
Affiliation(s)
- Xuxin Chen
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Ximin Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Ke Zhang
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Kar-Ming Fung
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Theresa C Thai
- Department of Radiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Kathleen Moore
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Robert S Mannel
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Hong Liu
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Bin Zheng
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Yuchen Qiu
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA.
| |
Collapse
|
49
|
Semi-supervised segmentation of echocardiography videos via noise-resilient spatiotemporal semantic calibration and fusion. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
50
|
Zhou Y, Dreizin D, Wang Y, Liu F, Shen W, Yuille AL. External Attention Assisted Multi-Phase Splenic Vascular Injury Segmentation With Limited Data. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1346-1357. [PMID: 34968179 PMCID: PMC9167782 DOI: 10.1109/tmi.2021.3139637] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The spleen is one of the most commonly injured solid organs in blunt abdominal trauma. The development of automatic segmentation systems from multi-phase CT for splenic vascular injury can augment severity grading for improving clinical decision support and outcome prediction. However, accurate segmentation of splenic vascular injury is challenging for the following reasons: 1) Splenic vascular injury can be highly variant in shape, texture, size, and overall appearance; and 2) Data acquisition is a complex and expensive procedure that requires intensive efforts from both data scientists and radiologists, which makes large-scale well-annotated datasets hard to acquire in general. In light of these challenges, we hereby design a novel framework for multi-phase splenic vascular injury segmentation, especially with limited data. On the one hand, we propose to leverage external data to mine pseudo splenic masks as the spatial attention, dubbed external attention, for guiding the segmentation of splenic vascular injury. On the other hand, we develop a synthetic phase augmentation module, which builds upon generative adversarial networks, for populating the internal data by fully leveraging the relation between different phases. By jointly enforcing external attention and populating internal data representation during training, our proposed method outperforms other competing methods and substantially improves the popular DeepLab-v3+ baseline by more than 7% in terms of average DSC, which confirms its effectiveness.
Collapse
|