1
|
Wang H, Ahn E, Bi L, Kim J. Self-supervised multi-modality learning for multi-label skin lesion classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108729. [PMID: 40184849 DOI: 10.1016/j.cmpb.2025.108729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 03/10/2025] [Accepted: 03/16/2025] [Indexed: 04/07/2025]
Abstract
BACKGROUND The clinical diagnosis of skin lesions involves the analysis of dermoscopic and clinical modalities. Dermoscopic images provide detailed views of surface structures, while clinical images offer complementary macroscopic information. Clinicians frequently use the seven-point checklist as an auxiliary tool for melanoma diagnosis and identifying lesion attributes. Supervised deep learning approaches, such as convolutional neural networks, have performed well using dermoscopic and clinical modalities (multi-modality) and further enhanced classification by predicting seven skin lesion attributes (multi-label). However, the performance of these approaches is reliant on the availability of large-scale labeled data, which are costly and time-consuming to obtain, more so with annotating multi-attributes METHODS:: To reduce the dependency on large labeled datasets, we propose a self-supervised learning (SSL) algorithm for multi-modality multi-label skin lesion classification. Compared with single-modality SSL, our algorithm enables multi-modality SSL by maximizing the similarities between paired dermoscopic and clinical images from different views. We introduce a novel multi-modal and multi-label SSL strategy that generates surrogate pseudo-multi-labels for seven skin lesion attributes through clustering analysis. A label-relation-aware module is proposed to refine each pseudo-label embedding, capturing the interrelationships between pseudo-multi-labels. We further illustrate the interrelationships of skin lesion attributes and their relationships with clinical diagnoses using an attention visualization technique. RESULTS The proposed algorithm was validated using the well-benchmarked seven-point skin lesion dataset. Our results demonstrate that our method outperforms the state-of-the-art SSL counterparts. Improvements in the area under receiver operating characteristic curve, precision, sensitivity, and specificity were observed across various lesion attributes and melanoma diagnoses. CONCLUSIONS Our self-supervised learning algorithm offers a robust and efficient solution for multi-modality multi-label skin lesion classification, reducing the reliance on large-scale labeled data. By effectively capturing and leveraging the complementary information between the dermoscopic and clinical images and interrelationships between lesion attributes, our approach holds the potential for improving clinical diagnosis accuracy in dermatology.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia; Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Euijoon Ahn
- College of Science and Engineering, James Cook University, Cairns, QLD 4870, Australia.
| | - Lei Bi
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Jinman Kim
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
2
|
Xiang F, Li Z, Jiang S, Li C, Li S, Gao T, He K, Chen J, Zhang J, Zhang J. Multimodal Masked Autoencoder Based on Adaptive Masking for Vitiligo Stage Classification. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01521-7. [PMID: 40301294 DOI: 10.1007/s10278-025-01521-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 04/03/2025] [Accepted: 04/21/2025] [Indexed: 05/01/2025]
Abstract
Vitiligo, a prevalent skin condition characterized by depigmentation, presents challenges in staging due to its inherent complexity. Multimodal skin images can provide complementary information, and in this study, the integration of clinical images of vitiligo and those obtained under Wood's lamp is conducive to the classification of vitiligo stages. However, difficulties in annotating multimodal data and the scarcity of multimodal data limit the performance of deep learning models in related classification tasks. To address these issues, a Multimodal Masked Autoencoder (Multi-MAE) based on adaptive masking is proposed in annotating multimodal data and the problem of multimodal data scarcity, and enhances the model's ability to extract characteristics from multimodal data. Specifically, an image reconstruction task is constructed to diminish reliance on annotated multimodal data, and a pre-training strategy is employed to alleviate the scarcity of multimodal data. Experimental results demonstrate that the proposed model achieves a vitiligo stage classification accuracy of 95.48% on a dataset of unlabeled dermatological images, an improvement of 5.16%, 4.51%, 3.87%, 2.58%, 4.51%, 4.51%, 3.87%, and 2.58% over that of MobileNet, DenseNet, VGG, ResNet-50, BEIT, MaskFeat, SimMIM, and MAE, respectively. These results verify the effectiveness of the proposed Multi-MAE model in assessing the stable and active vitiligo stages, making it a suitable clinical aid for evaluating the severity of vitiligo lesions.
Collapse
Affiliation(s)
- Fan Xiang
- Department of Automation, College of Electrical Engineering, Sichuan University, Chengdu, 610065, China
| | - Zhiming Li
- Department of Automation, College of Electrical Engineering, Sichuan University, Chengdu, 610065, China
| | - Shuying Jiang
- Department of Automation, College of Electrical Engineering, Sichuan University, Chengdu, 610065, China
| | - Chunying Li
- Department of Dermatology, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Shuli Li
- Department of Dermatology, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Tianwen Gao
- Department of Dermatology, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Kaiqiao He
- Department of Dermatology, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Jianru Chen
- Department of Dermatology, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Junpeng Zhang
- Department of Automation, College of Electrical Engineering, Sichuan University, Chengdu, 610065, China
| | - Junran Zhang
- Department of Automation, College of Electrical Engineering, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
3
|
Zuo L, Wang Z, Wang Y. A multi-stage multi-modal learning algorithm with adaptive multimodal fusion for improving multi-label skin lesion classification. Artif Intell Med 2025; 162:103091. [PMID: 40015211 DOI: 10.1016/j.artmed.2025.103091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 09/10/2024] [Accepted: 02/14/2025] [Indexed: 03/01/2025]
Abstract
Skin cancer is frequently occurring and has become a major contributor to both cancer incidence and mortality. Accurate and timely diagnosis of skin cancer holds the potential to save lives. Deep learning-based methods have demonstrated significant advancements in the screening of skin cancers. However, most current approaches rely on a single modality input for diagnosis, thereby missing out on valuable complementary information that could enhance accuracy. Although some multimodal-based methods exist, they often lack adaptability and fail to fully leverage multimodal information. In this paper, we introduce a novel uncertainty-based hybrid fusion strategy for a multi-modal learning algorithm aimed at skin cancer diagnosis. Our approach specifically combines three different modalities: clinical images, dermoscopy images, and metadata, to make the final classification. For the fusion of two image modalities, we employ an intermediate fusion strategy that considers the similarity between clinical and dermoscopy images to extract features containing both complementary and correlated information. To capture the correlated information, we utilize cosine similarity, and we employ concatenation as the means for integrating complementary information. In the fusion of image and metadata modalities, we leverage uncertainty to obtain confident late fusion results, allowing our method to adaptively combine the information from different modalities. We conducted comprehensive experiments using a popular publicly available skin disease diagnosis dataset, and the results of these experiments demonstrate the effectiveness of our proposed method. Our proposed fusion algorithm could enhance the clinical applicability of automated skin lesion classification, offering a more robust and adaptive way to make automatic diagnoses with the help of uncertainty mechanism. Code is available at https://github.com/Zuo-Lihan/CosCatNet-Adaptive_Fusion_Algorithm.
Collapse
Affiliation(s)
- Lihan Zuo
- School of Computer and Artificial Intelligence, Southwest Jiaotong University, Chengdu 610000, PR China
| | - Zizhou Wang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Yan Wang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| |
Collapse
|
4
|
Xiao C, Zhu A, Xia C, Qiu Z, Liu Y, Zhao C, Ren W, Wang L, Dong L, Wang T, Guo L, Lei B. Attention-Guided Learning With Feature Reconstruction for Skin Lesion Diagnosis Using Clinical and Ultrasound Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:543-555. [PMID: 39208042 DOI: 10.1109/tmi.2024.3450682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Skin lesion is one of the most common diseases, and most categories are highly similar in morphology and appearance. Deep learning models effectively reduce the variability between classes and within classes, and improve diagnostic accuracy. However, the existing multi-modal methods are only limited to the surface information of lesions in skin clinical and dermatoscopic modalities, which hinders the further improvement of skin lesion diagnostic accuracy. This requires us to further study the depth information of lesions in skin ultrasound. In this paper, we propose a novel skin lesion diagnosis network, which combines clinical and ultrasound modalities to fuse the surface and depth information of the lesion to improve diagnostic accuracy. Specifically, we propose an attention-guided learning (AL) module that fuses clinical and ultrasound modalities from both local and global perspectives to enhance feature representation. The AL module consists of two parts, attention-guided local learning (ALL) computes the intra-modality and inter-modality correlations to fuse multi-scale information, which makes the network focus on the local information of each modality, and attention-guided global learning (AGL) fuses global information to further enhance the feature representation. In addition, we propose a feature reconstruction learning (FRL) strategy which encourages the network to extract more discriminative features and corrects the focus of the network to enhance the model's robustness and certainty. We conduct extensive experiments and the results confirm the superiority of our proposed method. Our code is available at: https://github.com/XCL-hub/AGFnet.
Collapse
|
5
|
Xu J, Huang K, Zhong L, Gao Y, Sun K, Liu W, Zhou Y, Guo W, Guo Y, Zou Y, Duan Y, Lu L, Wang Y, Chen X, Zhao S. RemixFormer++: A Multi-Modal Transformer Model for Precision Skin Tumor Differential Diagnosis With Memory-Efficient Attention. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:320-337. [PMID: 39120989 DOI: 10.1109/tmi.2024.3441012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named RemixFormer++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.
Collapse
|
6
|
Liu M, Yao J, Yang J, Wan Z, Lin X. Bidirectional interaction directional variance attention model based on increased-transformer for thyroid nodule classification. Biomed Phys Eng Express 2024; 11:015048. [PMID: 39681000 DOI: 10.1088/2057-1976/ad9f68] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 12/16/2024] [Indexed: 12/18/2024]
Abstract
Malignant thyroid nodules are closely linked to cancer, making the precise classification of thyroid nodules into benign and malignant categories highly significant. However, the subtle differences in contour between benign and malignant thyroid nodules, combined with the texture features obscured by the inherent noise in ultrasound images, often result in low classification accuracy in most models. To address this, we propose a Bidirectional Interaction Directional Variance Attention Model based on Increased-Transformer, named IFormer-DVNet. This paper proposes the Increased-Transformer, which enables global feature modeling of feature maps extracted by the Convolutional Feature Extraction Module (CFEM). This design maximally alleviates noise interference in ultrasound images. The Bidirectional Interaction Directional Variance Attention module (BIDVA) dynamically calculates attention weights using the variance of input tensors along both vertical and horizontal directions. This allows the model to focus more effectively on regions with rich information in the image. The vertical and horizontal features are interactively combined to enhance the model's representational capability. During the model training process, we designed a Multi-Dimensional Loss function (MD Loss) to stretch the boundary distance between different classes and reduce the distance between samples of the same class. Additionally, the MD Loss function helps mitigate issues related to class imbalance in the dataset. We evaluated our network model using the public TNCD dataset and a private dataset. The results show that our network achieved an accuracy of 76.55% on the TNCD dataset and 93.02% on the private dataset. Compared to other state-of-the-art classification networks, our model outperformed them across all evaluation metrics.
Collapse
Affiliation(s)
- Ming Liu
- Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, People's Republic of China
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, People's Republic of China
| | - Jianing Yao
- Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, People's Republic of China
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, People's Republic of China
| | - Jianli Yang
- Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, People's Republic of China
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, People's Republic of China
| | - Zhenzhen Wan
- Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, People's Republic of China
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, People's Republic of China
| | - Xiong Lin
- Key Laboratory of Digital Medical Engineering of Hebei Province, Baoding 071000, People's Republic of China
- College of Electronic and Information Engineering, Hebei University, Baoding 071002, People's Republic of China
| |
Collapse
|
7
|
Chen H, Qu Z, Tian Y, Jiang N, Qin Y, Gao J, Zhang R, Ma Y, Jin Z, Zhai G. A cross-temporal multimodal fusion system based on deep learning for orthodontic monitoring. Comput Biol Med 2024; 180:109025. [PMID: 39159544 DOI: 10.1016/j.compbiomed.2024.109025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 07/30/2024] [Accepted: 08/11/2024] [Indexed: 08/21/2024]
Abstract
INTRODUCTION In the treatment of malocclusion, continuous monitoring of the three-dimensional relationship between dental roots and the surrounding alveolar bone is essential for preventing complications from orthodontic procedures. Cone-beam computed tomography (CBCT) provides detailed root and bone data, but its high radiation dose limits its frequent use, consequently necessitating an alternative for ongoing monitoring. OBJECTIVES We aimed to develop a deep learning-based cross-temporal multimodal image fusion system for acquiring root and jawbone information without additional radiation, enhancing the ability of orthodontists to monitor risk. METHODS Utilizing CBCT and intraoral scans (IOSs) as cross-temporal modalities, we integrated deep learning with multimodal fusion technologies to develop a system that includes a CBCT segmentation model for teeth and jawbones. This model incorporates a dynamic kernel prior model, resolution restoration, and an IOS segmentation network optimized for dense point clouds. Additionally, a coarse-to-fine registration module was developed. This system facilitates the integration of IOS and CBCT images across varying spatial and temporal dimensions, enabling the comprehensive reconstruction of root and jawbone information throughout the orthodontic treatment process. RESULTS The experimental results demonstrate that our system not only maintains the original high resolution but also delivers outstanding segmentation performance on external testing datasets for CBCT and IOSs. CBCT achieved Dice coefficients of 94.1 % and 94.4 % for teeth and jawbones, respectively, and it achieved a Dice coefficient of 91.7 % for the IOSs. Additionally, in the context of real-world registration processes, the system achieved an average distance error (ADE) of 0.43 mm for teeth and 0.52 mm for jawbones, significantly reducing the processing time. CONCLUSION We developed the first deep learning-based cross-temporal multimodal fusion system, addressing the critical challenge of continuous risk monitoring in orthodontic treatments without additional radiation exposure. We hope that this study will catalyze transformative advancements in risk management strategies and treatment modalities, fundamentally reshaping the landscape of future orthodontic practice.
Collapse
Affiliation(s)
- Haiwen Chen
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China
| | - Zhiyuan Qu
- Institute of Image Communication and Network Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200011, China
| | - Yuan Tian
- Institute of Image Communication and Network Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200011, China
| | - Ning Jiang
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuan Qin
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China
| | - Jie Gao
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China
| | - Ruoyan Zhang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China
| | - Yanning Ma
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China.
| | - Zuolin Jin
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Orthodontics, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, China
| | - Guangtao Zhai
- Institute of Image Communication and Network Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200011, China
| |
Collapse
|
8
|
Lyakhova UA, Lyakhov PA. Systematic review of approaches to detection and classification of skin cancer using artificial intelligence: Development and prospects. Comput Biol Med 2024; 178:108742. [PMID: 38875908 DOI: 10.1016/j.compbiomed.2024.108742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 06/03/2024] [Accepted: 06/08/2024] [Indexed: 06/16/2024]
Abstract
In recent years, there has been a significant improvement in the accuracy of the classification of pigmented skin lesions using artificial intelligence algorithms. Intelligent analysis and classification systems are significantly superior to visual diagnostic methods used by dermatologists and oncologists. However, the application of such systems in clinical practice is severely limited due to a lack of generalizability and risks of potential misclassification. Successful implementation of artificial intelligence-based tools into clinicopathological practice requires a comprehensive study of the effectiveness and performance of existing models, as well as further promising areas for potential research development. The purpose of this systematic review is to investigate and evaluate the accuracy of artificial intelligence technologies for detecting malignant forms of pigmented skin lesions. For the study, 10,589 scientific research and review articles were selected from electronic scientific publishers, of which 171 articles were included in the presented systematic review. All selected scientific articles are distributed according to the proposed neural network algorithms from machine learning to multimodal intelligent architectures and are described in the corresponding sections of the manuscript. This research aims to explore automated skin cancer recognition systems, from simple machine learning algorithms to multimodal ensemble systems based on advanced encoder-decoder models, visual transformers (ViT), and generative and spiking neural networks. In addition, as a result of the analysis, future directions of research, prospects, and potential for further development of automated neural network systems for classifying pigmented skin lesions are discussed.
Collapse
Affiliation(s)
- U A Lyakhova
- Department of Mathematical Modeling, North-Caucasus Federal University, 355017, Stavropol, Russia.
| | - P A Lyakhov
- Department of Mathematical Modeling, North-Caucasus Federal University, 355017, Stavropol, Russia; North-Caucasus Center for Mathematical Research, North-Caucasus Federal University, 355017, Stavropol, Russia.
| |
Collapse
|
9
|
Imran M, Akram MU, Salam AA. Transformer-Based Skin Carcinoma Classification using Histopathology Images via Incremental Learning. 2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS (ICPRS) 2024:1-7. [DOI: 10.1109/icprs62101.2024.10677812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Affiliation(s)
- Muhammad Imran
- National University of Sciences and Technology,Dept. of Mechatronics Engr.,Islamabad,Pakistan
| | - Muhammad Usman Akram
- National University of Sciences and Technology,Dept. of Comp. & Software Engr.,Islamabad,Pakistan
| | - Anum Abdul Salam
- National University of Sciences and Technology,Dept. of Comp. & Software Engr.,Islamabad,Pakistan
| |
Collapse
|
10
|
Wang Y, Zhen L, Tan TE, Fu H, Feng Y, Wang Z, Xu X, Goh RSM, Ng Y, Calhoun C, Tan GSW, Sun JK, Liu Y, Ting DSW. Geometric Correspondence-Based Multimodal Learning for Ophthalmic Image Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1945-1957. [PMID: 38206778 DOI: 10.1109/tmi.2024.3352602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in the clinical diagnosis and management of retinal diseases. Despite the widespread use of multimodal imaging in clinical practice, few methods for automated diagnosis of eye diseases utilize correlated and complementary information from multiple modalities effectively. This paper explores how to leverage the information from CFP and OCT images to improve the automated diagnosis of retinal diseases. We propose a novel multimodal learning method, named geometric correspondence-based multimodal learning network (GeCoM-Net), to achieve the fusion of CFP and OCT images. Specifically, inspired by clinical observations, we consider the geometric correspondence between the OCT slice and the CFP region to learn the correlated features of the two modalities for robust fusion. Furthermore, we design a new feature selection strategy to extract discriminative OCT representations by automatically selecting the important feature maps from OCT slices. Unlike the existing multimodal learning methods, GeCoM-Net is the first method that formulates the geometric relationships between the OCT slice and the corresponding region of the CFP image explicitly for CFP and OCT fusion. Experiments have been conducted on a large-scale private dataset and a publicly available dataset to evaluate the effectiveness of GeCoM-Net for diagnosing diabetic macular edema (DME), impaired visual acuity (VA) and glaucoma. The empirical results show that our method outperforms the current state-of-the-art multimodal learning methods by improving the AUROC score 0.4%, 1.9% and 2.9% for DME, VA and glaucoma detection, respectively.
Collapse
|
11
|
Morano J, Aresta G, Grechenig C, Schmidt-Erfurth U, Bogunovic H. Deep Multimodal Fusion of Data With Heterogeneous Dimensionality via Projective Networks. IEEE J Biomed Health Inform 2024; 28:2235-2246. [PMID: 38206782 DOI: 10.1109/jbhi.2024.3352970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
The use of multimodal imaging has led to significant improvements in the diagnosis and treatment of many diseases. Similar to clinical practice, some works have demonstrated the benefits of multimodal fusion for automatic segmentation and classification using deep learning-based methods. However, current segmentation methods are limited to fusion of modalities with the same dimensionality (e.g., 3D + 3D, 2D + 2D), which is not always possible, and the fusion strategies implemented by classification methods are incompatible with localization tasks. In this work, we propose a novel deep learning-based framework for the fusion of multimodal data with heterogeneous dimensionality (e.g., 3D + 2D) that is compatible with localization tasks. The proposed framework extracts the features of the different modalities and projects them into the common feature subspace. The projected features are then fused and further processed to obtain the final prediction. The framework was validated on the following tasks: segmentation of geographic atrophy (GA), a late-stage manifestation of age-related macular degeneration, and segmentation of retinal blood vessels (RBV) in multimodal retinal imaging. Our results show that the proposed method outperforms the state-of-the-art monomodal methods on GA and RBV segmentation by up to 3.10% and 4.64% Dice, respectively.
Collapse
|
12
|
Yousefi S, Najjar-Ghabel S, Danehchin R, Band SS, Hsu CC, Mosavi A. Automatic melanoma detection using discrete cosine transform features and metadata on dermoscopic images. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2024; 36:101944. [DOI: 10.1016/j.jksuci.2024.101944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
13
|
Zhang L, Xiao X, Wen J, Li H. MDKLoss: Medicine domain knowledge loss for skin lesion recognition. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2671-2690. [PMID: 38454701 DOI: 10.3934/mbe.2024118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Methods based on deep learning have shown good advantages in skin lesion recognition. However, the diversity of lesion shapes and the influence of noise disturbances such as hair, bubbles, and markers leads to large intra-class differences and small inter-class similarities, which existing methods have not yet effectively resolved. In addition, most existing methods enhance the performance of skin lesion recognition by improving deep learning models without considering the guidance of medical knowledge of skin lesions. In this paper, we innovatively construct feature associations between different lesions using medical knowledge, and design a medical domain knowledge loss function (MDKLoss) based on these associations. By expanding the gap between samples of various lesion categories, MDKLoss enhances the capacity of deep learning models to differentiate between different lesions and consequently boosts classification performance. Extensive experiments on ISIC2018 and ISIC2019 datasets show that the proposed method achieves a maximum of 91.6% and 87.6% accuracy. Furthermore, compared with existing state-of-the-art loss functions, the proposed method demonstrates its effectiveness, universality, and superiority.
Collapse
Affiliation(s)
- Li Zhang
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou 510515, China
- Department of Dermatology, Guangdong Second Provincial General Hospital, Guangzhou 510317, China
- Department of Dermatology, Ningbo No. 6 Hospital, Ningbo 315040, China
| | - Xiangling Xiao
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China
| | - Ju Wen
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou 510515, China
- Department of Dermatology, Guangdong Second Provincial General Hospital, Guangzhou 510317, China
| | - Huihui Li
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China
| |
Collapse
|
14
|
Zhang D, Li A, Wu W, Yu L, Kang X, Huo X. CR-Conformer: a fusion network for clinical skin lesion classification. Med Biol Eng Comput 2024; 62:85-94. [PMID: 37653185 DOI: 10.1007/s11517-023-02904-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 08/03/2023] [Indexed: 09/02/2023]
Abstract
Deep convolutional neural network (DCNN) models have been widely used to diagnose skin lesions, and some of them have achieved diagnostic results comparable to or even better than dermatologists. Most publicly available skin lesion datasets used to train DCNN were dermoscopic images. Expensive dermoscopic equipment is rarely available in rural clinics or small hospitals in remote areas. Therefore, it is of great significance to rely on clinical images for computer-aided diagnosis of skin lesions. This paper proposes an improved dual-branch fusion network called CR-Conformer. It integrates a DCNN branch that can effectively extract local features and a Transformer branch that can extract global features to capture more valuable features in clinical skin lesion images. In addition, we improved the DCNN branch to extract enhanced features in four directions through the convolutional rotation operation, further improving the classification performance of clinical skin lesion images. To verify the effectiveness of our proposed method, we conducted comprehensive tests on a private dataset named XJUSL, which contains ten types of clinical skin lesions. The test results indicate that our proposed method reduced the number of parameters by 11.17 M and improved the accuracy of clinical skin lesion image classification by 1.08%. It has the potential to realize automatic diagnosis of skin lesions in mobile devices.
Collapse
Affiliation(s)
- Dezhi Zhang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, 830000, China
- Xinjiang Clinical Research Center for Dermatologic Diseases, Urumqi, China
- Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urumqi, China
| | - Aolun Li
- School of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Weidong Wu
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, 830000, China.
- Xinjiang Clinical Research Center for Dermatologic Diseases, Urumqi, China.
- Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urumqi, China.
| | - Long Yu
- School of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Xiaojing Kang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, 830000, China
- Xinjiang Clinical Research Center for Dermatologic Diseases, Urumqi, China
- Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urumqi, China
| | - Xiangzuo Huo
- School of Information Science and Engineering, Xinjiang University, Urumqi, China
| |
Collapse
|
15
|
Guo R, Tian X, Lin H, McKenna S, Li HD, Guo F, Liu J. Graph-Based Fusion of Imaging, Genetic and Clinical Data for Degenerative Disease Diagnosis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:57-68. [PMID: 37991907 DOI: 10.1109/tcbb.2023.3335369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
Graph learning methods have achieved noteworthy performance in disease diagnosis due to their ability to represent unstructured information such as inter-subject relationships. While it has been shown that imaging, genetic and clinical data are crucial for degenerative disease diagnosis, existing methods rarely consider how best to use their relationships. How best to utilize information from imaging, genetic and clinical data remains a challenging problem. This study proposes a novel graph-based fusion (GBF) approach to meet this challenge. To extract effective imaging-genetic features, we propose an imaging-genetic fusion module which uses an attention mechanism to obtain modality-specific and joint representations within and between imaging and genetic data. Then, considering the effectiveness of clinical information for diagnosing degenerative diseases, we propose a multi-graph fusion module to further fuse imaging-genetic and clinical features, which adopts a learnable graph construction strategy and a graph ensemble method. Experimental results on two benchmarks for degenerative disease diagnosis (Alzheimers Disease Neuroimaging Initiative and Parkinson's Progression Markers Initiative) demonstrate its effectiveness compared to state-of-the-art graph-based methods. Our findings should help guide further development of graph-based models for dealing with imaging, genetic and clinical data.
Collapse
|
16
|
Wang Z, Zhang L, Shu X, Wang Y, Feng Y. Consistent representation via contrastive learning for skin lesion diagnosis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 242:107826. [PMID: 37837885 DOI: 10.1016/j.cmpb.2023.107826] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/16/2023]
Abstract
BACKGROUND Skin lesions are a prevalent ailment, with melanoma emerging as a particularly perilous variant. Encouragingly, artificial intelligence displays promising potential in early detection, yet its integration within clinical contexts, particularly involving multi-modal data, presents challenges. While multi-modal approaches enhance diagnostic efficacy, the influence of modal bias is often disregarded. METHODS In this investigation, a multi-modal feature learning technique termed "Contrast-based Consistent Representation Disentanglement" for dermatological diagnosis is introduced. This approach employs adversarial domain adaptation to disentangle features from distinct modalities, fostering a shared representation. Furthermore, a contrastive learning strategy is devised to incentivize the model to preserve uniformity in common lesion attributes across modalities. Emphasizing the learning of a uniform representation among models, this approach circumvents reliance on supplementary data. RESULTS Assessment of the proposed technique on a 7-point criteria evaluation dataset yields an average accuracy of 76.1% for multi-classification tasks, surpassing researched state-of-the-art methods. The approach tackles modal bias, enabling the acquisition of a consistent representation of common lesion appearances across diverse modalities, which transcends modality boundaries. This study underscores the latent potential of multi-modal feature learning in dermatological diagnosis. CONCLUSION In summation, a multi-modal feature learning strategy is posited for dermatological diagnosis. This approach outperforms other state-of-the-art methods, underscoring its capacity to enhance diagnostic precision for skin lesions.
Collapse
Affiliation(s)
- Zizhou Wang
- College of Computer Science, Sichuan University, Chengdu 610065, China; Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Lei Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China.
| | - Xin Shu
- College of Computer Science, Sichuan University, Chengdu 610065, China.
| | - Yan Wang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Yangqin Feng
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| |
Collapse
|
17
|
You H, Wang J, Ma R, Chen Y, Li L, Song C, Dong Z, Feng S, Zhou X. Clinical Interpretability of Deep Learning for Predicting Microvascular Invasion in Hepatocellular Carcinoma by Using Attention Mechanism. Bioengineering (Basel) 2023; 10:948. [PMID: 37627833 PMCID: PMC10451856 DOI: 10.3390/bioengineering10080948] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/26/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Preoperative prediction of microvascular invasion (MVI) is essential for management decision in hepatocellular carcinoma (HCC). Deep learning-based prediction models of MVI are numerous but lack clinical interpretation due to their "black-box" nature. Consequently, we aimed to use an attention-guided feature fusion network, including intra- and inter-attention modules, to solve this problem. This retrospective study recruited 210 HCC patients who underwent gadoxetate-enhanced MRI examination before surgery. The MRIs on pre-contrast, arterial, portal, and hepatobiliary phases (hepatobiliary phase: HBP) were used to develop single-phase and multi-phase models. Attention weights provided by attention modules were used to obtain visual explanations of predictive decisions. The four-phase fusion model achieved the highest area under the curve (AUC) of 0.92 (95% CI: 0.84-1.00), and the other models proposed AUCs of 0.75-0.91. Attention heatmaps of collaborative-attention layers revealed that tumor margins in all phases and peritumoral areas in the arterial phase and HBP were salient regions for MVI prediction. Heatmaps of weights in fully connected layers showed that the HBP contributed the most to MVI prediction. Our study firstly implemented self-attention and collaborative-attention to reveal the relationship between deep features and MVI, improving the clinical interpretation of prediction models. The clinical interpretability offers radiologists and clinicians more confidence to apply deep learning models in clinical practice, helping HCC patients formulate personalized therapies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Shiting Feng
- Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, 58th the Second Zhongshan Road, Guangzhou 510080, China; (H.Y.); (J.W.); (R.M.); (Y.C.); (L.L.); (C.S.); (Z.D.)
| | - Xiaoqi Zhou
- Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, 58th the Second Zhongshan Road, Guangzhou 510080, China; (H.Y.); (J.W.); (R.M.); (Y.C.); (L.L.); (C.S.); (Z.D.)
| |
Collapse
|
18
|
Feng Y, Sim Zheng Ting J, Xu X, Bee Kun C, Ong Tien En E, Irawan Tan Wee Jun H, Ting Y, Lei X, Chen WX, Wang Y, Li S, Cui Y, Wang Z, Zhen L, Liu Y, Siow Mong Goh R, Tan CH. Deep Neural Network Augments Performance of Junior Residents in Diagnosing COVID-19 Pneumonia on Chest Radiographs. Diagnostics (Basel) 2023; 13:diagnostics13081397. [PMID: 37189498 DOI: 10.3390/diagnostics13081397] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 05/17/2023] Open
Abstract
Chest X-rays (CXRs) are essential in the preliminary radiographic assessment of patients affected by COVID-19. Junior residents, as the first point-of-contact in the diagnostic process, are expected to interpret these CXRs accurately. We aimed to assess the effectiveness of a deep neural network in distinguishing COVID-19 from other types of pneumonia, and to determine its potential contribution to improving the diagnostic precision of less experienced residents. A total of 5051 CXRs were utilized to develop and assess an artificial intelligence (AI) model capable of performing three-class classification, namely non-pneumonia, non-COVID-19 pneumonia, and COVID-19 pneumonia. Additionally, an external dataset comprising 500 distinct CXRs was examined by three junior residents with differing levels of training. The CXRs were evaluated both with and without AI assistance. The AI model demonstrated impressive performance, with an Area under the ROC Curve (AUC) of 0.9518 on the internal test set and 0.8594 on the external test set, which improves the AUC score of the current state-of-the-art algorithms by 1.25% and 4.26%, respectively. When assisted by the AI model, the performance of the junior residents improved in a manner that was inversely proportional to their level of training. Among the three junior residents, two showed significant improvement with the assistance of AI. This research highlights the novel development of an AI model for three-class CXR classification and its potential to augment junior residents' diagnostic accuracy, with validation on external data to demonstrate real-world applicability. In practical use, the AI model effectively supported junior residents in interpreting CXRs, boosting their confidence in diagnosis. While the AI model improved junior residents' performance, a decline in performance was observed on the external test compared to the internal test set. This suggests a domain shift between the patient dataset and the external dataset, highlighting the need for future research on test-time training domain adaptation to address this issue.
Collapse
Affiliation(s)
- Yangqin Feng
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Jordan Sim Zheng Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Xinxing Xu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Chew Bee Kun
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Edward Ong Tien En
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Hendra Irawan Tan Wee Jun
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Yonghan Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Xiaofeng Lei
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Wen-Xiang Chen
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Yan Wang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Shaohua Li
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Yingnan Cui
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Zizhou Wang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Liangli Zhen
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Yong Liu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Rick Siow Mong Goh
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Cher Heng Tan
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
- Lee Kong Chian School of Medicine, 11, Mandalay Road, Singapore 308232, Singapore
| |
Collapse
|
19
|
Iqbal S, N. Qureshi A, Li J, Mahmood T. On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural Networks. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023; 30:3173-3233. [PMID: 37260910 PMCID: PMC10071480 DOI: 10.1007/s11831-023-09899-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/19/2023] [Indexed: 06/02/2023]
Abstract
Convolutional neural network (CNN) has shown dissuasive accomplishment on different areas especially Object Detection, Segmentation, Reconstruction (2D and 3D), Information Retrieval, Medical Image Registration, Multi-lingual translation, Local language Processing, Anomaly Detection on video and Speech Recognition. CNN is a special type of Neural Network, which has compelling and effective learning ability to learn features at several steps during augmentation of the data. Recently, different interesting and inspiring ideas of Deep Learning (DL) such as different activation functions, hyperparameter optimization, regularization, momentum and loss functions has improved the performance, operation and execution of CNN Different internal architecture innovation of CNN and different representational style of CNN has significantly improved the performance. This survey focuses on internal taxonomy of deep learning, different models of vonvolutional neural network, especially depth and width of models and in addition CNN components, applications and current challenges of deep learning.
Collapse
Affiliation(s)
- Saeed Iqbal
- Department of Computer Science, Faculty of Information Technology & Computer Science, University of Central Punjab, Lahore, Punjab 54000 Pakistan
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124 Beijing China
| | - Adnan N. Qureshi
- Department of Computer Science, Faculty of Information Technology & Computer Science, University of Central Punjab, Lahore, Punjab 54000 Pakistan
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124 Beijing China
- Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, 100124 Beijing China
| | - Tariq Mahmood
- Artificial Intelligence and Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, 11586 Kingdom of Saudi Arabia
| |
Collapse
|
20
|
Hasan MK, Ahamad MA, Yap CH, Yang G. A survey, review, and future trends of skin lesion segmentation and classification. Comput Biol Med 2023; 155:106624. [PMID: 36774890 DOI: 10.1016/j.compbiomed.2023.106624] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 01/04/2023] [Accepted: 01/28/2023] [Indexed: 02/03/2023]
Abstract
The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or associated with manual inspection. This article aims to provide a comprehensive literature survey and review of a total of 594 publications (356 for skin lesion segmentation and 238 for skin lesion classification) published between 2011 and 2022. These articles are analyzed and summarized in a number of different ways to contribute vital information regarding the methods for the development of CAD systems. These ways include: relevant and essential definitions and theories, input data (dataset utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria. We intend to investigate a variety of performance-enhancing approaches, including ensemble and post-processing. We also discuss these dimensions to reveal their current trends based on utilization frequencies. In addition, we highlight the primary difficulties associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these difficulties. Findings, recommendations, and trends are disclosed to inform future research on developing an automated and robust CAD system for skin lesion analysis.
Collapse
Affiliation(s)
- Md Kamrul Hasan
- Department of Bioengineering, Imperial College London, UK; Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
| | - Md Asif Ahamad
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
| | - Choon Hwai Yap
- Department of Bioengineering, Imperial College London, UK.
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, UK; Cardiovascular Research Centre, Royal Brompton Hospital, UK.
| |
Collapse
|
21
|
Li D, Zhang L, Zhang J, Xie X. Convolutional Feature Descriptor Selection for Mammogram Classification. IEEE J Biomed Health Inform 2023; 27:1467-1476. [PMID: 37018253 DOI: 10.1109/jbhi.2022.3233535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Breast cancer was the most commonly diagnosed cancer among women worldwide in 2020. Recently, several deep learning-based classification approaches have been proposed to screen breast cancer in mammograms. However, most of these approaches require additional detection or segmentation annotations. Meanwhile, some other image-level label-based methods often pay insufficient attention to lesion areas, which are critical for diagnosis. This study designs a novel deep-learning method for automatically diagnosing breast cancer in mammography, which focuses on the local lesion areas and only utilizes image-level classification labels. In this study, we propose to select discriminative feature descriptors from feature maps instead of identifying lesion areas using precise annotations. And we design a novel adaptive convolutional feature descriptor selection (AFDS) structure based on the distribution of the deep activation map. Specifically, we adopt the triangle threshold strategy to calculate a specific threshold for guiding the activation map to determine which feature descriptors (local areas) are discriminative. Ablation experiments and visualization analysis indicate that the AFDS structure makes the model easier to learn the difference between malignant and benign/normal lesions. Furthermore, since the AFDS structure can be regarded as a highly efficient pooling structure, it can be easily plugged into most existing convolutional neural networks with negligible effort and time consumption. Experimental results on two publicly available INbreast and CBIS-DDSM datasets indicate that the proposed method performs satisfactorily compared with state-of-the-art methods.
Collapse
|
22
|
Integrated Design of Optimized Weighted Deep Feature Fusion Strategies for Skin Lesion Image Classification. Cancers (Basel) 2022; 14:cancers14225716. [PMID: 36428808 PMCID: PMC9688253 DOI: 10.3390/cancers14225716] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/11/2022] [Accepted: 11/11/2022] [Indexed: 11/23/2022] Open
Abstract
This study mainly focuses on pre-processing the HAM10000 and BCN20000 skin lesion datasets to select important features that will drive for proper skin cancer classification. In this work, three feature fusion strategies have been proposed by utilizing three pre-trained Convolutional Neural Network (CNN) models, namely VGG16, EfficientNet B0, and ResNet50 to select the important features based on the weights of the features and are coined as Adaptive Weighted Feature Set (AWFS). Then, two other strategies, Model-based Optimized Weighted Feature Set (MOWFS) and Feature-based Optimized Weighted Feature Set (FOWFS), are proposed by optimally and adaptively choosing the weights using a meta-heuristic artificial jellyfish (AJS) algorithm. The MOWFS-AJS is a model-specific approach whereas the FOWFS-AJS is a feature-specific approach for optimizing the weights chosen for obtaining optimal feature sets. The performances of those three proposed feature selection strategies are evaluated using Decision Tree (DT), Naïve Bayesian (NB), Multi-Layer Perceptron (MLP), and Support Vector Machine (SVM) classifiers and the performance are measured through accuracy, precision, sensitivity, and F1-score. Additionally, the area under the receiver operating characteristics curves (AUC-ROC) is plotted and it is observed that FOWFS-AJS shows the best accuracy performance based on the SVM with 94.05% and 94.90%, respectively, for HAM 10000 and BCN 20000 datasets. Finally, the experimental results are also analyzed using a non-parametric Friedman statistical test and the computational times are recorded; the results show that, out of those three proposed feature selection strategies, the FOWFS-AJS performs very well because its quick converging nature is inculcated with the help of AJS.
Collapse
|