1
|
Yang X, Peng P, Li D, Ye Y, Lu X. Adaptive decoupling-fusion in Siamese network for image classification. Neural Netw 2025; 187:107346. [PMID: 40101559 DOI: 10.1016/j.neunet.2025.107346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 10/27/2024] [Accepted: 03/03/2025] [Indexed: 03/20/2025]
Abstract
Convolutional neural networks (CNNs) are highly regarded for their ability to extract semantic information from visual inputs. However, this capability often leads to the inadvertent loss of important visual details. In this paper, we introduce an Adaptive Decoupling Fusion (ADF) designed to preserve these valuable visual details and integrate seamlessly with existing hierarchical models. Our approach emphasizes retaining and leveraging appearance information from the network's shallow layers to enhance semantic understanding. We first decouple the appearance information from one branch of a Siamese Network and embed it into the deep feature space of the other branch. This facilitates a synergistic interaction: one branch supplies appearance information that benefits semantic understanding, while the other integrates this information into the semantic space. Traditional Siamese Networks typically use shared weights, which constrains the diversity of features that can be learned. To address this, we propose a differentiated collaborative learning where both branches receive the same input but are trained with cross-entropy loss, allowing them to have distinct weights. This enhances the network's adaptability to specific tasks. To further optimize the decoupling and fusion, we introduce a Mapper module featuring depthwise separable convolution and a gated fusion mechanism. This module regulates the information flow between branches, balancing appearance and semantic information. Under fully self-supervised conditions, utilizing only minimal data augmentation, we achieve a top-1 accuracy of 81.11% on the ImageNet-1k dataset using ADF-ResNeXt-101.
Collapse
Affiliation(s)
- Xi Yang
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| | - Pai Peng
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| | - Danyang Li
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| | - Yinghao Ye
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| | - Xiaohuan Lu
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| |
Collapse
|
2
|
Huang Y, Chang A, Dou H, Tao X, Zhou X, Cao Y, Huang R, Frangi AF, Bao L, Yang X, Ni D. Flip Learning: Weakly supervised erase to segment nodules in breast ultrasound. Med Image Anal 2025; 102:103552. [PMID: 40179628 DOI: 10.1016/j.media.2025.103552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 12/01/2024] [Accepted: 03/11/2025] [Indexed: 04/05/2025]
Abstract
Accurate segmentation of nodules in both 2D breast ultrasound (BUS) and 3D automated breast ultrasound (ABUS) is crucial for clinical diagnosis and treatment planning. Therefore, developing an automated system for nodule segmentation can enhance user independence and expedite clinical analysis. Unlike fully-supervised learning, weakly-supervised segmentation (WSS) can streamline the laborious and intricate annotation process. However, current WSS methods face challenges in achieving precise nodule segmentation, as many of them depend on inaccurate activation maps or inefficient pseudo-mask generation algorithms. In this study, we introduce a novel multi-agent reinforcement learning-based WSS framework called Flip Learning, which relies solely on 2D/3D boxes for accurate segmentation. Specifically, multiple agents are employed to erase the target from the box to facilitate classification tag flipping, with the erased region serving as the predicted segmentation mask. The key contributions of this research are as follows: (1) Adoption of a superpixel/supervoxel-based approach to encode the standardized environment, capturing boundary priors and expediting the learning process. (2) Introduction of three meticulously designed rewards, comprising a classification score reward and two intensity distribution rewards, to steer the agents' erasing process precisely, thereby avoiding both under- and over-segmentation. (3) Implementation of a progressive curriculum learning strategy to enable agents to interact with the environment in a progressively challenging manner, thereby enhancing learning efficiency. Extensively validated on the large in-house BUS and ABUS datasets, our Flip Learning method outperforms state-of-the-art WSS methods and foundation models, and achieves comparable performance as fully-supervised learning algorithms.
Collapse
Affiliation(s)
- Yuhao Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Ao Chang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Haoran Dou
- Centre for Computational Imaging and Simulation Technologies in Biomedicine (CISTIB), University of Leeds, Leeds, UK; Department of Computer Science, School of Engineering, University of Manchester, Manchester, UK
| | - Xing Tao
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Xinrui Zhou
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Yan Cao
- Shenzhen RayShape Medical Technology Co., Ltd, Shenzhen, China
| | - Ruobing Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Alejandro F Frangi
- Division of Informatics, Imaging and Data Science, School of Health Sciences, University of Manchester, Manchester, UK; Department of Computer Science, School of Engineering, University of Manchester, Manchester, UK; Medical Imaging Research Center (MIRC), Department of Electrical Engineering, Department of Cardiovascular Sciences, KU Leuven, Belgium; Alan Turing Institute, London, UK; NIHR Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, Manchester, UK
| | - Lingyun Bao
- Department of Ultrasound, Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, China.
| | - Xin Yang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China.
| | - Dong Ni
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China; School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
3
|
Ying Z, Li Q, Lian Z, Hou J, Lin T, Wang T. Understanding Convolutional Neural Networks From Excitations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8227-8239. [PMID: 39088494 DOI: 10.1109/tnnls.2024.3430978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Saliency maps have proven to be a highly efficacious approach for explicating the decisions of convolutional neural networks (CNNs). However, extant methodologies predominantly rely on gradients, which constrain their ability to explicate complex models. Furthermore, such approaches are not fully adept at leveraging negative gradient information to improve interpretive veracity. In this study, we present a novel concept, termed positive and negative excitation (PANE), which enables the direct extraction of PANE for each layer, thus enabling complete layer-by-layer information utilization sans gradients. To organize these excitations into final saliency maps, we introduce a double-chain backpropagation procedure. A comprehensive experimental evaluation, encompassing both binary classification and multiclassification tasks, was conducted to gauge the effectiveness of our proposed method. Encouragingly, the results evince that our approach offers a significant improvement over the state-of-the-art methods in terms of salient pixel removal, minor pixel removal, and inconspicuous adversarial perturbation generation guidance. In addition, we verify the correlation between PANEs.
Collapse
|
4
|
He J, Wang X, Wang Z, Xie R, Zhang Z, Liu TM, Cai Y, Chen L. Interpretable deep learning method to predict wound healing progress based on collagen fibers in wound tissue. Comput Biol Med 2025; 191:110110. [PMID: 40198981 DOI: 10.1016/j.compbiomed.2025.110110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 03/23/2025] [Accepted: 03/27/2025] [Indexed: 04/10/2025]
Abstract
BACKGROUND AND OBJECTIVE The dynamic evolution of collagen fibers during wound healing is crucial for assessing repair progression, guiding clinical treatment, and drug screening. Current quantitative methods analyzing collagen spatial patterns (density, orientation variance) lack established criteria to both stratify distinct healing periods and detect delayed healing conditions, necessitating the establishment of a novel classification method for wound healing status based on collagen fibers. METHODS We propose a deep learning method to classify various time points of wound healing and delayed healing using histological images of skin tissue. We fine-tune a pre-trained VGG16 model and enhance it with an interpretable framework that combines LayerCAM and Guided Backpropagation, leveraging model gradients and features to visually identify the tissue regions driving model predictions. RESULTS Our model achieved 85 % accuracy in a five-class classification task (normal skin, wound skin at 0, 3, 7, and 10 days) and 78 % in a three-class task (normal skin, wound skin at 0 days, diabetic wound skin at 10 days). Our interpretable framework accurately localizes collagen fibers without pixel-level annotations, demonstrating that our model classifies healing periods and delayed healing based on collagen regions in histological images rather than other less relevant tissue structures. CONCLUSIONS Our deep learning method leverages collagen fiber features to predict various time points of wound healing and delayed healing with high accuracy and visual interpretability, enhancing doctors' trust in model decisions. This could lead to more precise and effective wound treatment practices.
Collapse
Affiliation(s)
- Juan He
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, 999078, Macau; Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaoyan Wang
- Institute of Translational Medicine, Faculty of Health Sciences & Ministry of Education Frontiers Science Center for Precision Oncology, University of Macau, 999078, Macau
| | - Zhengshan Wang
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, 999078, Macau
| | - Ruitao Xie
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Zhiming Zhang
- Institute of Translational Medicine, Faculty of Health Sciences & Ministry of Education Frontiers Science Center for Precision Oncology, University of Macau, 999078, Macau
| | - Tzu-Ming Liu
- Institute of Translational Medicine, Faculty of Health Sciences & Ministry of Education Frontiers Science Center for Precision Oncology, University of Macau, 999078, Macau
| | - Yunpeng Cai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Long Chen
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, 999078, Macau
| |
Collapse
|
5
|
Yang W, Wang X, Qi W, Wang W. LGFormer: integrating local and global representations for EEG decoding. J Neural Eng 2025; 22:026042. [PMID: 40138736 DOI: 10.1088/1741-2552/adc5a3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Accepted: 03/26/2025] [Indexed: 03/29/2025]
Abstract
Objective.Electroencephalography (EEG) decoding is challenging because of its temporal variability and low signal-to-noise ratio, which complicate the extraction of meaningful information from signals. Although convolutional neural networks (CNNs) effectively extract local features from EEG signals, they are constrained by restricted receptive fields. In contrast, transformers excel at capturing global dependencies through self-attention mechanisms but often require extensive training data and computational resources, which limits their efficiency on EEG datasets with limited samples.Approach.In this paper, we propose LGFormer, a hybrid network designed to efficiently learn both local and global representations for EEG decoding. LGFormer employs a deep attention module to extract global information from EEG signals, dynamically adjusting the focus of CNNs. Subsequently, LGFormer incorporates a local-enhanced transformer, combining the strengths of CNNs and transformers to achieve multiscale perception from local to global. Despite integrating multiple advanced techniques, LGFormer maintains a lightweight design and training efficiency.Main results.LGFormer achieves state-of-the-art performance within 200 training epochs across four public datasets, including motor imagery, cognitive workload, and error-related negativity decoding tasks. Additionally, we propose a novel spatial and temporal attention visualization method, revealing that LGFormer captures discriminative spatial and temporal features, enhancing model interpretability and providing insights into its decision-making process.Significance.In summary, LGFormer demonstrates superior performance while maintaining high training efficiency across different tasks, highlighting its potential as a versatile and practical model for EEG decoding.
Collapse
Affiliation(s)
- Wenjie Yang
- CAS Key Laboratory of Space Manufacturing Technology, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Xingfu Wang
- CAS Key Laboratory of Space Manufacturing Technology, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Wenxia Qi
- CAS Key Laboratory of Space Manufacturing Technology, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Wei Wang
- CAS Key Laboratory of Space Manufacturing Technology, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
6
|
Chen Z, Wang S, Cao L, Shen Y, Ji R. Adaptive Zone Learning for Weakly Supervised Object Localization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7211-7224. [PMID: 38833389 DOI: 10.1109/tnnls.2024.3392948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Weakly supervised object localization (WSOL) stands as a pivotal endeavor within the realm of computer vision, entailing the location of objects utilizing merely image-level labels. Contemporary approaches in WSOL have leveraged FPMs, yielding commendable outcomes. However, these existing FPM-based techniques are predominantly confined to rudimentary strategies of either augmenting the foreground or diminishing the background presence. We argue for the exploration and exploitation of the intricate interplay between the object's foreground and its background to achieve efficient object localization. In this manuscript, we introduce an innovative framework, termed adaptive zone learning (AZL), which operates on a coarse-to-fine basis to refine FPMs through a triad of adaptive zone mechanisms. First, an adversarial learning mechanism (ALM) is employed, orchestrating an interplay between the foreground and background regions. This mechanism accentuates coarse-grained object regions in a mutually adversarial manner. Subsequently, an oriented learning mechanism (OLM) is unveiled, which harnesses local insights from both foreground and background in a fine-grained manner. This mechanism is instrumental in delineating object regions with greater granularity, thereby generating better FPMs. Furthermore, we propose a reinforced learning mechanism (RLM) as the compensatory mechanism for adversarial design, by which the undesirable foreground maps are refined again. Extensive experiments on CUB-200-2011 and ILSVRC datasets demonstrate that AZL achieves significant and consistent performance improvements over other state-of-the-art WSOL methods.
Collapse
|
7
|
Zhang L, Yin K, Lee SW. Semantic prioritization in visual counterfactual explanations with weighted segmentation and auto-adaptive region selection. Neural Netw 2025; 184:107097. [PMID: 39765041 DOI: 10.1016/j.neunet.2024.107097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 10/09/2024] [Accepted: 12/23/2024] [Indexed: 02/07/2025]
Abstract
In the domain of non-generative visual counterfactual explanations (CE), traditional techniques frequently involve the substitution of sections within a query image with corresponding sections from distractor images. Such methods have historically overlooked the semantic relevance of the replacement regions to the target object, thereby impairing the model's interpretability and hindering the editing workflow. Addressing these challenges, the present study introduces an innovative methodology named as Weighted Semantic Map with Auto-adaptive Candidate Editing Network (WSAE-Net). Characterized by two significant advancements: the determination of an weighted semantic map and the auto-adaptive candidate editing sequence. First, the generation of the weighted semantic map is designed to maximize the reduction of non-semantic feature units that need to be computed, thereby optimizing computational efficiency. Second, the auto-adaptive candidate editing sequences are designed to determine the optimal computational order among the feature units to be processed, thereby ensuring the efficient generation of counterfactuals while maintaining the semantic relevance of the replacement feature units to the target object. Through comprehensive experimentation, our methodology demonstrates superior performance, contributing to a more lucid and in-depth understanding of visual counterfactual explanations.
Collapse
Affiliation(s)
- Lintong Zhang
- Department of Artificial Intelligence, Korea University, 02841, Seoul, Republic of Korea.
| | - Kang Yin
- Department of Artificial Intelligence, Korea University, 02841, Seoul, Republic of Korea.
| | - Seong-Whan Lee
- Department of Artificial Intelligence, Korea University, 02841, Seoul, Republic of Korea.
| |
Collapse
|
8
|
Xu Z, Liu H, Fu G, Zheng R, Zayed T, Liu S. Interpretable deep learning for acoustic leak detection in water distribution systems. WATER RESEARCH 2025; 273:123076. [PMID: 39756226 DOI: 10.1016/j.watres.2024.123076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/06/2024] [Accepted: 12/30/2024] [Indexed: 01/07/2025]
Abstract
Leak detection is crucial for ensuring the safety of water systems and conserving water resources. However, current research on machine learning methods for leak detection focuses excessively on model development while neglecting model interpretability, which leads to transparency and credibility issues in practical applications. This study proposes the multi-channel convolution neural network (MCNN) model and compares the performance of the MCNN model with the existing benchmark algorithm (i.e., frequency convolutional neural network (FCNN)) using both experimental and real field data. Additionally, Multi-channel Gradient-weighted Class Activation Mapping (MGrad-CAM) was introduced to visualize the decision-making criterion of the model and identify critical signatures of acoustic signals. The study also employed clustering methods to analyze the impact mechanisms of various factors (i.e., pressure, leak flow rate, and distance) on acoustic signals from a machine learning perspective. Results show that the MCNN method outperformed the FCNN across laboratory and real-world datasets, achieving a high accuracy rate of 95.4 % in real-field scenarios. Using the MGrad-CAM, the interpretability of the DL model was analyzed, successfully identifying and visualizing the critical signatures of leak acoustic signals with more precise and fine-grained details. Additionally, this study clusters leak signals into two patterns and confirms that the bandwidth of the leak acoustic signal increases with higher pressure, closer proximity to the leak, and higher leak flow rates. It has also been discovered that the high-frequency components of the signal assist the model in more accurately detecting leaks. This study provides a new perspective for understanding the decision-making criterion of the leak detection model and the mechanism of the leak acoustic signal generation.
Collapse
Affiliation(s)
- Ziyang Xu
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, PR China
| | - Haixing Liu
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, PR China.
| | - Guangtao Fu
- Centre for Water Systems, University of Exeter, Exeter EX4 4QF, UK
| | - Run Zheng
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, PR China
| | - Tarek Zayed
- Department of Building and Real Estate, The Hong Kong Polytechnic University, Hong Kong SAR, PR China
| | - Shuming Liu
- School of Environment, Tsinghua University, 100084, Beijing, PR China
| |
Collapse
|
9
|
Wei F, Jiao Y, Huangfu Z, Shi G, Wang N, Dong H. Weakly-supervised segmentation with ensemble explainable AI: A comprehensive evaluation on crack detection. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2025; 96:045106. [PMID: 40249259 DOI: 10.1063/5.0249805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/26/2025] [Indexed: 04/19/2025]
Abstract
Surface cracks are crucial for structural health monitoring of various types of buildings. Despite substantial advancements in crack detection through deep neural networks, their reliance on pixel-level crack annotation escalates labeling costs and renders the labeling procedure time-intensive. Consequently, academics have suggested multiple Explainable Artificial Intelligence (XAI) methodologies to enhance the efficacy of pseudo-labeling. However, fractures' slender, continuous, and inconspicuous characteristics render current XAI approaches ineffective in adequately gathering feature information. This work examines the characteristics of many XAI strategies through extensive experimentation. It synthesizes the advantages of each strategy to mitigate the uncertainty error associated with a singular model in the fracture region. Moreover, we formulate and implement various integration strategies to mitigate and enhance the discrepancies across distinct XAI algorithms across two separate datasets. The experimental results indicate that the proposed method provides more accurate basic annotations for weakly supervised crack segmentation.
Collapse
Affiliation(s)
- Fupeng Wei
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
| | - Yibo Jiao
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
| | - Zhongmin Huangfu
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
| | - Ge Shi
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
| | - Nan Wang
- School of Information Science and Technology, Hainan Normal University, Haikou 571158, China
- Hainan Engineering Research Center for Extended Reality and Digital Intelligent Education, Hainan Normal University, Haikou 571158, China
| | - Hangcheng Dong
- School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
10
|
Yang Y, Long X, Dai J, Liu X, Zheng D, Cao J, Hu Y. Interpretable Identification of Single-Molecule Charge Transport via Fusion Attention-Based Deep Learning. J Phys Chem Lett 2025; 16:3165-3176. [PMID: 40111072 DOI: 10.1021/acs.jpclett.4c03650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Interpretability is fundamental in the precise identification of single-molecule charge transport, and its absence in deep learning models is currently the major barrier to the usage of such powerful algorithms in the field. Here, we have pioneered a novel identification method employing fusion attention-based deep learning technologies. Central to our approach is the innovative neural network architecture, SingleFACNN, which integrates convolutional neural networks with a fusion of multihead self-attention and spatial attention mechanisms. Our findings demonstrate that SingleFACNN accurately classifies the three-type and four-type STM-BJ data sets, leveraging the convolutional layers' robust feature extraction and the attention layers' capacity to capture long-range interactions. Through comprehensive gradient-weighted class activation mapping and ablation studies, we identified and analyzed the critical features impacting classification outcomes with remarkable accuracy, thus enhancing the interpretability of our deep learning model. Furthermore, SingleFACNN's application was extended to mixed samples with varying proportions, achieving commendable prediction performance at low computational cost. Our study underscores the potential of SingleFACNN in advancing the interpretability and credibility of deep learning applications in single-molecule charge transport, opening new avenues for single-molecule detection in complex systems.
Collapse
Affiliation(s)
- Yanyi Yang
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Xia Long
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Jiaqing Dai
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Xiaochi Liu
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Duokai Zheng
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Juexian Cao
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| | - Yong Hu
- Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan 411105, PR China
- Hunan Provincial Key Laboratory of Smart Carbon Materials and Advanced Sensing, Xiangtan University, Xiangtan 411105, PR China
| |
Collapse
|
11
|
Wang M, Gu Y, Yang L, Zhang B, Wang J, Lu X, Li J, Liu X, Zhao Y, Yu D, Tang S, He Q. A novel high-precision bilevel optimization method for 3D pulmonary nodule classification. Phys Med 2025; 133:104954. [PMID: 40117722 DOI: 10.1016/j.ejmp.2025.104954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 02/27/2025] [Accepted: 03/10/2025] [Indexed: 03/23/2025] Open
Abstract
BACKGROUND AND OBJECTIVE Classification of pulmonary nodules is important for the early diagnosis of lung cancer; however, the manual design of classification models requires substantial expert effort. To automate the model design process, we propose a neural architecture search with high-precision bilevel optimization (NAS-HBO) that directly searches for the optimal network on three-dimensional (3D) images. METHODS We propose a novel high-precision bilevel optimization method (HBOM) to search for an optimal 3D pulmonary nodule classification model. We employed memory optimization techniques with a partially decoupled operation-weighting method to reduce the memory overhead while maintaining path selection stability. Additionally, we introduce a novel maintaining receptive field criterion (MRFC) within the NAS-HBO framework. MRFC narrows the search space by selecting and expanding the 3D Mobile Inverted Residual Bottleneck Block (3D-MBconv) operation based on previous receptive fields, thereby enhancing the scalability and practical application capabilities of NAS-HBO in terms of model complexity and performance. RESULTS In this study, 888 CT images, including 554 benign and 450 malignant nodules, were obtained from the LIDC-IDRI dataset. The results showed that NAS-HBO achieved an impressive accuracy of 91.51 % after less than 6 h of searching, utilizing a mere 12.79 M parameters. CONCLUSION The proposed NAS-HBO method effectively automates the design of 3D pulmonary nodule classification models, achieving impressive accuracy with efficient parameters. By incorporating the HBOM and MRFC techniques, we demonstrated enhanced accuracy and scalability in model optimization for early lung cancer diagnosis. The related codes and results have been released at https://github.com/GuYuIMUST/NAS-HBO.
Collapse
Affiliation(s)
- Mansheng Wang
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Yu Gu
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China.
| | - Lidong Yang
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Baohua Zhang
- School of Automation and Electrical Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Jing Wang
- School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Xiaoqi Lu
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China; College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Jianjun Li
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Xin Liu
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Ying Zhao
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Dahua Yu
- School of Automation and Electrical Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Siyuan Tang
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China; School of Computer Science and Technology, Baotou Medical College, Inner Mongolia University of Science and Technology, Baotou 014040, China
| | - Qun He
- Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
| |
Collapse
|
12
|
Huangfu Z, Jiao Y, Wei F, Shi G, Dong H. A unified approach for weakly supervised crack detection via affine transformation and pseudo label refinement. Sci Rep 2025; 15:8673. [PMID: 40082505 PMCID: PMC11906806 DOI: 10.1038/s41598-025-93196-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Accepted: 03/05/2025] [Indexed: 03/16/2025] Open
Abstract
Consistent detection of cracks in engineering structures is essential for maintaining structural integrity. Deep neural networks perform well in this discipline, although their pixel-level labeling reliance increases labeling costs. Thus, weakly supervised learning methods have emerged. However, their labels are substantially worse quality than those of manual labeling. Current deep neural network visual interpretation approaches have issues including erroneous target localization. This study proposes an Affine Transformation and Pseudo Label Refinement (AT-CAM) method. The methodology comprises three phases: the initial phase employs a geometric enhancement strategy to produce a sequence of enhanced images from the input images, utilizing the Axiom-based Grad-CAM (XGradCAM) algorithm to generate class activation maps for each image, which are subsequently amalgamated into a unified saliency map; in the subsequent phase, the information flow pathways in the subsampling of the convolutional layer are modified by a designated Hook. The information flow in the samples is utilized to invert and eliminate the checkerboard noise produced during integration spatially; in the third stage, a dynamic range compression mechanism is employed to augment the prominence of the cracked areas by compressing the highlighted regions in the saliency map and diminishing the influence of background noise. The experimental results indicate that the method proposed in this study increases segmentation accuracy by 7.2% relative to the original baseline, markedly improves the visual interpretability of deep neural networks, and offers a novel, efficient, cost-effective, and interpretable approach for detecting structural cracks in engineering.
Collapse
Affiliation(s)
- Zhongmin Huangfu
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450046, China
| | - Yibo Jiao
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450046, China
| | - Fupeng Wei
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450046, China.
| | - Ge Shi
- School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450046, China
| | - Hangcheng Dong
- School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
13
|
Liang C, Yang F, Huang X, Zhang L, Wang Y. Deep learning assists early-detection of hypertension-mediated heart change on ECG signals. Hypertens Res 2025; 48:681-692. [PMID: 39394520 DOI: 10.1038/s41440-024-01938-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 09/06/2024] [Accepted: 09/23/2024] [Indexed: 10/13/2024]
Abstract
Arterial hypertension is a major risk factor for cardiovascular diseases. While cardiac ultrasound is a typical way to diagnose hypertension-mediated heart change, it often fails to detect early subtle structural changes. Electrocardiogram(ECG) represents electrical activity of heart muscle, affected by the changes in heart's structure. It is crucial to explore whether ECG can capture slight signals of hypertension-mediated heart change. However, reading ECG records is complex and some signals are too subtle to be captured by cardiologist's visual inspection. In this study, we designed a deep learning model to predict hypertension on ECG signals and then to identify hypertension-associated ECG segments. From The First Affiliated Hospital of Xiamen University, we collected 210,120 10-s 12-lead ECGs using the FX-8322 manufactured by FUKUDA and 812 ECGs using the RAGE-12 manufactured by NALONG. We proposed a deep learning framework, including MML-Net, a multi-branch, multi-scale LSTM neural network to evaluate the potential of ECG signals to detect hypertension, and ECG-XAI, an ECG-oriented wave-alignment AI explanation pipeline to identify hypertension-associated ECG segments. MML-Net achieved an 82% recall and an 87% precision in the testing, and an 80% recall and an 82% precision in the independent testing. In contrast, experienced clinical cardiologists typically attain recall rates ranging from 30 to 50% by visual inspection. The experiments demonstrate that ECG signals are sensitive to slight changes in heart structure caused by hypertension. ECG-XAI detects that R-wave and P-wave are the hypertension-associated ECG segments. The proposed framework has the potential to facilitate early diagnosis of heart change.
Collapse
Affiliation(s)
- Chengwei Liang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
| | - Fan Yang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen, Fujian, China
| | - Xiaobing Huang
- Fuzhou First General Hospital, Fujian Medical University, Fujian, China
| | - Lijuan Zhang
- The First Affiliation Hospital of Xiamen University, Xiamen University, Xiamen, Fujian, China.
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen, Fujian, China.
| |
Collapse
|
14
|
Wang K, Zhu M, Chen Z, Weng J, Li M, Yiu SM, Ding W, Gu T. A Statistical Physics Perspective: Understanding the Causality Behind Convolutional Neural Network Adversarial Vulnerability. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2118-2132. [PMID: 38324429 DOI: 10.1109/tnnls.2024.3359269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
The adversarial vulnerability of convolutional neural networks (CNNs) refers to the performance degradation of CNNs under adversarial attacks, leading to incorrect decisions. However, the causes of adversarial vulnerability in CNNs remain unknown. To address this issue, we propose a unique cross-scale analytical approach from a statistical physics perspective. It reveals that the huge amount of nonlinear effects inherent in CNNs is the fundamental cause for the formation and evolution of system vulnerability. Vulnerability is spontaneously formed on the macroscopic level after the symmetry of the system is broken through the nonlinear interaction between microscopic state order parameters. We develop a cascade failure algorithm, visualizing how micro perturbations on neurons' activation can cascade and influence macro decision paths. Our empirical results demonstrate the interplay between microlevel activation maps and macrolevel decision-making and provide a statistical physics perspective to understand the causality behind CNN vulnerability. Our work will help subsequent research to improve the adversarial robustness of CNNs.
Collapse
|
15
|
Fu J, Chen K, Dou Q, Gao Y, He Y, Zhou P, Lin S, Wang Y, Guo Y. IPNet: An Interpretable Network With Progressive Loss for Whole-Stage Colorectal Disease Diagnosis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:789-800. [PMID: 39298304 DOI: 10.1109/tmi.2024.3459910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/21/2024]
Abstract
Colorectal cancer plays a dominant role in cancer-related deaths, primarily due to the absence of obvious early-stage symptoms. Whole-stage colorectal disease diagnosis is crucial for assessing lesion evolution and determining treatment plans. However, locality difference and disease progression lead to intra-class disparities and inter-class similarities for colorectal lesion representation. In addition, interpretable algorithms explaining the lesion progression are still lacking, making the prediction process a "black box". In this paper, we propose IPNet, a dual-branch interpretable network with progressive loss for whole-stage colorectal disease diagnosis. The dual-branch architecture captures unbiased features representing diverse localities to suppress intra-class variation. The progressive loss function considers inter-class relationship, using prior knowledge of disease evolution to guide classification. Furthermore, a novel Grain-CAM is designed to interpret IPNet by visualizing pixel-wise attention maps from shallow to deep layers, providing regions semantically related to IPNet's progressive classification. We conducted whole-stage diagnosis on two image modalities, i.e., colorectal lesion classification on 129,893 endoscopic optical images and rectal tumor T-staging on 11,072 endoscopic ultrasound images. IPNet is shown to surpass other state-of-the-art algorithms, accordingly achieving an accuracy of 93.15% and 89.62%. Especially, it establishes effective decision boundaries for challenges like polyp vs. adenoma and T2 vs. T3. The results demonstrate an explainable attempt for colorectal lesion classification at a whole-stage level, and rectal tumor T-staging by endoscopic ultrasound is also unprecedentedly explored. IPNet is expected to be further applied, assisting physicians in whole-stage disease diagnosis and enhancing diagnostic interpretability.
Collapse
|
16
|
Wang C, Han J, Liu C, Zhang J, Qi Y. LEHP-DETR: A model with backbone improved and hybrid encoding innovated for flax capsule detection. iScience 2025; 28:111558. [PMID: 39877068 PMCID: PMC11773470 DOI: 10.1016/j.isci.2024.111558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/09/2024] [Accepted: 12/05/2024] [Indexed: 01/31/2025] Open
Abstract
Flax, as a functional crop with rich essential fatty acids and nutrients, is important in nutrition and industrial applications. However, the current process of flax seed detection relies mainly on manual operation, which is not only inefficient but also prone to error. The development of computer vision and deep learning techniques offers a new way to solve this problem. In this study, based on RT-DETR, we introduced the RepNCSPELAN4 module, ADown module, Context Aggregation module, and TFE module, and designed the HWD-ADown module, HiLo-AIFI module, and DSSFF module, and proposed an improved model, called LEHP-DETR. Experimental results show that LEHP-DETR achieves significant performance improvement on the flax dataset and comprehensively outperforms the comparison model. Compared to the base model, LEHP-DETR reduces the number of parameters by 67.3%, the model size by 66.3%, and the FLOPs by 37.6%. the average detection accuracy mAP50 and mAP50:95 increased by 2.6% and 3.5%, respectively.
Collapse
Affiliation(s)
- Changshun Wang
- College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730000, China
| | - Junying Han
- College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730000, China
| | - Chengzhong Liu
- College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730000, China
| | - Jianping Zhang
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou 730000, China
| | - Yanni Qi
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou 730000, China
| |
Collapse
|
17
|
Coluzzi D, Bordin V, Rivolta MW, Fortel I, Zhan L, Leow A, Baselli G. Biomarker Investigation Using Multiple Brain Measures from MRI Through Explainable Artificial Intelligence in Alzheimer's Disease Classification. Bioengineering (Basel) 2025; 12:82. [PMID: 39851356 PMCID: PMC11763248 DOI: 10.3390/bioengineering12010082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 01/05/2025] [Accepted: 01/09/2025] [Indexed: 01/26/2025] Open
Abstract
As the leading cause of dementia worldwide, Alzheimer's Disease (AD) has prompted significant interest in developing Deep Learning (DL) approaches for its classification. However, it currently remains unclear whether these models rely on established biological indicators. This work compares a novel DL model using structural connectivity (namely, BC-GCN-SE adapted from functional connectivity tasks) with an established model using structural magnetic resonance imaging (MRI) scans (namely, ResNet18). Unlike most studies primarily focusing on performance, our work places explainability at the forefront. Specifically, we define a novel Explainable Artificial Intelligence (XAI) metric, based on gradient-weighted class activation mapping. Its aim is quantitatively measuring how effectively these models fare against established AD biomarkers in their decision-making. The XAI assessment was conducted across 132 brain parcels. Results were compared to AD-relevant regions to measure adherence to domain knowledge. Then, differences in explainability patterns between the two models were assessed to explore the insights offered by each piece of data (i.e., MRI vs. connectivity). Classification performance was satisfactory in terms of both the median true positive (ResNet18: 0.817, BC-GCN-SE: 0.703) and true negative rates (ResNet18: 0.816; BC-GCN-SE: 0.738). Statistical tests (p < 0.05) and ranking of the 15% most relevant parcels revealed the involvement of target areas: the medial temporal lobe for ResNet18 and the default mode network for BC-GCN-SE. Additionally, our findings suggest that different imaging modalities provide complementary information to DL models. This lays the foundation for bioengineering advancements in developing more comprehensive and trustworthy DL models, potentially enhancing their applicability as diagnostic support tools for neurodegenerative diseases.
Collapse
Affiliation(s)
- Davide Coluzzi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy or (D.C.); (G.B.)
- Dipartimento di Informatica, Università degli Studi di Milano, 20133 Milan, Italy;
| | - Valentina Bordin
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy or (D.C.); (G.B.)
| | - Massimo W. Rivolta
- Dipartimento di Informatica, Università degli Studi di Milano, 20133 Milan, Italy;
| | - Igor Fortel
- Department of Biomedical Engineering, University of Illinois Chicago, Chicago, IL 60612, USA; (I.F.); (A.L.)
| | - Liang Zhan
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| | - Alex Leow
- Department of Biomedical Engineering, University of Illinois Chicago, Chicago, IL 60612, USA; (I.F.); (A.L.)
- Department of Psychiatry, University of Illinois Chicago, Chicago, IL 60612, USA
- Department of Computer Science, University of Illinois Chicago, Chicago, IL 60612, USA
| | - Giuseppe Baselli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy or (D.C.); (G.B.)
| |
Collapse
|
18
|
Fang X, Chong CF, Wong KL, Simões M, Ng BK. Investigating the key principles in two-step heterogeneous transfer learning for early laryngeal cancer identification. Sci Rep 2025; 15:2146. [PMID: 39820368 PMCID: PMC11739633 DOI: 10.1038/s41598-024-84836-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 12/27/2024] [Indexed: 01/19/2025] Open
Abstract
Data scarcity in medical images makes transfer learning a common approach in computer-aided diagnosis. Some disease classification tasks can rely on large homogeneous public datasets to train the transferred model, while others cannot, i.e., endoscopic laryngeal cancer image identification. Distinguished from most current works, this work pioneers exploring a two-step heterogeneous transfer learning (THTL) framework for laryngeal cancer identification and summarizing the fundamental principles for the intermediate domain selection. For heterogeneity and clear vascular representation, diabetic retinopathy images were chosen as THTL's intermediate domain. The experiment results reveal two vital principles in intermediate domain selection for future studies: 1) the size of the intermediate domain is not a sufficient condition to improve the transfer learning performance; 2) even distinct vascular features in the intermediate domain do not guarantee improved performance in the target domain. We observe that radial vascular patterns benefit benign classification, whereas twisted and tangled patterns align more with malignant classification. Additionally, to compensate for the absence of twisted patterns in the intermediate domains, we propose the Step-Wise Fine-Tuning (SWFT) technique, guided by the Layer Class Activate Map (LayerCAM) visualization result, getting 20.4% accuracy increases compared to accuracy from THTL's, even higher than fine-tune all layers.
Collapse
Affiliation(s)
- Xinyi Fang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Chak Fong Chong
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Kei Long Wong
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Computer Science and Engineering, University of Bologna, Bologna, 40100, Italy
| | - Marco Simões
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Benjamin K Ng
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| |
Collapse
|
19
|
Liu J, Xu Y, Liu Y, Luo H, Huang W, Yao L. Attention-Guided 3D CNN With Lesion Feature Selection for Early Alzheimer's Disease Prediction Using Longitudinal sMRI. IEEE J Biomed Health Inform 2025; 29:324-332. [PMID: 39412975 DOI: 10.1109/jbhi.2024.3482001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2024]
Abstract
Predicting the progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD) is critical for early intervention. Towards this end, various deep learning models have been applied in this domain, typically relying on structural magnetic resonance imaging (sMRI) data from a single time point whereas neglecting the dynamic changes in brain structure over time. Current longitudinal studies inadequately explore disease evolution dynamics and are burdened by high computational complexity. This paper introduces a novel lightweight 3D convolutional neural network specifically designed to capture the evolution of brain diseases for modeling the progression of MCI. First, a longitudinal lesion feature selection strategy is proposed to extract core features from temporal data, facilitating the detection of subtle differences in brain structure between two time points. Next, to refine the model for a more concentrated emphasis on lesion features, a disease trend attention mechanism is introduced to learn the dependencies between overall disease trends and local variation features. Finally, disease prediction visualization techniques are employed to improve the interpretability of the final predictions. Extensive experiments demonstrate that the proposed model achieves state-of-the-art performance in terms of area under the curve (AUC), accuracy, specificity, precision, and F1 score. This study confirms the efficacy of our early diagnostic method, utilizing only two follow-up sMRI scans to predict the disease status of MCI patients 24 months later with an AUC of 79.03%.
Collapse
|
20
|
Zheng Y, Zheng W, Du X. A lightweight rice pest detection algorithm based on improved YOLOv8. Sci Rep 2024; 14:29888. [PMID: 39623058 PMCID: PMC11612280 DOI: 10.1038/s41598-024-81587-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/27/2024] [Indexed: 12/06/2024] Open
Abstract
Timely and accurate detection of rice pests is highly important for pest control, as well as for improving rice yield and quality. However, owing to the high interclass similarity, significant intraclass age differences, and complex backgrounds among different pests, accurately and rapidly identifying a variety of rice pests via deep neural network models poses a significant challenge. To address this issue, this paper presents a fast and accurate method for rice pest detection and identification named Rice-YOLO (You Only Look Once). This model is based on YOLOv8-N and incorporates an efficient detection head designed for the complex characteristics of pests. Additionally, deep supervision layers were introduced into the network, along with the incorporation and improvement of the dynamic upsampling module. The experimental data included the large-scale pest public dataset IP102 and the sixteen-class rice pest dataset R2000. The experimental results demonstrated that Rice-YOLO outperformed previous object detection algorithms, with 78.1% mAP@0.5, 62.9% mAP@0.5:0.95, and 74.3% F1 scores.
Collapse
Affiliation(s)
- Yong Zheng
- Xiamen University of Technology, Fujian, 361024, China
- Hunan Provincial Key Laboratory of Remote Sensing Monitoring for Eco-environment in Dongting Lake Area, Changsha, 410004, Hunan, China
| | - Weiheng Zheng
- Xiamen University of Technology, Fujian, 361024, China.
- Hunan Provincial Key Laboratory of Remote Sensing Monitoring for Eco-environment in Dongting Lake Area, Changsha, 410004, Hunan, China.
| | - Xia Du
- Xiamen University of Technology, Fujian, 361024, China
| |
Collapse
|
21
|
Wang T, Huang K, Xu M, Huang J. Weakly supervised chest X-ray abnormality localization with non-linear modulation and foreground control. Sci Rep 2024; 14:29181. [PMID: 39587117 PMCID: PMC11589575 DOI: 10.1038/s41598-024-79701-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 11/11/2024] [Indexed: 11/27/2024] Open
Abstract
Chest X-ray is widely used to diagnose lung diseases. Due to the demand for accelerating analysis and interpretation to reduce the workload of radiologists, there has been a growing interest in building automated systems of chest X-ray abnormality localization. However, fully supervised methods usually require well-trained radiologists to annotate bounding boxes manually, which is labor-intensive and time-consuming. As a result, weakly supervised chest X-ray abnormality localization is gaining increasing attention because it only requires image-level annotations. Existing weakly supervised object localization (WSOL) techniques, which typically utilize class activation maps, often result in incomplete coverage and fragmentation of the objects and rely on class-specific classification accuracy. In this study, we propose a novel WSOL framework for chest X-ray abnormality localization that uses VMamba as the backbone and integrates three practical components to improve localization accuracy. First, we propose a non-linear modulation module to refine Foreground Prediction Maps (FPM) by expanding the foreground activation region and enhancing its continuity. Second, we design an FPM fusion module to strengthen the foreground and suppress the background, thereby improving their separability in chest X-ray images. Third, we craft a novel foreground control loss that regulates the feature maps to refine the background and foreground activation for better foreground identification. The proposed method is evaluated on two commonly used chest X-ray datasets, the NIH chest X-ray dataset and the RSNA dataset, and demonstrates superior performance over six state-of-the-art WSOL methods. In addition, the robustness and applicability of the proposed method are evaluated using three additional datasets with varying modalities and image quality.
Collapse
Affiliation(s)
- Tongyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Kuan Huang
- Department of Computer Science and Technology, Kean University, Union, 07083, USA
| | - Meng Xu
- Department of Computer Science and Technology, Kean University, Union, 07083, USA.
| | - Jianhua Huang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
22
|
Hu H, Wang R, Lin H, Yu H. UnionCAM: enhancing CNN interpretability through denoising, weighted fusion, and selective high-quality class activation mapping. Front Neurorobot 2024; 18:1490198. [PMID: 39610839 PMCID: PMC11602493 DOI: 10.3389/fnbot.2024.1490198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 10/04/2024] [Indexed: 11/30/2024] Open
Abstract
Deep convolutional neural networks (CNNs) have achieved remarkable success in various computer vision tasks. However, the lack of interpretability in these models has raised concerns and hindered their widespread adoption in critical domains. Generating activation maps that highlight the regions contributing to the CNN's decision has emerged as a popular approach to visualize and interpret these models. Nevertheless, existing methods often produce activation maps contaminated with irrelevant background noise or incomplete object activation, limiting their effectiveness in providing meaningful explanations. To address this challenge, we propose Union Class Activation Mapping (UnionCAM), an innovative visual interpretation framework that generates high-quality class activation maps (CAMs) through a novel three-step approach. UnionCAM introduces a weighted fusion strategy that adaptively combines multiple CAMs to create more informative and comprehensive activation maps. First, the denoising module removes background noise from CAMs by using adaptive thresholding. Subsequently, the union module fuses the denoised CAMs with region-based CAMs using a weighted combination scheme to obtain more comprehensive and informative maps, which we refer to as fused CAMs. Lastly, the activation map selection module automatically selects the optimal CAM that offers the best interpretation from the pool of fused CAMs. Extensive experiments on ILSVRC2012 and VOC2007 datasets demonstrate UnionCAM's superior performance over state-of-the-art methods. It effectively suppresses background noise, captures complete object regions, and provides intuitive visual explanations. UnionCAM achieves significant improvements in insertion and deletion scores, outperforming the best baseline. UnionCAM makes notable contributions by introducing a novel denoising strategy, adaptive fusion of CAMs, and an automatic selection mechanism. It bridges the gap between CNN performance and interpretability, providing a valuable tool for understanding and trusting CNN-based systems. UnionCAM has the potential to foster responsible deployment of CNNs in real-world applications.
Collapse
Affiliation(s)
- Hao Hu
- The Institute of Computing, China Academy of Railway Sciences Corporation Ltd, Beijing, China
- The Center of National Railway Intelligent Transportation System Engineering and Technology, Beijing, China
| | - Rui Wang
- The Institute of Computing, China Academy of Railway Sciences Corporation Ltd, Beijing, China
| | - Hao Lin
- Xi'an Jiaotong University, Xi'an, China
| | - Huai Yu
- Signal and Communication Research Institute, China Academy of Railway Sciences Corporation Ltd, Beijing, China
| |
Collapse
|
23
|
Tang C, Zhou Y, Zhao S, Xie M, Zhang R, Long X, Zhu L, Lu Y, Ma G, Li H. Segmentation tracking and clustering system enables accurate multi-animal tracking of social behaviors. PATTERNS (NEW YORK, N.Y.) 2024; 5:101057. [PMID: 39568468 PMCID: PMC11573910 DOI: 10.1016/j.patter.2024.101057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/18/2024] [Accepted: 08/13/2024] [Indexed: 11/22/2024]
Abstract
Accurate analysis of social behaviors in animals is hindered by methodological challenges. Here, we develop a segmentation tracking and clustering system (STCS) to address two major challenges in computational neuroethology: reliable multi-animal tracking and pose estimation under complex interaction conditions and providing interpretable insights into social differences guided by genotype information. We established a comprehensive, long-term, multi-animal-tracking dataset across various experimental settings. Benchmarking STCS against state-of-the-art tracking algorithms, we demonstrated its superior efficacy in analyzing behavioral experiments and establishing a robust tracking baseline. By analyzing the behavior of mice with autism spectrum disorder (ASD) using a novel weakly supervised clustering method under both solitary and social conditions, STCS reveals potential links between social stress and motor impairments. Benefiting from its modular and web-based design, STCS allows researchers to easily integrate the latest computer vision methods, enabling comprehensive behavior analysis services over the Internet, even from a single laptop.
Collapse
Affiliation(s)
- Cheng Tang
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Nuclear Medicine, Wuhan Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Yang Zhou
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Shuaizhu Zhao
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Mingshu Xie
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Ruizhe Zhang
- Wuhan Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Xiaoyan Long
- Wuhan Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Lingqiang Zhu
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Youming Lu
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Guangzhi Ma
- School of Computer Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Hao Li
- Innovation Center of Brain Medical Sciences, the Ministry of Education, China, Huazhong University of Science and Technology, Wuhan 430022, China
- Department of Pathophysiology, School of Basic Medicine and Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| |
Collapse
|
24
|
Wang C, He N, Zhang Y, Li Y, Huang P, Liu Y, Jin Z, Cheng Z, Liu Y, Wang Y, Zhang C, Haacke EM, Chen S, Yan F, Yang G. Enhancing Nigrosome-1 Sign Identification via Interpretable AI using True Susceptibility Weighted Imaging. J Magn Reson Imaging 2024; 60:1904-1915. [PMID: 38236577 DOI: 10.1002/jmri.29245] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Nigrosome 1 (N1), the largest nigrosome region in the ventrolateral area of the substantia nigra pars compacta, is identifiable by the "N1 sign" in long echo time gradient echo MRI. The N1 sign's absence is a vital Parkinson's disease (PD) diagnostic marker. However, it is challenging to visualize and assess the N1 sign in clinical practice. PURPOSE To automatically detect the presence or absence of the N1 sign from true susceptibility weighted imaging by using deep-learning method. STUDY TYPE Prospective. POPULATION/SUBJECTS 453 subjects, including 225 PD patients, 120 healthy controls (HCs), and 108 patients with other movement disorders, were prospectively recruited including 227 males and 226 females. They were divided into training, validation, and test cohorts of 289, 73, and 91 cases, respectively. FIELD STRENGTH/SEQUENCE 3D gradient echo SWI sequence at 3T; 3D multiecho strategically acquired gradient echo imaging at 3T; NM-sensitive 3D gradient echo sequence with MTC pulse at 3T. ASSESSMENT A neuroradiologist with 5 years of experience manually delineated substantia nigra regions. Two raters with 2 and 36 years of experience assessed the N1 sign on true susceptibility weighted imaging (tSWI), QSM with high-pass filter, and magnitude data combined with MTC data. We proposed NINet, a neural model, for automatic N1 sign identification in tSWI images. STATISTICAL TESTS We compared the performance of NINet to the subjective reference standard using Receiver Operating Characteristic analyses, and a decision curve analysis assessed identification accuracy. RESULTS NINet achieved an area under the curve (AUC) of 0.87 (CI: 0.76-0.89) in N1 sign identification, surpassing other models and neuroradiologists. NINet localized the putative N1 sign within tSWI images with 67.3% accuracy. DATA CONCLUSION Our proposed NINet model's capability to determine the presence or absence of the N1 sign, along with its localization, holds promise for enhancing diagnostic accuracy when evaluating PD using MR images. LEVEL OF EVIDENCE 2 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Chenglong Wang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Naying He
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Youmin Zhang
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yan Li
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Pei Huang
- Department of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yu Liu
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhijia Jin
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zenghui Cheng
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yun Liu
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Yida Wang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Chengxiu Zhang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - E Mark Haacke
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Department of Biomedical Engineering, Wayne State University, Detroit, Michigan, USA
| | - Shengdi Chen
- Department of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fuhua Yan
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guang Yang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| |
Collapse
|
25
|
Quan L, Huang D, Liu Z, Gao K, Mi X. An One-step Triple Enhanced weakly supervised semantic segmentation using image-level labels. PLoS One 2024; 19:e0309126. [PMID: 39432517 PMCID: PMC11493269 DOI: 10.1371/journal.pone.0309126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 08/05/2024] [Indexed: 10/23/2024] Open
Abstract
Weakly supervised semantic segmentation, based on image-level labels, abandons the pixel-level labels relied upon by traditional semantic segmentation algorithms. It only utilizes images as supervision information, thereby reducing the time cost and human resources required for marking pixel data. The prevailing approach in weakly supervised segmentation involves two-step method, introducing an additional network and numerous parameters, thereby complicating the model structure. Furthermore, image-level labels typically furnishes only category information for the entire image, lacking specific location details and accurate target boundaries during model training. We propose an innovative One-Step Triple Enhanced weakly supervised semantic segmentation network(OSTE). OSTE streamlines the model structure, which can accomplish both pseudo-labels generation and semantic segmentation tasks in just one step. Furthermore, we augment the weakly supervised semantic segmentation network in three key aspects based on the class activation map construction method, thereby enhancing segmentation accuracy: Firstly, by integrating local information from the activation map with the image, we can enhance the network's localization and expansion capabilities to obtain more accurate and rich location information. Then, we refine the seed areas of the class activation map by exploiting the correlation between multi-level feature. Finally, we incorporate conditional random field theory to generate pseudo-labels with higher confidence and richer boundary information. In comparison to the prevailing two-step weakly supervised semantic segmentation schemes, the segmentation network proposed in this paper achieves a more competitive mean Intersection over Union (mIoU) score of 58.47% on Pascal VOC. Additionally, it enhances the mIoU score by at least 5.03% when compared to existing end-to-end schemes.
Collapse
Affiliation(s)
- Longjie Quan
- School of Electronics and Information Engineering, Changchun University of Science and Technology, Changchun, People’s Republic of China
| | - Dandan Huang
- School of Electronics and Information Engineering, Changchun University of Science and Technology, Changchun, People’s Republic of China
| | - Zhi Liu
- School of Electronics and Information Engineering, Changchun University of Science and Technology, Changchun, People’s Republic of China
- National and Local Joint Engineering Research Center of Space Photoelectric Technology, Changchun University of Science and Technology, Changchun, People’s Republic of China
| | - Kai Gao
- School of Electronics and Information Engineering, Changchun University of Science and Technology, Changchun, People’s Republic of China
| | - Xiaohong Mi
- School of Business, Henan University of Science and Technology, Luoyang, People’s Republic of China
| |
Collapse
|
26
|
Cai L, Chen L, Huang J, Wang Y, Zhang Y. Know your orientation: A viewpoint-aware framework for polyp segmentation. Med Image Anal 2024; 97:103288. [PMID: 39096844 DOI: 10.1016/j.media.2024.103288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Collapse
Affiliation(s)
- Linghan Cai
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
| | - Lijiang Chen
- Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China
| | - Jianhao Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yifeng Wang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yongbing Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China.
| |
Collapse
|
27
|
Chen J, Guo J, Zhang H, Liang Z, Wang S. Weakly supervised localization model for plant disease based on Siamese networks. FRONTIERS IN PLANT SCIENCE 2024; 15:1418201. [PMID: 39399542 PMCID: PMC11466783 DOI: 10.3389/fpls.2024.1418201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 09/04/2024] [Indexed: 10/15/2024]
Abstract
Problems Plant diseases significantly impact crop growth and yield. The variability and unpredictability of symptoms postinfection increase the complexity of image-based disease detection methods, leading to a higher false alarm rate. Aim To address this challenge, we have developed an efficient, weakly supervised agricultural disease localization model using Siamese neural networks. Methods This model innovatively employs a Siamese network structure with a weight-sharing mechanism to effectively capture the visual differences in plants affected by diseases. Combined with our proprietary Agricultural Disease Precise Localization Class Activation Mapping algorithm (ADPL-CAM), the model can accurately identify areas affected by diseases, achieving effective localization of plant diseases. Results and conclusion The results showed that ADPL-CAM performed the best on all network architectures. On ResNet50, ADPL-CAM's top-1 accuracy was 3.96% higher than GradCAM and 2.77% higher than SmoothCAM; the average Intersection over Union (IoU) is 27.09% higher than GradCAM and 19.63% higher than SmoothCAM. Under the SPDNet architecture, ADPL-CAM achieves a top-1 accuracy of 54.29% and an average IoU of 67.5%, outperforming other CAM methods in all metrics. It can accurately and promptly identify and locate diseased leaves in crops.
Collapse
Affiliation(s)
| | - Jianwen Guo
- Dongguan University of Technology, Dongguan, China
| | | | | | | |
Collapse
|
28
|
Joye AS, Firlie MG, Wittberg DM, Aragie S, Nash SD, Tadesse Z, Dagnew A, Hailu D, Admassu F, Wondimteka B, Getachew H, Kabtu E, Beyecha S, Shibiru M, Getnet B, Birhanu T, Abdu S, Tekew S, Lietman TM, Keenan JD, Redd TK. Computer Vision Identification of Trachomatous Inflammation-Follicular Using Deep Learning. Cornea 2024; 44:613-618. [PMID: 39312712 PMCID: PMC11949225 DOI: 10.1097/ico.0000000000003701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 07/22/2024] [Accepted: 07/25/2024] [Indexed: 09/25/2024]
Abstract
PURPOSE Trachoma surveys are used to estimate the prevalence of trachomatous inflammation-follicular (TF) to guide mass antibiotic distribution. These surveys currently rely on human graders, introducing a significant resource burden and potential for human error. This study describes the development and evaluation of machine learning models intended to reduce cost and improve reliability of these surveys. METHODS Fifty-six thousand seven hundred twenty-five everted eyelid photographs were obtained from 11,358 children of age 0 to 9 years in a single trachoma-endemic region of Ethiopia over a 3-year period. Expert graders reviewed all images from each examination to determine the estimated number of tarsal conjunctival follicles and the degree of trachomatous inflammation-intense. The median estimate of the 3 grader groups was used as the ground truth to train a MobileNetV3 large deep convolutional neural network to detect cases with TF. RESULTS The classification model predicted a TF prevalence of 32%, which was not significantly different from the human consensus estimate (30%; 95% confidence interval of difference, -2 to +4%). The model had an area under the receiver operating characteristic curve of 0.943, F1 score of 0.923, 88% accuracy, 83% sensitivity, and 91% specificity. The area under the receiver operating characteristic curve increased to 0.995 when interpreting nonborderline cases of TF. CONCLUSIONS Deep convolutional neural network models performed well at classifying TF and detecting the number of follicles evident in conjunctival photographs. Implementation of similar models may enable accurate, efficient, large-scale trachoma screening. Further validation in diverse populations with varying TF prevalence is needed before implementation at scale.
Collapse
Affiliation(s)
- Ashlin S. Joye
- Casey Eye Institute, Oregon Health and Science University, Portland, OR
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Marissa G. Firlie
- George Washington University, School of Medicine and Health Sciences, Washington, DC
| | - Dionna M. Wittberg
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | | | | | | | - Adane Dagnew
- The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | | | - Fisseha Admassu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Bilen Wondimteka
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Habib Getachew
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Endale Kabtu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Social Beyecha
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Meskerem Shibiru
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Banchalem Getnet
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Tibebe Birhanu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Seid Abdu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Solomon Tekew
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Thomas M. Lietman
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Jeremy D. Keenan
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Travis K. Redd
- Casey Eye Institute, Oregon Health and Science University, Portland, OR
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| |
Collapse
|
29
|
El Hmimdi AE, Palpanas T, Kapoula Z. Efficient diagnostic classification of diverse pathologies through contextual eye movement data analysis with a novel hybrid architecture. Sci Rep 2024; 14:21461. [PMID: 39271749 PMCID: PMC11399410 DOI: 10.1038/s41598-024-68056-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 07/19/2024] [Indexed: 09/15/2024] Open
Abstract
The analysis of eye movements has proven valuable for understanding brain function and the neuropathology of various disorders. This research aims to utilize eye movement data analysis as a screening tool for differentiation between eight different groups of pathologies, including scholar, neurologic, and postural disorders. Leveraging a dataset from 20 clinical centers, all employing AIDEAL and REMOBI eye movement technologies this study extends prior research by considering a multi-annotation setting, incorporating information from recordings from saccade and vergence eye movement tests, and using contextual information (e.g. target signals and latency of the eye movement relative to the target and confidence level of the quality of eye movement recording) to improve accuracy while reducing noise interference. Additionally, we introduce a novel hybrid architecture that combines the weight-sharing feature of convolution layers with the long-range capabilities of the transformer architecture to improve model efficiency and reduce the computation cost by a factor of 3.36, while still being competitive in terms of macro F1 score. Evaluated on two diverse datasets, our method demonstrates promising results, the most powerful discrimination being Attention & Neurologic; with a macro F1 score of up to 78.8%; disorder. The results indicate the effectiveness of our approach in classifying eye movement data from different pathologies and different clinical centers accurately, thus enabling the creation of an assistant tool in the future.
Collapse
Affiliation(s)
- Alae Eddine El Hmimdi
- Orasis Eye Analytics and Rehabilitation, Paris, France
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France
| | - Themis Palpanas
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France
| | - Zoi Kapoula
- Orasis Eye Analytics and Rehabilitation, Paris, France.
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France.
| |
Collapse
|
30
|
Bankin M, Tyrykin Y, Duk M, Samsonova M, Kozlov K. Modeling Chickpea Productivity with Artificial Image Objects and Convolutional Neural Network. PLANTS (BASEL, SWITZERLAND) 2024; 13:2444. [PMID: 39273927 PMCID: PMC11397516 DOI: 10.3390/plants13172444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/23/2024] [Accepted: 08/28/2024] [Indexed: 09/15/2024]
Abstract
The chickpea plays a significant role in global agriculture and occupies an increasing share in the human diet. The main aim of the research was to develop a model for the prediction of two chickpea productivity traits in the available dataset. Genomic data for accessions were encoded in Artificial Image Objects, and a model for the thousand-seed weight (TSW) and number of seeds per plant (SNpP) prediction was constructed using a Convolutional Neural Network, dictionary learning and sparse coding for feature extraction, and extreme gradient boosting for regression. The model was capable of predicting both traits with an acceptable accuracy of 84-85%. The most important factors for model solution were identified using the dense regression attention maps method. The SNPs important for the SNpP and TSW traits were found in 34 and 49 genes, respectively. Genomic prediction with a constructed model can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired phenotype.
Collapse
Affiliation(s)
- Mikhail Bankin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Yaroslav Tyrykin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Duk
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Samsonova
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Konstantin Kozlov
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| |
Collapse
|
31
|
Zhao C, Hsiao JH, Chan AB. Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5967-5985. [PMID: 38517727 DOI: 10.1109/tpami.2024.3380604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
Abstract
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS.
Collapse
|
32
|
Won H, Lee HS, Youn D, Park D, Eo T, Kim W, Hwang D. Deep Learning-Based Joint Effusion Classification in Adult Knee Radiographs: A Multi-Center Prospective Study. Diagnostics (Basel) 2024; 14:1900. [PMID: 39272685 PMCID: PMC11394442 DOI: 10.3390/diagnostics14171900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/09/2024] [Accepted: 08/23/2024] [Indexed: 09/15/2024] Open
Abstract
Knee effusion, a common and important indicator of joint diseases such as osteoarthritis, is typically more discernible on magnetic resonance imaging (MRI) scans compared to radiographs. However, the use of radiographs for the early detection of knee effusion remains promising due to their cost-effectiveness and accessibility. This multi-center prospective study collected a total of 1413 radiographs from four hospitals between February 2022 to March 2023, of which 1281 were analyzed after exclusions. To automatically detect knee effusion on radiographs, we utilized a state-of-the-art (SOTA) deep learning-based classification model with a novel preprocessing technique to optimize images for diagnosing knee effusion. The diagnostic performance of the proposed method was significantly higher than that of the baseline model, achieving an area under the receiver operating characteristic curve (AUC) of 0.892, accuracy of 0.803, sensitivity of 0.820, and specificity of 0.785. Moreover, the proposed method significantly outperformed two non-orthopedic physicians. Coupled with an explainable artificial intelligence method for visualization, this approach not only improved diagnostic performance but also interpretability, highlighting areas of effusion. These results demonstrate that the proposed method enables the early and accurate classification of knee effusions on radiographs, thereby reducing healthcare costs and improving patient outcomes through timely interventions.
Collapse
Affiliation(s)
- Hyeyeon Won
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
| | - Hye Sang Lee
- Independent Researcher, Seoul 06295, Republic of Korea
| | - Daemyung Youn
- School of Management of Technology, Yonsei University, Seoul 03722, Republic of Korea
| | - Doohyun Park
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Taejoon Eo
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
| | - Wooju Kim
- Department of Industrial Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Dosik Hwang
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
- Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology, 5, Hwarang-ro 14-gil, Seongbuk-gu, Seoul 02792, Republic of Korea
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul 03722, Republic of Korea
- Department of Radiology, Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medical, Seoul 03722, Republic of Korea
| |
Collapse
|
33
|
Guo W, Jin S, Li Y, Jiang Y. The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data. ACCIDENT; ANALYSIS AND PREVENTION 2024; 203:107636. [PMID: 38776837 DOI: 10.1016/j.aap.2024.107636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/24/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024]
Abstract
The visual information regarding the road environment can influence drivers' perception and judgment, often resulting in frequent speeding incidents. Identifying speeding hotspots in cities can prevent potential speeding incidents, thereby improving traffic safety levels. We propose the Dual-Branch Contextual Dynamic-Static Feature Fusion Network based on static panoramic images and dynamically changing sequence data, aiming to capture global features in the macro scene of the area and dynamically changing information in the micro view for a more accurate urban speeding hotspot area identification. For the static branch, we propose the Multi-scale Contextual Feature Aggregation Network for learning global spatial contextual association information. In the dynamic branch, we construct the Multi-view Dynamic Feature Fusion Network to capture the dynamically changing features of a scene from a continuous sequence of street view images. Additionally, we designed the Dynamic-Static Feature Correlation Fusion Structure to correlate and fuse dynamic and static features. The experimental results show that the model has good performance, and the overall recognition accuracy reaches 99.4%. The ablation experiments show that the recognition effect after the fusion of dynamic and static features is better than that of static and dynamic branches. The proposed model also shows better performance than other deep learning models. In addition, we combine image processing methods and different Class Activation Mapping (CAM) methods to extract speeding frequency visual features from the model perception results. The results show that more accurate speeding frequency features can be obtained by using LayerCAM and GradCAM-Plus for static global scenes and dynamic local sequences, respectively. In the static global scene, the speeding frequency features are mainly concentrated on the buildings and green layout on both sides of the road, while in the dynamic scene, the speeding frequency features shift with the scene changes and are mainly concentrated on the dynamically changing transition areas of greenery, roads, and surrounding buildings. The code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/DCDSFF-Net.
Collapse
Affiliation(s)
- Wentong Guo
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Sheng Jin
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China; Zhongyuan Institute, Zhejiang University, Zhengzhou 450000, China.
| | - Yiding Li
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou 450003, China
| | - Yang Jiang
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
34
|
Dai W, Wu T, Liu R, Wang M, Yin J, Liu J. Any region can be perceived equally and effectively on rotation pretext task using full rotation and weighted-region mixture. Neural Netw 2024; 176:106350. [PMID: 38723309 DOI: 10.1016/j.neunet.2024.106350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 01/15/2024] [Accepted: 04/28/2024] [Indexed: 06/17/2024]
Abstract
In recent years, self-supervised learning has emerged as a powerful approach to learning visual representations without requiring extensive manual annotation. One popular technique involves using rotation transformations of images, which provide a clear visual signal for learning semantic representation. However, in this work, we revisit the pretext task of predicting image rotation in self-supervised learning and discover that it tends to marginalise the perception of features located near the centre of an image. To address this limitation, we propose a new self-supervised learning method, namely FullRot, which spotlights underrated regions by resizing the randomly selected and cropped regions of images. Moreover, FullRot increases the complexity of the rotation pretext task by applying the degree-free rotation to the region cropped into a circle. To encourage models to learn from different general parts of an image, we introduce a new data mixture technique called WRMix, which merges two random intra-image patches. By combining these innovative crop and rotation methods with the data mixture scheme, our approach, FullRot + WRMix, surpasses the state-of-the-art self-supervision methods in classification, segmentation, and object detection tasks on ten benchmark datasets with an improvement of up to +13.98% accuracy on STL-10, +8.56% accuracy on CIFAR-10, +10.20% accuracy on Sports-100, +15.86% accuracy on Mammals-45, +15.15% accuracy on PAD-UFES-20, +32.44% mIoU on VOC 2012, +7.62% mIoU on ISIC 2018, +9.70% mIoU on FloodArea, +25.16% AP50 on VOC 2007, and +58.69% AP50 on UTDAC 2020. The code is available at https://github.com/anthonyweidai/FullRot_WRMix.
Collapse
Affiliation(s)
- Wei Dai
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Tianyi Wu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Rui Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Min Wang
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Jianqin Yin
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
| | - Jun Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
35
|
Yuan H, Hong C, Jiang PT, Zhao G, Tran NTA, Xu X, Yan YY, Liu N. Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification. J Biomed Inform 2024; 156:104673. [PMID: 38862083 DOI: 10.1016/j.jbi.2024.104673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/01/2024] [Accepted: 06/07/2024] [Indexed: 06/13/2024]
Abstract
OBJECTIVE Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. Recently, artificial intelligence (AI), especially deep learning (DL), has been increasingly employed for automating the diagnostic process of pneumothorax. To address the opaqueness often associated with DL models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement. METHOD We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of the explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template's boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods (Saliency Map, Grad-CAM, and Integrated Gradients) with and without our template guidance when explaining two DL models (VGG-19 and ResNet-50) in two real-world datasets (SIIM-ACR and ChestX-Det). RESULTS The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. We further visualized baseline and template-guided model explanations on radiographs to showcase the performance of our approach. CONCLUSIONS In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving model explanations. Our approach not only aligns model explanations more closely with clinical insights but also exhibits extensibility to other thoracic diseases. We anticipate that our template guidance will forge a novel approach to elucidating AI models by integrating clinical domain expertise.
Collapse
Affiliation(s)
- Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, USA
| | | | - Gangming Zhao
- Faculty of Engineering, The University of Hong Kong, China
| | | | - Xinxing Xu
- Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore
| | - Yet Yen Yan
- Department of Radiology, Changi General Hospital, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Institute of Data Science, National University of Singapore, Singapore.
| |
Collapse
|
36
|
Li C, Narayanan A, Ghobakhlou A. Overlapping Shoeprint Detection by Edge Detection and Deep Learning. J Imaging 2024; 10:186. [PMID: 39194975 DOI: 10.3390/jimaging10080186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/04/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024] Open
Abstract
In the field of 2-D image processing and computer vision, accurately detecting and segmenting objects in scenarios where they overlap or are obscured remains a challenge. This difficulty is worse in the analysis of shoeprints used in forensic investigations because they are embedded in noisy environments such as the ground and can be indistinct. Traditional convolutional neural networks (CNNs), despite their success in various image analysis tasks, struggle with accurately delineating overlapping objects due to the complexity of segmenting intertwined textures and boundaries against a background of noise. This study introduces and employs the YOLO (You Only Look Once) model enhanced by edge detection and image segmentation techniques to improve the detection of overlapping shoeprints. By focusing on the critical boundary information between shoeprint textures and the ground, our method demonstrates improvements in sensitivity and precision, achieving confidence levels above 85% for minimally overlapped images and maintaining above 70% for extensively overlapped instances. Heatmaps of convolution layers were generated to show how the network converges towards successful detection using these enhancements. This research may provide a potential methodology for addressing the broader challenge of detecting multiple overlapping objects against noisy backgrounds.
Collapse
Affiliation(s)
- Chengran Li
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| | - Ajit Narayanan
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| | - Akbar Ghobakhlou
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| |
Collapse
|
37
|
Wei Y, Abrol A, Lah J, Qiu D, Calhoun VD. A deep spatio-temporal attention model of dynamic functional network connectivity shows sensitivity to Alzheimer's in asymptomatic individuals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039841 DOI: 10.1109/embc53108.2024.10781740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Alzheimer's disease (AD) progresses from asymptomatic changes to clinical symptoms, emphasizing the importance of early detection for proper treatment. Functional magnetic resonance imaging (fMRI), particularly dynamic functional network connectivity (dFNC), has emerged as an important biomarker for AD. Nevertheless, studies probing at-risk subjects in the pre-symptomatic stage using dFNC are limited. To identify at-risk subjects and understand alterations of dFNC in different stages, we leverage deep learning advancements and introduce a transformer-convolution framework for predicting at-risk subjects based on dFNC, incorporating spatial-temporal self-attention to capture brain network dependencies and temporal dynamics. Our model significantly outperforms other popular machine learning methods. By analyzing individuals with diagnosed AD and mild cognitive impairment (MCI), we studied the AD progression and observed a higher similarity between MCI and asymptomatic AD. The interpretable analysis highlights the cognitive-control network's diagnostic importance, with the model focusing on intra-visual domain dFNC when predicting asymptomatic AD subjects.
Collapse
|
38
|
Bhave S, Rodriguez V, Poterucha T, Mutasa S, Aberle D, Capaccione KM, Chen Y, Dsouza B, Dumeer S, Goldstein J, Hodes A, Leb J, Lungren M, Miller M, Monoky D, Navot B, Wattamwar K, Wattamwar A, Clerkin K, Ouyang D, Ashley E, Topkara VK, Maurer M, Einstein AJ, Uriel N, Homma S, Schwartz A, Jaramillo D, Perotte AJ, Elias P. Deep learning to detect left ventricular structural abnormalities in chest X-rays. Eur Heart J 2024; 45:2002-2012. [PMID: 38503537 PMCID: PMC11156488 DOI: 10.1093/eurheartj/ehad782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/24/2023] [Accepted: 11/14/2023] [Indexed: 03/21/2024] Open
Abstract
BACKGROUND AND AIMS Early identification of cardiac structural abnormalities indicative of heart failure is crucial to improving patient outcomes. Chest X-rays (CXRs) are routinely conducted on a broad population of patients, presenting an opportunity to build scalable screening tools for structural abnormalities indicative of Stage B or worse heart failure with deep learning methods. In this study, a model was developed to identify severe left ventricular hypertrophy (SLVH) and dilated left ventricle (DLV) using CXRs. METHODS A total of 71 589 unique CXRs from 24 689 different patients completed within 1 year of echocardiograms were identified. Labels for SLVH, DLV, and a composite label indicating the presence of either were extracted from echocardiograms. A deep learning model was developed and evaluated using area under the receiver operating characteristic curve (AUROC). Performance was additionally validated on 8003 CXRs from an external site and compared against visual assessment by 15 board-certified radiologists. RESULTS The model yielded an AUROC of 0.79 (0.76-0.81) for SLVH, 0.80 (0.77-0.84) for DLV, and 0.80 (0.78-0.83) for the composite label, with similar performance on an external data set. The model outperformed all 15 individual radiologists for predicting the composite label and achieved a sensitivity of 71% vs. 66% against the consensus vote across all radiologists at a fixed specificity of 73%. CONCLUSIONS Deep learning analysis of CXRs can accurately detect the presence of certain structural abnormalities and may be useful in early identification of patients with LV hypertrophy and dilation. As a resource to promote further innovation, 71 589 CXRs with adjoining echocardiographic labels have been made publicly available.
Collapse
Affiliation(s)
- Shreyas Bhave
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Victor Rodriguez
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Timothy Poterucha
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Simukayi Mutasa
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Dwight Aberle
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Kathleen M Capaccione
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Yibo Chen
- Inova Fairfax Hospital Imaging Center, Inova Fairfax Medical Campus, Falls Church, VA, USA
| | - Belinda Dsouza
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Shifali Dumeer
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Jonathan Goldstein
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Aaron Hodes
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Jay Leb
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Matthew Lungren
- Department of Radiology, University of California, SanFrancisco, CA, USA
| | - Mitchell Miller
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - David Monoky
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Benjamin Navot
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Kapil Wattamwar
- Division of Vascular and Interventional Radiology, Department of Radiology, Montefiore Medical Center, Bronx, NY, USA
| | - Anoop Wattamwar
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Kevin Clerkin
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - David Ouyang
- Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Euan Ashley
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Veli K Topkara
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Mathew Maurer
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Andrew J Einstein
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Nir Uriel
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Shunichi Homma
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Allan Schwartz
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Diego Jaramillo
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Adler J Perotte
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Pierre Elias
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| |
Collapse
|
39
|
You J, Ajlouni S, Kakaletri I, Charalampaki P, Giannarou S. XRelevanceCAM: towards explainable tissue characterization with improved localisation of pathological structures in probe-based confocal laser endomicroscopy. Int J Comput Assist Radiol Surg 2024; 19:1061-1073. [PMID: 38538880 PMCID: PMC11178611 DOI: 10.1007/s11548-024-03096-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/29/2024] [Indexed: 06/15/2024]
Abstract
PURPOSE Probe-based confocal laser endomicroscopy (pCLE) enables intraoperative tissue characterization with improved resection rates of brain tumours. Although a plethora of deep learning models have been developed for automating tissue characterization, their lack of transparency is a concern. To tackle this issue, techniques like Class Activation Map (CAM) and its variations highlight image regions related to model decisions. However, they often fall short of providing human-interpretable visual explanations for surgical decision support, primarily due to the shattered gradient problem or insufficient theoretical underpinning. METHODS In this paper, we introduce XRelevanceCAM, an explanation method rooted in a better backpropagation approach, incorporating sensitivity and conservation axioms. This enhanced method offers greater theoretical foundation and effectively mitigates the shattered gradient issue when compared to other CAM variants. RESULTS Qualitative and quantitative evaluations are based on ex vivo pCLE data of brain tumours. XRelevanceCAM effectively highlights clinically relevant areas that characterize the tissue type. Specifically, it yields a remarkable 56% improvement over our closest baseline, RelevanceCAM, in the network's shallowest layer as measured by the mean Intersection over Union (mIoU) metric based on ground-truth annotations (from 18 to 28.07%). Furthermore, a 6% improvement in mIoU is observed when generating the final saliency map from all network layers. CONCLUSION We introduce a new CAM variation, XRelevanceCAM, for precise identification of clinically important structures in pCLE data. This can aid introperative decision support in brain tumour resection surgery, as validated in our performance study.
Collapse
Affiliation(s)
- Jianzhong You
- Department of Computing, Imperial College London, Huxley Building, 180 Queen's Gate, South Kensington, London, UK.
| | - Serine Ajlouni
- Medical Faculty, University Witten Herdecke, 58455, Witten, Germany
| | - Irini Kakaletri
- Medical Faculty, Rheinische Friedrich Wilhelms University of Bonn, 53127, Bonn, Germany
| | - Patra Charalampaki
- Department of Neurosurgery, University Witten Herdecke, 58455, Witten, Germany
| | - Stamatia Giannarou
- Department of Surgery and Cancer, Imperial College London, 413, 4th Floor, Bessemer Building, South Kensington Campus, London, UK
| |
Collapse
|
40
|
Huang P, Xiao H, He P, Li C, Guo X, Tian S, Feng P, Chen H, Sun Y, Mercaldo F, Santone A, Qin J. LA-ViT: A Network With Transformers Constrained by Learned-Parameter-Free Attention for Interpretable Grading in a New Laryngeal Histopathology Image Dataset. IEEE J Biomed Health Inform 2024; 28:3557-3570. [PMID: 38442048 DOI: 10.1109/jbhi.2024.3373438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Grading laryngeal squamous cell carcinoma (LSCC) based on histopathological images is a clinically significant yet challenging task. However, more low-effect background semantic information appeared in the feature maps, feature channels, and class activation maps, which caused a serious impact on the accuracy and interpretability of LSCC grading. While the traditional transformer block makes extensive use of parameter attention, the model overlearns the low-effect background semantic information, resulting in ineffectively reducing the proportion of background semantics. Therefore, we propose an end-to-end network with transformers constrained by learned-parameter-free attention (LA-ViT), which improve the ability to learn high-effect target semantic information and reduce the proportion of background semantics. Firstly, according to generalized linear model and probabilistic, we demonstrate that learned-parameter-free attention (LA) has a stronger ability to learn highly effective target semantic information than parameter attention. Secondly, the first-type LA transformer block of LA-ViT utilizes the feature map position subspace to realize the query. Then, it uses the feature channel subspace to realize the key, and adopts the average convergence to obtain a value. And those construct the LA mechanism. Thus, it reduces the proportion of background semantics in the feature maps and feature channels. Thirdly, the second-type LA transformer block of LA-ViT uses the model probability matrix information and decision level weight information to realize key and query, respectively. And those realize the LA mechanism. So, it reduces the proportion of background semantics in class activation maps. Finally, we build a new complex semantic LSCC pathology image dataset to address the problem, which is less research on LSCC grading models because of lacking clinically meaningful datasets. After extensive experiments, the whole metrics of LA-ViT outperform those of other state-of-the-art methods, and the visualization maps match better with the regions of interest in the pathologists' decision-making. Moreover, the experimental results conducted on a public LSCC pathology image dataset show that LA-ViT has superior generalization performance to that of other state-of-the-art methods.
Collapse
|
41
|
Wang S, Sun M, Sun J, Wang Q, Wang G, Wang X, Meng X, Wang Z, Yu H. Advancing musculoskeletal tumor diagnosis: Automated segmentation and predictive classification using deep learning and radiomics. Comput Biol Med 2024; 175:108502. [PMID: 38678943 DOI: 10.1016/j.compbiomed.2024.108502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/18/2024] [Accepted: 04/21/2024] [Indexed: 05/01/2024]
Abstract
OBJECTIVES Musculoskeletal (MSK) tumors, given their high mortality rate and heterogeneity, necessitate precise examination and diagnosis to guide clinical treatment effectively. Magnetic resonance imaging (MRI) is pivotal in detecting MSK tumors, as it offers exceptional image contrast between bone and soft tissue. This study aims to enhance the speed of detection and the diagnostic accuracy of MSK tumors through automated segmentation and grading utilizing MRI. MATERIALS AND METHODS The research included 170 patients (mean age, 58 years ±12 (standard deviation), 84 men) with MSK lesions, who underwent MRI scans from April 2021 to May 2023. We proposed a deep learning (DL) segmentation model MSAPN based on multi-scale attention and pixel-level reconstruction, and compared it with existing algorithms. Using MSAPN-segmented lesions to extract their radiomic features for the benign and malignant classification of tumors. RESULTS Compared to the most advanced segmentation algorithms, MSAPN demonstrates better performance. The Dice similarity coefficients (DSC) are 0.871 and 0.815 in the testing set and independent validation set, respectively. The radiomics model for classifying benign and malignant lesions achieves an accuracy of 0.890. Moreover, there is no statistically significant difference between the radiomics model based on manual segmentation and MSAPN segmentation. CONCLUSION This research contributes to the advancement of MSK tumor diagnosis through automated segmentation and predictive classification. The integration of DL algorithms and radiomics shows promising results, and the visualization analysis of feature maps enhances clinical interpretability.
Collapse
Affiliation(s)
- Shuo Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China; State Key Laboratory of Advanced Medical Materials and Devices, Tianjin University, Tianjin, 300072, China.
| | - Man Sun
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Jinglai Sun
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Qingsong Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China.
| | - Guangpu Wang
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xiaolin Wang
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xianghong Meng
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Zhi Wang
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Hui Yu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China; State Key Laboratory of Advanced Medical Materials and Devices, Tianjin University, Tianjin, 300072, China; The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
42
|
Rao S, Bohle M, Schiele B. Better Understanding Differences in Attribution Methods via Systematic Evaluations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4090-4101. [PMID: 38215324 DOI: 10.1109/tpami.2024.3353528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
Collapse
|
43
|
Odusami M, Maskeliūnas R, Damaševičius R, Misra S. Machine learning with multimodal neuroimaging data to classify stages of Alzheimer's disease: a systematic review and meta-analysis. Cogn Neurodyn 2024; 18:775-794. [PMID: 38826669 PMCID: PMC11143094 DOI: 10.1007/s11571-023-09993-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 06/23/2023] [Accepted: 07/17/2023] [Indexed: 06/04/2024] Open
Abstract
In recent years, Alzheimer's disease (AD) has been a serious threat to human health. Researchers and clinicians alike encounter a significant obstacle when trying to accurately identify and classify AD stages. Several studies have shown that multimodal neuroimaging input can assist in providing valuable insights into the structural and functional changes in the brain related to AD. Machine learning (ML) algorithms can accurately categorize AD phases by identifying patterns and linkages in multimodal neuroimaging data using powerful computational methods. This study aims to assess the contribution of ML methods to the accurate classification of the stages of AD using multimodal neuroimaging data. A systematic search is carried out in IEEE Xplore, Science Direct/Elsevier, ACM DigitalLibrary, and PubMed databases with forward snowballing performed on Google Scholar. The quantitative analysis used 47 studies. The explainable analysis was performed on the classification algorithm and fusion methods used in the selected studies. The pooled sensitivity and specificity, including diagnostic efficiency, were evaluated by conducting a meta-analysis based on a bivariate model with the hierarchical summary receiver operating characteristics (ROC) curve of multimodal neuroimaging data and ML methods in the classification of AD stages. Wilcoxon signed-rank test is further used to statistically compare the accuracy scores of the existing models. With a 95% confidence interval of 78.87-87.71%, the combined sensitivity for separating participants with mild cognitive impairment (MCI) from healthy control (NC) participants was 83.77%; for separating participants with AD from NC, it was 94.60% (90.76%, 96.89%); for separating participants with progressive MCI (pMCI) from stable MCI (sMCI), it was 80.41% (74.73%, 85.06%). With a 95% confidence interval (78.87%, 87.71%), the Pooled sensitivity for distinguishing mild cognitive impairment (MCI) from healthy control (NC) participants was 83.77%, with a 95% confidence interval (90.76%, 96.89%), the Pooled sensitivity for distinguishing AD from NC was 94.60%, likewise (MCI) from healthy control (NC) participants was 83.77% progressive MCI (pMCI) from stable MCI (sMCI) was 80.41% (74.73%, 85.06%), and early MCI (EMCI) from NC was 86.63% (82.43%, 89.95%). Pooled specificity for differentiating MCI from NC was 79.16% (70.97%, 87.71%), AD from NC was 93.49% (91.60%, 94.90%), pMCI from sMCI was 81.44% (76.32%, 85.66%), and EMCI from NC was 85.68% (81.62%, 88.96%). The Wilcoxon signed rank test showed a low P-value across all the classification tasks. Multimodal neuroimaging data with ML is a promising future in classifying the stages of AD but more research is required to increase the validity of its application in clinical practice.
Collapse
Affiliation(s)
- Modupe Odusami
- Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania
| | - Rytis Maskeliūnas
- Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Sanjay Misra
- Department of Applied Data Science, Institute for Energy Technology, Halden, Norway
| |
Collapse
|
44
|
Song B, Yoshida S. Explainability of three-dimensional convolutional neural networks for functional magnetic resonance imaging of Alzheimer's disease classification based on gradient-weighted class activation mapping. PLoS One 2024; 19:e0303278. [PMID: 38771733 PMCID: PMC11108152 DOI: 10.1371/journal.pone.0303278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 04/22/2024] [Indexed: 05/23/2024] Open
Abstract
Currently, numerous studies focus on employing fMRI-based deep neural networks to diagnose neurological disorders such as Alzheimer's Disease (AD), yet only a handful have provided results regarding explainability. We address this gap by applying several prevalent explainability methods such as gradient-weighted class activation mapping (Grad-CAM) to an fMRI-based 3D-VGG16 network for AD diagnosis to improve the model's explainability. The aim is to explore the specific Region of Interest (ROI) of brain the model primarily focuses on when making predictions, as well as whether there are differences in these ROIs between AD and normal controls (NCs). First, we utilized multiple resting-state functional activity maps including ALFF, fALFF, ReHo, and VMHC to reduce the complexity of fMRI data, which differed from many studies that utilized raw fMRI data. Compared to methods utilizing raw fMRI data, this manual feature extraction approach may potentially alleviate the model's burden. Subsequently, 3D-VGG16 were employed for AD classification, where the final fully connected layers were replaced with a Global Average Pooling (GAP) layer, aimed at mitigating overfitting while preserving spatial information within the feature maps. The model achieved a maximum of 96.4% accuracy on the test set. Finally, several 3D CAM methods were employed to interpret the models. In the explainability results of the models with relatively high accuracy, the highlighted ROIs were primarily located in the precuneus and the hippocampus for AD subjects, while the models focused on the entire brain for NC. This supports current research on ROIs involved in AD. We believe that explaining deep learning models would not only provide support for existing research on brain disorders, but also offer important referential recommendations for the study of currently unknown etiologies.
Collapse
Affiliation(s)
- Boyue Song
- Graduate School of Engineering, Kochi University of Technology, Kami City, Kochi Prefecture, Japan
| | - Shinichi Yoshida
- School of Information, Kochi University of Technology, Kami City, Kochi Prefecture, Japan
| | | |
Collapse
|
45
|
Fan Y, Li Q, Mao H, Jiang F. Magnetoencephalography Decoding Transfer Approach: From Deep Learning Models to Intrinsically Interpretable Models. IEEE J Biomed Health Inform 2024; 28:2818-2829. [PMID: 38349827 DOI: 10.1109/jbhi.2024.3365051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024]
Abstract
When decoding neuroelectrophysiological signals represented by Magnetoencephalography (MEG), deep learning models generally achieve high predictive performance but lack the ability to interpret their predicted results. This limitation prevents them from meeting the essential requirements of reliability and ethical-legal considerations in practical applications. In contrast, intrinsically interpretable models, such as decision trees, possess self-evident interpretability while typically sacrificing accuracy. To effectively combine the respective advantages of both deep learning and intrinsically interpretable models, an MEG transfer approach through feature attribution-based knowledge distillation is pioneered, which transforms deep models (teacher) into highly accurate intrinsically interpretable models (student). The resulting models provide not only intrinsic interpretability but also high predictive performance, besides serving as an excellent approximate proxy to understand the inner workings of deep models. In the proposed approach, post-hoc feature knowledge derived from post-hoc interpretable algorithms, specifically feature attribution maps, is introduced into knowledge distillation for the first time. By guiding intrinsically interpretable models to assimilate this knowledge, the transfer of MEG decoding information from deep models to intrinsically interpretable models is implemented. Experimental results demonstrate that the proposed approach outperforms the benchmark knowledge distillation algorithms. This approach successfully improves the prediction accuracy of Soft Decision Tree by a maximum of 8.28%, reaching almost equivalent or even superior performance to deep teacher models. Furthermore, the model-agnostic nature of this approach offers broad application potential.
Collapse
|
46
|
Niu Y, Ding M, Ge M, Karlsson R, Zhang Y, Carballo A, Takeda K. R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut. SENSORS (BASEL, SWITZERLAND) 2024; 24:2695. [PMID: 38732800 PMCID: PMC11085337 DOI: 10.3390/s24092695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 04/20/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024]
Abstract
Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the "Relationship Weighted Out" and the "Cut" modules. The "Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the "Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, which is specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in scenarios with complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.
Collapse
Affiliation(s)
- Yingjie Niu
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Ming Ding
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Maoning Ge
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Robin Karlsson
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Yuxiao Zhang
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Alexander Carballo
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
- Graduate School of Engineering, Gifu University, Gifu 501-1112, Japan
| | - Kazuya Takeda
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
- Tier IV Inc., Tokyo 140-0001, Japan
| |
Collapse
|
47
|
Huang C, Jiang Y, Yang X, Wei C, Chen H, Xiong W, Lin H, Wang X, Tian T, Tan H. Enhancing Retinal Fundus Image Quality Assessment With Swin-Transformer-Based Learning Across Multiple Color-Spaces. Transl Vis Sci Technol 2024; 13:8. [PMID: 38568606 PMCID: PMC10996994 DOI: 10.1167/tvst.13.4.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/18/2024] [Indexed: 04/05/2024] Open
Abstract
Purpose The assessment of retinal image (RI) quality holds significant importance in both clinical trials and large datasets, because suboptimal images can potentially conceal early signs of diseases, thereby resulting in inaccurate medical diagnoses. This study aims to develop an automatic method for Retinal Image Quality Assessment (RIQA) that incorporates visual explanations, aiming to comprehensively evaluate the quality of retinal fundus images (RIs). Methods We developed an automatic RIQA system, named Swin-MCSFNet, utilizing 28,792 RIs from the EyeQ dataset, as well as 2000 images from the EyePACS dataset and an additional 1,000 images from the OIA-ODIR dataset. After preprocessing, including cropping black regions, data augmentation, and normalization, a Swin-MCSFNet classifier based on the Swin-Transformer for multiple color-space fusion was proposed to grade the quality of RIs. The generalizability of Swin-MCSFNet was validated across multiple data centers. Additionally, for enhanced interpretability, a Score-CAM-generated heatmap was applied to provide visual explanations. Results Experimental results reveal that the proposed Swin-MCSFNet achieves promising performance, yielding a micro-receiver operating characteristic (ROC) of 0.93 and ROC scores of 0.96, 0.81, and 0.96 for the "Good," "Usable," and "Reject" categories, respectively. These scores underscore the accuracy of RIQA based on Swin-MCSF in distinguishing among the three categories. Furthermore, heatmaps generated across different RIQA classification scores and various color spaces suggest that regions in the retinal images from multiple color spaces contribute significantly to the decision-making process of the Swin-MCSFNet classifier. Conclusions Our study demonstrates that the proposed Swin-MCSFNet outperforms other methods in experiments conducted on multiple datasets, as evidenced by the superior performance metrics and insightful Score-CAM heatmaps. Translational Relevance This study constructs a new retinal image quality evaluation system, which will contribute to the subsequent research of retinal images.
Collapse
Affiliation(s)
- Chengcheng Huang
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Yukang Jiang
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Xiaochun Yang
- The First People's Hospital of Yun Nan Province, Kunming, China
| | - Chiyu Wei
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Hongyu Chen
- Department of Optoelectronic Information Science and Engineering, Physical and Materials Science College, Guangzhou University, Guangzhou, China
| | - Weixue Xiong
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Henghui Lin
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Xueqin Wang
- School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Ting Tian
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Haizhu Tan
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| |
Collapse
|
48
|
Deng J, Heybati K, Shammas-Toma M. When vision meets reality: Exploring the clinical applicability of GPT-4 with vision. Clin Imaging 2024; 108:110101. [PMID: 38341880 DOI: 10.1016/j.clinimag.2024.110101] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/29/2024] [Accepted: 02/01/2024] [Indexed: 02/13/2024]
Affiliation(s)
- Jiawen Deng
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, ON, Canada.
| | - Kiyan Heybati
- Mayo Clinic Alix School of Medicine, Mayo Clinic, Jacksonville, FL, USA
| | - Matthew Shammas-Toma
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, ON, Canada
| |
Collapse
|
49
|
Hong SJ, Hou JU, Chung MJ, Kang SH, Shim BS, Lee SL, Park DH, Choi A, Oh JY, Lee KJ, Shin E, Cho E, Park SW. Convolutional neural network model for automatic recognition and classification of pancreatic cancer cell based on analysis of lipid droplet on unlabeled sample by 3D optical diffraction tomography. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 246:108041. [PMID: 38325025 DOI: 10.1016/j.cmpb.2024.108041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/05/2024] [Accepted: 01/19/2024] [Indexed: 02/09/2024]
Abstract
INTRODUCTION Pancreatic cancer cells generally accumulate large numbers of lipid droplets (LDs), which regulate lipid storage. To promote rapid diagnosis, an automatic pancreatic cancer cell recognition system based on a deep convolutional neural network was proposed in this study using quantitative images of LDs from stain-free cytologic samples by optical diffraction tomography. METHODS We retrieved 3D refractive index tomograms and reconstructed 37 optical images of one cell. From the four cell lines, the obtained fields were separated into training and test datasets with 10,397 and 3,478 images, respectively. Furthermore, we adopted several machine learning techniques based on a single image-based prediction model to improve the performance of the computer-aided diagnostic system. RESULTS Pancreatic cancer cells had a significantly lower total cell volume and dry mass than did normal pancreatic cells and were accompanied by greater numbers of lipid droplets (LDs). When evaluating multitask learning techniques utilizing the EfficientNet-b3 model through confusion matrices, the overall 2-category accuracy for cancer classification reached 96.7 %. Simultaneously, the overall 4-category accuracy for individual cell line classification achieved a high accuracy of 96.2 %. Furthermore, when we added the core techniques one by one, the overall performance of the proposed technique significantly improved, reaching an area under the curve (AUC) of 0.997 and an accuracy of 97.06 %. Finally, the AUC reached 0.998 through the ablation study with the score fusion technique. DISCUSSION Our novel training strategy has significant potential for automating and promoting rapid recognition of pancreatic cancer cells. In the near future, deep learning-embedded medical devices will substitute laborious manual cytopathologic examinations for sustainable economic potential.
Collapse
Affiliation(s)
- Seok Jin Hong
- Department of Otolaryngology-Head and Neck Surgery, Kangbuk Samsung Hospital Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jong-Uk Hou
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Moon Jae Chung
- Division of Gastroenterology, Department of Internal Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sung Hun Kang
- Department of Otolaryngology-Head and Neck Surgery, Kangbuk Samsung Hospital Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Bo-Seok Shim
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Seung-Lee Lee
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Da Hae Park
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Anna Choi
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Jae Yeon Oh
- Hallym University College of Medicine, Chuncheon, Republic of Korea
| | - Kyong Joo Lee
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Eun Shin
- Department of Pathology, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong, Republic of Korea
| | - Eunae Cho
- Division of Gastroenterology, Department of Internal Medicine, Chonnam National University Hospital, Gwangju, Republic of Korea
| | - Se Woo Park
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea.
| |
Collapse
|
50
|
Famiglini L, Campagner A, Barandas M, La Maida GA, Gallazzi E, Cabitza F. Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems. Comput Biol Med 2024; 170:108042. [PMID: 38308866 DOI: 10.1016/j.compbiomed.2024.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/19/2023] [Accepted: 01/26/2024] [Indexed: 02/05/2024]
Abstract
This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
Collapse
Affiliation(s)
- Lorenzo Famiglini
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | | | - Marilia Barandas
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, Porto, Portugal
| | | | - Enrico Gallazzi
- Istituto Ortopedico Gaetano Pini - ASST Pini-CTO, Milan, Italy
| | - Federico Cabitza
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy; IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| |
Collapse
|