1
|
Gou J, Xin X, Yu B, Song H, Zhang W, Wan S. Neighborhood relation-based knowledge distillation for image classification. Neural Netw 2025; 188:107429. [PMID: 40179584 DOI: 10.1016/j.neunet.2025.107429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 02/12/2025] [Accepted: 03/20/2025] [Indexed: 04/05/2025]
Abstract
As an efficient model compression method, recent knowledge distillation methods primarily transfer the knowledge from a large teacher model to a small student model by minimizing the differences between the predictions from teacher and student. However, the relationship between different samples has not been well-investigated, since recent relational distillation methods mainly construct the knowledge from all randomly selected samples, e.g., the similarity matrix of mini-batch samples. In this paper, we propose Neighborhood Relation-Based Knowledge Distillation (NRKD) to consider the local structure as the novel relational knowledge for better knowledge transfer. Specifically, we first find a subset of samples with their K-nearest neighbors according to the similarity matrix of mini-batch samples and then build the neighborhood relationship knowledge for knowledge distillation, where the characterized relational knowledge can be transferred by both intermediate feature maps and output logits. We perform extensive experiments on several popular image classification datasets for knowledge distillation, including CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet. Experimental results demonstrate that the proposed NRKD yields competitive results, compared to the state-of-the art distillation methods. Our codes are available at: https://github.com/xinxiaoxiaomeng/NRKD.git.
Collapse
Affiliation(s)
- Jianping Gou
- College of Computer and Information Science, College of Software, Southwest University, Chongqing, 400715, China.
| | - Xiaomeng Xin
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, Jiangsu, China.
| | - Baosheng Yu
- Lee Kong Chian School of Medicine, Nanyang Technological University, 639798, Singapore.
| | - Heping Song
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, Jiangsu, China.
| | - Weiyong Zhang
- College of Computer and Information Science, College of Software, Southwest University, Chongqing, 400715, China.
| | - Shaohua Wan
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzheng, 518110, Guangdong, China.
| |
Collapse
|
2
|
Fan C, Guo D, Wang Z, Wang M. Multi-Objective Convex Quantization for Efficient Model Compression. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2313-2329. [PMID: 40030682 DOI: 10.1109/tpami.2024.3521589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.
Collapse
|
3
|
Tölle M, Garthe P, Scherer C, Seliger JM, Leha A, Krüger N, Simm S, Martin S, Eble S, Kelm H, Bednorz M, André F, Bannas P, Diller G, Frey N, Groß S, Hennemuth A, Kaderali L, Meyer A, Nagel E, Orwat S, Seiffert M, Friede T, Seidler T, Engelhardt S. Real world federated learning with a knowledge distilled transformer for cardiac CT imaging. NPJ Digit Med 2025; 8:88. [PMID: 39915633 PMCID: PMC11802793 DOI: 10.1038/s41746-025-01434-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 01/02/2025] [Indexed: 02/09/2025] Open
Abstract
Federated learning is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often face challenges like partially labeled datasets, where only a few locations have certain expert annotations, leaving large portions of unlabeled data unused. Leveraging these could enhance transformer architectures' ability in regimes with small and diversely annotated sets. We conduct the largest federated cardiac CT analysis to date (n = 8, 104) in a real-world setting across eight hospitals. Our two-step semi-supervised strategy distills knowledge from task-specific CNNs into a transformer. First, CNNs predict on unlabeled data per label type and then the transformer learns from these predictions with label-specific heads. This improves predictive accuracy and enables simultaneous learning of all partial labels across the federation, and outperforms UNet-based models in generalizability on downstream tasks. Code and model weights are made openly available for leveraging future cardiac CT analysis.
Collapse
Affiliation(s)
- Malte Tölle
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany.
- Heidelberg University, Heidelberg, Germany.
- Informatics for Life Institute, Heidelberg, Germany.
| | - Philipp Garthe
- Clinic for Cardiology III, University Hospital Münster, Münster, Germany
| | - Clemens Scherer
- DZHK (German Centre for Cardiovascular Research), partner site Munich, Munich, Germany
- Department of Medicine I, LMU University Hospital, LMU Munich, Munich, Germany
| | - Jan Moritz Seliger
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
- Department of Diagnostic and Interventional Radiology and Nuclear Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Andreas Leha
- DZHK (German Centre for Cardiovascular Research), partner site Lower Saxony, Göttingen, Germany
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Nina Krüger
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
- Deutsches Herzzentrum der Charité (DHZC), Institute of Computer-assisted Cardiovascular Medicine, Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
| | - Stefan Simm
- DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | - Simon Martin
- DZHK (German Centre for Cardiovascular Research), partner site RhineMain, Frankfurt, Germany
- Institute for Experimental and Translational Cardiovascular Imaging, Goethe University, Frankfurt am Main, Germany
| | - Sebastian Eble
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
| | - Halvar Kelm
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
| | - Moritz Bednorz
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
| | - Florian André
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
- Heidelberg University, Heidelberg, Germany
- Informatics for Life Institute, Heidelberg, Germany
| | - Peter Bannas
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
- Department of Diagnostic and Interventional Radiology and Nuclear Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Gerhard Diller
- Clinic for Cardiology III, University Hospital Münster, Münster, Germany
| | - Norbert Frey
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
- Heidelberg University, Heidelberg, Germany
- Informatics for Life Institute, Heidelberg, Germany
| | - Stefan Groß
- DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | - Anja Hennemuth
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
- Deutsches Herzzentrum der Charité (DHZC), Institute of Computer-assisted Cardiovascular Medicine, Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
| | - Lars Kaderali
- DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | - Alexander Meyer
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
- Deutsches Herzzentrum der Charité (DHZC), Institute of Computer-assisted Cardiovascular Medicine, Berlin, Germany
| | - Eike Nagel
- DZHK (German Centre for Cardiovascular Research), partner site RhineMain, Frankfurt, Germany
- Institute for Experimental and Translational Cardiovascular Imaging, Goethe University, Frankfurt am Main, Germany
| | - Stefan Orwat
- Clinic for Cardiology III, University Hospital Münster, Münster, Germany
| | - Moritz Seiffert
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Tim Friede
- DZHK (German Centre for Cardiovascular Research), partner site Lower Saxony, Göttingen, Germany
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Tim Seidler
- DZHK (German Centre for Cardiovascular Research), partner site Lower Saxony, Göttingen, Germany
- Department of Cardiology, University Medicine Göttingen, Göttingen, Germany
- Department of Cardiology, Campus Kerckhoff of the Justus-Liebig-University at Gießen, Kerckhoff-Clinic, Gießen, Germany
| | - Sandy Engelhardt
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany
- Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany
- Heidelberg University, Heidelberg, Germany
- Informatics for Life Institute, Heidelberg, Germany
| |
Collapse
|
4
|
Dominguez-Morales JP, Hernandez-Rodriguez JC, Duran-Lopez L, Conejo-Mir J, Pereyra-Rodriguez JJ. Melanoma Breslow Thickness Classification Using Ensemble-Based Knowledge Distillation With Semi-Supervised Convolutional Neural Networks. IEEE J Biomed Health Inform 2025; 29:443-455. [PMID: 39302772 DOI: 10.1109/jbhi.2024.3465929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Melanoma is considered a global public health challenge and is responsible for more than 90% deaths related to skin cancer. Although the diagnosis of early melanoma is the main goal of dermoscopy, the discrimination between dermoscopic images of in situ and invasive melanomas can be a difficult task even for experienced dermatologists. Recent advances in artificial intelligence in the field of medical image analysis show that its application to dermoscopy with the aim of supporting and providing a second opinion to the medical expert could be of great interest. In this work, four datasets from different sources were used to train and evaluate deep learning models on in situ versus invasive melanoma classification and on Breslow thickness prediction. Supervised learning and semi-supervised learning using a multi-teacher ensemble knowledge distillation approach were considered and evaluated using a stratified 5-fold cross-validation scheme. The best models achieved AUCs of 0.80850.0242 and of 0.82320.0666 on the former and latter classification tasks, respectively. The best results were obtained using semi-supervised learning, with the best model achieving 0.8547 and 0.8768 AUC, respectively. An external test set was also evaluated, where semi-supervision achieved higher performance in all the classification tasks. The results obtained show that semi-supervised learning could improve the performance of trained models in different melanoma classification tasks compared to supervised learning. Automatic deep learning-based diagnosis systems could support medical professionals in their decision, serving as a second opinion or as a triage tool for medical centers.
Collapse
|
5
|
Gou J, Sun L, Yu B, Du L, Ramamohanarao K, Tao D. Collaborative Knowledge Distillation via Multiknowledge Transfer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6718-6730. [PMID: 36264723 DOI: 10.1109/tnnls.2022.3212733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instance features or relations via a specific distillation strategy, failing to explore the idea of transferring different types of knowledge with different distillation strategies. Moreover, the widely used offline distillation also suffers from a limited learning capacity due to the fixed large-to-small teacher-student architecture. In this article, we devise a collaborative KD via multiknowledge transfer (CKD-MKT) that prompts both self-learning and collaborative learning in a unified framework. Specifically, CKD-MKT utilizes a multiple knowledge transfer framework that assembles self and online distillation strategies to effectively: 1) fuse different kinds of knowledge, which allows multiple students to learn knowledge from both individual instances and instance relations, and 2) guide each other by learning from themselves using collaborative and self-learning. Experiments and ablation studies on six image datasets demonstrate that the proposed CKD-MKT significantly outperforms recent state-of-the-art methods for KD.
Collapse
|
6
|
Zhang Y, Chen Z, Yang X. Light-M: An efficient lightweight medical image segmentation framework for resource-constrained IoMT. Comput Biol Med 2024; 170:108088. [PMID: 38320339 DOI: 10.1016/j.compbiomed.2024.108088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/22/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024]
Abstract
The Internet of Medical Things (IoMT) is being incorporated into current healthcare systems. This technology intends to connect patients, IoMT devices, and hospitals over mobile networks, allowing for more secure, quick, and convenient health monitoring and intelligent healthcare services. However, existing intelligent healthcare applications typically rely on large-scale AI models, and standard IoMT devices have significant resource constraints. To alleviate this paradox, in this paper, we propose a Knowledge Distillation (KD)-based IoMT end-edge-cloud orchestrated architecture for medical image segmentation tasks, called Light-M, aiming to deploy a lightweight medical model in resource-constrained IoMT devices. Specifically, Light-M trains a large teacher model in the cloud server and employs computation in local nodes through imitation of the performance of the teacher model using knowledge distillation. Light-M contains two KD strategies: (1) active exploration and passive transfer (AEPT) and (2) self-attention-based inter-class feature variation (AIFV) distillation for the medical image segmentation task. The AEPT encourages the student model to learn undiscovered knowledge/features of the teacher model without additional feature layers, aiming to explore new features and outperform the teacher. To improve the distinguishability of the student for different classes, the student learns the self-attention-based feature variation (AIFV) between classes. Since the proposed AEPT and AIFV only appear in the training process, our framework does not involve any additional computation burden for a student model during the segmentation task deployment. Extensive experiments on cardiac images and public real-scene datasets demonstrate that our approach improves student model learning representations and outperforms state-of-the-art methods by combining two knowledge distillation strategies. Moreover, when deployed on the IoT device, the distilled student model takes only 29.6 ms for one sample at the inference step.
Collapse
Affiliation(s)
- Yifan Zhang
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China
| | - Zhuangzhuang Chen
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China
| | - Xuan Yang
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China.
| |
Collapse
|
7
|
Yan X, Jia L, Cao H, Yu Y, Wang T, Zhang F, Guan Q. Multitargets Joint Training Lightweight Model for Object Detection of Substation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2413-2424. [PMID: 35877791 DOI: 10.1109/tnnls.2022.3190139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The object detection of the substation is the key to ensuring the safety and reliable operation of the substation. The traditional image detection algorithms use the corresponding texture features of single-class objects and would not handle other different class objects easily. The object detection algorithm based on deep networks has generalization, and its sizeable complex backbone limits the application in the substation monitoring terminals with weak computing power. This article proposes a multitargets joint training lightweight model. The proposed model uses the feature maps of the complex model and the labels of objects in images as training multitargets. The feature maps have deeper feature information, and the feature maps of complex networks have higher information entropy than lightweight networks have. This article proposes the heat pixels method to improve the adequate object information because of the imbalance of the proportion between the foreground and the background. The heat pixels method is designed as a kind of reverse network calculation and reflects the object's position to the pixels of the feature maps. The temperature of the pixels indicates the probability of the existence of the objects in the locations. Three different lightweight networks use the complex model feature maps and the traditional tags as the training multitargets. The public dataset VOC and the substation equipment dataset are adopted in the experiments. The experimental results demonstrate that the proposed model can effectively improve object detection accuracy and reduce the time-consuming and calculation amount.
Collapse
|
8
|
Zhou L, Ni X, Kong Y, Zeng H, Xu M, Zhou J, Wang Q, Liu C. Mitigating misalignment in MRI-to-CT synthesis for improved synthetic CT generation: an iterative refinement and knowledge distillation approach. Phys Med Biol 2023; 68:245020. [PMID: 37976548 DOI: 10.1088/1361-6560/ad0ddc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 11/17/2023] [Indexed: 11/19/2023]
Abstract
Objective.Deep learning has shown promise in generating synthetic CT (sCT) from magnetic resonance imaging (MRI). However, the misalignment between MRIs and CTs has not been adequately addressed, leading to reduced prediction accuracy and potential harm to patients due to the generative adversarial network (GAN)hallucination phenomenon. This work proposes a novel approach to mitigate misalignment and improve sCT generation.Approach.Our approach has two stages: iterative refinement and knowledge distillation. First, we iteratively refine registration and synthesis by leveraging their complementary nature. In each iteration, we register CT to the sCT from the previous iteration, generating a more aligned deformed CT (dCT). We train a new model on the refined 〈dCT, MRI〉 pairs to enhance synthesis. Second, we distill knowledge by creating a target CT (tCT) that combines sCT and dCT images from the previous iterations. This further improves alignment beyond the individual sCT and dCT images. We train a new model with the 〈tCT, MRI〉 pairs to transfer insights from multiple models into this final knowledgeable model.Main results.Our method outperformed conditional GANs on 48 head and neck cancer patients. It reduced hallucinations and improved accuracy in geometry (3% ↑ Dice), intensity (16.7% ↓ MAE), and dosimetry (1% ↑γ3%3mm). It also achieved <1% relative dose difference for specific dose volume histogram points.Significance.This pioneering approach for addressing misalignment shows promising performance in MRI-to-CT synthesis for MRI-only planning. It could be applied to other modalities like cone beam computed tomography and tasks such as organ contouring.
Collapse
Affiliation(s)
- Leyuan Zhou
- Department of Radiation Oncology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, People's Republic of China
- Department of Radiation Oncology, Affiliated Hospital of Jiangnan University, Wuxi, People's Republic of China
| | - Xinye Ni
- Radiation Oncology Center, Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, People's Republic of China
- Center of Medical Physics, Nanjing Medical University, Changzhou, People's Republic of China
| | - Yan Kong
- Department of Radiation Oncology, Affiliated Hospital of Jiangnan University, Wuxi, People's Republic of China
| | - Haibin Zeng
- Department of Radiation Oncology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, People's Republic of China
| | - Muchen Xu
- Department of Radiation Oncology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, People's Republic of China
| | - Juying Zhou
- Department of Radiation Oncology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, People's Republic of China
| | - Qingxin Wang
- Department of Radiation Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, People's Republic of China
| | - Cong Liu
- Radiation Oncology Center, Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, People's Republic of China
- Center of Medical Physics, Nanjing Medical University, Changzhou, People's Republic of China
- Faculty of Business Information, Shanghai Business School, Shanghai, People's Republic of China
| |
Collapse
|
9
|
Xu C, Song Y, Zhang D, Bittencourt LK, Tirumani SH, Li S. Spatiotemporal knowledge teacher-student reinforcement learning to detect liver tumors without contrast agents. Med Image Anal 2023; 90:102980. [PMID: 37820417 DOI: 10.1016/j.media.2023.102980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 09/18/2023] [Accepted: 09/22/2023] [Indexed: 10/13/2023]
Abstract
Detecting Liver tumors without contrast agents (CAs) has shown great potential to advance liver cancer screening. It enables the provision of a reliable liver tumor-detecting result from non-enhanced MR images comparable to the radiologists' results from CA-enhanced MR images, thus eliminating the high risk of CAs, preventing an experience gap between radiologists and simplifying clinical workflows. In this paper, we proposed a novel spatiotemporal knowledge teacher-student reinforcement learning (SKT-RL) as a safe, speedy, and inexpensive contrast-free technology for liver tumor detection. Our SKT-RL builds a teacher-student framework to realize the exploring of explicit liver tumor knowledge from a teacher network on clear contrast-enhanced images to guide a student network to detect tumors from non-enhanced images directly. Importantly, our STK-RL enables three novelties in aspects of construction, transferring, and optimization to tumor knowledge to improve the guide effect. (1) A new spatiotemporal ternary knowledge set enables the construction of accurate knowledge that allows understanding of DRL's behavior (what to do) and reason (why to do it) behind reliable detection within each state and between their related historical states. (2) A novel pixel momentum transferring strategy enables detailed and controlled knowledge transfer ability. It transfers knowledge at a pixel level to enlarge the explorable space of transferring and control how much knowledge is transferred to prevent over-rely of the student to the teacher. (3) A phase-trend reward function designs different evaluations according to different detection phases to optimal for each phase in high-precision but also allows reward trend to constraint the evaluation to improve stability. Comprehensive experiments on a generalized liver tumor dataset with 375 patients (including hemangiomas, hepatocellular carcinoma, and normal controls) show that our novel SKT-RL attains a new state-of-the-art performance (improved precision by at least 4% when comparing the six recent advanced methods) in the task of liver tumor detection without CAs. The results proved that our SKT-DRL has greatly promoted the development and deployment of contrast-free liver tumor technology.
Collapse
Affiliation(s)
- Chenchu Xu
- School of Computer Science and Technology, Anhui University, Hefei, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Yuhong Song
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Dong Zhang
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada.
| | | | | | - Shuo Li
- School of Engineering, Case Western Reserve University, Cleveland, United States.
| |
Collapse
|
10
|
Yu C, Liu H, Zhang H. Distilling sub-space structure across views for cardiac indices estimation. Med Image Anal 2023; 85:102764. [PMID: 36791621 DOI: 10.1016/j.media.2023.102764] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 01/20/2023] [Accepted: 02/02/2023] [Indexed: 02/09/2023]
Abstract
Cardiac indices estimation in multi-view images attracts great attention due to its capability for cardiac function assessment. However, the variation of the cardiac indices across views causes that most cardiac indices estimation methods can only be trained separately in each view, resulting in low data utilization. To solve this problem, we have proposed distilling the sub-space structure across views to explore the multi-view data fully for cardiac indices estimation. In particular, the sub-space structure is obtained via building a n×n covariance matrix to describe the correlation between the output dimensions of all views. Then, an alternate convex search algorithm is proposed to optimize the cross-view learning framework by which: (i) we train the model with regularization of sub-space structure in each view; (ii) we update the sub-space structure based on the learned parameters from all views. In the end, we have conducted a series of experiments to verify the effectiveness of our proposed framework. The model is trained on three views (short axis, 2-chamber view and 4-chamber view) with two modalities (magnetic resonance imaging and computed tomography). Compared to the state-of-the-art methods, our method has demonstrated superior performance on cardiac indices estimation tasks.
Collapse
Affiliation(s)
- Chengjin Yu
- College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
| | - Huafeng Liu
- College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
| | - Heye Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
11
|
Transferable and Differentiable Discrete Network Embedding for multi-domains with Hierarchical Knowledge Distillation. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
12
|
Ye HJ, Lu S, Zhan DC. Generalized Knowledge Distillation via Relationship Matching. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1817-1834. [PMID: 35298374 DOI: 10.1109/tpami.2022.3160328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks. Knowledge distillation extracts knowledge from the teacher and integrates it with the target model (a.k.a. the "student"), which expands the student's knowledge and improves its learning efficacy. Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space - in this "Generalized Knowledge Distillation (GKD)," the classes of the teacher and the student may be the same, completely different, or partially overlapped. We claim that the comparison ability between instances acts as an essential factor threading knowledge across tasks, and propose the RElationship FacIlitated Local cLassifiEr Distillation (stance-label confidence between models, ReFilled requires the teacher to reweight the hard tuples pushed forward by the student and then matches the similarity comparison levels between instances. An embedding-induced classifier based on the teacher model supervises the student's classification confidence and adaptively emphasizes the most related supervision from the teacher. ReFilled demonstrates strong discriminative ability when the classes of the teacher vary from the same to a fully non-overlapped set w.r.t. the student. It also achieves state-of-the-art performance on standard knowledge distillation, one-step incremental learning, and few-shot learning tasks.
Collapse
|
13
|
Liu K, Chen K, Jia K. Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5570-5584. [PMID: 35981063 DOI: 10.1109/tip.2022.3197931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fine-grained visual classification can be addressed by deep representation learning under supervision of manually pre-defined targets (e.g., one-hot or the Hadamard codes). Such target coding schemes are less flexible to model inter-class correlation and are sensitive to sparse and imbalanced data distribution as well. In light of this, this paper introduces a novel target coding scheme - dynamic target relation graphs (DTRG), which, as an auxiliary feature regularization, is a self-generated structural output to be mapped from input images. Specifically, online computation of class-level feature centers is designed to generate cross-category distance in the representation space, which can thus be depicted by a dynamic graph in a non-parametric manner. Explicitly minimizing intra-class feature variations anchored on those class-level centers can encourage learning of discriminative features. Moreover, owing to exploiting inter-class dependency, the proposed target graphs can alleviate data sparsity and imbalanceness in representation learning. Inspired by recent success of the mixup style data augmentation, this paper introduces randomness into soft construction of dynamic target relation graphs to further explore relation diversity of target classes. Experimental results can demonstrate the effectiveness of our method on a number of diverse benchmarks of multiple visual classification, especially achieving the state-of-the-art performance on three popular fine-grained object benchmarks and superior robustness against sparse and imbalanced data. Source codes are made publicly available at https://github.com/AkonLau/DTRG.
Collapse
|
14
|
|
15
|
|
16
|
Yu Z, Shen D, Jin Z, Huang J, Cai D, Hua XS. Progressive Transfer Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1340-1348. [PMID: 35025744 DOI: 10.1109/tip.2022.3141258] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e.g., different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between each mini-batch's distribution and the whole dataset's distribution when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the dataset's global information when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the dataset's global information into a latent state and uses it to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by jointly optimizing BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can greatly improve the ReID model's performance on MSMT17, Market-1501, CUHK03, and DukeMTMC-reID datasets. Moreover, we extend our proposal to the general image classification task. The experiments in several image classification benchmark datasets demonstrate that our proposal can significantly improve baseline models' performance. The code has been released at https://github.com/ZJULearning/PTL.
Collapse
|
17
|
Wu L, Dong B, Liu X, Hong W, Chen L, Gao K, Sheng Q, Yu Y, Zhao L, Zhang Y. Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation. Front Pediatr 2022; 9:770182. [PMID: 35118028 PMCID: PMC8805220 DOI: 10.3389/fped.2021.770182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 12/20/2021] [Indexed: 01/10/2023] Open
Abstract
Standard echocardiographic view recognition is a prerequisite for automatic diagnosis of congenital heart defects (CHDs). This study aims to evaluate the feasibility and accuracy of standard echocardiographic view recognition in the diagnosis of CHDs in children using convolutional neural networks (CNNs). A new deep learning-based neural network method was proposed to automatically and efficiently identify commonly used standard echocardiographic views. A total of 367,571 echocardiographic image slices from 3,772 subjects were used to train and validate the proposed echocardiographic view recognition model where 23 standard echocardiographic views commonly used to diagnose CHDs in children were identified. The F1 scores of a majority of views were all ≥0.90, including subcostal sagittal/coronal view of the atrium septum, apical four-chamber view, apical five-chamber view, low parasternal four-chamber view, sax-mid, sax-basal, parasternal long-axis view of the left ventricle (PSLV), suprasternal long-axis view of the entire aortic arch, M-mode echocardiographic recording of the aortic (M-AO) and the left ventricle at the level of the papillary muscle (M-LV), Doppler recording from the mitral valve (DP-MV), the tricuspid valve (DP-TV), the ascending aorta (DP-AAO), the pulmonary valve (DP-PV), and the descending aorta (DP-DAO). This study provides a solid foundation for the subsequent use of artificial intelligence (AI) to identify CHDs in children.
Collapse
Affiliation(s)
- Lanping Wu
- Department of Pediatric Cardiology, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Bin Dong
- Shanghai Engineering Research Center of Intelligence Pediatrics, Shanghai, China
| | - Xiaoqing Liu
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Wenjing Hong
- Department of Pediatric Cardiology, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lijun Chen
- Department of Pediatric Cardiology, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Kunlun Gao
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Qiuyang Sheng
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Yizhou Yu
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Liebin Zhao
- Shanghai Engineering Research Center of Intelligence Pediatrics, Shanghai, China
| | - Yuqi Zhang
- Department of Pediatric Cardiology, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
18
|
Antaris S, Rafailidis D, Girdzijauskas S. Knowledge distillation on neural networks for evolving graphs. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00816-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractGraph representation learning on dynamic graphs has become an important task on several real-world applications, such as recommender systems, email spam detection, and so on. To efficiently capture the evolution of a graph, representation learning approaches employ deep neural networks, with large amount of parameters to train. Due to the large model size, such approaches have high online inference latency. As a consequence, such models are challenging to deploy to an industrial setting with vast number of users/nodes. In this study, we propose DynGKD, a distillation strategy to transfer the knowledge from a large teacher model to a small student model with low inference latency, while achieving high prediction accuracy. We first study different distillation loss functions to separately train the student model with various types of information from the teacher model. In addition, we propose a hybrid distillation strategy for evolving graph representation learning to combine the teacher’s different types of information. Our experiments with five publicly available datasets demonstrate the superiority of our proposed model against several baselines, with average relative drop $$40.60\%$$
40.60
%
in terms of RMSE in the link prediction task. Moreover, our DynGKD model achieves a compression ratio of 21:100, accelerating the inference latency with a speed up factor $$\times 30$$
×
30
, when compared with the teacher model. For reproduction purposes, we make our datasets and implementation publicly available at https://github.com/stefanosantaris/DynGKD.
Collapse
|