1
|
Fu Z, Wang Z, Yu C, Xu X, Li D. Double Confidence Calibration Focused Distillation for Task-Incremental Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9070-9083. [PMID: 38954573 DOI: 10.1109/tnnls.2024.3418811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Task-incremental learning methods that adopt knowledge distillation face two significant challenges: confidence bias and knowledge loss. These challenges make it difficult to effectively balance the stability and plasticity of the network in the incremental learning process. In this article, we propose double confidence calibration focused distillation (DCCFD) to address these challenges. We introduce intratask and intertask confidence calibration (ECC) modules that can mitigate network overconfidence during incremental learning and reduce the degree of feature representation bias. We also propose a focused distillation (FD) module that can alleviate the problem of knowledge loss during the task increment process, improving model stability without reducing plasticity. Experimental results on the CIFAR-100, TinyImageNet, and CORE-50 datasets demonstrate the effectiveness of our method, with performance that matches or exceeds the state of the art. Furthermore, our method can be used as a plug-and-play module to consistently improve class-incremental learning methods.
Collapse
|
2
|
Zhi R, Meng Y, Hou J, Wan J. Dual Balanced Class-Incremental Learning With im-Softmax and Angular Rectification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4437-4447. [PMID: 38442059 DOI: 10.1109/tnnls.2024.3368341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Owing to the superior performances, exemplar-based methods with knowledge distillation (KD) are widely applied in class incremental learning (CIL). However, it suffers from two drawbacks: 1) data imbalance between the old/learned and new classes causes the bias of the new classifier toward the head/new classes and 2) deep neural networks (DNNs) suffer from distribution drift when learning sequence tasks, which results in narrowed feature space and deficient representation of old tasks. For the first problem, we analyze the insufficiency of softmax loss when dealing with the problem of data imbalance in theory and then propose the imbalance softmax (im-softmax) loss to relieve the imbalanced data learning, where we re-scale the output logits to underfit the head/new classes. For another problem, we calibrate the feature space by incremental-adaptive angular margin (IAAM) loss. The new classes form a complete distribution in feature space yet the old are squeezed. To recover the old feature space, we first compute the included angle of normalized features and normalized anchor prototypes, and use the angle distribution to represent the class distribution, then we replenish the old distribution with the deviation from the new. Each anchor prototype is predefined as a learnable vector for a designated class. The proposed im-softmax reduces the bias in the linear classification layer. IAAM rectifies the representation learning, reduces the intra-class distance, and enlarges the inter-class margin. Finally, we seamlessly combine the im-softmax and IAAM in an end-to-end training framework, called the dual balanced class incremental learning (DBL), for further improvements. Experiments demonstrate the proposed method achieves state-of-the-art (SOTA) performances on several benchmarks, such as CIFAR10, CIFAR100, Tiny-ImageNet, and ImageNet-100.
Collapse
|
3
|
Ji Z, Jiao Z, Wang Q, Pang Y, Han J. Imbalance Mitigation for Continual Learning via Knowledge Decoupling and Dual Enhanced Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3450-3463. [PMID: 38190680 DOI: 10.1109/tnnls.2023.3347477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Continual learning (CL) aims at studying how to learn new knowledge continuously from data streams without catastrophically forgetting the previous knowledge. One of the key problems is catastrophic forgetting, that is, the performance of the model on previous tasks declines significantly after learning the subsequent task. Several studies addressed it by replaying samples stored in the buffer when training new tasks. However, the data imbalance between old and new task samples results in two serious problems: information suppression and weak feature discriminability. The former refers to the information in the sufficient new task samples suppressing that in the old task samples, which is harmful to maintaining the knowledge since the biased output worsens the consistency of the same sample's output at different moments. The latter refers to the feature representation being biased to the new task, which lacks discrimination to distinguish both old and new tasks. To this end, we build an imbalance mitigation for CL (IMCL) framework that incorporates a decoupled knowledge distillation (DKD) approach and a dual enhanced contrastive learning (DECL) approach to tackle both problems. Specifically, the DKD approach alleviates the suppression of the new task on the old tasks by decoupling the model output probability during the replay stage, which better maintains the knowledge of old tasks. The DECL approach enhances both low- and high-level features and fuses the enhanced features to construct contrastive loss to effectively distinguish different tasks. Extensive experiments on three popular datasets show that our method achieves promising performance under task incremental learning (Task-IL), class incremental learning (Class-IL), and domain incremental learning (Domain-IL) settings.
Collapse
|
4
|
Shi Y, Tang S, Li Y, He Z, Tang S, Wang R, Zheng W, Chen Z, Zhou Y. Continual learning for seizure prediction via memory projection strategy. Comput Biol Med 2024; 181:109028. [PMID: 39173485 DOI: 10.1016/j.compbiomed.2024.109028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/30/2024] [Accepted: 08/12/2024] [Indexed: 08/24/2024]
Abstract
Despite extensive algorithms for epilepsy prediction via machine learning, most models are tailored for offline scenarios and cannot handle actual scenarios where data changes over time. Catastrophic forgetting(CF) for learned electroencephalogram(EEG) data occurs when EEG changes dynamically in the clinical setting. This paper implements a continual learning(CL) strategy Memory Projection(MP) for epilepsy prediction, which can be combined with other algorithms to avoid CF. Such a strategy enables the model to learn EEG data from each patient in dynamic subspaces with weak correlation layer by layer to minimize interference and promote knowledge transfer. Regularization Loss Reconstruction Algorithm and Matrix Dimensionality Reduction Algorithm are introduced into the core of MP. Experimental results show that MP exhibits excellent performance and low forgetting rates in sequential learning of seizure prediction. The forgetting rate of accuracy and sensitivity under multiple experiments are below 5%. When learning from multi-center datasets, the forgetting rates for accuracy and sensitivity decrease to 0.65% and 1.86%, making it comparable to state-of-the-art CL strategies. Through ablation experiments, we have analyzed that MP can operate with minimal storage and computational cost, which demonstrates practical potential for seizure prediction in clinical scenarios.
Collapse
Affiliation(s)
- Yufei Shi
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Shishi Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Yuxuan Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Zhipeng He
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Shengsheng Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Ruixuan Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Weishi Zheng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Ziyi Chen
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Yi Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China.
| |
Collapse
|
5
|
Yu H, Cong Y, Sun G, Hou D, Liu Y, Dong J. Open-Ended Online Learning for Autonomous Visual Perception. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10178-10198. [PMID: 37027689 DOI: 10.1109/tnnls.2023.3242448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The visual perception systems aim to autonomously collect consecutive visual data and perceive the relevant information online like human beings. In comparison with the classical static visual systems focusing on fixed tasks (e.g., face recognition for visual surveillance), the real-world visual systems (e.g., the robot visual system) often need to handle unpredicted tasks and dynamically changed environments, which need to imitate human-like intelligence with open-ended online learning ability. Therefore, we provide a comprehensive analysis of open-ended online learning problems for autonomous visual perception in this survey. Based on "what to online learn" among visual perception scenarios, we classify the open-ended online learning methods into five categories: instance incremental learning to handle data attributes changing, feature evolution learning for incremental and decremental features with the feature dimension changed dynamically, class incremental learning and task incremental learning aiming at online adding new coming classes/tasks, and parallel and distributed learning for large-scale data to reveal the computational and storage advantages. We discuss the characteristic of each method and introduce several representative works as well. Finally, we introduce some representative visual perception applications to show the enhanced performance when using various open-ended online learning models, followed by a discussion of several future directions.
Collapse
|
6
|
Ohki T, Kunii N, Chao ZC. Efficient, continual, and generalized learning in the brain - neural mechanism of Mental Schema 2.0. Rev Neurosci 2023; 34:839-868. [PMID: 36960579 DOI: 10.1515/revneuro-2022-0137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 02/26/2023] [Indexed: 03/25/2023]
Abstract
There has been tremendous progress in artificial neural networks (ANNs) over the past decade; however, the gap between ANNs and the biological brain as a learning device remains large. With the goal of closing this gap, this paper reviews learning mechanisms in the brain by focusing on three important issues in ANN research: efficiency, continuity, and generalization. We first discuss the method by which the brain utilizes a variety of self-organizing mechanisms to maximize learning efficiency, with a focus on the role of spontaneous activity of the brain in shaping synaptic connections to facilitate spatiotemporal learning and numerical processing. Then, we examined the neuronal mechanisms that enable lifelong continual learning, with a focus on memory replay during sleep and its implementation in brain-inspired ANNs. Finally, we explored the method by which the brain generalizes learned knowledge in new situations, particularly from the mathematical generalization perspective of topology. Besides a systematic comparison in learning mechanisms between the brain and ANNs, we propose "Mental Schema 2.0," a new computational property underlying the brain's unique learning ability that can be implemented in ANNs.
Collapse
Affiliation(s)
- Takefumi Ohki
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Naoto Kunii
- Department of Neurosurgery, The University of Tokyo, Tokyo 113-0033, Japan
| | - Zenas C Chao
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
7
|
Jeon I, Kim T. Distinctive properties of biological neural networks and recent advances in bottom-up approaches toward a better biologically plausible neural network. Front Comput Neurosci 2023; 17:1092185. [PMID: 37449083 PMCID: PMC10336230 DOI: 10.3389/fncom.2023.1092185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Although it may appear infeasible and impractical, building artificial intelligence (AI) using a bottom-up approach based on the understanding of neuroscience is straightforward. The lack of a generalized governing principle for biological neural networks (BNNs) forces us to address this problem by converting piecemeal information on the diverse features of neurons, synapses, and neural circuits into AI. In this review, we described recent attempts to build a biologically plausible neural network by following neuroscientifically similar strategies of neural network optimization or by implanting the outcome of the optimization, such as the properties of single computational units and the characteristics of the network architecture. In addition, we proposed a formalism of the relationship between the set of objectives that neural networks attempt to achieve, and neural network classes categorized by how closely their architectural features resemble those of BNN. This formalism is expected to define the potential roles of top-down and bottom-up approaches for building a biologically plausible neural network and offer a map helping the navigation of the gap between neuroscience and AI engineering.
Collapse
Affiliation(s)
| | - Taegon Kim
- Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
| |
Collapse
|
8
|
Fukai T. Computational models of Idling brain activity for memory processing. Neurosci Res 2022; 189:75-82. [PMID: 36592825 DOI: 10.1016/j.neures.2022.12.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 12/29/2022] [Indexed: 01/01/2023]
Abstract
Studying the underlying neural mechanisms of cognitive functions of the brain is one of the central questions in modern biology. Moreover, it has significantly impacted the development of novel technologies in artificial intelligence. Spontaneous activity is a unique feature of the brain and is currently lacking in many artificially constructed intelligent machines. Spontaneous activity may represent the brain's idling states, which are internally driven by neuronal networks and possibly participate in offline processing during awake, sleep, and resting states. Evidence is accumulating that the brain's spontaneous activity is not mere noise but part of the mechanisms to process information about previous experiences. A bunch of literature has shown how previous sensory and behavioral experiences influence the subsequent patterns of brain activity with various methods in various animals. It seems, however, that the patterns of neural activity and their computational roles differ significantly from area to area and from function to function. In this article, I review the various forms of the brain's spontaneous activity, especially those observed during memory processing, and some attempts to model the generation mechanisms and computational roles of such activities.
Collapse
Affiliation(s)
- Tomoki Fukai
- Okinawa Institute of Science and Technology, Tancha 1919-1, Onna-son, Okinawa 904-0495, Japan.
| |
Collapse
|
9
|
Irfan B, Hellou M, Belpaeme T. Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions. Front Robot AI 2021; 8:676814. [PMID: 34651017 PMCID: PMC8505524 DOI: 10.3389/frobt.2021.676814] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
While earlier research in human-robot interaction pre-dominantly uses rule-based architectures for natural language interaction, these approaches are not flexible enough for long-term interactions in the real world due to the large variation in user utterances. In contrast, data-driven approaches map the user input to the agent output directly, hence, provide more flexibility with these variations without requiring any set of rules. However, data-driven approaches are generally applied to single dialogue exchanges with a user and do not build up a memory over long-term conversation with different users, whereas long-term interactions require remembering users and their preferences incrementally and continuously and recalling previous interactions with users to adapt and personalise the interactions, known as the lifelong learning problem. In addition, it is desirable to learn user preferences from a few samples of interactions (i.e., few-shot learning). These are known to be challenging problems in machine learning, while they are trivial for rule-based approaches, creating a trade-off between flexibility and robustness. Correspondingly, in this work, we present the text-based Barista Datasets generated to evaluate the potential of data-driven approaches in generic and personalised long-term human-robot interactions with simulated real-world problems, such as recognition errors, incorrect recalls and changes to the user preferences. Based on these datasets, we explore the performance and the underlying inaccuracies of the state-of-the-art data-driven dialogue models that are strong baselines in other domains of personalisation in single interactions, namely Supervised Embeddings, Sequence-to-Sequence, End-to-End Memory Network, Key-Value Memory Network, and Generative Profile Memory Network. The experiments show that while data-driven approaches are suitable for generic task-oriented dialogue and real-time interactions, no model performs sufficiently well to be deployed in personalised long-term interactions in the real world, because of their inability to learn and use new identities, and their poor performance in recalling user-related data.
Collapse
Affiliation(s)
- Bahar Irfan
- Centre for Robotics and Neural Systems, University of Plymouth,Plymouth, United Kingdom
| | | | - Tony Belpaeme
- Centre for Robotics and Neural Systems, University of Plymouth,Plymouth, United Kingdom
- IDLab-imec, Ghent University, Ghent, Belgium
| |
Collapse
|