1
|
CeCR: Cross-entropy contrastive replay for online class-incremental continual learning. Neural Netw 2024; 173:106163. [PMID: 38430638 DOI: 10.1016/j.neunet.2024.106163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 12/07/2023] [Accepted: 02/02/2024] [Indexed: 03/05/2024]
Abstract
Aiming at the realization of learning continually from an online data stream, replay-based methods have shown superior potential. The main challenge of replay-based methods is the selection of representative samples which are stored in the buffer and replayed. In this paper, we propose the Cross-entropy Contrastive Replay (CeCR) method in the online class-incremental setting. First, we present the Class-focused Memory Retrieval method that proceeds the class-level sampling without replacement. Second, we put forward the class-mean approximation memory update method that selectively replaces the mistakenly classified training samples with samples of current input batch. In addition, the Cross-entropy Contrastive Loss is proposed to implement the model training with obtaining more solid knowledge to achieve effective learning. Experiments show that the CeCR method has comparable or improved performance in two benchmark datasets in comparison with the state-of-the-art methods.
Collapse
|
2
|
A survey on few-shot class-incremental learning. Neural Netw 2024; 169:307-324. [PMID: 37922714 DOI: 10.1016/j.neunet.2023.10.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 11/07/2023]
Abstract
Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta learning-based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective.
Collapse
|
3
|
Subspace distillation for continual learning. Neural Netw 2023; 167:65-79. [PMID: 37625243 DOI: 10.1016/j.neunet.2023.07.047] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 05/31/2023] [Accepted: 07/28/2023] [Indexed: 08/27/2023]
Abstract
An ultimate objective in continual learning is to preserve knowledge learned in preceding tasks while learning new tasks. To mitigate forgetting prior knowledge, we propose a novel knowledge distillation technique that takes into the account the manifold structure of the latent/output space of a neural network in learning novel tasks. To achieve this, we propose to approximate the data manifold up-to its first order, hence benefiting from linear subspaces to model the structure and maintain the knowledge of a neural network while learning novel concepts. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise and therefore effective for mitigating Catastrophic Forgetting in continual learning. We also discuss and show how our proposed method can be adopted to address both classification and segmentation problems. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets including Pascal VOC, and Tiny-Imagenet. Furthermore, we show how the proposed method can be seamlessly combined with existing learning approaches to improve their performances. The codes of this article will be available at https://github.com/csiro-robotics/SDCL.
Collapse
|
4
|
Class incremental learning of remote sensing images based on class similarity distillation. PeerJ Comput Sci 2023; 9:e1583. [PMID: 37810339 PMCID: PMC10557500 DOI: 10.7717/peerj-cs.1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 08/20/2023] [Indexed: 10/10/2023]
Abstract
When a well-trained model learns a new class, the data distribution differences between the new and old classes inevitably cause catastrophic forgetting in order to perform better in the new class. This behavior differs from human learning. In this article, we propose a class incremental object detection method for remote sensing images to address the problem of catastrophic forgetting caused by distribution differences among different classes. First, we introduce a class similarity distillation (CSD) loss based on the similarity between new and old class prototypes, ensuring the model's plasticity to learn new classes and stability to detect old classes. Second, to better extract class similarity features, we propose a global similarity distillation (GSD) loss that maximizes the mutual information between the new class feature and old class features. Additionally, we present a region proposal network (RPN)-based method that assigns positive and negative labels to prevent mislearning issues. Experiments demonstrate that our method is more accurate for class incremental learning on public DOTA and DIOR datasets and significantly improves training efficiency compared to state-of-the-art class incremental object detection methods.
Collapse
|
5
|
Dual memory model for experience-once task-incremental lifelong learning. Neural Netw 2023; 166:174-187. [PMID: 37494763 DOI: 10.1016/j.neunet.2023.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 06/21/2023] [Accepted: 07/07/2023] [Indexed: 07/28/2023]
Abstract
Experience replay (ER) is a widely-adopted neuroscience-inspired method to perform lifelong learning. Nonetheless, existing ER-based approaches consider very coarse memory modules with simple memory and rehearsal mechanisms that cannot fully exploit the potential of memory replay. Evidence from neuroscience has provided fine-grained memory and rehearsal mechanisms, such as the dual-store memory system consisting of PFC-HC circuits. However, the computational abstraction of these processes is still very challenging. To address these problems, we introduce the Dual-Memory (Dual-MEM) model emulating the memorization, consolidation, and rehearsal process in the PFC-HC dual-store memory circuit. Dual-MEM maintains an incrementally updated short-term memory to benefit current-task learning. At the end of the current task, short-term memories will be consolidated into long-term ones for future rehearsal to alleviate forgetting. For the Dual-MEM optimization, we propose two learning policies that emulate different memory retrieval strategies: Direct Retrieval Learning and Mixup Retrieval Learning. Extensive evaluations on eight benchmarks demonstrate that Dual-MEM delivers compelling performance while maintaining high learning and memory utilization efficiencies under the challenging experience-once setting.
Collapse
|
6
|
Mitigate forgetting in few-shot class-incremental learning using different image views. Neural Netw 2023; 165:999-1009. [PMID: 37467587 DOI: 10.1016/j.neunet.2023.06.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 06/06/2023] [Accepted: 06/28/2023] [Indexed: 07/21/2023]
Abstract
In the few-shot class incremental learning (FSCIL) setting, new classes with few training examples become available incrementally, and deep learning models suffer from catastrophic forgetting of the previous classes when trained on new classes. Data augmentation techniques are generally used to increase the training data and improve the model performance. In this work, we demonstrate that differently augmented views of the same image obtained by applying data augmentations may not necessarily activate the same set of neurons in the model. Therefore, the information gained by a model regarding a class, when trained using data augmentation, may not necessarily be stored in the same set of neurons in the model. Consequently, during incremental training, even if some of the model weights that store the previously seen class information for a particular view get overwritten, the information of the previous classes for the other views may still remain intact in the other model weights. Therefore, the impact of catastrophic forgetting on the model predictions is different for different data augmentations used during training. Based on this, we present an Augmentation-based Prediction Rectification (APR) approach to reduce the impact of catastrophic forgetting in the FSCIL setting. APR can also augment other FSCIL approaches and significantly improve their performance. We also propose a novel feature synthesis module (FSM) for synthesizing features relevant to the previously seen classes without requiring training data from these classes. FSM outperforms other generative approaches in this setting. We experimentally show that our approach outperforms other methods on benchmark datasets.
Collapse
|
7
|
Continual learning with invertible generative models. Neural Netw 2023; 164:606-616. [PMID: 37244212 DOI: 10.1016/j.neunet.2023.05.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 02/14/2023] [Accepted: 05/12/2023] [Indexed: 05/29/2023]
Abstract
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF throughout the training process, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network's embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
Collapse
|
8
|
A domain-agnostic approach for characterization of lifelong learning systems. Neural Netw 2023; 160:274-296. [PMID: 36709531 DOI: 10.1016/j.neunet.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/11/2022] [Accepted: 01/08/2023] [Indexed: 01/21/2023]
Abstract
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of (1) Continuous Learning, (2) Transfer and Adaptation, and (3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.
Collapse
|
9
|
IDT: An incremental deep tree framework for biological image classification. Artif Intell Med 2022; 134:102392. [PMID: 36462909 DOI: 10.1016/j.artmed.2022.102392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 08/10/2022] [Accepted: 08/29/2022] [Indexed: 12/13/2022]
Abstract
Nowadays, breast and cervical cancers are respectively the first and fourth most common causes of cancer death in females. It is believed that, automated systems based on artificial intelligence would allow the early diagnostic which increases significantly the chances of proper treatment and survival. Although Convolutional Neural Networks (CNNs) have achieved human-level performance in object classification tasks, the regular growing of the amount of medical data and the continuous increase of the number of classes make them difficult to learn new tasks without being re-trained from scratch. Nevertheless, fine tuning and transfer learning in deep models are techniques that lead to the well-known catastrophic forgetting problem. In this paper, an Incremental Deep Tree (IDT) framework for biological image classification is proposed to address the catastrophic forgetting of CNNs allowing them to learn new classes while maintaining acceptable accuracies on the previously learnt ones. To evaluate the performance of our approach, the IDT framework is compared against with three popular incremental methods, namely iCaRL, LwF and SupportNet. The experimental results on MNIST dataset achieved 87 % of accuracy and the obtained values on the BreakHis, the LBC and the SIPaKMeD datasets are promising with 92 %, 98 % and 93 % respectively.
Collapse
|
10
|
TaskDrop: A competitive baseline for continual learning of sentiment classification. Neural Netw 2022; 155:551-560. [PMID: 36191451 DOI: 10.1016/j.neunet.2022.08.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 07/09/2022] [Accepted: 08/31/2022] [Indexed: 11/23/2022]
Abstract
In this paper, we study the multi-task sentiment classification problem in the continual learning setting, i.e., a model is sequentially trained to classify the sentiment of reviews of products in a particular category. The use of common sentiment words in reviews of different product categories leads to large cross-task similarity, which differentiates it from continual learning in other domains. This knowledge sharing nature renders forgetting reduction focused approaches less effective for the problem under consideration. Unlike existing approaches, where task-specific masks are learned with specifically presumed training objectives, we propose an approach called Task-aware Dropout (TaskDrop) to randomly sample a binary mask for each task. While the standard dropout generates and applies random masks for each training instance per epoch for regularization, random masks in TaskDrop are used for model capacity allocation and reuse to each coming task. We conducted experimental studies on Amazon review data and made comparison to various baselines and state-of-the-art approaches. Our empirical results show that regardless of simplicity, TaskDrop overall achieved competitive performance, especially after relatively long term learning. This demonstrates that the proposed random capacity allocation mechanism works well for continual sentiment classification.
Collapse
|
11
|
Continuous learning of spiking networks trained with local rules. Neural Netw 2022; 155:512-522. [PMID: 36166978 DOI: 10.1016/j.neunet.2022.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/29/2022] [Accepted: 09/02/2022] [Indexed: 10/31/2022]
Abstract
Artificial neural networks (ANNs) experience catastrophic forgetting (CF) during sequential learning. In contrast, the brain can learn continuously without any signs of catastrophic forgetting. Spiking neural networks (SNNs) are the next generation of ANNs with many features borrowed from biological neural networks. Thus, SNNs potentially promise better resilience to CF. In this paper, we study the susceptibility of SNNs to CF and test several biologically inspired methods for mitigating catastrophic forgetting. SNNs are trained with biologically plausible local training rules based on spike-timing-dependent plasticity (STDP). Local training prohibits the direct use of CF prevention methods based on gradients of a global loss function. We developed and tested the method to determine the importance of synapses (weights) based on stochastic Langevin dynamics without the need for the gradients. Several other methods of catastrophic forgetting prevention adapted from analog neural networks were tested as well. The experiments were performed on freely available datasets in the SpykeTorch environment.
Collapse
|
12
|
Adaptive windowing based recurrent neural network for drift adaption in non-stationary environment. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING 2022; 14:1-15. [PMID: 35789602 PMCID: PMC9243804 DOI: 10.1007/s12652-022-04116-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
In today's digital era, many applications generate massive data streams that must be sequenced and processed immediately. Therefore, storing large amounts of data for analysis is impractical. Now, this infinite amount of evolving data confronts concept drifts in data stream classification. Concept drift is a phenomenon in which the distribution of input data or the relationship between input data and target label changes over time. If the drifts are not addressed, the learning model's performance suffers. Non-stationary data streams must be processed as they arrive, and neural networks' built-in capabilities aid in the processing of huge non-stationary data streams. We proposed an adaptive windowing approach based on a gated recurrent unit, a variant of the recurrent neural network incrementally trained on incoming data (for the real-world airline and synthetic Streaming Ensemble Algorithm (SEA) datasets), and employed elastic weight consolidation with the Fisher information matrix to prevent forgetting. Unlike the traditional fixed window methodology, the proposed model dynamically increases the window size if the prediction is correct and reduces it if drifts occur. As a result, an adaptive recurrent neural network model can adapt to changes in the non-stationary data stream and provide consistent performance. Moreover, the findings revealed that on the airline and the SEA dataset, the proposed model outperforms state-of-the-art methods by achieving 67.74% and 91.70% accuracy, respectively. Further, the results demonstrated that the proposed model has a better accuracy of 3.6% and 1.6% for the SEA and the airline dataset, respectively.
Collapse
|
13
|
LwF-ECG: Learning-without-forgetting approach for electrocardiogram heartbeat classification based on memory with task selector. Comput Biol Med 2021; 137:104807. [PMID: 34496312 DOI: 10.1016/j.compbiomed.2021.104807] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/24/2021] [Accepted: 08/24/2021] [Indexed: 10/20/2022]
Abstract
Most existing Electrocardiogram (ECG) classification methods assume that all arrhythmia classes are known during the training phase. In this paper, the problem of learning several successive tasks is addressed, where, in each new task, there are new arrhythmia classes to learn. Unfortunately, in machine learning it is known that when a model is retrained onto a new task, the machine tends to forget the old task. This is known in machine learning, as 'the catastrophic forgetting phenomenon'. To this end, a learn-without-forgetting (LwF) approach to solve this problem is proposed. This novel deep LwF method for ECG heartbeat classification is the first work of its kind in the field. This proposed LwF approach consists of a deep learning architecture that includes the following important aspects: feature extraction module, classification layers for each learned task, memory module to store one prototype for each task, and a task selection module able to identify the most suitable task for each input sample. The feature extraction module constitutes another contribution of this work. It starts with a set of deep layers that convert an ECG heartbeat signal into an image, then the pre-trained DenseNet169 CNN takes the obtained image and extracts rich and powerful features that are effective inputs for the classifications layers of the model. Whenever a new task is to be learned, the network expands with a new classification layer having a Softmax activation function. The newly added layer is responsible for learning the classes of the new task. When the network is trained for the new task, the shared layers, as well as the output layers of the old tasks, are also fine-tuned using pseudo labels. This helps in retaining knowledge of old tasks. Finally, the task selector stores feature prototypes for each task, and using a distance matching network, is trained to select which task is more suitable to classify a new test sample. The whole network uses end-to-end learning to optimize one loss functions, which is a weighted combination of the loss functions of the different network modules. The proposed model was tested on three common ECG datasets, namely the MIT-BIH, INCART, and SVDB datasets. The results obtained demonstrate the success of the proposed method in learning, without forgetting, successive ECG heartbeat classification tasks.
Collapse
|
14
|
A comprehensive study of class incremental learning algorithms for visual tasks. Neural Netw 2020; 135:38-54. [PMID: 33341513 DOI: 10.1016/j.neunet.2020.12.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/02/2020] [Accepted: 12/02/2020] [Indexed: 10/22/2022]
Abstract
The ability of artificial agents to increment their capabilities when confronted with new data is an open challenge in artificial intelligence. The main challenge faced in such cases is catastrophic forgetting, i.e., the tendency of neural networks to underfit past data when new ones are ingested. A first group of approaches tackles forgetting by increasing deep model capacity to accommodate new knowledge. A second type of approaches fix the deep model size and introduce a mechanism whose objective is to ensure a good compromise between stability and plasticity of the model. While the first type of algorithms were compared thoroughly, this is not the case for methods which exploit a fixed size model. Here, we focus on the latter, place them in a common conceptual and experimental framework and propose the following contributions: (1) define six desirable properties of incremental learning algorithms and analyze them according to these properties, (2) introduce a unified formalization of the class-incremental learning problem, (3) propose a common evaluation framework which is more thorough than existing ones in terms of number of datasets, size of datasets, size of bounded memory and number of incremental states, (4) investigate the usefulness of herding for past exemplars selection, (5) provide experimental evidence that it is possible to obtain competitive performance without the use of knowledge distillation to tackle catastrophic forgetting and (6) facilitate reproducibility by integrating all tested methods in a common open-source repository. The main experimental finding is that none of the existing algorithms achieves the best results in all evaluated settings. Important differences arise notably if a bounded memory of past classes is allowed or not.
Collapse
|
15
|
Encoding primitives generation policy learning for robotic arm to overcome catastrophic forgetting in sequential multi-tasks learning. Neural Netw 2020; 129:163-173. [PMID: 32535306 DOI: 10.1016/j.neunet.2020.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 05/06/2020] [Accepted: 06/02/2020] [Indexed: 12/14/2022]
Abstract
Continual learning, a widespread ability in people and animals, aims to learn and acquire new knowledge and skills continuously. Catastrophic forgetting usually occurs in continual learning when an agent attempts to learn different tasks sequentially without storing or accessing previous task information. Unfortunately, current learning systems, e.g., neural networks, are prone to deviate the weights learned in previous tasks after training new tasks, leading to catastrophic forgetting, especially in a sequential multi-tasks scenario. To address this problem, in this paper, we propose to overcome catastrophic forgetting with the focus on learning a series of robotic tasks sequentially. Particularly, a novel hierarchical neural network's framework called Encoding Primitives Generation Policy Learning (E-PGPL) is developed to enable continual learning with two components. By employing a variational autoencoder to project the original state space into a meaningful low-dimensional feature space, representative state primitives could be sampled to help learn corresponding policies for different tasks. In learning a new task, the feature space is required to be close to the previous ones so that previously learned tasks can be protected. Extensive experiments on several simulated robotic tasks demonstrate our method's efficacy to learn control policies for handling sequentially arriving multi-tasks, delivering improvement substantially over some other continual learning methods, especially for the tasks with more diversity.
Collapse
|
16
|
A neural model of schemas and memory encoding. BIOLOGICAL CYBERNETICS 2020; 114:169-186. [PMID: 31686197 DOI: 10.1007/s00422-019-00808-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Accepted: 09/24/2019] [Indexed: 06/10/2023]
Abstract
The ability to rapidly assimilate new information is essential for survival in a dynamic environment. This requires experiences to be encoded alongside the contextual schemas in which they occur. Tse et al. (Science 316(5821):76-82, 2007) showed that new information matching a preexisting schema is learned rapidly. To better understand the neurobiological mechanisms for creating and maintaining schemas, we constructed a biologically plausible neural network to learn context in a spatial memory task. Our model suggests that this occurs through two processing streams of indexing and representation, in which the medial prefrontal cortex and hippocampus work together to index cortical activity. Additionally, our study shows how neuromodulation contributes to rapid encoding within consistent schemas. The level of abstraction of our model further provides a basis for creating context-dependent memories while preventing catastrophic forgetting in artificial neural networks.
Collapse
|
17
|
Uncertainty-based modulation for lifelong learning. Neural Netw 2019; 120:129-142. [PMID: 31708227 DOI: 10.1016/j.neunet.2019.09.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 08/23/2019] [Accepted: 09/07/2019] [Indexed: 11/21/2022]
Abstract
The creation of machine learning algorithms for intelligent agents capable of continuous, lifelong learning is a critical objective for algorithms being deployed on real-life systems in dynamic environments. Here we present an algorithm inspired by neuromodulatory mechanisms in the human brain that integrates and expands upon Stephen Grossberg's ground-breaking Adaptive Resonance Theory proposals. Specifically, it builds on the concept of uncertainty, and employs a series of "neuromodulatory" mechanisms to enable continuous learning, including self-supervised and one-shot learning. Algorithm components were evaluated in a series of benchmark experiments that demonstrate stable learning without catastrophic forgetting. We also demonstrate the critical role of developing these systems in a closed-loop manner where the environment and the agent's behaviors constrain and guide the learning process. To this end, we integrated the algorithm into an embodied simulated drone agent. The experiments show that the algorithm is capable of continuous learning of new tasks and under changed conditions with high classification accuracy (>94%) in a virtual environment, without catastrophic forgetting. The algorithm accepts high dimensional inputs from any state-of-the-art detection and feature extraction algorithms, making it a flexible addition to existing systems. We also describe future development efforts focused on imbuing the algorithm with mechanisms to seek out new knowledge as well as employ a broader range of neuromodulatory processes.
Collapse
|