1
|
Bortolussi L, Carbone G, Laurenti L, Patane A, Sanguinetti G, Wicker M. On the Robustness of Bayesian Neural Networks to Adversarial Attacks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6679-6692. [PMID: 38648123 DOI: 10.1109/tnnls.2024.3386642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this article, we analyse the geometry of adversarial attacks in the over-parameterized limit for Bayesian neural networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lie on a lower dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit, BNN posteriors are robust to gradient-based adversarial attacks. Crucially, by relying on the convergence of infinitely-wide BNNs to Gaussian processes (GPs), we prove that, under certain relatively mild assumptions, the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each NN sampled from the BNN posterior does not have vanishing gradients. The experimental results on the MNIST, Fashion MNIST, and a synthetic dataset with BNNs trained with Hamiltonian Monte Carlo and variational inference support this line of arguments, empirically showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free adversarial attacks.
Collapse
|
2
|
Zhao Y, Saxena D, Cao J. AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential Datasets. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2509-2522. [PMID: 38113151 DOI: 10.1109/tnnls.2023.3341841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Managing heterogeneous datasets that vary in complexity, size, and similarity in continual learning presents a significant challenge. Task-agnostic continual learning is necessary to address this challenge, as datasets with varying similarity pose difficulties in distinguishing task boundaries. Conventional task-agnostic continual learning practices typically rely on rehearsal or regularization techniques. However, rehearsal methods may struggle with varying dataset sizes and regulating the importance of old and new data due to rigid buffer sizes. Meanwhile, regularization methods apply generic constraints to promote generalization but can hinder performance when dealing with dissimilar datasets lacking shared features, necessitating a more adaptive approach. In this article, we propose a novel adaptive continual learning (AdaptCL) method to tackle heterogeneity in sequential datasets. AdaptCL employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size. It also utilizes task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting caused by differences in data similarity. Through a two-pronged case study approach, we evaluate AdaptCL on both datasets of MNIST variants and DomainNet, as well as datasets from different domains. The latter include both large-scale, diverse binary-class datasets and few-shot, multiclass datasets. Across all these scenarios, AdaptCL consistently exhibits robust performance, demonstrating its flexibility and general applicability in handling heterogeneous datasets.
Collapse
|
3
|
Zhang J, Xiao J, Chen M, Hong X. Multimodal Continual Learning for Process Monitoring: A Novel Weighted Canonical Correlation Analysis With Attention Mechanism. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1827-1841. [PMID: 38109253 DOI: 10.1109/tnnls.2023.3331732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Aimed at sequential dynamic modes, a novel multimodal weighted canonical correlation analysis using an attention (MWCCA-A) mechanism is introduced to derive a single model for process monitoring, by integrating two ideas of replay and regularization in continual learning. Under the assumption that data are received sequentially, subsets of data from past modes with dynamic features are selected and stored as replay data, which are utilized together with the current mode data for continual model parameter estimation. The weighted canonical correlation analysis (WCCA) is introduced to achieve appropriate weightings of past modes' replay data so that the latent variables are extracted by maximizing the weighted correlation with its prediction via the attention mechanism. Specifically, replay data weightings are obtained via the probability density estimation from each mode. This is also beneficial in overcoming data imbalance among multiple modes and consolidating the significant features of past modes further. Alternatively, the proposed model also regularizes parameters based on its previous modes' importance, which is measured by synaptic intelligence (SI). Meanwhile, the objective is decoupled into a regularization-related part and a replay-related part, to overcome the potentially unstable optimization trajectory of SI-based continual learning. In comparison with several multimode monitoring methods, the effectiveness of the proposed MWCCA-A approach is demonstrated by a continuous stirred tank heater (CSTH), Tennessee Eastman process (TEP), and a practical coal pulverizing system.
Collapse
|
4
|
Chen D, Xie Z, Liu R, Yu W, Hu Q, Li X, Ding SX. Bayesian Hierarchical Graph Neural Networks With Uncertainty Feedback for Trustworthy Fault Diagnosis of Industrial Processes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18635-18648. [PMID: 37843997 DOI: 10.1109/tnnls.2023.3319468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
Abstract
Deep learning (DL) methods have been widely applied to intelligent fault diagnosis of industrial processes and achieved state-of-the-art performance. However, fault diagnosis with point estimate may provide untrustworthy decisions. Recently, Bayesian inference shows to be a promising approach to trustworthy fault diagnosis by quantifying the uncertainty of the decisions with a DL model. The uncertainty information is not involved in the training process, which does not help the learning of highly uncertain samples and has little effect on improving the fault diagnosis performance. To address this challenge, we propose a Bayesian hierarchical graph neural network (BHGNN) with an uncertainty feedback mechanism, which formulates a trustworthy fault diagnosis on the Bayesian DL (BDL) framework. Specifically, BHGNN captures the epistemic uncertainty and aleatoric uncertainty via a variational dropout approach and utilizes the uncertainty information of each sample to adjust the strength of the temporal consistency (TC) constraint for robust feature learning. Meanwhile, the BHGNN method models the process data as a hierarchical graph (HG) by leveraging the interaction-aware module and physical topology knowledge of the industrial process, which integrates data with domain knowledge to learn fault representation. Moreover, the experiments on a three-phase flow facility (TFF) and secure water treatment (SWaT) show superior and competitive performance in fault diagnosis and verify the trustworthiness of the proposed method.
Collapse
|
5
|
Dedeoglu M, Lin S, Zhang Z, Zhang J. Continual Learning of Generative Models With Limited Data: From Wasserstein-1 Barycenter to Adaptive Coalescence. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12042-12056. [PMID: 37028381 DOI: 10.1109/tnnls.2023.3251096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Learning generative models is challenging for a network edge node with limited data and computing power. Since tasks in similar environments share a model similarity, it is plausible to leverage pretrained generative models from other edge nodes. Appealing to optimal transport theory tailored toward Wasserstein-1 generative adversarial networks (WGANs), this study aims to develop a framework that systematically optimizes continual learning of generative models using local data at the edge node while exploiting adaptive coalescence of pretrained generative models. Specifically, by treating the knowledge transfer from other nodes as Wasserstein balls centered around their pretrained models, continual learning of generative models is cast as a constrained optimization problem, which is further reduced to a Wasserstein-1 barycenter problem. A two-stage approach is devised accordingly: 1) the barycenters among the pretrained models are computed offline, where displacement interpolation is used as the theoretic foundation for finding adaptive barycenters via a "recursive" WGAN configuration and 2) the barycenter computed offline is used as metamodel initialization for continual learning, and then, fast adaptation is carried out to find the generative model using the local samples at the target edge node. Finally, a weight ternarization method, based on joint optimization of weights and threshold for quantization, is developed to compress the generative model further. Extensive experimental studies corroborate the effectiveness of the proposed framework.
Collapse
|
6
|
Ho S, Liu M, Du L, Gao L, Xiang Y. Prototype-Guided Memory Replay for Continual Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10973-10983. [PMID: 37028080 DOI: 10.1109/tnnls.2023.3246049] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Continual learning (CL) is a machine learning paradigm that accumulates knowledge while learning sequentially. The main challenge in CL is catastrophic forgetting of previously seen tasks, which occurs due to shifts in the probability distribution. To retain knowledge, existing CL models often save some past examples and revisit them while learning new tasks. As a result, the size of saved samples dramatically increases as more samples are seen. To address this issue, we introduce an efficient CL method by storing only a few samples to achieve good performance. Specifically, we propose a dynamic prototype-guided memory replay (PMR) module, where synthetic prototypes serve as knowledge representations and guide the sample selection for memory replay. This module is integrated into an online meta-learning (OML) model for efficient knowledge transfer. We conduct extensive experiments on the CL benchmark text classification datasets and examine the effect of training set order on the performance of CL models. The experimental results demonstrate the superiority our approach in terms of accuracy and efficiency.
Collapse
|
7
|
Yu H, Cong Y, Sun G, Hou D, Liu Y, Dong J. Open-Ended Online Learning for Autonomous Visual Perception. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10178-10198. [PMID: 37027689 DOI: 10.1109/tnnls.2023.3242448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The visual perception systems aim to autonomously collect consecutive visual data and perceive the relevant information online like human beings. In comparison with the classical static visual systems focusing on fixed tasks (e.g., face recognition for visual surveillance), the real-world visual systems (e.g., the robot visual system) often need to handle unpredicted tasks and dynamically changed environments, which need to imitate human-like intelligence with open-ended online learning ability. Therefore, we provide a comprehensive analysis of open-ended online learning problems for autonomous visual perception in this survey. Based on "what to online learn" among visual perception scenarios, we classify the open-ended online learning methods into five categories: instance incremental learning to handle data attributes changing, feature evolution learning for incremental and decremental features with the feature dimension changed dynamically, class incremental learning and task incremental learning aiming at online adding new coming classes/tasks, and parallel and distributed learning for large-scale data to reveal the computational and storage advantages. We discuss the characteristic of each method and introduce several representative works as well. Finally, we introduce some representative visual perception applications to show the enhanced performance when using various open-ended online learning models, followed by a discussion of several future directions.
Collapse
|
8
|
Disabato S, Roveri M. Tiny Machine Learning for Concept Drift. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8470-8481. [PMID: 37015671 DOI: 10.1109/tnnls.2022.3229897] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Tiny machine learning (TML) is a new research area whose goal is to design machine and deep learning (DL) techniques able to operate in embedded systems and the Internet-of-Things (IoT) units, hence satisfying the severe technological constraints on memory, computation, and energy characterizing these pervasive devices. Interestingly, the related literature mainly focused on reducing the computational and memory demand of the inference phase of machine and deep learning models. At the same time, the training is typically assumed to be carried out in cloud or edge computing systems (due to the larger memory and computational requirements). This assumption results in TML solutions that might become obsolete when the process generating the data is affected by concept drift (e.g., due to periodicity or seasonality effect, faults or malfunctioning affecting sensors or actuators, or changes in the users' behavior), a common situation in real-world application scenarios. For the first time in the literature, this article introduces a TML for concept drift (TML-CD) solution based on deep learning feature extractors and a k -nearest neighbors ( k -NNs) classifier integrating a hybrid adaptation module able to deal with concept drift affecting the data-generating process. This adaptation module continuously updates (in a passive way) the knowledge base of TML-CD and, at the same time, employs a change detection test (CDT) to inspect for changes (in an active way) to quickly adapt to concept drift by removing obsolete knowledge. Experimental results on both image and audio benchmarks show the effectiveness of the proposed solution, whilst the porting of TML-CD on three off-the-shelf micro-controller units (MCUs) shows the feasibility of what is proposed in real-world pervasive systems.
Collapse
|
9
|
Zhao H, Fu Y, Kang M, Tian Q, Wu F, Li X. MgSvF: Multi-Grained Slow versus Fast Framework for Few-Shot Class-Incremental Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1576-1588. [PMID: 34882547 DOI: 10.1109/tpami.2021.3133897] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As a challenging problem, few-shot class-incremental learning (FSCIL) continually learns a sequence of tasks, confronting the dilemma between slow forgetting of old knowledge and fast adaptation to new knowledge. In this paper, we concentrate on this "slow versus fast" (SvF) dilemma to determine which knowledge components to be updated in a slow fashion or a fast fashion, and thereby balance old-knowledge preservation and new-knowledge adaptation. We propose a multi-grained SvF learning strategy to cope with the SvF dilemma from two different grains: intra-space (within the same feature space) and inter-space (between two different feature spaces). The proposed strategy designs a novel frequency-aware regularization to boost the intra-space SvF capability, and meanwhile develops a new feature space composition operation to enhance the inter-space SvF learning performance. With the multi-grained SvF learning strategy, our method outperforms the state-of-the-art approaches by a large margin.
Collapse
|
10
|
Han H, Liu H, Qiao J. Data-Knowledge-Driven Self-Organizing Fuzzy Neural Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2081-2093. [PMID: 35802545 DOI: 10.1109/tnnls.2022.3186671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fuzzy neural networks (FNNs) hold the advantages of knowledge leveraging and adaptive learning, which have been widely used in nonlinear system modeling. However, it is difficult for FNNs to obtain the appropriate structure in the situation of insufficient data, which limits its generalization performance. To solve this problem, a data-knowledge-driven self-organizing FNN (DK-SOFNN) with a structure compensation strategy and a parameter reinforcement mechanism is proposed in this article. First, a structure compensation strategy is proposed to mine structural information from empirical knowledge to learn the structure of DK-SOFNN. Then, a complete model structure can be acquired by sufficient structural information. Second, a parameter reinforcement mechanism is developed to determine the parameter evolution direction of DK-SOFNN that is most suitable for the current model structure. Then, a robust model can be obtained by the interaction between parameters and dynamic structure. Finally, the proposed DK-SOFNN is theoretically analyzed on the fixed structure case and dynamic structure case. Then, the convergence conditions can be obtained to guide practical applications. The merits of DK-SOFNN are demonstrated by some benchmark problems and industrial applications.
Collapse
|
11
|
Ashfahani A, Pratama M. Unsupervised Continual Learning in Streaming Environments. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9992-10003. [PMID: 35417356 DOI: 10.1109/tnnls.2022.3163362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
A deep clustering network (DCN) is desired for data streams because of its aptitude in extracting natural features thus bypassing the laborious feature engineering step. While automatic construction of deep networks in streaming environments remains an open issue, it is also hindered by the expensive labeling cost of data streams rendering the increasing demand for unsupervised approaches. This article presents an unsupervised approach of DCN construction on the fly via simultaneous deep learning and clustering termed autonomous DCN (ADCN). It combines the feature extraction layer and autonomous fully connected layer in which both network width and depth are self-evolved from data streams based on the bias-variance decomposition of reconstruction loss. The self-clustering mechanism is performed in the deep embedding space of every fully connected layer, while the final output is inferred via the summation of cluster prediction score. Furthermore, a latent-based regularization is incorporated to resolve the catastrophic forgetting issue. A rigorous numerical study has shown that ADCN produces better performance compared with its counterparts while offering fully autonomous construction of ADCN structure in streaming environments in the absence of any labeled samples for model updates. To support the reproducible research initiative, codes, supplementary material, and raw results of ADCN are made available in https://github.com/andriash001/AutonomousDCN.git.
Collapse
|
12
|
Gao Q, Luo Z, Klabjan D, Zhang F. Efficient Architecture Search for Continual Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8555-8565. [PMID: 35235526 DOI: 10.1109/tnnls.2022.3151511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Continual learning with neural networks, which aims to learn a sequence of tasks, is an important learning framework in artificial intelligence (AI). However, it often confronts three challenges: 1) overcome the catastrophic forgetting problem; 2) adapt the current network to new tasks; and 3) control its model complexity. To reach these goals, we propose a novel approach named continual learning with efficient architecture search (CLEAS). CLEAS works closely with neural architecture search (NAS), which leverages reinforcement learning techniques to search for the best neural architecture that fits a new task. In particular, we design a neuron-level NAS controller that decides which old neurons from previous tasks should be reused (knowledge transfer) and which new neurons should be added (to learn new knowledge). Such a fine-grained controller allows finding a very concise architecture that can fit each new task well. Meanwhile, since we do not alter the weights of the reused neurons, we perfectly memorize the knowledge learned from the previous tasks. We evaluate CLEAS on numerous sequential classification tasks, and the results demonstrate that CLEAS outperforms other state-of-the-art alternative methods, achieving higher classification accuracy while using simpler neural architectures.
Collapse
|
13
|
Chen L, Liang X, Feng Y, Zhang L, Yang J, Liu Z. Online Intention Recognition With Incomplete Information Based on a Weighted Contrastive Predictive Coding Model in Wargame. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7515-7528. [PMID: 35108210 DOI: 10.1109/tnnls.2022.3144171] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The incomplete and imperfect essence of the battlefield situation results in a challenge to the efficiency, stability, and reliability of traditional intention recognition methods. For this problem, we propose a deep learning architecture that consists of a contrastive predictive coding (CPC) model, a variable-length long short-term memory network (LSTM) model, and an attention weight allocator for online intention recognition with incomplete information in wargame (W-CPCLSTM). First, based on the typical characteristics of intelligence data, a CPC model is designed to capture more global structures from limited battlefield information. Then, a variable-length LSTM model is employed to classify the learned representations into predefined intention categories. Next, a weighted approach to the training attention of CPC and LSTM is introduced to allow for the stability of the model. Finally, performance evaluation and application analysis of the proposed model for the online intention recognition task were carried out based on four different degrees of detection information and a perfect situation of ideal conditions in a wargame. Besides, we explored the effect of different lengths of intelligence data on recognition performance and gave application examples of the proposed model to a wargame platform. The simulation results demonstrate that our method not only contributes to the growth of recognition stability, but it also improves recognition accuracy by 7%-11%, 3%-7%, 3%-13%, and 3%-7%, the recognition speed by 6- 32× , 4- 18× , 13-* × , and 1- 6× compared with the traditional LSTM, classical FCN, OctConv, and OctFCN models, respectively, which characterizes it as a promising reference tool for command decision-making.
Collapse
|
14
|
Cai B, Sheng C, Gao C, Liu Y, Shi M, Liu Z, Feng Q, Liu G. Artificial Intelligence Enhanced Reliability Assessment Methodology With Small Samples. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6578-6590. [PMID: 34822332 DOI: 10.1109/tnnls.2021.3128514] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Due to the high price of the product and the limitation of laboratory conditions, reliability tests often get a small number of failed samples. If the data are not handled properly, the reliability evaluation results will incur grave errors. In order to solve this problem, this work proposes an artificial intelligence (AI) enhanced reliability assessment methodology by combining Bayesian neural networks (BNNs) and differential evolution (DE) algorithms. First, a single hidden layer BNN model is constructed by fusing small samples and prior information to obtain the 95% confidence interval (CI) of the posterior distribution. Then, the DE algorithm is used to iteratively generate optimal virtual samples based on the 95% CI and small samples trends. A reliability assessment model is reconstructed based on double hidden layers BNN model by combining virtual samples and test samples in the last stage. In order to verify the effectiveness of the proposed method, an accelerated life test (ALT) of the subsurface electronic control unit (S-ECU) was carried out. The verification test results show that the proposed method can accurately evaluate the reliability life of a product. And compared with the two existing methods, the results show that this method can effectively improve the accuracy of the reliability assessment of a test product.
Collapse
|