1
|
Gu S, Xu C, Hu D, Hou C. Adaptive Learning for Dynamic Features and Noisy Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1219-1237. [PMID: 39480720 DOI: 10.1109/tpami.2024.3489217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2024]
Abstract
Applying current machine learning algorithms in complex and open environments remains challenging, especially when different changing elements are coupled and the training data is scarce. For example, in the activity recognition task, the motion sensors may change position or fall off due to the intensity of the activity, leading to changes in feature space and finally resulting in label noise. Learning from such a problem where the dynamic features are coupled with noisy labels is crucial but rarely studied, particularly when the noisy samples in new feature space are limited. In this paper, we tackle the above problem by proposing a novel two-stage algorithm, called Adaptive Learning for Dynamic features and Noisy labels (ALDN). Specifically, optimal transport is first modified to map the previously learned heterogeneous model to the prior model of the current stage. Then, to fully reuse the mapped prior model, we add a simple yet efficient regularizer as the consistency constraint to assist both the estimation of the noise transition matrix and the model training in the current stage. Finally, two implementations with direct (ALDN-D) and indirect (ALDN-ID) constraints are illustrated for better investigation. More importantly, we provide theoretical guarantees for risk minimization of ALDN-D and ALDN-ID. Extensive experiments validate the effectiveness of the proposed algorithms.
Collapse
|
2
|
Tang J, Lai Y, Liu X. Multiview Spectral Clustering Based on Consensus Neighbor Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18661-18673. [PMID: 37819821 DOI: 10.1109/tnnls.2023.3319823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Multiview spectral clustering, renowned for its spatial learning capability, has garnered significant attention in the data mining field. However, existing methods assume that the optimal consensus adjacency matrix is confined within the space spanned by each view's adjacency matrix. This constraint restricts the feasible domain of the algorithm and hinders the exploration of the optimal consensus adjacency matrix. To address this limitation, we propose a novel and convex strategy, termed the consensus neighbor strategy, for learning the optimal consensus adjacency matrix. This approach constructs the optimal consensus adjacency matrix by capturing the consensus local structure of each sample across all views, thereby expanding the search space and facilitating the discovery of the optimal consensus adjacency matrix. Furthermore, we introduce the concept of a correlation measuring matrix to prevent trivial solution. We develop an efficient iterative algorithm to solve the resulting optimization problem, benefitting from the convex nature of our model, which ensures convergence to a global optimum. Experimental results on 16 multiview datasets demonstrate that our proposed algorithm surpasses state-of-the-art methods in terms of its robust consensus representation learning capability. The code of this article is uploaded to https://github.com/PhdJiayiTang/Consensus-Neighbor-Strategy.git.
Collapse
|
3
|
Li G, Cherukuri AK. Embrace open-environment machine learning for robust AI. Natl Sci Rev 2024; 11:nwad300. [PMID: 39007001 PMCID: PMC11242451 DOI: 10.1093/nsr/nwad300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/07/2023] [Accepted: 11/27/2023] [Indexed: 07/16/2024] Open
Abstract
Dive into the novel OpenML paradigm, unveiling its transformative approach to robust AI in dynamic environment, shaping Automated Machine Learning with adaptability for ground breaking advancements towards Artificial General Intelligence.
Collapse
Affiliation(s)
- Gang Li
- The Centre for Cyber Resilience and Trust, Deakin University, Australia
| | | |
Collapse
|
4
|
Yu H, Cong Y, Sun G, Hou D, Liu Y, Dong J. Open-Ended Online Learning for Autonomous Visual Perception. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10178-10198. [PMID: 37027689 DOI: 10.1109/tnnls.2023.3242448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The visual perception systems aim to autonomously collect consecutive visual data and perceive the relevant information online like human beings. In comparison with the classical static visual systems focusing on fixed tasks (e.g., face recognition for visual surveillance), the real-world visual systems (e.g., the robot visual system) often need to handle unpredicted tasks and dynamically changed environments, which need to imitate human-like intelligence with open-ended online learning ability. Therefore, we provide a comprehensive analysis of open-ended online learning problems for autonomous visual perception in this survey. Based on "what to online learn" among visual perception scenarios, we classify the open-ended online learning methods into five categories: instance incremental learning to handle data attributes changing, feature evolution learning for incremental and decremental features with the feature dimension changed dynamically, class incremental learning and task incremental learning aiming at online adding new coming classes/tasks, and parallel and distributed learning for large-scale data to reveal the computational and storage advantages. We discuss the characteristic of each method and introduce several representative works as well. Finally, we introduce some representative visual perception applications to show the enhanced performance when using various open-ended online learning models, followed by a discussion of several future directions.
Collapse
|
5
|
Lin JQ, Li XL, Chen MS, Wang CD, Zhang H. Incomplete Data Meets Uncoupled Case: A Challenging Task of Multiview Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8097-8110. [PMID: 36459612 DOI: 10.1109/tnnls.2022.3224748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Incomplete multiview clustering (IMC) methods have achieved remarkable progress by exploring the complementary information and consensus representation of incomplete multiview data. However, to our best knowledge, none of the existing methods attempts to handle the uncoupled and incomplete data simultaneously, which affects their generalization ability in real-world scenarios. For uncoupled incomplete data, the unclear and partial cross-view correlation introduces the difficulty to explore the complementary information between views, which results in the unpromising clustering performance for the existing multiview clustering methods. Besides, the presence of hyperparameters limits their applications. To fill these gaps, a novel uncoupled IMC (UIMC) method is proposed in this article. Specifically, UIMC develops a joint framework for feature inferring and recoupling. The high-order correlations of all views are explored by performing a tensor singular value decomposition (t-SVD)-based tensor nuclear norm (TNN) on recoupled and inferred self-representation matrices. Moreover, all hyperparameters of the UIMC method are updated in an exploratory manner. Extensive experiments on six widely used real-world datasets have confirmed the superiority of the proposed method in handling the uncoupled incomplete multiview data compared with the state-of-the-art methods.
Collapse
|
6
|
Hou C, Gu S, Xu C, Qian Y. Incremental Learning for Simultaneous Augmentation of Feature and Class. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14789-14806. [PMID: 37610915 DOI: 10.1109/tpami.2023.3307670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
With the emergence of new data collection ways in many dynamic environment applications, the samples are gathered gradually in the accumulated feature spaces. With the incorporation of new type features, it may result in the augmentation of class numbers. For instance, in activity recognition, using the old features during warm-up, we can separate different warm-up exercises. With the accumulation of new attributes obtained from newly added sensors, we can better separate the newly appeared formal exercises. Learning for such simultaneous augmentation of feature and class is crucial but rarely studied, particularly when the labeled samples with full observations are limited. In this paper, we tackle this problem by proposing a novel incremental learning method for Simultaneous Augmentation of Feature and Class (SAFC) in a two-stage way. To guarantee the reusability of the model trained on previous data, we add a regularizer in the current model, which can provide solid prior in training the new classifier. We also present the theoretical analyses about the generalization bound, which can validate the efficiency of model inheritance. After solving the one-shot problem, we also extend it to multi-shot. Experimental results demonstrate the effectiveness of our approaches, together with their effectiveness in activity recognition applications.
Collapse
|
7
|
Xu N, Li JY, Liu YP, Geng X. Trusted-Data-Guided Label Enhancement on Noisy Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9940-9951. [PMID: 35394916 DOI: 10.1109/tnnls.2022.3162316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Label distribution covers a certain number of labels, representing the degree to which each label describes the instance. Label enhancement (LE) is a procedure of recovering the label distribution from the logical labels in the training data, the purpose of which is to better depict the label ambiguity through label distribution. However, data annotation inevitably introduces label noise, and it is extremely challenging to implement LE on corrupted labels. To deal with this problem, one way to recover the label distribution from the corrupted labels is to be guided by a small batch of trusted data. In this article, a novel LE method named TALEN is proposed via recovering and progressively refining label distribution guided by trusted data. Specifically, an LE process is applied to the untrusted data to select samples with a clean label. In addition, a combined loss function is designed to train the predictive model for classification. Experiments on datasets with synthetic label noise validate the feasibility of identifying clean labels via the recovered label distribution. Furthermore, experimental results on both synthetic label noise and real-world label noise on image datasets and additional experiments on text datasets show a clear advantage of TALEN over several existing noise-robust learning methods.
Collapse
|
8
|
Zha Z, Wen B, Yuan X, Zhou J, Zhu C, Kot AC. Low-Rankness Guided Group Sparse Representation for Image Restoration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7593-7607. [PMID: 35130172 DOI: 10.1109/tnnls.2022.3144630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
As a spotlighted nonlocal image representation model, group sparse representation (GSR) has demonstrated a great potential in diverse image restoration tasks. Most of the existing GSR-based image restoration approaches exploit the nonlocal self-similarity (NSS) prior by clustering similar patches into groups and imposing sparsity to each group coefficient, which can effectively preserve image texture information. However, these methods have imposed only plain sparsity over each individual patch of the group, while neglecting other beneficial image properties, e.g., low-rankness (LR), leads to degraded image restoration results. In this article, we propose a novel low-rankness guided group sparse representation (LGSR) model for highly effective image restoration applications. The proposed LGSR jointly utilizes the sparsity and LR priors of each group of similar patches under a unified framework. The two priors serve as the complementary priors in LGSR for effectively preserving the texture and structure information of natural images. Moreover, we apply an alternating minimization algorithm with an adaptively adjusted parameter scheme to solve the proposed LGSR-based image restoration problem. Extensive experiments are conducted to demonstrate that the proposed LGSR achieves superior results compared with many popular or state-of-the-art algorithms in various image restoration tasks, including denoising, inpainting, and compressive sensing (CS).
Collapse
|
9
|
Liu Y, Fan X, Li W, Gao Y. Online Passive-Aggressive Active Learning for Trapezoidal Data Streams. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6725-6739. [PMID: 35675249 DOI: 10.1109/tnnls.2022.3178880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The idea of combining the active query strategy and the passive-aggressive (PA) update strategy in online learning can be credited to the PA active (PAA) algorithm, which has proven to be effective in learning linear classifiers from datasets with a fixed feature space. We propose a novel family of online active learning algorithms, named PAA learning for trapezoidal data streams (PAATS) and multiclass PAATS (MPAATS) (and their variants), for binary and multiclass online classification tasks on trapezoidal data streams where the feature space may expand over time. Under the context of an ever-changing feature space, we provide the theoretical analysis of the mistake bounds for both PAATS and MPAATS. Our experiments on a wide variety of benchmark datasets have confirm that the combination of the instance-regulated active query strategy and the PA update strategy is much more effective in learning from trapezoidal data streams. We have also compared PAATS with online learning with streaming features (OLSF)-the state-of-the-art approach in learning linear classifiers from trapezoidal data streams. PAATS could achieve much better classification accuracy, especially for large-scale real-world data streams.
Collapse
|
10
|
Hou C, Fan R, Zeng LL, Hu D. Adaptive Feature Selection With Augmented Attributes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:9306-9324. [PMID: 37021891 DOI: 10.1109/tpami.2023.3238011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In many dynamic environment applications, with the evolution of data collection ways, the data attributes are incremental and the samples are stored with accumulated feature spaces gradually. For instance, in the neuroimaging-based diagnosis of neuropsychiatric disorders, with emerging of diverse testing ways, we get more brain image features over time. The accumulation of different types of features will unavoidably bring difficulties in manipulating the high-dimensional data. It is challenging to design an algorithm to select valuable features in this feature incremental scenario. To address this important but rarely studied problem, we propose a novel Adaptive Feature Selection method (AFS). It enables the reusability of the feature selection model trained on previous features and adapts it to fit the feature selection requirements on all features automatically. Besides, an ideal l0-norm sparse constraint for feature selection is imposed with a proposed effective solving strategy. We present the theoretical analyses about the generalization bound and convergence behavior. After tackling this problem in a one-shot case, we extend it to the multi-shot scenario. Plenty of experimental results demonstrate the effectiveness of reusing previous features and the superior of l0-norm constraint in various aspects, together with its effectiveness in discriminating schizophrenic patients from healthy controls.
Collapse
|
11
|
Wang X, Lu Y, Lin X, Li J, Zhang Z. An Unsupervised Classification Algorithm for Heterogeneous Cryo-EM Projection Images Based on Autoencoders. Int J Mol Sci 2023; 24:ijms24098380. [PMID: 37176089 PMCID: PMC10179202 DOI: 10.3390/ijms24098380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/29/2023] [Accepted: 04/30/2023] [Indexed: 05/15/2023] Open
Abstract
Heterogeneous three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is an important but very challenging technique for recovering the conformational heterogeneity of flexible biological macromolecules such as proteins in different functional states. Heterogeneous projection image classification is a feasible solution to solve the structural heterogeneity problem in single-particle cryo-EM. The majority of heterogeneous projection image classification methods are developed using supervised learning technology or require a large amount of a priori knowledge, such as the orientations or common lines of the projection images, which leads to certain limitations in their practical applications. In this paper, an unsupervised heterogeneous cryo-EM projection image classification algorithm based on autoencoders is proposed, which only needs to know the number of heterogeneous 3D structures in the dataset and does not require any labeling information of the projection images or other a priori knowledge. A simple autoencoder with multi-layer perceptrons trained in iterative mode and a complex autoencoder with residual networks trained in one-pass learning mode are implemented to convert heterogeneous projection images into latent variables. The extracted high-dimensional features are reduced to two dimensions using the uniform manifold approximation and projection dimensionality reduction algorithm, and then clustered using the spectral clustering algorithm. The proposed algorithm is applied to two heterogeneous cryo-EM datasets for heterogeneous 3D reconstruction. Experimental results show that the proposed algorithm can effectively extract category features of heterogeneous projection images and achieve high classification and reconstruction accuracy, indicating that the proposed algorithm is effective for heterogeneous 3D reconstruction in single-particle cryo-EM.
Collapse
Affiliation(s)
- Xiangwen Wang
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Yonggang Lu
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Xianghong Lin
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Jianwei Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Zequn Zhang
- College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| |
Collapse
|
12
|
Zhao H, Wang H, Fu Y, Wu F, Li X. Memory-Efficient Class-Incremental Learning for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5966-5977. [PMID: 33939615 DOI: 10.1109/tnnls.2021.3072041] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With the memory-resource-limited constraints, class-incremental learning (CIL) usually suffers from the "catastrophic forgetting" problem when updating the joint classification model on the arrival of newly added classes. To cope with the forgetting problem, many CIL methods transfer the knowledge of old classes by preserving some exemplar samples into the size-constrained memory buffer. To utilize the memory buffer more efficiently, we propose to keep more auxiliary low-fidelity exemplar samples, rather than the original real-high-fidelity exemplar samples. Such a memory-efficient exemplar preserving scheme makes the old-class knowledge transfer more effective. However, the low-fidelity exemplar samples are often distributed in a different domain away from that of the original exemplar samples, that is, a domain shift. To alleviate this problem, we propose a duplet learning scheme that seeks to construct domain-compatible feature extractors and classifiers, which greatly narrows down the above domain gap. As a result, these low-fidelity auxiliary exemplar samples have the ability to moderately replace the original exemplar samples with a lower memory cost. In addition, we present a robust classifier adaptation scheme, which further refines the biased classifier (learned with the samples containing distillation label knowledge about old classes) with the help of the samples of pure true class labels. Experimental results demonstrate the effectiveness of this work against the state-of-the-art approaches. We will release the code, baselines, and training statistics for all models to facilitate future research.
Collapse
|
13
|
Hou BJ, Zhang L, Zhou ZH. Prediction With Unpredictable Feature Evolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5706-5715. [PMID: 33861713 DOI: 10.1109/tnnls.2021.3071311] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Learning with feature evolution studies the scenario where the features of the data streams can evolve, i.e., old features vanish and new features emerge. Its goal is to keep the model always performing well even when the features happen to evolve. To tackle this problem, canonical methods assume that the old features will vanish simultaneously and the new features themselves will emerge simultaneously as well. They also assume that there is an overlapping period where old and new features both exist when the feature space starts to change. However, in reality, the feature evolution could be unpredictable, which means that the features can vanish or emerge arbitrarily, causing the overlapping period incomplete. In this article, we propose a novel paradigm: prediction with unpredictable feature evolution (PUFE) where the feature evolution is unpredictable. To address this problem, we fill the incomplete overlapping period and formulate it as a new matrix completion problem. We give a theoretical bound on the least number of observed entries to make the overlapping period intact. With this intact overlapping period, we leverage an ensemble method to take the advantage of both the old and new feature spaces without manually deciding which base models should be incorporated. Theoretical and experimental results validate that our method can always follow the best base models and, thus, realize the goal of learning with feature evolution.
Collapse
|
14
|
Abstract
Conventional machine learning studies generally assume close-environment scenarios where important factors of the learning process hold invariant. With the great success of machine learning, nowadays, more and more practical tasks, particularly those involving open-environment scenarios where important factors are subject to change, called open-environment machine learning in this article, are present to the community. Evidently, it is a grand challenge for machine learning turning from close environment to open environment. It becomes even more challenging since, in various big data tasks, data are usually accumulated with time, like streams, while it is hard to train the machine learning model after collecting all data as in conventional studies. This article briefly introduces some advances in this line of research, focusing on techniques concerning emerging new classes, decremental/incremental features, changing data distributions and varied learning objectives, and discusses some theoretical issues.
Collapse
Affiliation(s)
- Zhi-Hua Zhou
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
| |
Collapse
|
15
|
|
16
|
Zhao X, Nie F, Wang R, Li X. Improving projected fuzzy K-means clustering via robust learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Ye HJ, Zhan DC, Jiang Y, Zhou ZH. Heterogeneous Few-Shot Model Rectification With Semantic Mapping. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3878-3891. [PMID: 32750764 DOI: 10.1109/tpami.2020.2994749] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
There still involve lots of challenges when applying machine learning algorithms in unknown environments, especially those with limited training data. To handle the data insufficiency and make a further step towards robust learning, we adopt the learnware notion Z.-H. Zhou, "Learnware: On the future of machine learning," Front. Comput. Sci., vol. 10, no. 4 pp. 589-590, 2016 which equips a model with an essential reusable property-the model learned in a related task could be easily adapted to the current data-scarce environment without data sharing. To this end, we propose the REctiFy via heterOgeneous pRedictor Mapping (ReForm) framework enabling the current model to take advantage of a related model from two kinds of heterogeneous environment, i.e., either with different sets of features or labels. By Encoding Meta InformaTion (Emit) of features and labels as the model specification, we utilize an optimal transported semantic mapping to characterize and bridge the environment changes. After fine-tuning over a few labeled examples through a biased regularization objective, the transformed heterogeneous model adapts to the current task efficiently. We apply ReForm over both synthetic and real-world tasks such as few-shot image classification with either learned or pre-defined specifications. Experimental results validate the effectiveness and practical utility of the proposed ReForm framework.
Collapse
|
18
|
Liu L, Kuang Z, Chen Y, Xue JH, Yang W, Zhang W. IncDet: In Defense of Elastic Weight Consolidation for Incremental Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2306-2319. [PMID: 32598286 DOI: 10.1109/tnnls.2020.3002583] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Elastic weight consolidation (EWC) has been successfully applied for general incremental learning to overcome the catastrophic forgetting issue. It adaptively constrains each parameter of the new model not to deviate much from its counterpart in the old model during fine-tuning on new class data sets, according to its importance weight for old tasks. However, the previous study demonstrates that it still suffers from catastrophic forgetting when directly used in object detection. In this article, we show EWC is effective for incremental object detection if with critical adaptations. First, we conduct controlled experiments to identify two core issues why EWC fails if trivially applied to incremental detection: 1) the absence of old class annotations in new class images makes EWC misclassify objects of old classes in these images as background and 2) the quadratic regularization loss in EWC easily leads to gradient explosion when balancing old and new classes. Then, based on the abovementioned findings, we propose the corresponding solutions to tackle these issues: 1) utilize pseudobounding box annotations of old classes on new data sets to compensate for the absence of old class annotations and 2) adopt a novel Huber regularization instead of the original quadratic loss to prevent from unstable training. Finally, we propose a general EWC-based incremental object detection framework and implement it under both Fast R-CNN and Faster R-CNN, showing its flexibility and versatility. In terms of either the final performance or the performance drop with respect to the upper bound of joint training on all seen classes, evaluations on the PASCAL VOC and COCO data sets show that our method achieves a new state of the art.
Collapse
|
19
|
|
20
|
|
21
|
Bak N, Hansen LK. Data Driven Estimation of Imputation Error-A Strategy for Imputation with a Reject Option. PLoS One 2016; 11:e0164464. [PMID: 27723782 PMCID: PMC5056679 DOI: 10.1371/journal.pone.0164464] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 09/26/2016] [Indexed: 11/19/2022] Open
Abstract
Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values indiscriminately. We note that the effects of imputation can be strongly dependent on what is missing. To help make decisions about which records should be imputed, we propose to use a machine learning approach to estimate the imputation error for each case with missing data. The method is thought to be a practical approach to help users using imputation after the informed choice to impute the missing data has been made. To do this all patterns of missing values are simulated in all complete cases, enabling calculation of the "true error" in each of these new cases. The error is then estimated for each case with missing values by weighing the "true errors" by similarity. The method can also be used to test the performance of different imputation methods. A universal numerical threshold of acceptable error cannot be set since this will differ according to the data, research question, and analysis method. The effect of threshold can be estimated using the complete cases. The user can set an a priori relevant threshold for what is acceptable or use cross validation with the final analysis to choose the threshold. The choice can be presented along with argumentation for the choice rather than holding to conventions that might not be warranted in the specific dataset.
Collapse
Affiliation(s)
- Nikolaj Bak
- Center for Neuropsychiatric Schizophrenia Research (CNSR) & Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Psychiatric Center Glostrup, Copenhagen University Hospitals, Mental Health Services, Capital Region of Denmark, Glostrup, Denmark
- * E-mail:
| | - Lars K. Hansen
- Cognitive Systems, DTU Compute, Dept. Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|