1
|
He Y, Chen W, Wang S, Liu T, Wang M. Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:729-742. [PMID: 39292592 DOI: 10.1109/tip.2024.3459589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.
Collapse
|
2
|
Wang Y, Sun L. Energy-efficient dynamic sensor time series classification for edge health devices. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108268. [PMID: 38870733 DOI: 10.1016/j.cmpb.2024.108268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/17/2024] [Accepted: 05/31/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND AND OBJECTIVE Time series data plays a crucial role in the realm of the Internet of Things Medical (IoMT). Through machine learning (ML) algorithms, online time series classification in IoMT systems enables reliable real-time disease detection. Deploying ML algorithms on edge health devices can reduce latency and safeguard patients' privacy. However, the limited computational resources of these devices underscore the need for more energy-efficient algorithms. Furthermore, online time series classification inevitably faces the challenges of concept drift (CD) and catastrophic forgetting (CF). To address these challenges, this study proposes an energy-efficient Online Time series classification algorithm that can solve CF and CD for health devices, called OTCD. METHODS OTCD first detects the appearance of concept drift and performs prototype updates to mitigate its impact. Afterward, it standardizes the potential space distribution and selectively preserves key training parameters to address CF. This approach reduces the required memory and enhances energy efficiency. To evaluate the performance of the proposed model in real-time health monitoring tasks, we utilize electrocardiogram (ECG) and photoplethysmogram (PPG) data. By adopting various feature extractors, three arrhythmia classification models are compared. To assess the energy efficiency of OTCD, we conduct runtime tests on each dataset. Additionally, the OTCD is compared with state-of-the-art (SOTA) dynamic time series classification models for performance evaluation. RESULTS The OTCD algorithm outperforms existing SOTA time series classification algorithms in IoMT. In particular, OTCD is on average 2.77% to 14.74% more accurate than other models on the MIT-BIH arrhythmia dataset. Additionally, it consumes low memory (1 KB) and performs computations at a rate of 0.004 GFLOPs per second, leading to energy savings and high time efficiency. CONCLUSION Our proposed algorithm, OTCD, enables efficient real-time classification of medical time series on edge health devices. Experimental results demonstrate its significant competitiveness, offering promising prospects for safe and reliable healthcare.
Collapse
Affiliation(s)
- Yueyuan Wang
- Department of Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, 210044, China.
| | - Le Sun
- Department of Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, 210044, China.
| |
Collapse
|
3
|
Du J, Liu P, Vong CM, Chen C, Wang T, Chen CLP. Class-Incremental Learning Method With Fast Update and High Retainability Based on Broad Learning System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11332-11345. [PMID: 37030863 DOI: 10.1109/tnnls.2023.3259016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Machine learning aims to generate a predictive model from a training dataset of a fixed number of known classes. However, many real-world applications (such as health monitoring and elderly care) are data streams in which new data arrive continually in a short time. Such new data may even belong to previously unknown classes. Hence, class-incremental learning (CIL) is necessary, which incrementally and rapidly updates an existing model with the data of new classes while retaining the existing knowledge of old classes. However, most current CIL methods are designed based on deep models that require a computationally expensive training and update process. In addition, deep learning based CIL (DCIL) methods typically employ stochastic gradient descent (SGD) as an optimizer that forgets the old knowledge to a certain extent. In this article, a broad learning system-based CIL (BLS-CIL) method with fast update and high retainability of old class knowledge is proposed. Traditional BLS is a fast and effective shallow neural network, but it does not work well on CIL tasks. However, our proposed BLS-CIL can overcome these issues and provide the following: 1) high accuracy due to our novel class-correlation loss function that considers the correlations between old and new classes; 2) significantly short training/update time due to the newly derived closed-form solution for our class-correlation loss without iterative optimization; and 3) high retainability of old class knowledge due to our newly derived recursive update rule for CIL (RULL) that does not replay the exemplars of all old classes, as contrasted to the exemplars-replaying methods with the SGD optimizer. The proposed BLS-CIL has been evaluated over 12 real-world datasets, including seven tabular/numerical datasets and six image datasets, and the compared methods include one shallow network and seven classical or state-of-the-art DCIL methods. Experimental results show that our BIL-CIL can significantly improve the classification performance over a shallow network by a large margin (8.80%-48.42%). It also achieves comparable or even higher accuracy than DCIL methods, but greatly reduces the training time from hours to minutes and the update time from minutes to seconds.
Collapse
|
4
|
Yu H, Cong Y, Sun G, Hou D, Liu Y, Dong J. Open-Ended Online Learning for Autonomous Visual Perception. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10178-10198. [PMID: 37027689 DOI: 10.1109/tnnls.2023.3242448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The visual perception systems aim to autonomously collect consecutive visual data and perceive the relevant information online like human beings. In comparison with the classical static visual systems focusing on fixed tasks (e.g., face recognition for visual surveillance), the real-world visual systems (e.g., the robot visual system) often need to handle unpredicted tasks and dynamically changed environments, which need to imitate human-like intelligence with open-ended online learning ability. Therefore, we provide a comprehensive analysis of open-ended online learning problems for autonomous visual perception in this survey. Based on "what to online learn" among visual perception scenarios, we classify the open-ended online learning methods into five categories: instance incremental learning to handle data attributes changing, feature evolution learning for incremental and decremental features with the feature dimension changed dynamically, class incremental learning and task incremental learning aiming at online adding new coming classes/tasks, and parallel and distributed learning for large-scale data to reveal the computational and storage advantages. We discuss the characteristic of each method and introduce several representative works as well. Finally, we introduce some representative visual perception applications to show the enhanced performance when using various open-ended online learning models, followed by a discussion of several future directions.
Collapse
|
5
|
Zhao H, Fu Y, Kang M, Tian Q, Wu F, Li X. MgSvF: Multi-Grained Slow versus Fast Framework for Few-Shot Class-Incremental Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1576-1588. [PMID: 34882547 DOI: 10.1109/tpami.2021.3133897] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As a challenging problem, few-shot class-incremental learning (FSCIL) continually learns a sequence of tasks, confronting the dilemma between slow forgetting of old knowledge and fast adaptation to new knowledge. In this paper, we concentrate on this "slow versus fast" (SvF) dilemma to determine which knowledge components to be updated in a slow fashion or a fast fashion, and thereby balance old-knowledge preservation and new-knowledge adaptation. We propose a multi-grained SvF learning strategy to cope with the SvF dilemma from two different grains: intra-space (within the same feature space) and inter-space (between two different feature spaces). The proposed strategy designs a novel frequency-aware regularization to boost the intra-space SvF capability, and meanwhile develops a new feature space composition operation to enhance the inter-space SvF learning performance. With the multi-grained SvF learning strategy, our method outperforms the state-of-the-art approaches by a large margin.
Collapse
|
6
|
Tian S, Li L, Li W, Ran H, Ning X, Tiwari P. A survey on few-shot class-incremental learning. Neural Netw 2024; 169:307-324. [PMID: 37922714 DOI: 10.1016/j.neunet.2023.10.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 11/07/2023]
Abstract
Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta learning-based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective.
Collapse
Affiliation(s)
- Songsong Tian
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China; School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, 100083, China.
| | - Lusi Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | - Weijun Li
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China; School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing, 100083, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, 100083, China.
| | - Hang Ran
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, 100083, China.
| | - Xin Ning
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China; School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing, 100083, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, 100083, China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Halmstad, 30118, Sweden.
| |
Collapse
|
7
|
Zhang J, Wang T, Ng WWY, Pedrycz W. KNNENS: A k-Nearest Neighbor Ensemble-Based Method for Incremental Learning Under Data Stream With Emerging New Classes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9520-9527. [PMID: 35213317 DOI: 10.1109/tnnls.2022.3149991] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this brief, we investigate the problem of incremental learning under data stream with emerging new classes (SENC). In the literature, existing approaches encounter the following problems: 1) yielding high false positive for the new class; i) having long prediction time; and 3) having access to true labels for all instances, which is unrealistic and unacceptable in real-life streaming tasks. Therefore, we propose the k -Nearest Neighbor ENSemble-based method (KNNENS) to handle these problems. The KNNENS is effective to detect the new class and maintains high classification performance for known classes. It is also efficient in terms of run time and does not require true labels of new class instances for model update, which is desired in real-life streaming classification tasks. Experimental results show that the KNNENS achieves the best performance on four benchmark datasets and three real-world data streams in terms of accuracy and F1-measure and has a relatively fast run time compared to four reference methods. Codes are available at https://github.com/Ntriver/KNNENS.
Collapse
|
8
|
Shen M, Chen D, Hu S, Xu G. Class incremental learning of remote sensing images based on class similarity distillation. PeerJ Comput Sci 2023; 9:e1583. [PMID: 37810339 PMCID: PMC10557500 DOI: 10.7717/peerj-cs.1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 08/20/2023] [Indexed: 10/10/2023]
Abstract
When a well-trained model learns a new class, the data distribution differences between the new and old classes inevitably cause catastrophic forgetting in order to perform better in the new class. This behavior differs from human learning. In this article, we propose a class incremental object detection method for remote sensing images to address the problem of catastrophic forgetting caused by distribution differences among different classes. First, we introduce a class similarity distillation (CSD) loss based on the similarity between new and old class prototypes, ensuring the model's plasticity to learn new classes and stability to detect old classes. Second, to better extract class similarity features, we propose a global similarity distillation (GSD) loss that maximizes the mutual information between the new class feature and old class features. Additionally, we present a region proposal network (RPN)-based method that assigns positive and negative labels to prevent mislearning issues. Experiments demonstrate that our method is more accurate for class incremental learning on public DOTA and DIOR datasets and significantly improves training efficiency compared to state-of-the-art class incremental object detection methods.
Collapse
Affiliation(s)
- Mingge Shen
- Zhejiang College of Security Technology, College of Intelligent Equipment, Wenzhou, Zhejiang, China
- Zhejiang College of Security Technology, Wenzhou Key Laboratory of Stereoscopic and Intelligent Monitoring and Warning of Natural Disasters, Wenzhou, Zhejiang, China
| | - Dehu Chen
- Wenzhou University of Technology, College of Architecture and Energy Engineering, Wenzhou, Zhejiang, China
- Wenzhou University of Technology, Wenzhou Key Laboratory of Intelligent Lifeline Protection and Emergency Technology for Resilient City, Wenzhou, Zhejiang, China
| | - Silan Hu
- Macau University of Science and Technology, Faculty of Innovation Engineering, Macau, Macau, China
| | - Gang Xu
- Zhejiang College of Security Technology, College of Intelligent Equipment, Wenzhou, Zhejiang, China
- Zhejiang College of Security Technology, Wenzhou Key Laboratory of Stereoscopic and Intelligent Monitoring and Warning of Natural Disasters, Wenzhou, Zhejiang, China
| |
Collapse
|
9
|
Zhu X, Liu B, Ren J, Zhu X, Mao Y, Wu X, Li Y, Wu Y, Zhao L, Sun T, Ullah R, Chen Y. Optical performance monitoring using lifelong learning with confrontational knowledge distillation in 7-core fiber for elastic optical networks. OPTICS EXPRESS 2022; 30:27109-27122. [PMID: 36236888 DOI: 10.1364/oe.463490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 06/28/2022] [Indexed: 06/16/2023]
Abstract
We propose a novel optical performance monitoring (OPM) scheme, including modulation format recognition (MFR) and optical signal-to-noise ratio (OSNR) estimation, for 7-core fiber in elastic optical networks (EONs) by using the specific Stokes sectional images of the received signals. Meanwhile, MFR and OSNR estimation in all channels can be utilized by using a lightweight neural network via lifelong learning. In addition, the proposed scheme saves the computational resources for real implementation through confrontational knowledge distillation, making it easy to deploy the proposed neural network in the receiving end and intermediate node. Five modulation formats, including BPSK, QPSK, 8PSK, 8QAM, and 16QAM, were recognized by the proposed scheme within the OSNR of 10-30 dB over 2 km weakly coupled 7-core fiber. Experimental results show that 100% recognition accuracy of all these five modulation formats can be achieved while the RMSE of the estimation is below 0.1 dB. Compared with conventional neural network architectures, the proposed neural network achieves better performance, whose runtime is merely 20.2 ms, saving the computational resource of the optical network.
Collapse
|
10
|
Yun P, Liu Y, Liu M. In Defense of Knowledge Distillation for Task Incremental Learning and Its Application in 3D Object Detection. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3060417] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Shieh JL, Haq QMU, Haq MA, Karam S, Chondro P, Gao DQ, Ruan SJ. Continual Learning Strategy in One-Stage Object Detection Framework Based on Experience Replay for Autonomous Driving Vehicle. SENSORS 2020; 20:s20236777. [PMID: 33260864 PMCID: PMC7730714 DOI: 10.3390/s20236777] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 11/24/2020] [Accepted: 11/24/2020] [Indexed: 11/16/2022]
Abstract
Object detection is an important aspect for autonomous driving vehicles (ADV), which may comprise of a machine learning model that detects a range of classes. As the deployment of ADV widens globally, the variety of objects to be detected may increase beyond the designated range of classes. Continual learning for object detection essentially ensure a robust adaptation of a model to detect additional classes on the fly. This study proposes a novel continual learning method for object detection that learns new object class(es) along with cumulative memory of classes from prior learning rounds to avoid any catastrophic forgetting. The results of PASCAL VOC 2007 have suggested that the proposed ER method obtains 4.3% of mAP drop compared against the all-classes learning, which is the lowest amongst other prior arts.
Collapse
Affiliation(s)
- Jeng-Lun Shieh
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan; (J.-L.S.); (Q.M.u.H.); (M.A.H.); (S.K.)
| | - Qazi Mazhar ul Haq
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan; (J.-L.S.); (Q.M.u.H.); (M.A.H.); (S.K.)
| | - Muhamad Amirul Haq
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan; (J.-L.S.); (Q.M.u.H.); (M.A.H.); (S.K.)
| | - Said Karam
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan; (J.-L.S.); (Q.M.u.H.); (M.A.H.); (S.K.)
| | - Peter Chondro
- Information and Communications Research Laboratories, Embedded Vision and Graphics Technology Department, Division for Embedded System and SoC Technology, Industrial Technology Research Institute, Hsinchu 31057, Taiwan; (P.C.); (D.-Q.G.)
| | - De-Qin Gao
- Information and Communications Research Laboratories, Embedded Vision and Graphics Technology Department, Division for Embedded System and SoC Technology, Industrial Technology Research Institute, Hsinchu 31057, Taiwan; (P.C.); (D.-Q.G.)
| | - Shanq-Jang Ruan
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan; (J.-L.S.); (Q.M.u.H.); (M.A.H.); (S.K.)
- Correspondence:
| |
Collapse
|