1
|
Huang W, Sun S, Lin X, Li P, Zhu L, Wang J, Chen CLP, Sheng B. Unsupervised Fusion Feature Matching for Data Bias in Uncertainty Active Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5749-5763. [PMID: 36215385 DOI: 10.1109/tnnls.2022.3209085] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Active learning (AL) aims to sample the most valuable data for model improvement from the unlabeled pool. Traditional works, especially uncertainty-based methods, are prone to suffer from a data bias issue, which means that selected data cannot cover the entire unlabeled pool well. Although there have been lots of literature works focusing on this issue recently, they mainly benefit from the huge additional training costs and the artificially designed complex loss. The latter causes these methods to be redesigned when facing new models or tasks, which is very time-consuming and laborious. This article proposes a feature-matching-based uncertainty that resamples selected uncertainty data by feature matching, thus removing similar data to alleviate the data bias issue. To ensure that our proposed method does not introduce a lot of additional costs, we specially design a unsupervised fusion feature matching (UFFM), which does not require any training in our novel AL framework. Besides, we also redesign several classic uncertainty methods to be applied to more complex visual tasks. We conduct rigorous experiments on lots of standard benchmark datasets to validate our work. The experimental results show that our UFFM is better than the similar unsupervised feature matching technologies, and our proposed uncertainty calculation method outperforms random sampling, classic uncertainty approaches, and recent state-of-the-art (SOTA) uncertainty approaches.
Collapse
|
2
|
Aromolaran OT, Isewon I, Adedeji E, Oswald M, Adebiyi E, Koenig R, Oyelade J. Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster. PLoS One 2023; 18:e0288023. [PMID: 37556452 PMCID: PMC10411809 DOI: 10.1371/journal.pone.0288023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/18/2023] [Indexed: 08/11/2023] Open
Abstract
Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.
Collapse
Affiliation(s)
- Olufemi Tony Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunu Isewon
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Eunice Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Marcus Oswald
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
3
|
Active Learning by Extreme Learning Machine with Considering Exploration and Exploitation Simultaneously. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11089-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
4
|
Robust active representation via ℓ2,p-norm constraints. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
5
|
Li C, Li R, Yuan Y, Wang G, Xu D. Deep Unsupervised Active Learning via Matrix Sketching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:9280-9293. [PMID: 34739378 DOI: 10.1109/tip.2021.3124317] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Most existing unsupervised active learning methods aim at minimizing the data reconstruction loss by using the linear models to choose representative samples for manually labeling in an unsupervised setting. Thus these methods often fail in modelling data with complex non-linear structure. To address this issue, we propose a new deep unsupervised Active Learning method for classification tasks, inspired by the idea of Matrix Sketching, called ALMS. Specifically, ALMS leverages a deep auto-encoder to embed data into a latent space, and then describes all the embedded data with a small size sketch to summarize the major characteristics of the data. In contrast to previous approaches that reconstruct the whole data matrix for selecting the representative samples, ALMS aims to select a representative subset of samples to well approximate the sketch, which can preserve the major information of data meanwhile significantly reducing the number of network parameters. This makes our algorithm alleviate the issue of model overfitting and readily cope with large datasets. Actually, the sketch provides a type of self-supervised signal to guide the learning of the model. Moreover, we propose to construct an auxiliary self-supervised task by classifying real/fake samples, in order to further improve the representation ability of the encoder. We thoroughly evaluate the performance of ALMS on both single-label and multi-label classification tasks, and the results demonstrate its superior performance against the state-of-the-art methods. The code can be found at https://github.com/lrq99/ALMS.
Collapse
|
6
|
Fang Q, Xu X, Tang D. Loss-based active learning via double-branch deep network. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211044930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Due to the limitation of data annotation and the ability of dealing with label-efficient problems, active learning has received lots of research interest in recent years. Most of the existing approaches focus on designing a different selection strategy to achieve better performance for special tasks; however, the performance of the strategy still needs to be improved. In this work, we focus on improving the performance of active learning and propose a loss-based strategy that learns to predict target losses of unlabeled inputs to select the most uncertain samples, which is designed to learn a better selection strategy based on a double-branch deep network. Experimental results on two visual recognition tasks show that our approach achieves the state-of-the-art performance compared with previous methods. Moreover, our approach is also robust to different network architectures, biased initial labels, noisy oracles, or sampling budget sizes, and the complexity is also competitive, which demonstrates the effectiveness and efficiency of our proposed approach.
Collapse
Affiliation(s)
- Qiang Fang
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
| | - Xin Xu
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
| | - Dengqing Tang
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
| |
Collapse
|
7
|
Li H, Wang Y, Li Y, Xiao G, Hu P, Zhao R, Li B. Learning adaptive criteria weights for active semi-supervised learning. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.01.045] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|