1
|
Ding H, Sun Y, Wang Z, Huang N, Shen Z, Cui X. RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
2
|
Van der Schraelen L, Stouthuysen K, Vanden Broucke S, Verdonck T. Regularization oversampling for classification tasks: To exploit what you do not know. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
3
|
Novel motor fault detection scheme based on one-class tensor hyperdisk. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
4
|
Yan M, Hui SC, Li N. DML-PL: Deep Metric Learning Based Pseudo-Labeling Framework for Class Imbalanced Semi-Supervised Learning. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
5
|
An imbalanced binary classification method via space mapping using normalizing flows with class discrepancy constraints. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
6
|
Zhang Y, Li L, Ren Z, Yu Y, Li Y, Pan J, Lu Y, Feng L, Zhang W, Han Y. Plant-scale biogas production prediction based on multiple hybrid machine learning technique. BIORESOURCE TECHNOLOGY 2022; 363:127899. [PMID: 36075348 DOI: 10.1016/j.biortech.2022.127899] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/28/2022] [Accepted: 08/31/2022] [Indexed: 06/15/2023]
Abstract
The parameters from full-scale biogas plants are highly nonlinear and imbalanced, resulting in low prediction accuracy when using traditional machine learning algorithms. In this study, a hybrid extreme learning machine (ELM) model was proposed to improve prediction accuracy by solving imbalanced data. The results showed that the best ELM model had a good prediction for validation data (R2 = 0.972), and the model was developed into the software (prediction error of 2.15 %). Furthermore, two parameters within a certain range (feed volume (FV) = 23-45 m3 and total volatile fatty acids of anaerobic digestion (TVFAAD) = 1750-3000 mg/L) were identified as the most important characteristics that positively affected biogas production. This study combines machine learning with data-balancing techniques and optimization algorithms to achieve accurate predictions of plant biogas production at various loads.
Collapse
Affiliation(s)
- Yi Zhang
- State Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing 102249, PR China
| | - Linhui Li
- College of Artificial Intelligence, China University of Petroleum Beijing (CUPB), Beijing 102249, PR China
| | - Zhonghao Ren
- State Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing 102249, PR China
| | - Yating Yu
- State Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing 102249, PR China
| | - Yeqing Li
- State Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing 102249, PR China.
| | - Junting Pan
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, PR China
| | - Yanjuan Lu
- Beijing Fairyland Environmental Technology Co., Ltd, Beijing 100094, PR China
| | - Lu Feng
- NIBIO, Norwegian Institute of Bioeconomy Research, P.O. Box 115, N-1431 Ås, Norway
| | - Weijin Zhang
- School of Energy Science and Engineering, Central South University, Changsha 410083, PR China
| | - Yongming Han
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, PR China
| |
Collapse
|
7
|
Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning. Sci Rep 2022; 12:8724. [PMID: 35610323 PMCID: PMC9130330 DOI: 10.1038/s41598-022-12682-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 05/13/2022] [Indexed: 11/22/2022] Open
Abstract
Traditional supervised learning algorithms do not satisfactorily solve the classification problem on imbalanced data sets, since they tend to assign the majority class, to the detriment of the minority class classification. In this paper, we introduce the Bayesian network-based over-sampling method (BOSME), which is a new over-sampling methodology based on Bayesian networks. Over-sampling methods handle imbalanced data by generating synthetic minority instances, with the benefit that classifiers learned from a more balanced data set have a better ability to predict the minority class. What makes BOSME different is that it relies on a new approach, generating artificial instances of the minority class following the probability distribution of a Bayesian network that is learned from the original minority classes by likelihood maximization. We compare BOSME with the benchmark synthetic minority over-sampling technique (SMOTE) through a series of experiments in the context of indirect cost-sensitive learning, with some state-of-the-art classifiers and various data sets, showing statistical evidence in favor of BOSME, with respect to the expected (misclassification) cost.
Collapse
|
8
|
An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083928] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Many real-world classification problems such as fraud detection, intrusion detection, churn prediction, and anomaly detection suffer from the problem of imbalanced datasets. Therefore, in all such classification tasks, we need to balance the imbalanced datasets before building classifiers for prediction purposes. Several data-balancing techniques (DBT) have been discussed in the literature to address this issue. However, not much work is conducted to assess the performance of DBT. Therefore, in this research paper we empirically assess the performance of the data-preprocessing-level data-balancing techniques, namely: Under Sampling (OS), Over Sampling (OS), Hybrid Sampling (HS), Random Over Sampling Examples (ROSE), Synthetic Minority Over Sampling (SMOTE), and Clustering-Based Under Sampling (CBUS) techniques. We have used six different classifiers and twenty-five different datasets, that have varying levels of imbalance ratio (IR), to assess the performance of DBT. The experimental results indicate that DBT helps to improve the performance of the classifiers. However, no significant difference was observed in the performance of the US, OS, HS, SMOTE, and CBUS. It was also observed that performance of DBT was not consistent across varying levels of IR in the dataset and different classifiers.
Collapse
|
9
|
|