1
|
Xu Y, Yu Z, Chen CLP, Liu Z. Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2284-2297. [PMID: 34469316 DOI: 10.1109/tnnls.2021.3106306] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
It is hard to construct an optimal classifier for high-dimensional imbalanced data, on which the performance of classifiers is seriously affected and becomes poor. Although many approaches, such as resampling, cost-sensitive, and ensemble learning methods, have been proposed to deal with the skewed data, they are constrained by high-dimensional data with noise and redundancy. In this study, we propose an adaptive subspace optimization ensemble method (ASOEM) for high-dimensional imbalanced data classification to overcome the above limitations. To construct accurate and diverse base classifiers, a novel adaptive subspace optimization (ASO) method based on adaptive subspace generation (ASG) process and rotated subspace optimization (RSO) process is designed to generate multiple robust and discriminative subspaces. Then a resampling scheme is applied on the optimized subspace to build a class-balanced data for each base classifier. To verify the effectiveness, our ASOEM is implemented based on different resampling strategies on 24 real-world high-dimensional imbalanced datasets. Experimental results demonstrate that our proposed methods outperform other mainstream imbalance learning approaches and classifier ensemble methods.
Collapse
|
2
|
Liu J, Lei J, Liao Z, He J. Software defect prediction model based on improved twin support vector machines. Soft comput 2023. [DOI: 10.1007/s00500-023-07984-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
3
|
An ensemble algorithm using quantum evolutionary optimization of weighted type-II fuzzy system and staged Pegasos Quantum Support Vector Classifier with multi-criteria decision making system for diagnosis and grading of breast cancer. Soft comput 2023. [DOI: 10.1007/s00500-023-07939-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2023]
|
4
|
Chen S, Wang R, Lu J. A meta-framework for multi-label active learning based on deep reinforcement learning. Neural Netw 2023; 162:258-270. [PMID: 36913822 DOI: 10.1016/j.neunet.2023.02.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/02/2023] [Accepted: 02/28/2023] [Indexed: 03/09/2023]
Abstract
Multi-label Active Learning (MLAL) is an effective method to improve the performance of the classifier on multi-label problems with less annotation effort by allowing the learning system to actively select high-quality examples (example-label pairs) for labeling. Existing MLAL algorithms mainly focus on designing reasonable algorithms to evaluate the potential values (as previously mentioned quality) of the unlabeled data. These manually designed methods may show totally different results on various types of datasets due to the defect of the methods or the particularity of the datasets. In this paper, instead of manually designing an evaluation method, we propose a deep reinforcement learning (DRL) model to explore a general evaluation method on several seen datasets and eventually apply it to unseen datasets based on a meta framework. In addition, a self-attention mechanism along with a reward function is integrated into the DRL structure to address the label correlation and data imbalanced problems in MLAL. Comprehensive experiments show that our proposed DRL-based MLAL method is able to produce comparable results as compared with other methods reported in the literature.
Collapse
Affiliation(s)
- Shuyue Chen
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China.
| | - Ran Wang
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China; Shenzhen Key Lab. of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, 518060, China; Guangdong Key Lab. of Intelligent Information Process, Shenzhen University, Shenzhen, 518060, China.
| | - Jian Lu
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China; Shenzhen Key Lab. of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, 518060, China.
| |
Collapse
|
5
|
Pérez E, Ventura S. A framework to build accurate Convolutional Neural Network models for melanoma diagnosis. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
6
|
Stable Matching-Based Two-Way Selection in Multi-Label Active Learning with Imbalanced Data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Addressing Label Ambiguity Imbalance in Candidate Labels: Measures and Disambiguation Algorithm. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Chen W, Yang K, Yu Z, Zhang W. Double-kernel based class-specific broad learning system for multiclass imbalance learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
9
|
Deep Learning Algorithm for Heart Valve Diseases Assisted Diagnosis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Heart sounds are mainly the expressions of the opening and closing of the heart valves. Some sounds are produced by the interruption of laminar blood flow as it turns into turbulent flow, which is explained by abnormal functioning of the valves. The analysis of the phonocardiographic signals has made it possible to indicate that the normal and pathological records differ from each other concerning both temporal and spectral features. The present work describes the design and implementation based on deep neural networks and deep learning for the binary and multiclass classification of four common valvular pathologies and normal heart sounds. For feature extraction, three different techniques were considered: Discrete Wavelet Transform, Continuous Wavelet Transform and Mel Frequency Cepstral Coefficients. The performance of both approaches reached F1 scores higher than 98% and specificities in the “Normal” class of up to 99%, which considers the cases that can be misclassified as normal. These results place the present work as a highly competitive proposal for the generation of systems for assisted diagnosis.
Collapse
|
10
|
Jiang M, Yang Y, Qiu H. Fuzzy entropy and fuzzy support-based boosting random forests for imbalanced data. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02620-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
R E, Jain DK, Kotecha K, Pandya S, Reddy SS, E R, Varadarajan V, Mahanti A, V S. Hybrid Deep Neural Network for Handling Data Imbalance in Precursor MicroRNA. Front Public Health 2022; 9:821410. [PMID: 35004605 PMCID: PMC8733243 DOI: 10.3389/fpubh.2021.821410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, the field of bioinformatics has been increasing rapidly. Robust bioinformatics tools are going to play a vital role in future progress. Scientists working in the field of bioinformatics conduct a large number of researches to extract knowledge from the biological data available. Several bioinformatics issues have evolved as a result of the creation of massive amounts of unbalanced data. The classification of precursor microRNA (pre miRNA) from the imbalanced RNA genome data is one such problem. The examinations proved that pre miRNAs (precursor microRNAs) could serve as oncogene or tumor suppressors in various cancer types. This paper introduces a Hybrid Deep Neural Network framework (H-DNN) for the classification of pre miRNA in imbalanced data. The proposed H-DNN framework is an integration of Deep Artificial Neural Networks (Deep ANN) and Deep Decision Tree Classifiers. The Deep ANN in the proposed H-DNN helps to extract the meaningful features and the Deep Decision Tree Classifier helps to classify the pre miRNA accurately. Experimentation of H-DNN was done with genomes of animals, plants, humans, and Arabidopsis with an imbalance ratio up to 1:5000 and virus with a ratio of 1:400. Experimental results showed an accuracy of more than 99% in all the cases and the time complexity of the proposed H-DNN is also very less when compared with the other existing approaches.
Collapse
Affiliation(s)
- Elakkiya R
- School of Computing, SASTRA Deemed University, Thanjavur, India
| | - Deepak Kumar Jain
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, India
| | - Sharnil Pandya
- Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | | | - Rajalakshmi E
- School of Computing, SASTRA Deemed University, Thanjavur, India
| | - Vijayakumar Varadarajan
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
| | | | | |
Collapse
|
12
|
Fan S, Zhang X, Song Z. Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
13
|
Active learning with extreme learning machine for online imbalanced multiclass classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107385] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
14
|
Walczak S. Predicting Crime and Other Uses of Neural Networks in Police Decision Making. Front Psychol 2021; 12:587943. [PMID: 34690848 PMCID: PMC8529125 DOI: 10.3389/fpsyg.2021.587943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 08/24/2021] [Indexed: 11/21/2022] Open
Abstract
Neural networks are a machine learning method that excel in solving classification and forecasting problems. They have also been shown to be a useful tool for working with big data oriented environments such as law enforcement. This article reviews and examines existing research on the utilization of neural networks for forecasting crime and other police decision making problem solving. Neural network models to predict specific types of crime using location and time information and to predict a crime’s location when given the crime and time of day are developed to demonstrate the application of neural networks to police decision making. The neural network crime prediction models utilize geo-spatiality to provide immediate information on crimes to enhance law enforcement decision making. The neural network models are able to predict the type of crime being committed 16.4% of the time for 27 different types of crime or 27.1% of the time when similar crimes are grouped into seven categories of crime. The location prediction neural networks are able to predict the zip code location or adjacent location 31.2% of the time.
Collapse
Affiliation(s)
- Steven Walczak
- School of Information and Florida Center for Cybersecurity, University of South Florida, Tampa, FL, United States
| |
Collapse
|
15
|
Alharthi R, Alhothali A, Moria K. A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter. INFORM SYST 2021. [DOI: 10.1016/j.is.2021.101740] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
16
|
Liu M, Dong M, Jing C. A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
17
|
Addressing the multi-label imbalance for neural networks: An approach based on stratified mini-batches. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
Classifying multiclass imbalanced data using generalized class-specific extreme learning machine. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00236-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13030464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Conventional classification algorithms have shown great success in balanced hyperspectral data classification. However, the imbalanced class distribution is a fundamental problem of hyperspectral data, and it is regarded as one of the great challenges in classification tasks. To solve this problem, a non-ANN based deep learning, namely SMOTE-Based Weighted Deep Rotation Forest (SMOTE-WDRoF) is proposed in this paper. First, the neighboring pixels of instances are introduced as the spatial information and balanced datasets are created by using the SMOTE algorithm. Second, these datasets are fed into the WDRoF model that consists of the rotation forest and the multi-level cascaded random forests. Specifically, the rotation forest is used to generate rotation feature vectors, which are input into the subsequent cascade forest. Furthermore, the output probability of each level and the original data are stacked as the dataset of the next level. And the sample weights are automatically adjusted according to the dynamic weight function constructed by the classification results of each level. Compared with the traditional deep learning approaches, the proposed method consumes much less training time. The experimental results on four public hyperspectral data demonstrate that the proposed method can get better performance than support vector machine, random forest, rotation forest, SMOTE combined rotation forest, convolutional neural network, and rotation-based deep forest in multiclass imbalance learning.
Collapse
|
20
|
Xing F, Zhang X, Cornish TC. Artificial intelligence for pathology. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00011-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
Wang Z, Cao C, Zhu Y. Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5178-5191. [PMID: 31995503 DOI: 10.1109/tnnls.2020.2964585] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a novel entropy and confidence-based undersampling boosting (ECUBoost) framework to solve imbalanced problems. The boosting-based ensemble is combined with a new undersampling method to improve the generalization performance. To avoid losing informative samples during the data preprocessing of the boosting-based ensemble, both confidence and entropy are used in ECUBoost as benchmarks to ensure the validity and structural distribution of the majority samples during the undersampling. Furthermore, different from other iterative dynamic resampling methods, ECUBoost based on confidence can be applied to algorithms without iterations such as decision trees. Meanwhile, random forests are used as base classifiers in ECUBoost. Furthermore, experimental results on both artificial data sets and KEEL data sets prove the effectiveness of the proposed method.
Collapse
|
22
|
Gultekin S, Saha A, Ratnaparkhi A, Paisley J. MBA: Mini-Batch AUC Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5561-5574. [PMID: 32142457 DOI: 10.1109/tnnls.2020.2969527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of machine-learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large data sets remains an open challenge for this problem. This article proposes a novel approach to AUC maximization based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.
Collapse
|
23
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2857-2867. [PMID: 31170082 DOI: 10.1109/tnnls.2019.2914471] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the postgenome era, many problems in bioinformatics have arisen due to the generation of large amounts of imbalanced data. In particular, the computational classification of precursor microRNA (pre-miRNA) involves a high imbalance in the classes. For this task, a classifier is trained to identify RNA sequences having the highest chance of being miRNA precursors. The big issue is that well-known pre-miRNAs are usually just a few in comparison to the hundreds of thousands of candidate sequences in a genome, which results in highly imbalanced data. This imbalance has a strong influence on most standard classifiers and, if not properly addressed, the classifier is not able to work properly in a real-life scenario. This work provides a comparative assessment of recent deep neural architectures for dealing with the large imbalanced data issue in the classification of pre-miRNAs. We present and analyze recent architectures in a benchmark framework with genomes of animals and plants, with increasing imbalance ratios up to 1:2000. We also propose a new graphical way for comparing classifiers performance in the context of high-class imbalance. The comparative results obtained show that, at a very high imbalance, deep belief neural networks can provide the best performance.
Collapse
|
24
|
Rocha WFDC, do Prado CB, Blonder N. Comparison of Chemometric Problems in Food Analysis Using Non-Linear Methods. Molecules 2020; 25:E3025. [PMID: 32630676 PMCID: PMC7411792 DOI: 10.3390/molecules25133025] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 06/25/2020] [Accepted: 06/29/2020] [Indexed: 11/16/2022] Open
Abstract
Food analysis is a challenging analytical problem, often addressed using sophisticated laboratory methods that produce large data sets. Linear and non-linear multivariate methods can be used to process these types of datasets and to answer questions such as whether product origin is accurately labeled or whether a product is safe to eat. In this review, we present the application of non-linear methods such as artificial neural networks, support vector machines, self-organizing maps, and multi-layer artificial neural networks in the field of chemometrics related to food analysis. We discuss criteria to determine when non-linear methods are better suited for use instead of traditional methods. The principles of algorithms are described, and examples are presented for solving the problems of exploratory analysis, classification, and prediction.
Collapse
Affiliation(s)
- Werickson Fortunato de Carvalho Rocha
- National Institute of Metrology, Quality and Technology (INMETRO), Av. N. S. das Graças, 50, Xerém, Duque de Caxias 25250-020, RJ, Brazil; (W.F.C.R.); (C.B.d.P.)
- National Institute of Standards and Technology (NIST), 100 Bureau Drive, Stop 8390 Gaithersburg, MD 20899, USA
| | - Charles Bezerra do Prado
- National Institute of Metrology, Quality and Technology (INMETRO), Av. N. S. das Graças, 50, Xerém, Duque de Caxias 25250-020, RJ, Brazil; (W.F.C.R.); (C.B.d.P.)
| | - Niksa Blonder
- National Institute of Standards and Technology (NIST), 100 Bureau Drive, Stop 8390 Gaithersburg, MD 20899, USA
| |
Collapse
|
25
|
Vong CM, Du J. Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data. Neural Netw 2020; 128:268-278. [PMID: 32454371 DOI: 10.1016/j.neunet.2020.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 03/30/2020] [Accepted: 05/11/2020] [Indexed: 11/16/2022]
Abstract
Multi-class classification for highly imbalanced data is a challenging task in which multiple issues must be resolved simultaneously, including (i) accuracy on classifying highly imbalanced multi-class data; (ii) training efficiency for large data; and (iii) sensitivity to high imbalance ratio (IR). In this paper, a novel sequential ensemble learning (SEL) framework is designed to simultaneously resolve these issues. SEL framework provides a significant property over traditional AdaBoost, in which the majority samples can be divided into multiple small and disjoint subsets for training multiple weak learners without compromising accuracy (while AdaBoost cannot). To ensure the class balance and majority-disjoint property of subsets, a learning strategy called balanced and majority-disjoint subsets division (BMSD) is developed. Unfortunately it is difficult to derive a general learner combination method (LCM) for any kind of weak learner. In this work, LCM is specifically designed for extreme learning machine, called LCM-ELM. The proposed SEL framework with BMSD and LCM-ELM has been compared with state-of-the-art methods over 16 benchmark datasets. In the experiments, under highly imbalanced multi-class data (IR up to 14K; data size up to 493K), (i) the proposed works improve the performance in different measures including G-mean, macro-F, micro-F, MAUC; (ii) training time is significantly reduced.
Collapse
Affiliation(s)
- Chi-Man Vong
- Department of Computer and Information Science, University of Macau, Macau- SAR 999078, China.
| | - Jie Du
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518060, China.
| |
Collapse
|
26
|
|
27
|
Munkhdalai L, Munkhdalai T, Ryu KH. GEV-NN: A deep neural network architecture for class imbalance problem in binary classification. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105534] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
28
|
Automatic Semantic Segmentation with DeepLab Dilated Learning Network for Change Detection in Remote Sensing Images. Neural Process Lett 2020. [DOI: 10.1007/s11063-019-10174-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Li Y, Wei J, Kang K, Wu Z. An efficient noise-filtered ensemble model for customer churn analysis in aviation industry. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-182807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Yongjun Li
- School of Computer Science and Engineering, South China University of Technology, GuangZhou, China
| | - Jianshuang Wei
- School of Computer Science and Engineering, South China University of Technology, GuangZhou, China
| | - Kai Kang
- School of Information Science and Engineering, Central South University, ChangSha, China
| | - Zhouyang Wu
- School of Computer Science and Engineering, South China University of Technology, GuangZhou, China
| |
Collapse
|
30
|
Fernandes ER, de Carvalho AC. Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.04.052] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
31
|
Predicting Seminal Quality via Imbalanced Learning with Evolutionary Safe-Level Synthetic Minority Over-Sampling Technique. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09657-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
32
|
Malhotra R, Kamal S. An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.04.090] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
33
|
Zhu T, Lin Y, Liu Y, Zhang W, Zhang J. Minority oversampling for imbalanced ordinal regression. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.12.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
34
|
González-Barcenas VM, Rendón E, Alejo R, Granda-Gutiérrez EE, Valdovinos RM. Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1007/978-3-030-31332-6_19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
Zhang C, Tan KC, Li H, Hong GS. A Cost-Sensitive Deep Belief Network for Imbalanced Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:109-122. [PMID: 29993587 DOI: 10.1109/tnnls.2018.2832648] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Imbalanced data with a skewed class distribution are common in many real-world applications. Deep Belief Network (DBN) is a machine learning technique that is effective in classification tasks. However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class. To deal with this problem, cost-sensitive approaches assign different misclassification costs for different classes without disrupting the true data sample distributions. However, due to lack of prior knowledge, the misclassification costs are usually unknown and hard to choose in practice. Moreover, it has not been well studied as to how cost-sensitive learning could improve DBN performance on imbalanced data problems. This paper proposes an evolutionary cost-sensitive deep belief network (ECS-DBN) for imbalanced classification. ECS-DBN uses adaptive differential evolution to optimize the misclassification costs based on the training data that presents an effective approach to incorporating the evaluation measure (i.e., G-mean) into the objective function. We first optimize the misclassification costs, and then apply them to DBN. Adaptive differential evolution optimization is implemented as the optimization algorithm that automatically updates its corresponding parameters without the need of prior domain knowledge. The experiments have shown that the proposed approach consistently outperforms the state of the art on both benchmark data sets and real-world data set for fault diagnosis in tool condition monitoring.
Collapse
|
36
|
A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.08.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Vong CM, Du J, Wong CM, Cao JW. Postboosting Using Extended G-Mean for Online Sequential Multiclass Imbalance Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6163-6177. [PMID: 29993897 DOI: 10.1109/tnnls.2018.2826553] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a novel learning method called postboosting using extended G-mean (PBG) is proposed for online sequential multiclass imbalance learning (OS-MIL) in neural networks. PBG is effective due to three reasons. 1) Through postadjusting a classification boundary under extended G-mean, the challenging issue of imbalanced class distribution for sequentially arriving multiclass data can be effectively resolved. 2) A newly derived update rule for online sequential learning is proposed, which produces a high G-mean for current model and simultaneously possesses almost the same information of its previous models. 3) A dynamic adjustment mechanism provided by extended G-mean is valid to deal with the unresolved challenging dense-majority problem and two dynamic changing issues, namely, dynamic changing data scarcity (DCDS) and dynamic changing data diversity (DCDD). Compared to other OS-MIL methods, PBG is highly effective on resolving DCDS, while PBG is the only method to resolve dense-majority and DCDD. Furthermore, PBG can directly and effectively handle unscaled data stream. Experiments have been conducted for PBG and two popular OS-MIL methods for neural networks under massive binary and multiclass data sets. Through the analyses of experimental results, PBG is shown to outperform the other compared methods on all data sets in various aspects including the issues of data scarcity, dense-majority, DCDS, DCDD, and unscaled data.
Collapse
|
38
|
Xing F, Xie Y, Su H, Liu F, Yang L. Deep Learning in Microscopy Image Analysis: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4550-4568. [PMID: 29989994 DOI: 10.1109/tnnls.2017.2766168] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Computerized microscopy image analysis plays an important role in computer aided diagnosis and prognosis. Machine learning techniques have powered many aspects of medical investigation and clinical practice. Recently, deep learning is emerging as a leading machine learning tool in computer vision and has attracted considerable attention in biomedical image analysis. In this paper, we provide a snapshot of this fast-growing field, specifically for microscopy image analysis. We briefly introduce the popular deep neural networks and summarize current deep learning achievements in various tasks, such as detection, segmentation, and classification in microscopy image analysis. In particular, we explain the architectures and the principles of convolutional neural networks, fully convolutional networks, recurrent neural networks, stacked autoencoders, and deep belief networks, and interpret their formulations or modelings for specific tasks on various microscopy images. In addition, we discuss the open challenges and the potential trends of future research in microscopy image analysis using deep learning.
Collapse
|
39
|
Abstract
Classification of data with imbalanced class distribution has encountered a significant drawback by most conventional classification learning methods which assume a relatively balanced class distribution. This paper proposes a novel classification method based on data-partition and SMOTE for imbalanced learning. The proposed method differs from conventional ones in both the learning and prediction stages. For the learning stage, the proposed method uses the following three steps to learn a class-imbalance oriented model: (1) partitioning the majority class into several clusters using data partition methods such as K-Means, (2) constructing a novel training set using SMOTE on each data set obtained by merging each cluster with the minority class, and (3) learning a classification model on each training set using convention classification learning methods including decision tree, SVM and neural network. Therefore, a classifier repository consisting of several classification models is constructed. With respect to the prediction stage, for a given example to be classified, the proposed method uses the partition model constructed in the learning stage to select a model from the classifier repository to predict the example. Comprehensive experiments on KEEL data sets show that the proposed method outperforms some other existing methods on evaluation measures of recall, g-mean, f-measure and AUC.
Collapse
|
40
|
Li L, Li W, Zou B, Wang Y, Tang YY, Han H. Learning With Coefficient-Based Regularized Regression on Markov Resampling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4166-4176. [PMID: 29990029 DOI: 10.1109/tnnls.2017.2757140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Big data research has become a globally hot topic in recent years. One of the core problems in big data learning is how to extract effective information from the huge data. In this paper, we propose a Markov resampling algorithm to draw useful samples for handling coefficient-based regularized regression (CBRR) problem. The proposed Markov resampling algorithm is a selective sampling method, which can automatically select uniformly ergodic Markov chain (u.e.M.c.) samples according to transition probabilities. Based on u.e.M.c. samples, we analyze the theoretical performance of CBRR algorithm and generalize the existing results on independent and identically distributed observations. To be specific, when the kernel is infinitely differentiable, the learning rate depending on the sample size $m$ can be arbitrarily close to $\mathcal {O}(m^{-1})$ under a mild regularity condition on the regression function. The good generalization ability of the proposed method is validated by experiments on simulated and real data sets.
Collapse
|
41
|
Wang R, Liu T, Tao D. Multiclass Learning With Partially Corrupted Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2568-2580. [PMID: 28534794 DOI: 10.1109/tnnls.2017.2699783] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Traditional classification systems rely heavily on sufficient training data with accurate labels. However, the quality of the collected data depends on the labelers, among which inexperienced labelers may exist and produce unexpected labels that may degrade the performance of a learning system. In this paper, we investigate the multiclass classification problem where a certain amount of training examples are randomly labeled. Specifically, we show that this issue can be formulated as a label noise problem. To perform multiclass classification, we employ the widely used importance reweighting strategy to enable the learning on noisy data to more closely reflect the results on noise-free data. We illustrate the applicability of this strategy to any surrogate loss functions and to different classification settings. The proportion of randomly labeled examples is proved to be upper bounded and can be estimated under a mild condition. The convergence analysis ensures the consistency of the learned classifier to the optimal classifier with respect to clean data. Two instantiations of the proposed strategy are also introduced. Experiments on synthetic and real data verify that our approach yields improvements over the traditional classifiers as well as the robust classifiers. Moreover, we empirically demonstrate that the proposed strategy is effective even on asymmetrically noisy data.
Collapse
|
42
|
Hu J, Yang H, Lyu MR, King I, Man-Cho So A. Online Nonlinear AUC Maximization for Imbalanced Data Sets. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:882-895. [PMID: 28141529 DOI: 10.1109/tnnls.2016.2610465] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Classifying binary imbalanced streaming data is a significant task in both machine learning and data mining. Previously, online area under the receiver operating characteristic (ROC) curve (AUC) maximization has been proposed to seek a linear classifier. However, it is not well suited for handling nonlinearity and heterogeneity of the data. In this paper, we propose the kernelized online imbalanced learning (KOIL) algorithm, which produces a nonlinear classifier for the data by maximizing the AUC score while minimizing a functional regularizer. We address four major challenges that arise from our approach. First, to control the number of support vectors without sacrificing the model performance, we introduce two buffers with fixed budgets to capture the global information on the decision boundary by storing the corresponding learned support vectors. Second, to restrict the fluctuation of the learned decision function and achieve smooth updating, we confine the influence on a new support vector to its -nearest opposite support vectors. Third, to avoid information loss, we propose an effective compensation scheme after the replacement is conducted when either buffer is full. With such a compensation scheme, the performance of the learned model is comparable to the one learned with infinite budgets. Fourth, to determine good kernels for data similarity representation, we exploit the multiple kernel learning framework to automatically learn a set of kernels. Extensive experiments on both synthetic and real-world benchmark data sets demonstrate the efficacy of our proposed approach.
Collapse
|
43
|
Kang Q, Chen X, Li S, Zhou M. A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4263-4274. [PMID: 28113413 DOI: 10.1109/tcyb.2016.2606104] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, -measure, and -mean.
Collapse
|
44
|
Wankhade KK, Jondhale KC, Thool VR. A hybrid approach for classification of rare class data. Knowl Inf Syst 2017. [DOI: 10.1007/s10115-017-1114-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
45
|
Lim P, Goh CK, Tan KC. Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:2850-2861. [PMID: 27337735 DOI: 10.1109/tcyb.2016.2579658] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Class imbalance problems, where the number of samples in each class is unequal, is prevalent in numerous real world machine learning applications. Traditional methods which are biased toward the majority class are ineffective due to the relative severity of misclassifying rare events. This paper proposes a novel evolutionary cluster-based oversampling ensemble framework, which combines a novel cluster-based synthetic data generation method with an evolutionary algorithm (EA) to create an ensemble. The proposed synthetic data generation method is based on contemporary ideas of identifying oversampling regions using clusters. The novel use of EA serves a twofold purpose of optimizing the parameters of the data generation method while generating diverse examples leveraging on the characteristics of EAs, reducing overall computational cost. The proposed method is evaluated on a set of 40 imbalance datasets obtained from the University of California, Irvine, database, and outperforms current state-of-the-art ensemble algorithms tackling class imbalance problems.
Collapse
|
46
|
Cai W, Zhang M, Zhang Y. Batch Mode Active Learning for Regression With Expected Model Change. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1668-1681. [PMID: 28113918 DOI: 10.1109/tnnls.2016.2542184] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.
Collapse
|
47
|
Chen C, Dong D, Qi B, Petersen IR, Rabitz H. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1345-1359. [PMID: 28113872 DOI: 10.1109/tnnls.2016.2540719] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Collapse
|
48
|
Alejo R, Monroy-de-Jesús J, Ambriz-Polo JC, Pacheco-Sánchez JH. An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem. Neural Comput Appl 2017. [DOI: 10.1007/s00521-017-2938-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
49
|
Vluymans S, Triguero I, Cornelis C, Saeys Y. EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.026] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
50
|
A Selective Dynamic Sampling Back-Propagation Approach for Handling the Two-Class Imbalance Problem. APPLIED SCIENCES-BASEL 2016. [DOI: 10.3390/app6070200] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|