1
|
Perales-Gonzalez C, Fernandez-Navarro F, Carbonero-Ruz M, Perez-Rodriguez J. Global Negative Correlation Learning: A Unified Framework for Global Optimization of Ensemble Models. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4031-4042. [PMID: 33571099 DOI: 10.1109/tnnls.2021.3055734] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Ensembles are a widely implemented approach in the machine learning community and their success is traditionally attributed to the diversity within the ensemble. Most of these approaches foster diversity in the ensemble by data sampling or by modifying the structure of the constituent models. Despite this, there is a family of ensemble models in which diversity is explicitly promoted in the error function of the individuals. The negative correlation learning (NCL) ensemble framework is probably the most well-known algorithm within this group of methods. This article analyzes NCL and reveals that the framework actually minimizes the combination of errors of the individuals of the ensemble instead of minimizing the residuals of the final ensemble. We propose a novel ensemble framework, named global negative correlation learning (GNCL), which focuses on the optimization of the global ensemble instead of the individual fitness of its components. An analytical solution for the parameters of base regressors based on the NCL framework and the global error function proposed is also provided under the assumption of fixed basis functions (although the general framework could also be instantiated for neural networks with nonfixed basis functions). The proposed ensemble framework is evaluated by extensive experiments with regression and classification data sets. Comparisons with other state-of-the-art ensemble methods confirm that GNCL yields the best overall performance.
Collapse
|
2
|
Lan G, Gao Z, Tong L, Liu T. Class binarization to neuroevolution for multiclass classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07525-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractMulticlass classification is a fundamental and challenging task in machine learning. The existing techniques of multiclass classification can be categorized as (1) decomposition into binary (2) extension from binary and (3) hierarchical classification. Decomposing multiclass classification into a set of binary classifications that can be efficiently solved by using binary classifiers, called class binarization, which is a popular technique for multiclass classification. Neuroevolution, a general and powerful technique for evolving the structure and weights of neural networks, has been successfully applied to binary classification. In this paper, we apply class binarization techniques to a neuroevolution algorithm, NeuroEvolution of Augmenting Topologies (NEAT), that are used to generate neural networks for multiclass classification. We propose a new method that applies Error-Correcting Output Codes (ECOC) to design the class binarization strategies on the neuroevolution for multiclass classification. The ECOC strategies are compared with the class binarization strategies of One-vs-One and One-vs-All on three well-known datasets of Digit, Satellite, and Ecoli. We analyse their performance from four aspects of multiclass classification degradation, accuracy, evolutionary efficiency, and robustness. The results show that the NEAT with ECOC performs high accuracy with low variance. Specifically, it shows significant benefits in a flexible number of binary classifiers and strong robustness.
Collapse
|
3
|
Heterogeneous feature ensemble modeling with stochastic configuration networks for predicting furnace temperature of a municipal solid waste incineration process. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07271-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Prabhakararao E, Dandapat S. Multi-Scale Convolutional Neural Network Ensemble for Multi-Class Arrhythmia Classification. IEEE J Biomed Health Inform 2021; 26:3802-3812. [PMID: 34962891 DOI: 10.1109/jbhi.2021.3138986] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The automated analysis of electrocardiogram (ECG) signals plays a crucial role in the early diagnosis and management of cardiac arrhythmias. The diverse etiology of arrhythmia and the subtle variations in the pathological ECG characteristics pose challenges in designing reliable automated methods. Existing methods mostly use single deep convolutional neural networks (DCNN) based approaches for arrhythmia classification. Such approaches may not be adequate for effectively representing diverse pathological ECG characteristics. This paper presents a novel way of using an ensemble of multiple DCNN classifiers for effective arrhythmia classification named Deep Multi-Scale Convolutional neural network Ensemble (DMSCE). Specifically, we designed multiple scale-dependent DCNN expert classifiers with different receptive fields to encode the scale-specific pathological ECG characteristics and generate the local predictions. A convolutional gating network is designed to compute the dynamic fusion weights for the experts based on their competencies. These weights are used to aggregate the local predictions and generate final diagnosis decisions. Moreover, a new error function with a correlation penalty is formulated to enable interaction and optimal diversity among experts during the training process. The model is evaluated on the PTBXL-2020 12-lead ECG and the CinC-training2017 single-lead ECG datasets and delivers state-of-the-art performance. Average F1-score of 84.5% and 88.3% are obtained for the PTBXL-2020 and the CinC-training2017 datasets, respectively. Impressive performance across various cardiac arrhythmias and the elegant generalization ability for different leads make the method suitable for reliable remote or in-hospital arrhythmia monitoring applications.
Collapse
|
5
|
Parallel orthogonal deep neural network. Neural Netw 2021; 140:167-183. [PMID: 33765532 DOI: 10.1016/j.neunet.2021.03.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 11/21/2022]
Abstract
Ensemble learning methods combine multiple models to improve performance by exploiting their diversity. The success of these approaches relies heavily on the dissimilarity of the base models forming the ensemble. This diversity can be achieved in many ways, with well-known examples including bagging and boosting. It is the diversity of the models within an ensemble that allows the ensemble to correct the errors made by its members, and consequently leads to higher classification or regression performance. A mistake made by a base model can only be rectified if other members behave differently on that particular instance, and provide the aggregator with enough information to make an informed decision. On the contrary, lack of diversity not only lowers model performance, but also wastes computational resources. Nevertheless, in the current state of the art ensemble approaches, there is no guarantee on the level of diversity achieved, and no mechanism ensuring that each member will learn a different decision boundary from the others. In this paper, we propose a parallel orthogonal deep learning architecture in which diversity is enforced by design, through imposing an orthogonality constraint. Multiple deep neural networks are created, parallel to each other. At each parallel layer, the outputs of different base models are subject to Gram-Schmidt orthogonalization. We demonstrate that this approach leads to a high level of diversity from two perspectives. First, the models make different errors on different parts of feature space, and second, they exhibit different levels of uncertainty in their decisions. Experimental results confirm the benefits of the proposed method, compared to standard deep learning models and well-known ensemble methods, in terms of diversity and, as a result, classification performance.
Collapse
|
6
|
Lu J, Ding J, Dai X, Chai T. Ensemble Stochastic Configuration Networks for Estimating Prediction Intervals: A Simultaneous Robust Training Algorithm and Its Application. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5426-5440. [PMID: 32071006 DOI: 10.1109/tnnls.2020.2967816] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Obtaining accurate point prediction of industrial processes' key variables is challenging due to the outliers and noise that are common in industrial data. Hence the prediction intervals (PIs) have been widely adopted to quantify the uncertainty related to the point prediction. In order to improve the prediction accuracy and quantify the level of uncertainty associated with the point prediction, this article estimates the PIs by using ensemble stochastic configuration networks (SCNs) and bootstrap method. The estimated PIs can guarantee both the modeling stability and computational efficiency. To encourage the cooperation among the base SCNs and improve the robustness of the ensemble SCNs when the training data are contaminated with noise and outliers, a simultaneous robust training method of the ensemble SCNs is developed based on the Bayesian ridge regression and M-estimate. Moreover, the hyperparameters of the assumed distributions over noise and output weights of the ensemble SCNs are estimated by the expectation-maximization (EM) algorithm, which can result in the optimal PIs and better prediction accuracy. Finally, the performance of the proposed approach is evaluated on three benchmark data sets and a real-world data set collected from a refinery. The experimental results demonstrate that the proposed approach exhibits better performance in terms of the quality of PIs, prediction accuracy, and robustness.
Collapse
|
7
|
|
8
|
Perales-González C, Carbonero-Ruz M, Pérez-Rodríguez J, Becerra-Alonso D, Fernández-Navarro F. Negative correlation learning in the extreme learning machine framework. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04788-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
9
|
|
10
|
Chen H, Jiang B, Yao X. Semisupervised Negative Correlation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5366-5379. [PMID: 29994737 DOI: 10.1109/tnnls.2017.2784814] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Negative correlation learning (NCL) is an ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual ensemble member. Each ensemble member minimizes its mean square error and its error correlation with the rest of the ensemble. This paper analyzes NCL and reveals that adopting a negative correlation term for unlabeled data is beneficial to improving the model performance in the semisupervised learning (SSL) setting. We then propose a novel SSL algorithm, Semisupervised NCL (SemiNCL) algorithm. The algorithm considers the negative correlation terms for both labeled and unlabeled data for the semisupervised problems. In order to reduce the computational and memory complexity, an accelerated SemiNCL is derived from the distributed least square algorithm. In addition, we have derived a bound for two parameters in SemiNCL based on an analysis of the Hessian matrix of the error function. The new algorithm is evaluated by extensive experiments with various ratios of labeled and unlabeled training data. Comparisons with other state-of-the-art supervised and semisupervised algorithms confirm that SemiNCL achieves the best overall performance.
Collapse
|
11
|
Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F. Macular OCT Classification Using a Multi-Scale Convolutional Neural Network Ensemble. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:1024-1034. [PMID: 29610079 DOI: 10.1109/tmi.2017.2780115] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Computer-aided diagnosis (CAD) of retinal pathologies is a current active area in medical image analysis. Due to the increasing use of retinal optical coherence tomography (OCT) imaging technique, a CAD system in retinal OCT is essential to assist ophthalmologist in the early detection of ocular diseases and treatment monitoring. This paper presents a novel CAD system based on a multi-scale convolutional mixture of expert (MCME) ensemble model to identify normal retina, and two common types of macular pathologies, namely, dry age-related macular degeneration, and diabetic macular edema. The proposed MCME modular model is a data-driven neural structure, which employs a new cost function for discriminative and fast learning of image features by applying convolutional neural networks on multiple-scale sub-images. MCME maximizes the likelihood function of the training data set and ground truth by considering a mixture model, which tries also to model the joint interaction between individual experts by using a correlated multivariate component for each expert module instead of only modeling the marginal distributions by independent Gaussian components. Two different macular OCT data sets from Heidelberg devices were considered for the evaluation of the method, i.e., a local data set of OCT images of 148 subjects and a public data set of 45 OCT acquisitions. For comparison purpose, we performed a wide range of classification measures to compare the results with the best configurations of the MCME method. With the MCME model of four scale-dependent experts, the precision rate of 98.86%, and the area under the receiver operating characteristic curve (AUC) of 0.9985 were obtained on average.
Collapse
|
12
|
Sheng W, Shan P, Chen S, Liu Y, Alsaadi FE. A niching evolutionary algorithm with adaptive negative correlation learning for neural network ensemble. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.055] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
|
14
|
Fernández JC, Cruz-Ramírez M, Hervás-Martínez C. Sensitivity versus accuracy in ensemble models of Artificial Neural Networks from Multi-objective Evolutionary Algorithms. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2781-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
15
|
Abstract
Ensemble learning systems could lower down the risk of overfitting that often appears in a single learning model. Different to those ensemble learning approaches by re-sampling, negative correlation learning trains all learners in an ensemble simultaneously and cooperatively. However, overfitting had sometimes been observed in negative correlation learning. Two error bounds are therefore introduced into negative correlation learning for preventing overfitting. One is the upper bound of error output (UBEO) which divides the training data into two groups based on the distances between the data and the formed decision boundary. The other is the lower bound of error rate (LBER) which is set as a learning switch. Before the performance measured by error rates is higher than LBER, negative correlation learning is applied on the whole training set. As soon as the performance is lower than LBER, negative correlation learning will only be applied to the group of data whose distances to the current decision boundary are within the range of UBEO. The other group of data outside of this range will not be learned anymore. Further learning on the data points in the later group would make the learned decision boundary too complex to classify the unseen data well. Experimental results would explore how LBER and UBEO would lead negative correlation learning towards a robust decision boundary.
Collapse
Affiliation(s)
- Yong Liu
- School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
| |
Collapse
|
16
|
Shim Y, Philippides A, Staras K, Husbands P. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP. PLoS Comput Biol 2016; 12:e1005137. [PMID: 27760125 PMCID: PMC5070787 DOI: 10.1371/journal.pcbi.1005137] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 09/12/2016] [Indexed: 01/28/2023] Open
Abstract
We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Collapse
Affiliation(s)
- Yoonsik Shim
- Centre for Computational Neuroscience and Robotics, University of Sussex, Falmer, Brighton, United Kingdom
| | - Andrew Philippides
- Centre for Computational Neuroscience and Robotics, University of Sussex, Falmer, Brighton, United Kingdom
| | - Kevin Staras
- Neuroscience, School of Life Sciences, University of Sussex, Falmer, Brighton, United Kingdom
| | - Phil Husbands
- Centre for Computational Neuroscience and Robotics, University of Sussex, Falmer, Brighton, United Kingdom
| |
Collapse
|
17
|
Rahman MM, Islam MM, Murase K, Yao X. Layered Ensemble Architecture for Time Series Forecasting. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:270-283. [PMID: 25751882 DOI: 10.1109/tcyb.2015.2401038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods.
Collapse
|
18
|
A spatial modelling framework for assessing climate change impacts on freshwater ecosystems: Response of brown trout (Salmo trutta L.) biomass to warming water temperature. Ecol Modell 2015. [DOI: 10.1016/j.ecolmodel.2015.06.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
19
|
|
20
|
Kokkinos Y, Margaritis KG. Confidence ratio affinity propagation in ensemble selection of neural network classifiers for distributed privacy-preserving data mining. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.07.065] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
21
|
Chen WC, Tseng LY, Wu CS. A unified evolutionary training scheme for single and ensemble of feedforward neural network. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.05.057] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Bao Y, Xiong T, Hu Z. PSO-MISMO modeling strategy for multistep-ahead time series prediction. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:655-668. [PMID: 23846512 DOI: 10.1109/tcyb.2013.2265084] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Multistep-ahead time series prediction is one of the most challenging research topics in the field of time series modeling and prediction, and is continually under research. Recently, the multiple-input several multiple-outputs (MISMO) modeling strategy has been proposed as a promising alternative for multistep-ahead time series prediction, exhibiting advantages compared with the two currently dominating strategies, the iterated and the direct strategies. Built on the established MISMO strategy, this paper proposes a particle swarm optimization (PSO)-based MISMO modeling strategy, which is capable of determining the number of sub-models in a self-adaptive mode, with varying prediction horizons. Rather than deriving crisp divides with equal-size s prediction horizons from the established MISMO, the proposed PSO-MISMO strategy, implemented with neural networks, employs a heuristic to create flexible divides with varying sizes of prediction horizons and to generate corresponding sub-models, providing considerable flexibility in model construction, which has been validated with simulated and real datasets.
Collapse
|
23
|
Bao Y, Xiong T, Hu Z. Multi-step-ahead time series prediction using multiple-output support vector regression. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.09.010] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
24
|
De Stefano C, Folino G, Fontanella F, Scotto di Freca A. Using Bayesian networks for selecting classifiers in GP ensembles. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2013.09.049] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
25
|
Classification of ECG arrhythmia by a modular neural network based on Mixture of Experts and Negatively Correlated Learning. Biomed Signal Process Control 2013. [DOI: 10.1016/j.bspc.2012.10.005] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Fernández A, López V, Galar M, del Jesus MJ, Herrera F. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2013.01.018] [Citation(s) in RCA: 236] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
|
28
|
Combining features of negative correlation learning with mixture of experts in proposed ensemble methods. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.07.022] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
29
|
Shuo Wang, Xin Yao. Multiclass Imbalance Problems: Analysis and Potential Solutions. ACTA ACUST UNITED AC 2012; 42:1119-30. [DOI: 10.1109/tsmcb.2012.2187280] [Citation(s) in RCA: 319] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
30
|
Ebrahimpour R, Sadeghnejad N, Sajedin A, Mohammadi N. Electrocardiogram beat classification via coupled boosting by filtering and preloaded mixture of experts. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-1063-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
31
|
|
32
|
|
33
|
Masoudnia S, Ebrahimpour R, Arani SAAA. Incorporation of a Regularization Term to Control Negative Correlation in Mixture of Experts. Neural Process Lett 2012. [DOI: 10.1007/s11063-012-9221-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
34
|
Ebrahimpour R, Arani SAAA, Masoudnia S. Improving combination method of NCL experts using gating network. Neural Comput Appl 2011. [DOI: 10.1007/s00521-011-0746-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
35
|
Hybrid modeling for the prediction of leaching rate in leaching process based on negative correlation learning bagging ensemble algorithm. Comput Chem Eng 2011. [DOI: 10.1016/j.compchemeng.2011.02.012] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
36
|
|
37
|
|
38
|
Akhand MAH, Shill PC, Murase K. Hybrid Ensemble Construction with Selected Neural Networks. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2011. [DOI: 10.20965/jaciii.2011.p0652] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A Neural Network Ensemble (NNE) is convenient for improving classification task performance. Among the remarkable number of methods based on different techniques for constructing NNEs, Negative Correlation Learning (NCL), bagging, and boosting are the most popular. None of them, however, could show better performance for all problems. To improve performance combining the complementary strengths of the individual methods, we propose two different ways to construct hybrid ensembles combining NCL with bagging and boosting. One produces a pool of predefined numbers of networks using standard NCL and bagging (or boosting) and then uses a genetic algorithm to select an optimal network subset for an NNE from the pool. Results of experiments confirmed that our proposals show consistently better performance with concise ensembles than conventional methods when tested using a suite of 25 benchmark problems.
Collapse
|
39
|
Liu Z, Li W, Sun W. A novel method of short-term load forecasting based on multiwavelet transform and multiple neural networks. Neural Comput Appl 2011. [DOI: 10.1007/s00521-011-0715-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
40
|
Castro PAD, Von Zuben FJ. Learning Ensembles of Neural Networks by Means of a Bayesian Artificial Immune System. ACTA ACUST UNITED AC 2011; 22:304-16. [DOI: 10.1109/tnn.2010.2096823] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
41
|
|
42
|
Abstract
Negative correlation learning (NCL) is a neural network ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean square error (MSE) together with the correlation of the ensemble. This paper analyzes NCL and reveals that the training of NCL (when lambda = 1) corresponds to training the entire ensemble as a single learning machine that only minimizes the MSE without regularization. This analysis explains the reason why NCL is prone to overfitting the noise in the training set. This paper also demonstrates that tuning the correlation parameter lambda in NCL by cross validation cannot overcome the overfitting problem. The paper analyzes this problem and proposes the regularized negative correlation learning (RNCL) algorithm which incorporates an additional regularization term for the whole ensemble. RNCL decomposes the ensemble's training objectives, including MSE and regularization, into a set of sub-objectives, and each sub-objective is implemented by an individual neural network. In this paper, we also provide a Bayesian interpretation for RNCL and provide an automatic algorithm to optimize regularization parameters based on Bayesian inference. The RNCL formulation is applicable to any nonlinear estimator minimizing the MSE. The experiments on synthetic as well as real-world data sets demonstrate that RNCL achieves better performance than NCL, especially when the noise level is nontrivial in the data set.
Collapse
Affiliation(s)
- Huanhuan Chen
- The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Birmingham, UK.
| | | |
Collapse
|
43
|
|
44
|
Tang K, Lin M, Minku FL, Yao X. Selective negative correlation learning approach to incremental learning. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.09.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
45
|
Dai Q. The build of a dynamic classifier selection ICBP system and its application to pattern recognition. Neural Comput Appl 2009. [DOI: 10.1007/s00521-009-0263-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
46
|
Pugalenthi G, Tang K, Suganthan PN, Chakrabarti S. Identification of structurally conserved residues of proteins in absence of structural homologs using neural network ensemble. ACTA ACUST UNITED AC 2008; 25:204-10. [PMID: 19038986 PMCID: PMC2638999 DOI: 10.1093/bioinformatics/btn618] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Motivation: So far various bioinformatics and machine learning techniques applied for identification of sequence and functionally conserved residues in proteins. Although few computational methods are available for the prediction of structurally conserved residues from protein structure, almost all methods require homologous structural information and structure-based alignments, which still prove to be a bottleneck in protein structure comparison studies. In this work, we developed a neural network approach for identification of structurally important residues from a single protein structure without using homologous structural information and structural alignment. Results: A neural network ensemble (NNE) method that utilizes negative correlation learning (NCL) approach was developed for identification of structurally conserved residues (SCRs) in proteins using features that represent amino acid conservation and composition, physico-chemical properties and structural properties. The NCL-NNE method was applied to 6042 SCRs that have been extracted from 496 protein domains. This method obtained high prediction sensitivity (92.8%) and quality (Matthew's correlation coefficient is 0.852) in identification of SCRs. Further benchmarking using 60 protein domains containing 1657 SCRs that were not part of the training and testing datasets shows that the NCL-NNE can correctly predict SCRs with ∼ 90% sensitivity. These results suggest the usefulness of NCL-NNE for facilitating the identification of SCRs utilizing information derived from a single protein structure. Therefore, this method could be extremely effective in large-scale benchmarking studies where reliable structural homologs and alignments are limited. Availability: The executable for the NCL-NNE algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SCR.htm Contact:epnsugan@ntu.edu.sg; chakraba@ncbi.nlm.nih.gov. Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ganesan Pugalenthi
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
| | | | | | | |
Collapse
|
47
|
Islam MM, Yao X, Shahriar Nirjon SMS, Islam MA, Murase K. Bagging and boosting negatively correlated neural networks. ACTA ACUST UNITED AC 2008; 38:771-84. [PMID: 18558541 DOI: 10.1109/tsmcb.2008.922055] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this paper, we propose two cooperative ensemble learning algorithms, i.e., NegBagg and NegBoost, for designing neural network (NN) ensembles. The proposed algorithms incrementally train different individual NNs in an ensemble using the negative correlation learning algorithm. Bagging and boosting algorithms are used in NegBagg and NegBoost, respectively, to create different training sets for different NNs in the ensemble. The idea behind using negative correlation learning in conjunction with the bagging/boosting algorithm is to facilitate interaction and cooperation among NNs during their training. Both NegBagg and NegBoost use a constructive approach to automatically determine the number of hidden neurons for NNs. NegBoost also uses the constructive approach to automatically determine the number of NNs for the ensemble. The two algorithms have been tested on a number of benchmark problems in machine learning and NNs, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, satellite, soybean, and waveform problems. The experimental results show that NegBagg and NegBoost require a small number of training epochs to produce compact NN ensembles with good generalization.
Collapse
Affiliation(s)
- Md Monirul Islam
- Bangladesh University of Engineering and Technology (BUET), Dhaka 1000, Bangladesh
| | | | | | | | | |
Collapse
|
48
|
Kim KJ, Cho SB. Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 2006. [DOI: 10.1016/j.neucom.2006.03.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
49
|
Yao X, Liu Y, Li J, He J, Frayn C. Current developments and future directions of bio-inspired computation and implications for ecoinformatics. ECOL INFORM 2006. [DOI: 10.1016/j.ecoinf.2005.07.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Valentini G. An Experimental Bias-Variance Analysis of SVM Ensembles Based on Resampling Techniques. ACTA ACUST UNITED AC 2005; 35:1252-71. [PMID: 16366250 DOI: 10.1109/tsmcb.2005.850183] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recently, bias-variance decomposition of error has been used as a tool to study the behavior of learning algorithms and to develop new ensemble methods well suited to the bias-variance characteristics of base learners. We propose methods and procedures, based on Domingo's unified bias-variance theory, to evaluate and quantitatively measure the bias-variance decomposition of error in ensembles of learning machines. We apply these methods to study and compare the bias-variance characteristics of single support vector machines (SVMs) and ensembles of SVMs based on resampling techniques, and their relationships with the cardinality of the training samples. In particular, we present an experimental bias-variance analysis of bagged and random aggregated ensembles of SVMs in order to verify their theoretical variance reduction properties. The experimental bias-variance analysis quantitatively characterizes the relationships between bagging and random aggregating, and explains the reasons why ensembles built on small subsamples of the data work with large databases. Our analysis also suggests new directions for research to improve on classical bagging.
Collapse
Affiliation(s)
- Giorgio Valentini
- Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy.
| |
Collapse
|