1
|
Isla-Cernadas D, Fernandez-Delgado M, Cernadas E, Sirsat MS, Maarouf H, Barro S. Closed-Form Gaussian Spread Estimation for Small and Large Support Vector Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4336-4344. [PMID: 40031077 DOI: 10.1109/tnnls.2024.3377370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The support vector machine (SVM) with Gaussian kernel often achieves state-of-the-art performance in classification problems, but requires the tuning of the kernel spread. Most optimization methods for spread tuning require training, being slow and not suited for large-scale datasets. We formulate an analytic expression to calculate, directly from data without iterative search, the spread minimizing the difference between Gaussian and ideal kernel matrices. The proposed direct gamma tuning (DGT) equals the performance of and is one to two orders of magnitude faster than the state-of-the art approaches on 30 small datasets. Combined with random sampling of training patterns, it also runs on large classification problems. Our method is very efficient in experiments with 20 large datasets up to 31 million of patterns, it is faster and performs significantly better than linear SVM, and it is also faster than iterative minimization. Code is available upon paper acceptance from this link: https://persoal.citius.usc.es/manuel.fernandez.delgado/papers/dgt/index.html and from CodeOcean: https://codeocean.com/capsule/4271163/tree/v1.
Collapse
|
2
|
Glänzer L, Göpfert L, Schmitz-Rode T, Slabu I. Navigating predictions at nanoscale: a comprehensive study of regression models in magnetic nanoparticle synthesis. J Mater Chem B 2024; 12:12652-12664. [PMID: 39503353 PMCID: PMC11563307 DOI: 10.1039/d4tb02052a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 10/29/2024] [Indexed: 11/08/2024]
Abstract
The applicability of magnetic nanoparticles (MNP) highly depends on their physical properties, especially their size. Synthesizing MNP with a specific size is challenging due to the large number of interdepend parameters during the synthesis that control their properties. In general, synthesis control cannot be described by white box approaches (empirical, simulation or physics based). To handle synthesis control, this study presents machine learning based approaches for predicting the size of MNP during their synthesis. A dataset comprising 17 synthesis parameters and the corresponding MNP sizes were analyzed. Eight regression algorithms (ridge, lasso, elastic net, decision trees, random forest, gradient boosting, support vectors and multilayer perceptron) were evaluated. The model performance was assessed via root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and standard deviation of residuals. Support vector regression (SVR) exhibited the lowest RMSE values of 3.44 and a standard deviation for the residuals of 5.13. SVR demonstrated a favorable balance between accuracy and consistency among these methods. Qualitative factors like adaptability to online learning and robustness against outliers were additionally considered. Altogether, SVR emerged as the most suitable approach to predict MNP sizes due to its ability to continuously learn from new data and resilience to noise, making it well-suited for real-time applications with varying data quality. In this way, a feasible optimization framework for automated and self-regulated MNP synthesis was implemented. Key challenges included the limited dataset size, potential violations of modeling assumptions, and sensitivity to hyperparameters. Strategies like data regularization, correlation analysis, and grid search for model hyperparameters were employed to mitigate these issues.
Collapse
Affiliation(s)
- Lukas Glänzer
- Institute of Applied Medical Engineering, Helmholtz Institute, Medical Faculty, RWTH Aachen University, Germany.
| | - Lennart Göpfert
- Institute of Applied Medical Engineering, Helmholtz Institute, Medical Faculty, RWTH Aachen University, Germany.
| | - Thomas Schmitz-Rode
- Institute of Applied Medical Engineering, Helmholtz Institute, Medical Faculty, RWTH Aachen University, Germany.
| | - Ioana Slabu
- Institute of Applied Medical Engineering, Helmholtz Institute, Medical Faculty, RWTH Aachen University, Germany.
| |
Collapse
|
3
|
Thakur SS, Poddar P, Roy RB. Real-time prediction of smoking activity using machine learning based multi-class classification model. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:14529-14551. [PMID: 35233178 PMCID: PMC8874745 DOI: 10.1007/s11042-022-12349-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 08/18/2021] [Accepted: 01/18/2022] [Indexed: 05/29/2023]
Abstract
UNLABELLED Smoking cessation efforts can be greatly influenced by providing just-in-time intervention to individuals who are trying to quit smoking. Detecting smoking activity accurately among the confounding activities of daily living (ADLs) being monitored by the wearable device is a challenging and intriguing research problem. This study aims to develop a machine learning based modeling framework to identify the smoking activity among the confounding ADLs in real-time using the streaming data from the wrist-wearable IMU (6-axis inertial measurement unit) sensor. A low-cost wrist-wearable device has been designed and developed to collect raw sensor data from subjects for the activities. A sliding window mechanism has been used to process the streaming raw sensor data and extract several time-domain, frequency-domain, and descriptive features. Hyperparameter tuning and feature selection have been done to identify best hyperparameters and features respectively. Subsequently, multi-class classification models are developed and validated using in-sample and out-of-sample testing. The developed models obtained predictive accuracy (area under receiver operating curve) up to 98.7% for predicting the smoking activity. The findings of this study will lead to a novel application of wearable devices to accurately detect smoking activity in real-time. It will further help the healthcare professionals in monitoring their patients who are smokers by providing just-in-time intervention to help them quit smoking. The application of this framework can be extended to more preventive healthcare use-cases and detection of other activities of interest. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11042-022-12349-6.
Collapse
Affiliation(s)
- Saurabh Singh Thakur
- Rajendra Mishra School of Engineering Entrepreneurship, Indian Institute of Technology, Kharagpur, India
| | - Pradeep Poddar
- Department of Metallurgical and Materials Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ram Babu Roy
- Rajendra Mishra School of Engineering Entrepreneurship, Indian Institute of Technology, Kharagpur, India
| |
Collapse
|
4
|
Polus JS, Bloomfield RA, Vasarhelyi EM, Lanting BA, Teeter MG. Machine Learning Predicts the Fall Risk of Total Hip Arthroplasty Patients Based on Wearable Sensor Instrumented Performance Tests. J Arthroplasty 2021; 36:573-578. [PMID: 32928593 DOI: 10.1016/j.arth.2020.08.034] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 08/10/2020] [Accepted: 08/17/2020] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The prevalence of falls affects the wellbeing of aging adults and places an economic burden on the healthcare system. Integration of wearable sensors into existing fall risk assessment tools enables objective data collection that describes the functional ability of patients. In this study, supervised machine learning was applied to sensor-derived metrics to predict the fall risk of patients following total hip arthroplasty. METHODS At preoperative, 2-week, and 6-week postoperative appointments, patients (n = 72) were instrumented with sensors while they performed the timed-up-and-go walking test. Preoperative and 2-week postoperative data were used to form the feature sets and 6-week total times were used as labels. Support vector machine and linear discriminant analysis classifier models were developed and tested on various combinations of feature sets and feature reduction schemes. Using a 10-fold leave-some-subjects-out testing scheme, the accuracy, sensitivity, specificity, and area under the receiver-operator curve (AUC) were evaluated for all models. RESULTS A high performance model (accuracy = 0.87, sensitivity = 0.97, specificity = 0.46, AUC = 0.82) was obtained with a support vector machine classifier using sensor-derived metrics from only the preoperative appointment. An overall improved performance (accuracy = 0.90, sensitivity = 0.93, specificity = 0.59, AUC = 0.88) was achieved with a linear discriminant analysis classifier when 2-week postoperative data were added to the preoperative data. CONCLUSION The high accuracy of the fall risk prediction models is valuable for patients, clinicians, and the healthcare system. High-risk patients can implement preventative measures and low-risk patients can be directed to enhanced recovery care programs.
Collapse
Affiliation(s)
- Jennifer S Polus
- School of Biomedical Engineering, Western University, London, Ontario, Canada; Imaging Research Laboratories, Robarts Research Institute, Western University, London, Ontario, Canada
| | - Riley A Bloomfield
- Imaging Research Laboratories, Robarts Research Institute, Western University, London, Ontario, Canada; Department of Electrical and Computer Engineering, Western University, London, Ontario, Canada
| | - Edward M Vasarhelyi
- Division of Orthopaedic Surgery, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada
| | - Brent A Lanting
- Division of Orthopaedic Surgery, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada
| | - Matthew G Teeter
- School of Biomedical Engineering, Western University, London, Ontario, Canada; Imaging Research Laboratories, Robarts Research Institute, Western University, London, Ontario, Canada; Division of Orthopaedic Surgery, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada; Surgical Innovation Program, Lawson Health Research Institute, London, Ontario, Canada
| |
Collapse
|
5
|
Espel D, Courty S, Auda Y, Sheeren D, Elger A. Submerged macrophyte assessment in rivers: An automatic mapping method using Pléiades imagery. WATER RESEARCH 2020; 186:116353. [PMID: 32919140 DOI: 10.1016/j.watres.2020.116353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/11/2023]
Abstract
Submerged macrophyte monitoring is a major concern for hydrosystem management, particularly for understanding and preventing the potential impacts of global change on ecological functions and services. Macrophyte distribution assessments in rivers are still primarily realized using field monitoring or manual photo-interpretation of aerial images. Considering the lack of applications in fluvial environments, developing operational, low-cost and less time-consuming tools able to automatically map and monitor submerged macrophyte distribution is therefore crucial to support effective management programs. In this study, the suitability of very fine-scale resolution (50 cm) multispectral Pléiades satellite imagery to estimate submerged macrophyte cover, at the scale of a 1 km river section, was investigated. The performance of nonparametric regression methods (based on two reliable and well-known machine learning algorithms for remote sensing applications, Random Forest and Support Vector Regression) were compared for several spectral datasets, testing the relevance of 4 spectral bands (red, green, blue and near-infrared) and two vegetation indices (the Normalized Difference Vegetation Index, NDVI, and the Green-Red Vegetation Index, GRVI), and for several field sampling configurations. Both machine learning algorithms applied to a Pléiades image were able to reasonably well predict macrophyte cover in river ecosystems with promising performance metrics (R² above 0.7 and RMSE around 20%). The Random Forest algorithm combined to the 4 spectral bands from Pléiades image was the most efficient, particularly for extreme cover values (0% and 100%). Our study also demonstrated that a larger number of fine-scale field sampling entities clearly involved better cover predictions than a smaller number of larger sampling entities.
Collapse
Affiliation(s)
- Diane Espel
- Laboratoire Ecologie Fonctionnelle et Environnement, Université de Toulouse, CNRS, Toulouse, France; Adict Solutions, Toulouse, France.
| | | | | | - David Sheeren
- Université de Toulouse, INRAE, UMR DYNAFOR, Castanet-Tolosan, France
| | - Arnaud Elger
- Laboratoire Ecologie Fonctionnelle et Environnement, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
6
|
Ding L, Liao S, Liu Y, Liu L, Zhu F, Yao Y, Shao L, Gao X. Approximate Kernel Selection via Matrix Approximation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4881-4891. [PMID: 31945003 DOI: 10.1109/tnnls.2019.2958922] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Kernel selection is of fundamental importance for the generalization of kernel methods. This article proposes an approximate approach for kernel selection by exploiting the approximability of kernel selection and the computational virtue of kernel matrix approximation. We define approximate consistency to measure the approximability of the kernel selection problem. Based on the analysis of approximate consistency, we solve the theoretical problem of whether, under what conditions, and at what speed, the approximate criterion is close to the accurate one, establishing the foundations of approximate kernel selection. We introduce two selection criteria based on error estimation and prove the approximate consistency of the multilevel circulant matrix (MCM) approximation and Nyström approximation under these criteria. Under the theoretical guarantees of the approximate consistency, we design approximate algorithms for kernel selection, which exploits the computational advantages of the MCM and Nyström approximations to conduct kernel selection in a linear or quasi-linear complexity. We experimentally validate the theoretical results for the approximate consistency and evaluate the effectiveness of the proposed kernel selection algorithms.
Collapse
|
7
|
|
8
|
|
9
|
|
10
|
Oneto L, Navarin N, Donini M, Ridella S, Sperduti A, Aiolli F, Anguita D. Learning With Kernels: A Local Rademacher Complexity-Based Analysis With Application to Graph Kernels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4660-4671. [PMID: 29990207 DOI: 10.1109/tnnls.2017.2771830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
When dealing with kernel methods, one has to decide which kernel and which values for the hyperparameters to use. Resampling techniques can address this issue but these procedures are time-consuming. This problem is particularly challenging when dealing with structured data, in particular with graphs, since several kernels for graph data have been proposed in literature, but no clear relationship among them in terms of learning properties is defined. In these cases, exhaustive search seems to be the only reasonable approach. Recently, the global Rademacher complexity (RC) and local Rademacher complexity (LRC), two powerful measures of the complexity of a hypothesis space, have shown to be suited for studying kernels properties. In particular, the LRC is able to bound the generalization error of an hypothesis chosen in a space by disregarding those ones which will not be taken into account by any learning procedure because of their high error. In this paper, we show a new approach to efficiently bound the RC of the space induced by a kernel, since its exact computation is an NP-Hard problem. Then we show for the first time that RC can be used to estimate the accuracy and expressivity of different graph kernels under different parameter configurations. The authors' claims are supported by experimental results on several real-world graph data sets.
Collapse
|
11
|
Randomized learning: Generalization performance of old and new theoretically grounded algorithms. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.10.066] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Pan X, Yang Z, Xu Y, Wang L. Safe Screening Rules for Accelerating Twin Support Vector Machine Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1876-1887. [PMID: 28422692 DOI: 10.1109/tnnls.2017.2688182] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The twin support vector machine (TSVM) is widely used in classification problems, but it is not efficient enough for large-scale data sets. Furthermore, to get the optimal parameter, the exhaustive grid search method is applied to TSVM. It is very time-consuming, especially for multiparameter models. Although many techniques have been presented to solve these problems, all of them always affect the performance of TSVM to some extent. In this paper, we propose a safe screening rule (SSR) for linear-TSVM, and give a modified SSR (MSSR) for nonlinear TSVM, which contains multiple parameters. The SSR and MSSR can delete most training samples and reduce the scale of TSVM before solving it. Sequential versions of SSR and MSSR are further introduced to substantially accelerate the whole parameter tuning process. One important advantage of SSR and MSSR is that they are safe, i.e., we can obtain the same solution as the original problem by utilizing them. Experiments on eight real-world data sets and an imbalanced data set with different imbalanced ratios demonstrate the efficiency and safety of SSR and MSSR.
Collapse
|
13
|
|
14
|
|
15
|
|
16
|
Gastaldo P, Bisio F, Gianoglio C, Ragusa E, Zunino R. Learning with similarity functions: A novel design for the extreme learning machine. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.05.116] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.02.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.01.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
19
|
|
20
|
|
21
|
A local Vapnik–Chervonenkis complexity. Neural Netw 2016; 82:62-75. [DOI: 10.1016/j.neunet.2016.07.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Revised: 05/19/2016] [Accepted: 07/01/2016] [Indexed: 11/23/2022]
|
22
|
Control for Ship Course-Keeping Using Optimized Support Vector Machines. ALGORITHMS 2016. [DOI: 10.3390/a9030052] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
23
|
Vyas BY, Das B, Maheshwari RP. Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1631-1642. [PMID: 25314714 DOI: 10.1109/tnnls.2014.2360879] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper presents the Chebyshev neural network (ChNN) as an improved artificial intelligence technique for power system protection studies and examines the performances of two ChNN learning algorithms for fault classification of series compensated transmission line. The training algorithms are least-square Levenberg-Marquardt (LSLM) and recursive least-square algorithm with forgetting factor (RLSFF). The performances of these algorithms are assessed based on their generalization capability in relating the fault current parameters with an event of fault in the transmission line. The proposed algorithm is fast in response as it utilizes postfault samples of three phase currents measured at the relaying end corresponding to half-cycle duration only. After being trained with only a small part of the generated fault data, the algorithms have been tested over a large number of fault cases with wide variation of system and fault parameters. Based on the studies carried out in this paper, it has been found that although the RLSFF algorithm is faster for training the ChNN in the fault classification application for series compensated transmission lines, the LSLM algorithm has the best accuracy in testing. The results prove that the proposed ChNN-based method is accurate, fast, easy to design, and immune to the level of compensations. Thus, it is suitable for digital relaying applications.
Collapse
|
24
|
Oneto L, Bisio F, Cambria E, Anguita D. Statistical Learning Theory and ELM for Big Social Data Analysis. IEEE COMPUT INTELL M 2016. [DOI: 10.1109/mci.2016.2572540] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
25
|
|
26
|
Prediction of Military Vehicle's Drawbar Pull Based on an Improved Relevance Vector Machine and Real Vehicle Tests. SENSORS 2016; 16:s16030351. [PMID: 26978359 PMCID: PMC4813926 DOI: 10.3390/s16030351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 02/28/2016] [Accepted: 03/01/2016] [Indexed: 11/29/2022]
Abstract
The scientific and effective prediction of drawbar pull is of great importance in the evaluation of military vehicle trafficability. Nevertheless, the existing prediction models have demonstrated lots of inherent limitations. In this framework, a multiple-kernel relevance vector machine model (MkRVM) including Gaussian kernel and polynomial kernel is proposed to predict drawbar pull. Nonlinear decreasing inertia weight particle swarm optimization (NDIWPSO) is employed for parameter optimization. As the relations between drawbar pull and its influencing factors have not been tested on real vehicles, a series of experimental analyses based on real vehicle test data are done to confirm the effective influencing factors. A dynamic testing system is applied to conduct field tests and gain required test data. Gaussian kernel RVM, polynomial kernel RVM, support vector machine (SVM) and generalized regression neural network (GRNN) are also used to compare with the MkRVM model. The results indicate that the MkRVM model is a preferable model in this case. Finally, the proposed novel model is compared to the traditional prediction model of drawbar pull. The results show that the MkRVM model significantly improves the prediction accuracy. A great potential of improved RVM is indicated in further research of wheel-soil interactions.
Collapse
|
27
|
Oneto L, Ridella S, Anguita D. Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach Learn 2015. [DOI: 10.1007/s10994-015-5540-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
28
|
Tsamardinos I, Rakhshani A, Lagani V. Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization. INT J ARTIF INTELL T 2015. [DOI: 10.1142/s0218213015400230] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select an optimal combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the performance of the final, reported model. Combining the two tasks is not trivial because when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased/overfitted) due to performing multiple statistical comparisons. In this paper, we discuss the theoretical properties of performance estimation when model selection is present and we confirm that the simple Cross-Validation with model selection is indeed optimistic (overestimates performance) in small sample scenarios and should be avoided. We present in detail and investigate the theoretical properties of the Nested Cross Validation and a method by Tibshirani and Tibshirani for removing the estimation bias. In computational experiments with real datasets both protocols provide conservative estimation of performance and should be preferred. These statements hold true even if feature selection is performed as preprocessing.
Collapse
Affiliation(s)
- Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Crete, Greece
- Institute of Computer Science, Foundation for Research and Technology — Hellas (FORTH), Heraklion Campus, Voutes, Heraklion, GR-700 13, Greece
| | - Amin Rakhshani
- Department of Computer Science, University of Crete, Crete, Greece
- Institute of Computer Science, Foundation for Research and Technology — Hellas (FORTH), Heraklion Campus, Voutes, Heraklion, GR-700 13, Greece
| | - Vincenzo Lagani
- Institute of Computer Science, Foundation for Research and Technology — Hellas (FORTH), Vassilika Vouton, Heraklion, GR-700 13, Greece
| |
Collapse
|
29
|
Oneto L, Ghio A, Ridella S, Anguita D. Fully Empirical and Data-Dependent Stability-Based Bounds. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1913-1926. [PMID: 25347893 DOI: 10.1109/tcyb.2014.2361857] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The purpose of this paper is to obtain a fully empirical stability-based bound on the generalization ability of a learning procedure, thus, circumventing some limitations of the structural risk minimization framework. We show that assuming a desirable property of a learning algorithm is sufficient to make data-dependency explicit for stability, which, instead, is usually bounded only in an algorithmic-dependent way. In addition, we prove that a well-known and widespread classifier, like the support vector machine (SVM), satisfies this condition. The obtained bound is then exploited for model selection purposes in SVM classification and tested on a series of real-world benchmarking datasets demonstrating, in practice, the effectiveness of our approach.
Collapse
|
30
|
Oneto L, Ghio A, Ridella S, Anguita D. Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates. Neural Process Lett 2015. [DOI: 10.1007/s11063-015-9429-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Reyes-Ortiz JL, Oneto L, Anguita D. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.procs.2015.07.286] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Fumeo E, Oneto L, Anguita D. Condition Based Maintenance in Railway Transportation Systems Based on Big Data Streaming Analysis. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.procs.2015.07.321] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
33
|
Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:2202-2211. [PMID: 25420243 DOI: 10.1109/tnnls.2014.2307359] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we derive a deep connection between the Vapnik-Chervonenkis (VC) entropy and the Rademacher complexity. For this purpose, we first refine some previously known relationships between the two notions of complexity and then derive new results, which allow computing an admissible range for the Rademacher complexity, given a value of the VC-entropy, and vice versa. The approach adopted in this paper is new and relies on the careful analysis of the combinatorial nature of the problem. The obtained results improve the state of the art on this research topic.
Collapse
|
34
|
Anguita D, Ghio A, Oneto L, Ridella S. Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.04.027] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
35
|
Tsamardinos I, Rakhshani A, Lagani V. Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization. ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS 2014. [DOI: 10.1007/978-3-319-07064-3_1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
36
|
An improved analysis of the Rademacher data-dependent bound using its self bounding property. Neural Netw 2013; 44:107-11. [DOI: 10.1016/j.neunet.2013.03.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2013] [Accepted: 03/22/2013] [Indexed: 11/21/2022]
|
37
|
Sheng C, Zhao J, Wang W, Leung H. Prediction intervals for a noisy nonlinear time series based on a bootstrapping reservoir computing network ensemble. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1036-1048. [PMID: 24808519 DOI: 10.1109/tnnls.2013.2250299] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Prediction intervals that provide estimated values as well as the corresponding reliability are applied to nonlinear time series forecast. However, constructing reliable prediction intervals for noisy time series is still a challenge. In this paper, a bootstrapping reservoir computing network ensemble (BRCNE) is proposed and a simultaneous training method based on Bayesian linear regression is developed. In addition, the structural parameters of the BRCNE, that is, the number of reservoir computing networks and the reservoir dimension, are determined off-line by the 0.632 bootstrap cross-validation. To verify the effectiveness of the proposed method, two kinds of time series data, including the multisuperimposed oscillator problem with additive noises and a practical gas flow in steel industry are employed here. The experimental results indicate that the proposed approach has a satisfactory performance on prediction intervals for practical applications.
Collapse
|
38
|
Wang D, Qiao H, Zhang B, Wang M. Online support vector machine based on convex hull vertices selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:593-609. [PMID: 24808380 DOI: 10.1109/tnnls.2013.2238556] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The support vector machine (SVM) method, as a promising classification technique, has been widely used in various fields due to its high efficiency. However, SVM cannot effectively solve online classification problems since, when a new sample is misclassified, the classifier has to be retrained with all training samples plus the new sample, which is time consuming. According to the geometric characteristics of SVM, in this paper we propose an online SVM classifier called VS-OSVM, which is based on convex hull vertices selection within each class. The VS-OSVM algorithm has two steps: 1) the samples selection process, in which a small number of skeleton samples constituting an approximate convex hull in each class of the current training samples are selected and 2) the online updating process, in which the classifier is updated with newly arriving samples and the selected skeleton samples. From the theoretical point of view, the first d+1 (d is the dimension of the input samples) selected samples are proved to be vertices of the convex hull. This guarantees that the selected samples in our approach keep the greatest amount of information of the convex hull. From the application point of view, the new algorithm can update the classifier without reducing its classification performance. Experimental results on benchmark data sets have shown the validity and effectiveness of the VS-OSVM algorithm.
Collapse
|
39
|
Zoidi O, Tefas A, Pitas I. Multiplicative update rules for concurrent nonnegative matrix factorization and maximum margin classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:422-434. [PMID: 24808315 DOI: 10.1109/tnnls.2012.2235461] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The state-of-the-art classification methods which employ nonnegative matrix factorization (NMF) employ two consecutive independent steps. The first one performs data transformation (dimensionality reduction) and the second one classifies the transformed data using classification methods, such as nearest neighbor/centroid or support vector machines (SVMs). In the following, we focus on using NMF factorization followed by SVM classification. Typically, the parameters of these two steps, e.g., the NMF bases/coefficients and the support vectors, are optimized independently, thus leading to suboptimal classification performance. In this paper, we merge these two steps into one by incorporating maximum margin classification constraints into the standard NMF optimization. The notion behind the proposed framework is to perform NMF, while ensuring that the margin between the projected data of the two classes is maximal. The concurrent NMF factorization and support vector optimization are performed through a set of multiplicative update rules. In the same context, the maximum margin classification constraints are imposed on the NMF problem with additional discriminant constraints and respective multiplicative update rules are extracted. The impact of the maximum margin classification constraints on the NMF factorization problem is addressed in Section VI. Experimental results in several databases indicate that the incorporation of the maximum margin classification constraints into the NMF and discriminant NMF objective functions improves the accuracy of the classification.
Collapse
|
40
|
Budka M, Gabrys B. Density-preserving sampling: robust and efficient alternative to cross-validation for error estimation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:22-34. [PMID: 24808204 DOI: 10.1109/tnnls.2012.2222925] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.
Collapse
|