601
|
A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning. Comput Biol Med 2017; 83:157-165. [DOI: 10.1016/j.compbiomed.2017.03.002] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 02/25/2017] [Accepted: 03/01/2017] [Indexed: 11/18/2022]
|
602
|
Sanz J, Paternain D, Galar M, Fernandez J, Reyero D, Belzunegui T. A new survival status prediction system for severe trauma patients based on a multiple classifier system. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 142:1-8. [PMID: 28325437 DOI: 10.1016/j.cmpb.2017.02.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 01/12/2017] [Accepted: 02/10/2017] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE Severe trauma patients are those who have several injuries implying a death risk. Prediction systems consider the severity of these injuries to predict whether the patients are likely to survive or not. These systems allow one to objectively compare the quality of the emergency services of trauma centres across different hospitals. However, even the most accurate existing prediction systems are based on the usage of a single model. The aim of this paper is to combine several models to make the prediction, since this methodology usually improves the performance of single models. MATERIALS AND METHODS The two currently used prediction systems by the Hospital of Navarre, which are based on logistic regression models, besides the C4.5 decision tree are combined to conform our proposed multiple classifier system. The quality of the method is tested using the major trauma registry of Navarre, which stores information of 462 trauma patients. A 10x10-fold cross-validation model is applied using as performance measures the specificity, sensitivity and the geometric mean between the two former ones. The results are supported by the usage of the Mann-Whitney's U statistical test. RESULTS The proposed method provides 0.8908, 0.6703 and 0.7661 for sensitivity, specificity and geometric mean, respectively. It slightly decreases the sensitivity of the currently used systems but it notably increases the specificity, which implies a large enhancement on the geometric mean. The same behaviour is found when it is compared versus four classical ensemble approaches and the random forest. The statistical analysis supports the quality of our proposal, since the obtained p-values are less than 0.01 in all the cases. CONCLUSIONS The obtained results show that the multiple classifier systems is the best choice among the considered methods to obtain a trade-off between sensitivity and specificity.
Collapse
Affiliation(s)
- José Sanz
- Departamento de Automatica y Computacion and Institute of Smart Cities, Universidad Publica de Navarra, Campus Arrosadia s/n, P.O. Box 31006, Pamplona, Spain.
| | - Daniel Paternain
- Departamento de Automatica y Computacion and Institute of Smart Cities, Universidad Publica de Navarra, Campus Arrosadia s/n, P.O. Box 31006, Pamplona, Spain
| | - Mikel Galar
- Departamento de Automatica y Computacion and Institute of Smart Cities, Universidad Publica de Navarra, Campus Arrosadia s/n, P.O. Box 31006, Pamplona, Spain
| | - Javier Fernandez
- Departamento de Automatica y Computacion and Institute of Smart Cities, Universidad Publica de Navarra, Campus Arrosadia s/n, P.O. Box 31006, Pamplona, Spain
| | - Diego Reyero
- Prehospital Emergency, Navarre Health Services, Pamplona, Spain
| | - Tomás Belzunegui
- Department of Health, Universidad Publica de Navarra, Barañaín Avenue s/n, P.O. Box 31008, Pamplona, Spain; Accident and Emergency Department, Hospital of Navarre, Navarre, Spain
| |
Collapse
|
603
|
|
604
|
Bach M, Werner A, Żywiec J, Pluskiewicz W. The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.09.038] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
605
|
Alejo R, Monroy-de-Jesús J, Ambriz-Polo JC, Pacheco-Sánchez JH. An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem. Neural Comput Appl 2017. [DOI: 10.1007/s00521-017-2938-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
606
|
Deeba F, Mohammed SK, Bui FM, Wahid KA. An empirical study on the effect of imbalanced data on bleeding detection in endoscopic video. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:2598-2601. [PMID: 28268854 DOI: 10.1109/embc.2016.7591262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In biomedical applications including classification of endoscopic videos, class imbalance is a common problem arising from the significant difference between the prior probabilities of different classes. In this paper, we investigate the performance of different classifiers for varying training data distribution in case of bleeding detection problem through three experiments. In the first experiment, we analyze the classifier performance for different class distribution with a fixed sized training dataset. The experiment provides the indication of the required class distribution for optimum classification performance. In the second and third experiments, we investigate the effect of both training data size and class distribution on the classification performance. From our experiments, we found that a larger dataset with moderate class imbalance yields better classification performance compared to a small dataset with balanced distribution. Ensemble classifiers are more robust to the variation in training dataset compared to single classifier.
Collapse
|
607
|
|
608
|
A novel ensemble decision tree based on under-sampling and clonal selection for web spam detection. Pattern Anal Appl 2017. [DOI: 10.1007/s10044-017-0602-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
609
|
|
610
|
A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2017:1827016. [PMID: 28250765 PMCID: PMC5304315 DOI: 10.1155/2017/1827016] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Revised: 12/23/2016] [Accepted: 12/28/2016] [Indexed: 11/17/2022]
Abstract
Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.
Collapse
|
611
|
Nguyen TT, Hwang D, Jung JJ. Handling imbalanced classification problem: A case study on social media datasets. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-169140] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Tuong Tri Nguyen
- Department of Computer Engineering, Yeungnam University, Gyeongsan, South Korea
| | - Dosam Hwang
- Department of Computer Engineering, Yeungnam University, Gyeongsan, South Korea
| | - Jason J. Jung
- Department of Computer Engineering, Chung-Ang University, Seoul, South Korea
| |
Collapse
|
612
|
|
613
|
Ren F, Cao P, Li W, Zhao D, Zaiane O. Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm. Comput Med Imaging Graph 2017; 55:54-67. [DOI: 10.1016/j.compmedimag.2016.07.011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 06/17/2016] [Accepted: 07/29/2016] [Indexed: 10/21/2022]
|
614
|
Zhang X, Zhuang Y, Hu H, Wang W. 3-D Laser-Based Multiclass and Multiview Object Detection in Cluttered Indoor Scenes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:177-190. [PMID: 26685265 DOI: 10.1109/tnnls.2015.2496195] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper investigates the problem of multiclass and multiview 3-D object detection for service robots operating in a cluttered indoor environment. A novel 3-D object detection system using laser point clouds is proposed to deal with cluttered indoor scenes with a fewer and imbalanced training data. Raw 3-D point clouds are first transformed to 2-D bearing angle images to reduce the computational cost, and then jointly trained multiple object detectors are deployed to perform the multiclass and multiview 3-D object detection. The reclassification technique is utilized on each detected low confidence bounding box in the system to reduce false alarms in the detection. The RUS-SMOTEboost algorithm is used to train a group of independent binary classifiers with imbalanced training data. Dense histograms of oriented gradients and local binary pattern features are combined as a feature set for the reclassification task. Based on the dalian university of technology (DUT)-3-D data set taken from various office and household environments, experimental results show the validity and good performance of the proposed method.
Collapse
|
615
|
Christensen K, Nørskov S, Frederiksen L, Scholderer J. In Search of New Product Ideas: Identifying Ideas in Online Communities by Machine Learning and Text Mining. CREATIVITY AND INNOVATION MANAGEMENT 2016. [DOI: 10.1111/caim.12202] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
616
|
Abstract
Ensemble learning systems could lower down the risk of overfitting that often appears in a single learning model. Different to those ensemble learning approaches by re-sampling, negative correlation learning trains all learners in an ensemble simultaneously and cooperatively. However, overfitting had sometimes been observed in negative correlation learning. Two error bounds are therefore introduced into negative correlation learning for preventing overfitting. One is the upper bound of error output (UBEO) which divides the training data into two groups based on the distances between the data and the formed decision boundary. The other is the lower bound of error rate (LBER) which is set as a learning switch. Before the performance measured by error rates is higher than LBER, negative correlation learning is applied on the whole training set. As soon as the performance is lower than LBER, negative correlation learning will only be applied to the group of data whose distances to the current decision boundary are within the range of UBEO. The other group of data outside of this range will not be learned anymore. Further learning on the data points in the later group would make the learned decision boundary too complex to classify the unseen data well. Experimental results would explore how LBER and UBEO would lead negative correlation learning towards a robust decision boundary.
Collapse
Affiliation(s)
- Yong Liu
- School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
| |
Collapse
|
617
|
Ibarguren I, Lasarguren A, Pérez JM, Muguerza J, Gurrutxaga I, Arbelaitz O. BFPART: Best-First PART. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
618
|
Khan N, McClean S, Zhang S, Nugent C. Optimal Parameter Exploration for Online Change-Point Detection in Activity Monitoring Using Genetic Algorithms. SENSORS 2016; 16:s16111784. [PMID: 27792177 PMCID: PMC5134443 DOI: 10.3390/s16111784] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 10/17/2016] [Accepted: 10/18/2016] [Indexed: 11/16/2022]
Abstract
In recent years, smart phones with inbuilt sensors have become popular devices to facilitate activity recognition. The sensors capture a large amount of data, containing meaningful events, in a short period of time. The change points in this data are used to specify transitions to distinct events and can be used in various scenarios such as identifying change in a patient’s vital signs in the medical domain or requesting activity labels for generating real-world labeled activity datasets. Our work focuses on change-point detection to identify a transition from one activity to another. Within this paper, we extend our previous work on multivariate exponentially weighted moving average (MEWMA) algorithm by using a genetic algorithm (GA) to identify the optimal set of parameters for online change-point detection. The proposed technique finds the maximum accuracy and F_measure by optimizing the different parameters of the MEWMA, which subsequently identifies the exact location of the change point from an existing activity to a new one. Optimal parameter selection facilitates an algorithm to detect accurate change points and minimize false alarms. Results have been evaluated based on two real datasets of accelerometer data collected from a set of different activities from two users, with a high degree of accuracy from 99.4% to 99.8% and F_measure of up to 66.7%.
Collapse
Affiliation(s)
- Naveed Khan
- School of Computing and Information Engineering, Ulster University, Coleraine, Co., Londonderry BTT52 1SA, UK.
| | - Sally McClean
- School of Computing and Information Engineering, Ulster University, Coleraine, Co., Londonderry BTT52 1SA, UK.
| | - Shuai Zhang
- School of Computing and Mathematics, Ulster University, Jordanstown, Co., Antrim BT37 0QB, UK.
| | - Chris Nugent
- School of Computing and Mathematics, Ulster University, Jordanstown, Co., Antrim BT37 0QB, UK.
| |
Collapse
|
619
|
Fan J, Niu Z, Liang Y, Zhao Z. Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.10.140] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
620
|
Papadopoulos H, Kyriacou E, Nicolaides A. Unbiased confidence measures for stroke risk estimation based on ultrasound carotid image analysis. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2590-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
621
|
Kipnis P, Turk BJ, Wulf DA, LaGuardia JC, Liu V, Churpek MM, Romero-Brufau S, Escobar GJ. Development and validation of an electronic medical record-based alert score for detection of inpatient deterioration outside the ICU. J Biomed Inform 2016; 64:10-19. [PMID: 27658885 DOI: 10.1016/j.jbi.2016.09.013] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 08/23/2016] [Accepted: 09/18/2016] [Indexed: 10/21/2022]
Abstract
BACKGROUND Patients in general medical-surgical wards who experience unplanned transfer to the intensive care unit (ICU) show evidence of physiologic derangement 6-24h prior to their deterioration. With increasing availability of electronic medical records (EMRs), automated early warning scores (EWSs) are becoming feasible. OBJECTIVE To describe the development and performance of an automated EWS based on EMR data. MATERIALS AND METHODS We used a discrete-time logistic regression model to obtain an hourly risk score to predict unplanned transfer to the ICU within the next 12h. The model was based on hospitalization episodes from all adult patients (18years) admitted to 21 Kaiser Permanente Northern California (KPNC) hospitals from 1/1/2010 to 12/31/2013. Eligible patients met these entry criteria: initial hospitalization occurred at a KPNC hospital; the hospitalization was not for childbirth; and the EMR had been operational at the hospital for at least 3months. We evaluated the performance of this risk score, called Advanced Alert Monitor (AAM) and compared it against two other EWSs (eCART and NEWS) in terms of their sensitivity, specificity, negative predictive value, positive predictive value, and area under the receiver operator characteristic curve (c statistic). RESULTS A total of 649,418 hospitalization episodes involving 374,838 patients met inclusion criteria, with 19,153 of the episodes experiencing at least one outcome. The analysis data set had 48,723,248 hourly observations. Predictors included physiologic data (laboratory tests and vital signs); neurological status; severity of illness and longitudinal comorbidity indices; care directives; and health services indicators (e.g. elapsed length of stay). AAM showed better performance compared to NEWS and eCART in all the metrics and prediction intervals. The AAM AUC was 0.82 compared to 0.79 and 0.76 for eCART and NEWS, respectively. Using a threshold that generated 1 alert per day in a unit with a patient census of 35, the sensitivity of AAM was 49% (95% CI: 47.6-50.3%) compared to the sensitivities of eCART and NEWS scores of 44% (42.3-45.1) and 40% (38.2-40.9), respectively. For all three scores, about half of alerts occurred within 12h of the event, and almost two thirds within 24h of the event. CONCLUSION The AAM score is an example of a score that takes advantage of multiple data streams now available in modern EMRs. It highlights the ability to harness complex algorithms to maximize signal extraction. The main challenge in the future is to develop detection approaches for patients in whom data are sparser because their baseline risk is lower.
Collapse
Affiliation(s)
- Patricia Kipnis
- Kaiser Foundation Health Plan, Inc., 1950 Franklin St., 17th Floor, Oakland, CA 94612, United States; Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States.
| | - Benjamin J Turk
- Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States
| | - David A Wulf
- Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States
| | - Juan Carlos LaGuardia
- Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States
| | - Vincent Liu
- Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States; Intensive Care Department, Kaiser Permanente Medical Center, 700 Lawrence Expressway, Santa Clara, CA 95051, United States
| | - Matthew M Churpek
- Section of Pulmonary and Critical Care Medicine, Department of Medicine, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, United States
| | - Santiago Romero-Brufau
- Mayo Clinic Center for Innovation, 200 1st Street SW, Rochester, MN 55905, United States
| | - Gabriel J Escobar
- Kaiser Permanente Northern California, Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA 94612, United States; Department of Inpatient Pediatrics, Kaiser Permanente Medical Center, 1425 S. Main Street Walnut Creek, CA 94596, United States
| |
Collapse
|
622
|
A New Methodology Based on Imbalanced Classification for Predicting Outliers in Electricity Demand Time Series. ENERGIES 2016. [DOI: 10.3390/en9090752] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
623
|
Thyroid lesion classification in 242 patient population using Gabor transform features from high resolution ultrasound images. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.06.010] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
624
|
Perez-Ortiz M, Gutierrez PA, Tino P, Hervas-Martinez C. Oversampling the Minority Class in the Feature Space. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1947-1961. [PMID: 26316222 DOI: 10.1109/tnnls.2015.2461436] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.
Collapse
|
625
|
Cheng F, Zhang J, Wen C. Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data. Pattern Recognit Lett 2016. [DOI: 10.1016/j.patrec.2016.06.009] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
626
|
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.02.056] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
627
|
Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.05.048] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
628
|
ElMoaqet H, Tilbury DM, Ramachandran SK. A new algorithm for the detection of sleep apnea events in respiration signals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:3199-3202. [PMID: 28268988 DOI: 10.1109/embc.2016.7591409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Sleep apneas are the most common type of sleep-related breathing disorders which cause a patient to move from a good sleep into an inefficient sleep. In addition, sleep apnea widely impacts the American population and is a large cost for healthcare. Traditional detection methods of sleep apneas are complex, expensive, and invasive to most patients. Among the various physiological signals, respiration signals are relatively easy to be monitored. However, not many studies are conducted using respiration signal only, and most of the previous algorithms are insufficient to detect apnea events. In this paper, we propose a new algorithm based on only the respiration signal to detect the apnea events during sleep and conduct experiments comparing the performance of our algorithm against two apnea detection algorithms. We use 20 patients' data, all of whom have severe Apnea Hypopnea Index (AHI>30: over 30 events per hour). Our study shows that our algorithm outperforms the other two algorithms.
Collapse
|
629
|
Lahiri A, Roy AG, Sheet D, Biswas PK. Deep neural ensemble for retinal vessel segmentation in fundus images towards achieving label-free angiography. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:1340-1343. [PMID: 28268573 DOI: 10.1109/embc.2016.7590955] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background. In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member autoencoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708. Comparison with other major algorithms substantiates the high efficacy of our model.
Collapse
|
630
|
Jones DE, Ghandehari H, Facelli JC. A review of the applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 132:93-103. [PMID: 27282231 PMCID: PMC4902872 DOI: 10.1016/j.cmpb.2016.04.025] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 04/11/2016] [Accepted: 04/22/2016] [Indexed: 05/08/2023]
Abstract
This article presents a comprehensive review of applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles of medical interest. The papers reviewed here present the results of research using these techniques to predict the biological fate and properties of a variety of nanoparticles relevant to their biomedical applications. These include the influence of particle physicochemical properties on cellular uptake, cytotoxicity, molecular loading, and molecular release in addition to manufacturing properties like nanoparticle size, and polydispersity. Overall, the results are encouraging and suggest that as more systematic data from nanoparticles becomes available, machine learning and data mining would become a powerful aid in the design of nanoparticles for biomedical applications. There is however the challenge of great heterogeneity in nanoparticles, which will make these discoveries more challenging than for traditional small molecule drug design.
Collapse
Affiliation(s)
- David E Jones
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Hamidreza Ghandehari
- Departments of Bioengineering and Pharmaceutics and Pharmaceutical Chemistry, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Nanomedicine, Nano Institute of Utah, University of Utah, Salt Lake City, UT 84112, USA
| | - Julio C Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Nanomedicine, Nano Institute of Utah, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
631
|
A Selective Dynamic Sampling Back-Propagation Approach for Handling the Two-Class Imbalance Problem. APPLIED SCIENCES-BASEL 2016. [DOI: 10.3390/app6070200] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
632
|
Sohrabi MK, Akbari S. A comprehensive study on the effects of using data mining techniques to predict tie strength. COMPUTERS IN HUMAN BEHAVIOR 2016. [DOI: 10.1016/j.chb.2016.02.092] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
633
|
Daqi G, Ahmed D, Lili G, Zejian W, Zhe W. Pseudo-inverse linear discriminants for the improvement of overall classification accuracies. Neural Netw 2016; 81:59-71. [PMID: 27351107 DOI: 10.1016/j.neunet.2016.05.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Revised: 05/17/2016] [Accepted: 05/28/2016] [Indexed: 11/19/2022]
Abstract
This paper studies the learning and generalization performances of pseudo-inverse linear discriminant (PILDs) based on the processing minimum sum-of-squared error (MS(2)E) and the targeting overall classification accuracy (OCA) criterion functions. There is little practicable significance to prove the equivalency between a PILD with the desired outputs in reverse proportion to the number of class samples and an FLD with the totally projected mean thresholds. When the desired outputs of each class are assigned a fixed value, a PILD is partly equal to an FLD. With the customarily desired outputs {1, -1}, a practicable threshold is acquired, which is only related to sample sizes. If the desired outputs of each sample are changeable, a PILD has nothing in common with an FLD. The optimal threshold may thus be singled out from multiple empirical ones related to sizes and distributed regions. Depending upon the processing MS(2)E criteria and the actually algebraic distances, an iterative learning strategy of PILD is proposed, the outstanding advantages of which are with limited epoch, without learning rate and divergent risk. Enormous experimental results for the benchmark datasets have verified that the iterative PILDs with optimal thresholds have good learning and generalization performances, and even reach the top OCAs for some datasets among the existing classifiers.
Collapse
Affiliation(s)
- Gao Daqi
- Department of Computer Science, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China.
| | - Dastagir Ahmed
- Department of Computer Science, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Guo Lili
- Department of Computer Science, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Wang Zejian
- Department of Computer Science, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Wang Zhe
- Department of Computer Science, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
634
|
Fan Q, Gao D. A fast BP networks with dynamic sample selection for handwritten recognition. Pattern Anal Appl 2016. [DOI: 10.1007/s10044-016-0566-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
635
|
Fernández A, Elkano M, Galar M, Sanz JA, Alshomrani S, Bustince H, Herrera F. Enhancing evolutionary fuzzy systems for multi-class problems: Distance-based relative competence weighting with truncated confidences (DRCW-TC). Int J Approx Reason 2016. [DOI: 10.1016/j.ijar.2016.02.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
636
|
Sun B, Chen S, Wang J, Chen H. A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.03.024] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
637
|
Zhang Y, Liu B, Cai J, Zhang S. Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2342-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
638
|
Jiang K, Lu J, Xia K. A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2016. [DOI: 10.1007/s13369-016-2179-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
639
|
Tran QD, Liatsis P. RABOC: An approach to handle class imbalance in multimodal biometric authentication. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2014.12.126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
640
|
Umut İ, Çentik G. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:2041467. [PMID: 27213008 PMCID: PMC4860221 DOI: 10.1155/2016/2041467] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 04/02/2016] [Accepted: 04/04/2016] [Indexed: 11/18/2022]
Abstract
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.
Collapse
Affiliation(s)
- İlhan Umut
- Department of Computer Engineering, Faculty of Engineering, Trakya University, 22030 Edirne, Turkey
| | - Güven Çentik
- Department of Computer Engineering, Faculty of Engineering, Trakya University, 22030 Edirne, Turkey
| |
Collapse
|
641
|
|
642
|
Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2015.12.006] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
643
|
Gutiérrez PA, García S. Current prospects on ordinal and monotonic classification. PROGRESS IN ARTIFICIAL INTELLIGENCE 2016. [DOI: 10.1007/s13748-016-0088-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
644
|
Cuendet GL, Schoettker P, Yüce A, Sorci M, Gao H, Perruchoud C, Thiran JP. Facial Image Analysis for Fully Automatic Prediction of Difficult Endotracheal Intubation. IEEE Trans Biomed Eng 2016; 63:328-39. [PMID: 26186767 DOI: 10.1109/tbme.2015.2457032] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
GOAL Difficult tracheal intubation is a major cause of anesthesia-related injuries with potential life threatening complications. Detection and anticipation of difficult airway in the preoperative period is, thus, crucial for the patients' safety. We propose an automatic face-analysis approach to detect morphological traits related to difficult intubation and improve its prediction. METHODS For this purpose, we have collected a database of 970 patients including photos, videos, and ground truth data. Specific statistical face models have been learned using the faces in our database providing an automated parametrization of the facial morphology. The most discriminative morphological features are selected through the importance ranking provided by the random forest algorithm. The random forest approach has also been used to train a classifier on these selected features. We compare a threshold tuning method based on class prior with two methods, which learn an optimal threshold on a training set for tackling the inherent imbalanced nature of the database. RESULTS Our fully automated method achieves an AUC of 81.0% in a simplified experimental setup, where only easy and difficult patients are considered. A further validation on the entire database has proven that our method is applicable for real-world difficult intubation prediction, with AUC = 77.9%. CONCLUSION The system performance is in line with the state-of-the-art medical diagnosis, based on ratings provided by trained anesthesiologists, whose assessment is guided by an extensive set of criteria. SIGNIFICANCE We present the first completely automatic and noninvasive difficult intubation detection system that is suitable for use in clinical settings.
Collapse
|
645
|
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification. PLoS One 2016; 11:e0146116. [PMID: 26764911 PMCID: PMC4713117 DOI: 10.1371/journal.pone.0146116] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 12/13/2015] [Indexed: 11/30/2022] Open
Abstract
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Collapse
|
646
|
Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.08.060] [Citation(s) in RCA: 153] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
647
|
Bao L, Juan C, Li J, Zhang Y. Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2014.05.096] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
648
|
The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers. INT J MACH LEARN CYB 2015. [DOI: 10.1007/s13042-015-0478-7] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
649
|
Díez-Pastor JF, Rodríguez JJ, García-Osorio CI, Kuncheva LI. Diversity techniques improve the performance of the best imbalance learning ensembles. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.07.025] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
650
|
|