451
|
Wang Z, Cao C. Cascade interpolation learning with double subspaces and confidence disturbance for imbalanced problems. Neural Netw 2019; 118:17-31. [DOI: 10.1016/j.neunet.2019.06.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 04/04/2019] [Accepted: 06/03/2019] [Indexed: 10/26/2022]
|
452
|
Altalhi A, Forcén J, Pagola M, Barrenechea E, Bustince H, Takáč Z. Moderate deviation and restricted equivalence functions for measuring similarity between data. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.078] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
453
|
Chan TK, Chin CS. Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput Appl 2019. [DOI: 10.1007/s00521-018-3407-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
454
|
Towards Real-Time Prediction of Freezing of Gait in Patients With Parkinson's Disease: Addressing the Class Imbalance Problem. SENSORS 2019; 19:s19183898. [PMID: 31509999 PMCID: PMC6767263 DOI: 10.3390/s19183898] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/05/2019] [Accepted: 09/08/2019] [Indexed: 01/06/2023]
Abstract
Freezing of gait (FoG) is a common motor symptom in patients with Parkinson's disease (PD). FoG impairs gait initiation and walking and increases fall risk. Intelligent external cueing systems implementing FoG detection algorithms have been developed to help patients recover gait after freezing. However, predicting FoG before its occurrence enables preemptive cueing and may prevent FoG. Such prediction remains challenging given the relative infrequency of freezing compared to non-freezing events. In this study, we investigated the ability of individual and ensemble classifiers to predict FoG. We also studied the effect of the ADAptive SYNthetic (ADASYN) sampling algorithm and classification cost on classifier performance. Eighteen PD patients performed a series of daily walking tasks wearing accelerometers on their ankles, with nine experiencing FoG. The ensemble classifier formed by Support Vector Machines, K-Nearest Neighbors, and Multi-Layer Perceptron using bagging techniques demonstrated highest performance (F1 = 90.7) when synthetic FoG samples were added to the training set and class cost was set as twice that of normal gait. The model identified 97.4% of the events, with 66.7% being predicted. This study demonstrates our algorithm's potential for accurate prediction of gait events and the provision of preventive cueing in spite of limited event frequency.
Collapse
|
455
|
Classifying imbalanced data using BalanceCascade-based kernelized extreme learning machine. Pattern Anal Appl 2019. [DOI: 10.1007/s10044-019-00844-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
456
|
|
457
|
Classifying imbalanced data using ensemble of reduced kernelized weighted extreme learning machine. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01001-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
458
|
Provenza NR, Paulk AC, Peled N, Restrepo MI, Cash SS, Dougherty DD, Eskandar EN, Borton DA, Widge AS. Decoding task engagement from distributed network electrophysiology in humans. J Neural Eng 2019; 16:056015. [PMID: 31419211 PMCID: PMC6765221 DOI: 10.1088/1741-2552/ab2c58] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
OBJECTIVE Here, our objective was to develop a binary decoder to detect task engagement in humans during two distinct, conflict-based behavioral tasks. Effortful, goal-directed decision-making requires the coordinated action of multiple cognitive processes, including attention, working memory and action selection. That type of mental effort is often dysfunctional in mental disorders, e.g. when a patient attempts to overcome a depression or anxiety-driven habit but feels unable. If the onset of engagement in this type of focused mental activity could be reliably detected, decisional function might be augmented, e.g. through neurostimulation. However, there are no known algorithms for detecting task engagement with rapid time resolution. APPROACH We defined a new network measure, fixed canonical correlation (FCCA), specifically suited for neural decoding applications. We extracted FCCA features from local field potential recordings in human volunteers to give a temporally continuous estimate of mental effort, defined by engagement in experimental conflict tasks. MAIN RESULTS Using a small number of features per participant, we accurately decoded and distinguished task engagement from other mental activities. Further, the decoder distinguished between engagement in two different conflict-based tasks within seconds of their onset. SIGNIFICANCE These results demonstrate that network-level brain activity can detect specific types of mental efforts. This could form the basis of a responsive intervention strategy for decision-making deficits.
Collapse
Affiliation(s)
- Nicole R Provenza
- Brown University School of Engineering, Providence, RI, United States of America
- Charles Stark Draper Laboratory, Cambridge, MA, United States of America
| | - Angelique C Paulk
- Massachusetts General Hospital Neurosurgery Research, Boston, MA, United States of America
- Massachusetts General Hospital Neurology, Boston, MA, United States of America
| | - Noam Peled
- MGH/HST Martinos Center for Biomedical Imaging, Charlestown, MA, United States of America
| | - Maria I Restrepo
- Center for Computation and Visualization, Brown University, Providence, RI 02912, United States of America
| | - Sydney S Cash
- Massachusetts General Hospital Neurology, Boston, MA, United States of America
| | - Darin D Dougherty
- Massachusetts General Hospital Psychiatry, Boston, MA, United States of America
| | - Emad N Eskandar
- Massachusetts General Hospital Neurosurgery Research, Boston, MA, United States of America
- Present affiliation: Chair, Department of Neurological Surgery, Montefiore Medical Center, New York, NY, United States of America
| | - David A Borton
- Brown University School of Engineering, Providence, RI, United States of America
- Carney Institute for Brain Science, Providence, RI, United States of America
- Department of Veterans Affairs, Providence Medical Center, Center for Neurorestoration and Neurotechnology, Providence, RI, United States of America
| | - Alik S Widge
- Massachusetts General Hospital Psychiatry, Boston, MA, United States of America
- Present affiliation: Department of Psychiatry, University of Minnesota, Minneapolis, MN, United States of America
| |
Collapse
|
459
|
Alobaidi MH, Meguid MA, Chebana F. Predicting seismic-induced liquefaction through ensemble learning frameworks. Sci Rep 2019; 9:11786. [PMID: 31409827 PMCID: PMC6692379 DOI: 10.1038/s41598-019-48044-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 07/22/2019] [Indexed: 11/24/2022] Open
Abstract
The regional nature of liquefaction records and limited information available for a certain set of explanatories motivate the development of complex prediction techniques. Indirect methods are commonly applied to incidentally derive a hyperplane to this binary classification problem. Machine learning approaches offer evolutionary prediction models which can be used as direct prediction methods to liquefaction occurrence. Ensemble learning is a recent advancement in this field. According to a predefined ensemble architecture, a number of learners are trained and their inferences are integrated to produce stable and improved generalization ability. However, there is a need to consider several aspects of the ensemble learning frameworks when exploiting them for a particular application; a comprehensive evaluation of an ensemble learner’s generalization ability is required but usually overlooked. Also, the literature falls short on work utilizing ensemble learning in liquefaction prediction. To this extent, this work examines useful ensemble learning approaches for seismic-induced liquefaction prediction. A comprehensive analysis of fifteen ensemble models is performed. The results show improved prediction performance and diminishing uncertainty of ensembles, compared with single machine learning models.
Collapse
Affiliation(s)
- Mohammad H Alobaidi
- Civil Engineering and Applied Mechanics, McGill University, 817 Sherbrooke Street West, Montréal, QC, H3A 0C3, Canada.
| | - Mohamed A Meguid
- Civil Engineering and Applied Mechanics, McGill University, 817 Sherbrooke Street West, Montréal, QC, H3A 0C3, Canada
| | - Fateh Chebana
- Eau Terre Environnement, Institut National de la Recherche Scientifique, 490 Rue de la Couronne, Québec, QC, G1K 9A9, Canada
| |
Collapse
|
460
|
Ensemble Bagged Tree Based Classification for Reducing Non-Technical Losses in Multan Electric Power Company of Pakistan. ELECTRONICS 2019. [DOI: 10.3390/electronics8080860] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.
Collapse
|
461
|
Fernandes ER, de Carvalho AC. Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.04.052] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
462
|
A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.04.060] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
463
|
|
464
|
Lambert JPT, Childs DZ, Freckleton RP. Testing the ability of unmanned aerial systems and machine learning to map weeds at subfield scales: a test with the weed Alopecurus myosuroides (Huds). PEST MANAGEMENT SCIENCE 2019; 75:2283-2294. [PMID: 30972939 PMCID: PMC6767585 DOI: 10.1002/ps.5444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 04/03/2019] [Accepted: 04/08/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND It is important to map agricultural weed populations to improve management and maintain future food security. Advances in data collection and statistical methodology have created new opportunities to aid in the mapping of weed populations. We set out to apply these new methodologies (unmanned aerial systems; UAS) and statistical techniques (convolutional neural networks; CNN) to the mapping of black-grass, a highly impactful weed in wheat fields in the UK. We tested this by undertaking extensive UAS and field-based mapping over the course of 2 years, in total collecting multispectral image data from 102 fields, with 76 providing informative data. We used these data to construct a vegetation index (VI), which we used to train a custom CNN model from scratch. We undertook a suite of data engineering techniques, such as balancing and cleaning to optimize performance of our metrics. We also investigate the transferability of the models from one field to another. RESULTS The results show that our data collection methodology and implementation of CNN outperform pervious approaches in the literature. We show that data engineering to account for 'artefacts' in the image data increases our metrics significantly. We are not able to identify any traits that are shared between fields that result in high scores from our novel leave one field our cross validation (LOFO-CV) tests. CONCLUSION We conclude that this evaluation procedure is a better estimation of real-world predictive value when compared with past studies. We conclude that by engineering the image data set into discrete classes of data quality we increase the prediction accuracy from the baseline model by 5% to an area under the curve (AUC) of 0.825. We find that the temporal effects studied here have no effect on our ability to model weed densities. © 2019 The Authors. Pest Management Science published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.
Collapse
Affiliation(s)
- James PT Lambert
- Department of Animal & Plant ScienceUniversity of SheffieldSheffieldU.K.
| | - Dylan Z Childs
- Department of Animal & Plant ScienceUniversity of SheffieldSheffieldU.K.
| | - Rob P Freckleton
- Department of Animal & Plant ScienceUniversity of SheffieldSheffieldU.K.
| |
Collapse
|
465
|
The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9153093] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A dropout early warning system enables schools to preemptively identify students who are at risk of dropping out of school, to promptly react to them, and eventually to help potential dropout students to continue their learning for a better future. However, the inherent class imbalance between dropout and non-dropout students could pose difficulty in building accurate predictive modeling for a dropout early warning system. The present study aimed to improve the performance of a dropout early warning system: (a) by addressing the class imbalance issue using the synthetic minority oversampling techniques (SMOTE) and the ensemble methods in machine learning; and (b) by evaluating the trained classifiers with both receiver operating characteristic (ROC) and precision–recall (PR) curves. To that end, we trained random forest, boosted decision tree, random forest with SMOTE, and boosted decision tree with SMOTE using the big data samples of the 165,715 high school students from the National Education Information System (NEIS) in South Korea. According to our ROC and PR curve analysis, boosted decision tree showed the optimal performance.
Collapse
|
466
|
Ruiz-Pérez I, Ayala F, Puerta JM, Elvira JLL, De Ste Croix M, Hernández-Sánchez S, Vera-Garcia FJ. A Bayesian Network approach to study the relationships between several neuromuscular performance measures and dynamic postural control in futsal players. PLoS One 2019; 14:e0220065. [PMID: 31344068 PMCID: PMC6657865 DOI: 10.1371/journal.pone.0220065] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 07/08/2019] [Indexed: 11/18/2022] Open
Abstract
Purpose The purpose of this study was to analyse the relationship between several parameters of neuromuscular performance with dynamic postural control using a Bayesian Network Classifiers (BN) based analysis. Methods The y-balance test (measure of dynamic postural control), isokinetic (concentric and eccentric) knee flexion and extension strength, isometric hip abduction and adduction strength, lower extremity joint range of motion (ROM) and core stability were assessed in 44 elite male futsal players. A feature selection process was carried out before building a BN (using the Tabu search algorithm) for each leg. The BN models built were used to make belief updating processes to study the individual and concurrent contributions of the selected parameters of neuromuscular performance on dynamic postural control. Results The BNs generated using the selected features by the algorithms correlation attribute evaluator and chi squared reported the highest evaluation criteria (area under the receiver operating characteristic curve [AUC]) for the dominant (AUC = 0.899) and non-dominant (AUC = 0.879) legs, respectively. Conclusions The BNs demonstrated that performance achieved in the y-balance test appears to be widely influenced by hip and knee flexion and ankle dorsiflexion ROM measures in the sagittal plane, as well as by measures of static but mainly dynamic core stability in the frontal plane. Therefore, training interventions aimed at improving or maintaining dynamic postural control in elite male futsal players should include, among other things, exercises that produce ROM scores equal or higher than 127° of hip flexion, 132.5° of knee flexion as well as 34° and 30.5° of ankle dorsiflexion with the knee flexed and extended, respectively. Likewise, these training interventions should also include exercises to maintain or improve both the static and dynamic (medial-lateral plane) core stability so that futsal players can achieve medial radial error values lower than 6.69 and 8.79 mm, respectively.
Collapse
Affiliation(s)
- Iñaki Ruiz-Pérez
- Department of Sport Sciences, Sports Research Centre, Miguel Hernandez University of Elche, Elche, Spain
| | - Francisco Ayala
- Department of Sport Sciences, Sports Research Centre, Miguel Hernandez University of Elche, Elche, Spain
- Postdoctoral fellow from Seneca Foundation, Murcia, Spain
- * E-mail:
| | - José Miguel Puerta
- Department of Computer Systems, University of Castilla-La Mancha, Albacete, Spain
| | - Jose L. L. Elvira
- Department of Sport Sciences, Sports Research Centre, Miguel Hernandez University of Elche, Elche, Spain
| | - Mark De Ste Croix
- School of Sport and Exercise, University of Gloucestershire, Gloucester, United Kingdom
| | - Sergio Hernández-Sánchez
- Department of Pathology and Surgery, Physiotherapy Area, Miguel Hernandez University of Elche, Alicante, Spain
| | - Francisco Jose Vera-Garcia
- Department of Sport Sciences, Sports Research Centre, Miguel Hernandez University of Elche, Elche, Spain
| |
Collapse
|
467
|
Fahandezi Sadi M, Ansari E, Afsharchi M. Supervised word sense disambiguation using new features based on word embeddings. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-182868] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Majid Fahandezi Sadi
- Department of Computer Engineering, University of Zanjan, University of Zanjan Blvd. Zanjan, Iran
| | - Ebrahim Ansari
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
- Research Center for Basic Sciences & Modern Technologies (RBST), Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Mohsen Afsharchi
- Department of Computer Engineering, University of Zanjan, University of Zanjan Blvd. Zanjan, Iran
| |
Collapse
|
468
|
Predicting Seminal Quality via Imbalanced Learning with Evolutionary Safe-Level Synthetic Minority Over-Sampling Technique. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09657-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
469
|
|
470
|
Alaba PA, Popoola SI, Olatomiwa L, Akanle MB, Ohunakin OS, Adetiba E, Alex OD, Atayero AA, Wan Daud WMA. Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.086] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
471
|
Bader-El-Den M, Teitei E, Perry T. Biased Random Forest For Dealing With the Class Imbalance Problem. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2163-2172. [PMID: 30475733 DOI: 10.1109/tnnls.2018.2878400] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognizing the minority class during classification is a problem as most algorithms employed to learn from data input are biased toward the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This paper is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the minority instances in the data sets, the algorithms in this paper aims to "oversample the classification ensemble" by increasing the number of classifiers that represent the minority class in the ensemble, i.e., random forest. The proposed biased random forest algorithm employs the nearest neighbor algorithm to identify the critical areas in a given data set. The standard random forest is then fed with more random trees generated based on the critical areas. The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.
Collapse
|
472
|
Lee HK, Jin R, Feng Y, Bain PA, Goffinet J, Baker C, Li J. An Analytical Framework for TJR Readmission Prediction and Cost-Effective Intervention. IEEE J Biomed Health Inform 2019; 23:1760-1772. [DOI: 10.1109/jbhi.2018.2859581] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
473
|
Mohamed MF, Shabayek AER, El-Gayyar M, Nassar H. An adaptive framework for real-time data reduction in AMI. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2019. [DOI: 10.1016/j.jksuci.2018.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
474
|
Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles. Artif Intell Med 2019; 97:27-37. [PMID: 31202397 DOI: 10.1016/j.artmed.2019.05.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Revised: 04/01/2019] [Accepted: 05/08/2019] [Indexed: 11/23/2022]
Abstract
Breast Cancer is one of the most common causes of cancer death in women, representing a very complex disease with varied molecular alterations. To assist breast cancer prognosis, the classification of patients into biological groups is of great significance for treatment strategies. Recent studies have used an ensemble of multiple clustering algorithms to elucidate the most characteristic biological groups of breast cancer. However, the combination of various clustering methods resulted in a number of patients remaining unclustered. Therefore, a framework still needs to be developed which can assign as many unclustered (i.e. biologically diverse) patients to one of the identified groups in order to improve classification. Therefore, in this paper we develop a novel classification framework which introduces a new ensemble classification stage after the ensemble clustering stage to target the unclustered patients. Thus, a step-by-step pipeline is introduced which couples ensemble clustering with ensemble classification for the identification of core groups, data distribution in them and improvement in final classification results by targeting the unclustered data. The proposed pipeline is employed on a novel real world breast cancer dataset and subsequently its robustness and stability are examined by testing it on standard datasets. The results show that by using the presented framework, an improved classification is obtained. Finally, the results have been verified using statistical tests, visualisation techniques, cluster quality assessment and interpretation from clinical experts.
Collapse
|
475
|
Gupta K, Bhavsar A, Sao AK. Detecting mitotic cells in HEp-2 images as anomalies via one class classifier. Comput Biol Med 2019; 111:103328. [PMID: 31326866 DOI: 10.1016/j.compbiomed.2019.103328] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 06/12/2019] [Accepted: 06/12/2019] [Indexed: 11/18/2022]
Abstract
We propose a novel framework for classification of mitotic v/s non-mitotic cells in a Computer Aided Diagnosis (CAD) system for Anti-Nuclear Antibodies (ANA) detection. In the proposed work, due to unique characteristics (the rare occurrence) of the mitotic cells, their identification is posed as an anomaly detection approach. This will resolve the issue of data imbalance, which can arise in the traditional binary classification paradigm for mitotic v/s non-mitotic cell image classification. Here, the characteristics of only non-mitotic/interphase cells are captured using a well-defined feature representation to characterize the non-mitotic class distribution well, and the mitotic class is posed as an anomalous class. This framework requires training data only for the majority (non-mitotic) class, to build the classification model. The feature representation of the non-mitotic class includes morphology, texture, and Convolutional Neural Network (CNN) based feature representations, coupled with Bag-of-Words (BoW) and Spatial Pyramid Pooling (SPP) based summarization techniques. For classification, in this work, we employ the One-Class Support Vector Machines (OC-SVM). The proposed classification framework is validated on a publicly available dataset, and across various experiments, we demonstrate comparable or better performance over binary classification, attaining 0.99 (max.) F-Score in one case. The proposed framework proves to be an effective way to solve the mentioned problem statement, where there are less number of samples in one of the classes.
Collapse
Affiliation(s)
- Krati Gupta
- School of Computing & Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India.
| | - Arnav Bhavsar
- School of Computing & Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India.
| | - Anil K Sao
- School of Computing & Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India.
| |
Collapse
|
476
|
Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2019. [DOI: 10.2478/fcds-2019-0009] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
Sentiment classification is an important task which gained extensive attention both in academia and in industry. Many issues related to this task such as handling of negation or of sarcastic utterances were analyzed and accordingly addressed in previous works. However, the issue of class imbalance which often compromises the prediction capabilities of learning algorithms was scarcely studied. In this work, we aim to bridge the gap between imbalanced learning and sentiment analysis. An experimental study including twelve imbalanced learning preprocessing methods, four feature representations, and a dozen of datasets, is carried out in order to analyze the usefulness of imbalanced learning methods for sentiment classification. Moreover, the data difficulty factors — commonly studied in imbalanced learning — are investigated on sentiment corpora to evaluate the impact of class imbalance.
Collapse
|
477
|
Tao X, Li Q, Guo W, Ren C, Li C, Liu R, Zou J. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.02.062] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
478
|
O’Brien R, Ishwaran H. A Random Forests Quantile Classifier for Class Imbalanced Data. PATTERN RECOGNITION 2019; 90:232-249. [PMID: 30765897 PMCID: PMC6370055 DOI: 10.1016/j.patcog.2019.01.036] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to tt-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.
Collapse
Affiliation(s)
- Robert O’Brien
- Division of Biostatistics, University of Miami, Miami, FL 33136, USA
| | - Hemant Ishwaran
- Division of Biostatistics, University of Miami, Miami, FL 33136, USA
| |
Collapse
|
479
|
Magge A, Sarker A, Nikfarjam A, Gonzalez-Hernandez G. Comment on: "Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts". J Am Med Inform Assoc 2019; 26:577-579. [PMID: 31087070 PMCID: PMC6515520 DOI: 10.1093/jamia/ocz013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 01/21/2019] [Indexed: 11/12/2022] Open
Affiliation(s)
- Arjun Magge
- College of Health Solutions, Arizona State University, Scottsdale, Arizona, USA
| | - Abeed Sarker
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Azadeh Nikfarjam
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
480
|
Fan Q, Wang Z, Gao D. Locality Density-Based Fuzzy Multiple Empirical Kernel Learning. Neural Process Lett 2019. [DOI: 10.1007/s11063-018-9881-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
481
|
Dai HJ, Wang CK. Classifying adverse drug reactions from imbalanced twitter data. Int J Med Inform 2019; 129:122-132. [PMID: 31445246 DOI: 10.1016/j.ijmedinf.2019.05.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 04/07/2019] [Accepted: 05/21/2019] [Indexed: 10/26/2022]
Abstract
BACKGROUND Nowadays, social media are often being used by general public to create and share public messages related to their health. With the global increase in social media usage, there is a trend of posting information related to adverse drug reactions (ADR). Mining the social media data for this type of information will be helpful for pharmacological post-marketing surveillance and monitoring. Although the concept of using social media to facilitate pharmacovigilance is convincing, construction of automatic ADR detection systems remains a challenge because the corpora compiled from social media tend to be highly imbalanced, posing a major obstacle to the development of classifiers with reliable performance. METHODS Several methods have been proposed to address the challenge of imbalanced corpora. However, we are not aware of any studies that investigated the effectiveness of the strategies of dealing with the problem of imbalanced data in the context of ADR detection from social media. In light of this, we evaluated a variety of imbalanced techniques and proposed a novel word embedding-based synthetic minority over-sampling technique (WESMOTE), which synthesizes new training examples from the sentence representation based on word embeddings. We compared the performance of all methods on two large imbalanced datasets released for the purpose of detecting ADR posts. RESULTS In comparison with the state-of-the-art approaches, the classifiers that incorporated imbalanced classification techniques achieved comparable or better F-scores. All of our best performing configurations combined random under-sampling with techniques including the proposed WESMOTE, boosting and ensemble, implying that an integration of these approaches with under-sampling provides a reliable solution for large imbalanced social media datasets. Furthermore, ensemble-based methods like vote-based under-sampling (VUE) and random under-sampling boosting can be alternatives for the hybrid synthetic methods because both methods increase the diversity of the created weak classifiers, leading to better recall and overall F-scores for the minority classes. CONCLUSIONS Data collected from the social media are usually very large and highly imbalanced. In order to maximize the performance of a classifier trained on such data, applications of imbalanced strategies are required. We considered several practical methods for handling imbalanced Twitter data along with their performance on the binary classification task with respect to ADRs. In conclusion, the following practical insights are gained: 1) When dealing with text classification, the proposed word embedding-based synthetic minority over-sampling technique is more effective than traditional synthetic-based over-sampling methods. 2) In cases where large amounts of training data are available, the imbalanced strategies combined with under-sampling techniques are preferred. 3) Finally, employment of advanced methods does not guarantee better performance than simpler ones such as VUE, which achieved high performance with advantages like faster building time and ease of development.
Collapse
Affiliation(s)
- Hong-Jie Dai
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan, Republic of China; Post Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan, Republic of China.
| | - Chen-Kai Wang
- Big Data laboratories of Chunghwa Telecom Laboratories, Taoyuan, Taiwan, Republic of China.
| |
Collapse
|
482
|
Geng K, Shin DC, Song D, Hampson RE, Deadwyler SA, Berger TW, Marmarelis VZ. Multi-Input, Multi-Output Neuronal Mode Network Approach to Modeling the Encoding Dynamics and Functional Connectivity of Neural Systems. Neural Comput 2019; 31:1327-1355. [PMID: 31113305 DOI: 10.1162/neco_a_01204] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
This letter proposes a novel method, multi-input, multi-output neuronal mode network (MIMO-NMN), for modeling encoding dynamics and functional connectivity in neural ensembles such as the hippocampus. Compared with conventional approaches such as the Volterra-Wiener model, linear-nonlinear-cascade (LNC) model, and generalized linear model (GLM), the NMN has several advantages in terms of estimation accuracy, model interpretation, and functional connectivity analysis. We point out the limitations of current neural spike modeling methods, especially the estimation biases caused by the imbalanced class problem when the number of zeros is significantly larger than ones in the spike data. We use synthetic data to test the performance of NMN with a comparison of the traditional methods, and the results indicate the NMN approach could reduce the imbalanced class problem and achieve better predictions. Subsequently, we apply the MIMO-NMN method to analyze data from the human hippocampus. The results indicate that the MIMO-NMN method is a promising approach to modeling neural dynamics and analyzing functional connectivity of multi-neuronal data.
Collapse
Affiliation(s)
- Kunling Geng
- Department of Biomedical Engineering and Biomedical Simulations Resource Center, University of Southern California, Los Angeles, CA, 90089, U.S.A.
| | - Dae C Shin
- Department of Biomedical Engineering and Biomedical Simulations Resource Center, University of Southern California, Los Angeles, CA, 90089, U.S.A.
| | - Dong Song
- Department of Biomedical Engineering and Biomedical Simulations Resource Center, University of Southern California, Los Angeles, CA, 90089, U.S.A.
| | - Robert E Hampson
- Department of Physiology and Pharmacology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, U.S.A.
| | - Samuel A Deadwyler
- Department of Physiology and Pharmacology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, U.S.A.
| | - Theodore W Berger
- Department of Biomedical Engineering and Biomedical Simulations Resource Center, University of Southern California, Los Angeles, CA, 90089, U.S.A.
| | - Vasilis Z Marmarelis
- Department of Biomedical Engineering and Biomedical Simulations Resource Center, University of Southern California, Los Angeles, CA, 90089, U.S.A.
| |
Collapse
|
483
|
Furxhi I, Murphy F, Mullins M, Poland CA. Machine learning prediction of nanoparticle in vitro toxicity: A comparative study of classifiers and ensemble-classifiers using the Copeland Index. Toxicol Lett 2019; 312:157-166. [PMID: 31102714 DOI: 10.1016/j.toxlet.2019.05.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 04/12/2019] [Accepted: 05/13/2019] [Indexed: 01/22/2023]
Abstract
Nano-Particles (NPs) are well established as important components across a broad range of products from cosmetics to electronics. Their utilization is increasing with their significant economic and societal potential yet to be fully realized. Inroads have been made in our understanding of the risks posed to human health and the environment by NPs but this area will require continuous research and monitoring. In recent years Machine Learning (ML) techniques have exploited large datasets and computation power to create breakthroughs in diverse fields from facial recognition to genomics. More recently, ML techniques have been applied to nanotoxicology with very encouraging results. In this study, categories of ML classifiers (rules, trees, lazy, functions and bayes) were compared using datasets from the Safe and Sustainable Nanotechnology (S2NANO) database to investigate their performance in predicting NPs in vitro toxicity. Physicochemical properties, toxicological and quantum-mechanical attributes and in vitro experimental conditions were used as input variables to predict the toxicity of NPs based on cell viability. Voting, an ensemble meta-classifier, was used to combine base models to optimize the classification prediction of toxicity. To facilitate inter-comparison, a Copeland Index was applied that ranks the classifiers according to their performance and suggested the optimal classifier. Neural Network (NN) and Random forest (RF) showed the best performance in the majority of the datasets used in this study. However, the combination of classifiers demonstrated an improved prediction resulting meta-classifier to have higher indices. This proposed Copeland Index can now be used by researchers to identify and clearly prioritize classifiers in order to achieve more accurate classification predictions for NP toxicity for a given dataset.
Collapse
Affiliation(s)
- Irini Furxhi
- Dept. of Accounting and Finance, Kemmy Business School, University of Limerick, V94PH93, Ireland.
| | - Finbarr Murphy
- Dept. of Accounting and Finance, Kemmy Business School, University of Limerick, V94PH93, Ireland.
| | - Martin Mullins
- Dept. of Accounting and Finance, Kemmy Business School, University of Limerick, V94PH93, Ireland.
| | - Craig A Poland
- ELEGI/Colt Laboratory, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, Scotland, United Kingdom.
| |
Collapse
|
484
|
A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00953-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
485
|
Malhotra R, Kamal S. An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.04.090] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
486
|
Pang Y, Peng L, Chen Z, Yang B, Zhang H. Imbalanced learning based on adaptive weighting and Gaussian function synthesizing with an application on Android malware detection. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.01.065] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
487
|
A learning approach to image-based visual servoing with a bagging method of velocity calculations. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.12.082] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
488
|
Huang Z, Yang C, Chen X, Huang K, Xie Y. Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04208-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
489
|
Escobar GJ, Gupta NR, Walsh EM, Soltesz L, Terry SM, Kipnis P. Automated early detection of obstetric complications: theoretic and methodologic considerations. Am J Obstet Gynecol 2019; 220:297-307. [PMID: 30682365 DOI: 10.1016/j.ajog.2019.01.208] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 12/20/2018] [Accepted: 01/10/2019] [Indexed: 12/01/2022]
Abstract
Compared with adults who are admitted to general medical-surgical wards, women who are admitted to labor and delivery services are at much lower risk of experiencing unexpected critical illness. Nonetheless, critical illness and other complications that put either the mother or fetus at risk do occur. One potential approach to prevention is to use automated early warning systems, such as those used for nonpregnant adults. Predictive models that use data extracted in real time from electronic records constitute the cornerstone of such systems. This article addresses several issues that are involved in the development of such predictive models: specification of temporal characteristics, choice of denominator, selection of outcomes for model calibration, potential uses of existing adult severity of illness scores, approaches to data processing, statistical considerations, validation, and options for instantiation. These have not been addressed explicitly in the obstetrics literature, which has focused on the use of manually assigned scores. In addition, this article provides some results from work in progress to develop 2 obstetric predictive models with the use of data from 262,071 women who were admitted to a labor and delivery service at 15 Kaiser Permanente Northern California hospitals between 2010 and 2017.
Collapse
Affiliation(s)
- Gabriel J Escobar
- Division of Research, Systems Research Initiative, Kaiser Permanente Northern California, Oakland, CA.
| | - Neeru R Gupta
- Department of Obstetrics and Gynecology, Kaiser Permanente Medical Center, Oakland, CA
| | - Eileen M Walsh
- Division of Research, Perinatal Research Unit, Kaiser Permanente Northern California, Oakland, CA
| | - Lauren Soltesz
- Division of Research, Systems Research Initiative, Kaiser Permanente Northern California, Oakland, CA
| | - Stephanie M Terry
- Department of Obstetrics and Gynecology, Kaiser Permanente Medical Center, San Francisco, CA
| | - Patricia Kipnis
- Division of Research, Systems Research Initiative, Kaiser Permanente Northern California, Oakland, CA; Decision Support, Kaiser Foundation Hospitals, Inc, Oakland, CA
| |
Collapse
|
490
|
Wang Y, Yang L, Yuan C. A robust outlier control framework for classification designed with family of homotopy loss function. Neural Netw 2019; 112:41-53. [DOI: 10.1016/j.neunet.2019.01.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 12/07/2018] [Accepted: 01/22/2019] [Indexed: 10/27/2022]
|
491
|
Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2017.12.030] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
492
|
Removal of batch effects using stratified subsampling of metabolomic data for in vitro endocrine disruptors screening. Talanta 2019; 195:77-86. [DOI: 10.1016/j.talanta.2018.11.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/31/2018] [Accepted: 11/05/2018] [Indexed: 01/31/2023]
|
493
|
|
494
|
Gal O, Auslander N, Fan Y, Meerzaman D. Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression. Cancer Inform 2019; 18:1176935119835544. [PMID: 30911218 PMCID: PMC6423478 DOI: 10.1177/1176935119835544] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 01/29/2019] [Indexed: 11/17/2022] Open
Abstract
Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the k-nearest neighbors algorithm (K-NN) model (K = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted P value = .0092, which may suggest the vital role of N-glycosylation in AML.
Collapse
Affiliation(s)
- Ophir Gal
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Noam Auslander
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.,Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Yu Fan
- Center for Biomedical Informatics & Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Daoud Meerzaman
- Center for Biomedical Informatics & Information Technology, National Cancer Institute, Rockville, MD, USA
| |
Collapse
|
495
|
|
496
|
Tsai CF, Lin WC, Hu YH, Yao GT. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.10.029] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
497
|
|
498
|
Barteld F, Biemann C, Zinsmeister H. Token-based spelling variant detection in Middle Low German texts. LANG RESOUR EVAL 2019. [DOI: 10.1007/s10579-018-09441-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
499
|
Instance selection improves geometric mean accuracy: a study on imbalanced data classification. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/s13748-019-00172-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
500
|
Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform 2019; 90:103089. [DOI: 10.1016/j.jbi.2018.12.003] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 11/02/2018] [Accepted: 12/21/2018] [Indexed: 11/15/2022]
|