Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Van Hulse J, Khoshgoftaar T. Knowledge discovery from imbalanced and noisy data. DATA KNOWL ENG 2009;68:1513-42. [DOI: 10.1016/j.datak.2009.08.005] [Citation(s) in RCA: 127] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

Ortega Vázquez C, vanden Broucke S, De Weerdt J. A two-step anomaly detection based method for PU classification in imbalanced data sets. Data Min Knowl Discov 2023. [DOI: 10.1007/s10618-023-00925-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]

Xia S, Chen B, Wang G, Zheng Y, Gao X, Giem E, Chen Z. mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;33:2916-2930. [PMID: 33428577 DOI: 10.1109/tnnls.2020.3047046] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Lee D, Kim K. Improved noise-filtering algorithm for AdaBoost using the inter-and intra-class variability of imbalanced datasets. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Moradi K, Aldarraji Z, Luthra M, Madison GP, Ascoli GA. Normalized unitary synaptic signaling of the hippocampus and entorhinal cortex predicted by deep learning of experimental recordings. Commun Biol 2022;5:418. [PMID: 35513471 PMCID: PMC9072429 DOI: 10.1038/s42003-022-03329-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 03/30/2022] [Indexed: 11/21/2022] Open

Chou EP, Yang SP. A virtual multi-label approach to imbalanced data classification. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2049820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Bi W, Zhang Q. Forecasting mergers and acquisitions failure based on partial-sigmoid neural network and feature selection. PLoS One 2021;16:e0259575. [PMID: 34788332 PMCID: PMC8598039 DOI: 10.1371/journal.pone.0259575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 10/22/2021] [Indexed: 11/19/2022] Open

Dudjak M, Martinović G. An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. EXPERT SYSTEMS WITH APPLICATIONS 2021;182:115297. [DOI: 10.1016/j.eswa.2021.115297] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Shatnawi R. Software fault prediction using machine learning techniques with metric thresholds. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-210061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Sabeti E, Drews J, Reamaroon N, Warner E, Sjoding MW, Gryak J, Najarian K. Learning Using Partially Available Privileged Information and Label Uncertainty: Application in Detection of Acute Respiratory Distress Syndrome. IEEE J Biomed Health Inform 2021;25:784-796. [PMID: 32750956 DOI: 10.1109/jbhi.2020.3008601] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Chongomweru H, Kasem A. A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH). Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05570-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

A New Under-Sampling Method to Face Class Overlap and Imbalance. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10155164] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Nematzadeh Z, Ibrahim R, Selamat A. A hybrid model for class noise detection using k-means and classification filtering algorithms. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-3129-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abdulrauf Sharifai G, Zainol Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes (Basel) 2020;11:genes11070717. [PMID: 32605144 PMCID: PMC7397300 DOI: 10.3390/genes11070717] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/19/2019] [Accepted: 01/07/2020] [Indexed: 11/16/2022] Open

Bauder RA, Khoshgoftaar TM. A study on rare fraud predictions with big Medicare claims fraud data. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-184415] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019;119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 421] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Data preprocessing in predictive data mining. KNOWL ENG REV 2019. [DOI: 10.1017/s026988891800036x] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1244-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso). Sci Rep 2018;8:9959. [PMID: 29967391 PMCID: PMC6028482 DOI: 10.1038/s41598-018-28244-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 06/18/2018] [Indexed: 12/02/2022] Open

Novel mislabeled training data detection algorithm. Neural Comput Appl 2018. [DOI: 10.1007/s00521-016-2589-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

A bi-objective hybrid algorithm for the classification of imbalanced noisy and borderline data sets. Pattern Anal Appl 2018. [DOI: 10.1007/s10044-018-0693-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Luengo J, Shim SO, Alshomrani S, Altalhi A, Herrera F. CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2017.10.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Cost-sensitive elimination of mislabeled training data. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.03.034] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Xiao G, Wu F, Zhou X, Li K. Probabilistic top-k range query processing for uncertain databases. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2016. [DOI: 10.3233/jifs-169040] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Dealing with Data Difficulty Factors While Learning from Imbalanced Data. STUDIES IN COMPUTATIONAL INTELLIGENCE 2016. [DOI: 10.1007/978-3-319-18781-5_17] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Chen KH, Wang KJ, Adrian AM, Wang KM, Teng NC. Diagnosis of Brain Metastases from Lung Cancer Using a Modified Electromagnetism like Mechanism Algorithm. J Med Syst 2015;40:35. [PMID: 26573656 DOI: 10.1007/s10916-015-0367-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2015] [Accepted: 10/06/2015] [Indexed: 11/26/2022]

Abstract

Brain metastases are commonly found in patients that are diagnosed with primary malignancy on their lung. Lung cancer patients with brain metastasis tend to have a poor survivability, which is less than 6 months in median. Therefore, an early and effective detection system for such disease is needed to help prolong the patients' survivability and improved their quality of life. A modified electromagnetism-like mechanism (EM) algorithm, MEM-SVM, is proposed by combining EM algorithm with support vector machine (SVM) as the classifier and opposite sign test (OST) as the local search technique. The proposed method is applied to 44 UCI and IDA datasets, and 5 cancers microarray datasets as preliminary experiment. In addition, this method is tested on 4 lung cancer microarray public dataset. Further, we tested our method on a nationwide dataset of brain metastasis from lung cancer (BMLC) in Taiwan. Since the nature of real medical dataset to be highly imbalanced, the synthetic minority over-sampling technique (SMOTE) is utilized to handle this problem. The proposed method is compared against another 8 popular benchmark classifiers and feature selection methods. The performance evaluation is based on the accuracy and Kappa index. For the 44 UCI and IDA datasets and 5 cancer microarray datasets, a non-parametric statistical test confirmed that MEM-SVM outperformed the other methods. For the 4 lung cancer public microarray datasets, MEM-SVM still achieved the highest mean value for accuracy and Kappa index. Due to the imbalanced property on the real case of BMLC dataset, all methods achieve good accuracy without significance difference among the methods. However, on the balanced BMLC dataset, MEM-SVM appears to be the best method with higher accuracy and Kappa index. We successfully developed MEM-SVM to predict the occurrence of brain metastasis from lung cancer with the combination of SMOTE technique to handle the class imbalance properties. The results confirmed that MEM-SVM has good diagnosis power and can be applied as an alternative diagnosis tool in with other medical tests for the early detection of brain metastasis from lung cancer.

Collapse

Bekhuis T, Tseytlin E, Mitchell KJ. A Prototype for a Hybrid System to Support Systematic Review Teams: A Case Study of Organ Transplantation. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2015;2015:940-947. [PMID: 26855824 PMCID: PMC4742277 DOI: 10.1109/bibm.2015.7359810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Napierala K, Stefanowski J. Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 2015. [DOI: 10.1007/s10844-015-0368-1] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Xue JH, Hall P. Why Does Rebalancing Class-Unbalanced Data Improve AUC for Linear Discriminant Analysis? IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015;37:1109-1112. [PMID: 26353332 DOI: 10.1109/tpami.2014.2359660] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Pan S, Wu J, Zhu X, Zhang C. Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE TRANSACTIONS ON CYBERNETICS 2015;45:940-954. [PMID: 25167562 DOI: 10.1109/tcyb.2014.2341031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Kanj S, Abdallah F, Denœux T, Tout K. Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal Appl 2015. [DOI: 10.1007/s10044-015-0452-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Young WA, Nykl SL, Weckman GR, Chelberg DM. Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 2014. [DOI: 10.1007/s00521-014-1780-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Frénay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014;25:845-869. [PMID: 24808033 DOI: 10.1109/tnnls.2013.2292894] [Citation(s) in RCA: 360] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

WALD RANDALL, KHOSHGOFTAAR TAGHIM, SLOAN JOHNC. FEATURE SELECTION FOR OPTIMIZATION OF WAVELET PACKET DECOMPOSITION IN RELIABILITY ANALYSIS OF SYSTEMS. INT J ARTIF INTELL T 2013. [DOI: 10.1142/s0218213013600117] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Parameter-free classification in multi-class imbalanced data sets. DATA KNOWL ENG 2013. [DOI: 10.1016/j.datak.2013.06.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Janitza S, Strobl C, Boulesteix AL. An AUC-based permutation variable importance measure for random forests. BMC Bioinformatics 2013;14:119. [PMID: 23560875 PMCID: PMC3626572 DOI: 10.1186/1471-2105-14-119] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/21/2013] [Indexed: 11/30/2022] Open

Abstract

Background

The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance.

Results

We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings.

Conclusions

The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.

Collapse

Zhou L. Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2012.12.007] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Newby D, Freitas AA, Ghafourian T. Coping with Unbalanced Class Data Sets in Oral Absorption Models. J Chem Inf Model 2013;53:461-74. [DOI: 10.1021/ci300348u] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Tomašev N, Mladenić D. Hubness-aware shared neighbor distances for high-dimensional $$k$$ -nearest neighbor classification. Knowl Inf Syst 2013. [DOI: 10.1007/s10115-012-0607-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data. EMERGING PARADIGMS IN MACHINE LEARNING 2013. [DOI: 10.1007/978-3-642-28699-5_11] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]

Classifying highly imbalanced ICU data. Health Care Manag Sci 2012;16:119-28. [DOI: 10.1007/s10729-012-9216-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Accepted: 10/08/2012] [Indexed: 11/27/2022]

DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets. DATA KNOWL ENG 2012. [DOI: 10.1016/j.datak.2012.08.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Van Hulse J, Khoshgoftaar TM, Napolitano A. Evaluating the Impact of Data Quality on Sampling. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2012. [DOI: 10.1142/s021964921100295x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract Learning from imbalanced training data can be a difficult endeavour, and the task is made even more challenging if the data is of low quality or the size of the training dataset is small. Data sampling is a commonly used method for improving learner performance when data is imbalanced. However, little effort has been put forth to investigate the performance of data sampling techniques when data is both noisy and imbalanced. In this work, we present a comprehensive empirical investigation of the impact of changes in four training dataset characteristics — dataset size, class distribution, noise level and noise distribution — on data sampling techniques. We present the performance of four common data sampling techniques using 11 learning algorithms. The results, which are based on an extensive suite of experiments for which over 15 million models were trained and evaluated, show that: (1) even for relatively clean datasets, class imbalance can still hurt learner performance, (2) data sampling, however, may not improve performance for relatively clean but imbalanced datasets, (3) data sampling can be very effective at dealing with the combined problems of noise and imbalance, (4) both the level and distribution of class noise among the classes are important, as either factor alone does not cause a significant impact, (5) when sampling does improve the learners (i.e. for noisy and imbalanced datasets), RUS and SMOTE are the most effective at improving the AUC, while SMOTE performed well relative to the F-measure, (6) there are significant differences in the empirical results depending on the performance measure used, and hence it is important to consider multiple metrics in this type of analysis, and (7) data sampling rarely hurt the AUC, but only significantly improved performance when data was at least moderately skewed or noisy, while for the F-measure, data sampling often resulted in significantly worse performance when applied to slightly skewed or noisy datasets, but did improve performance when data was either severely noisy or skewed, or contained moderate levels of both noise and imbalance. Collapse

Marques I, Graña M. Face recognition with lattice independent component analysis and extreme learning machines. Soft comput 2012. [DOI: 10.1007/s00500-012-0826-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

García V, Sánchez J, Mollineda R. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl Based Syst 2012. [DOI: 10.1016/j.knosys.2011.06.013] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Catal C, Alan O, Balkan K. Class noise detection based on software metrics and ROC curves. Inf Sci (N Y) 2011. [DOI: 10.1016/j.ins.2011.06.017] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Segura-Bedmar I, Martínez P, de Pablo-Sánchez C. Using a shallow linguistic kernel for drug–drug interaction extraction. J Biomed Inform 2011;44:789-804. [DOI: 10.1016/j.jbi.2011.04.005] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 04/14/2011] [Accepted: 04/19/2011] [Indexed: 11/26/2022]

Kotsiantis SB. Decision trees: a recent overview. Artif Intell Rev 2011. [DOI: 10.1007/s10462-011-9272-4] [Citation(s) in RCA: 194] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Khoshgoftaar TM, Van Hulse J, Napolitano A. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. ACTA ACUST UNITED AC 2011. [DOI: 10.1109/tsmca.2010.2084081] [Citation(s) in RCA: 185] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Assessing the Impact of Class-Imbalanced Data for Classifying Relevant/Irrelevant Medline Documents. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/978-3-642-19914-1_45] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Tang J, Zhang J. Modeling the evolution of associated data. DATA KNOWL ENG 2010. [DOI: 10.1016/j.datak.2010.03.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]