Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

251

Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021;61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

252

Chennuru VK, Timmappareddy SR. Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02369-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

253

Fraiwan L, Hassanin O. Computer-aided identification of degenerative neuromuscular diseases based on gait dynamics and ensemble decision tree classifiers. PLoS One 2021;16:e0252380. [PMID: 34086723 PMCID: PMC8177554 DOI: 10.1371/journal.pone.0252380] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 05/15/2021] [Indexed: 11/18/2022] Open

254

Mosquera-Lopez C, Wan E, Shastry M, Folsom J, Leitschuh J, Condon J, Rajhbeharrysingh U, Hildebrand A, Cameron M, Jacobs PG. Automated Detection of Real-World Falls: Modeled From People With Multiple Sclerosis. IEEE J Biomed Health Inform 2021;25:1975-1984. [PMID: 33245698 DOI: 10.1109/jbhi.2020.3041035] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

255

Ng WWY, Zhang Y, Zhang J, Wang DD, Wang FL. Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2019.2922340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

256

Yang D, Tan K, Huang Z, Li X, Chen B, Ren G, Xiao W. An automatic method for removing empty camera trap images using ensemble learning. Ecol Evol 2021;11:7591-7601. [PMID: 34188837 PMCID: PMC8216933 DOI: 10.1002/ece3.7591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 03/26/2021] [Accepted: 03/30/2021] [Indexed: 11/10/2022] Open

Abstract

Camera traps often produce massive images, and empty images that do not contain animals are usually overwhelming. Deep learning is a machine-learning algorithm and widely used to identify empty camera trap images automatically. Existing methods with high accuracy are based on millions of training samples (images) and require a lot of time and personnel costs to label the training samples manually. Reducing the number of training samples can save the cost of manually labeling images. However, the deep learning models based on a small dataset produce a large omission error of animal images that many animal images tend to be identified as empty images, which may lead to loss of the opportunities of discovering and observing species. Therefore, it is still a challenge to build the DCNN model with small errors on a small dataset. Using deep convolutional neural networks and a small-size dataset, we proposed an ensemble learning approach based on conservative strategies to identify and remove empty images automatically. Furthermore, we proposed three automatic identifying schemes of empty images for users who accept different omission errors of animal images. Our experimental results showed that these three schemes automatically identified and removed 50.78%, 58.48%, and 77.51% of the empty images in the dataset when the omission errors were 0.70%, 1.13%, and 2.54%, respectively. The analysis showed that using our scheme to automatically identify empty images did not omit species information. It only slightly changed the frequency of species occurrence. When only a small dataset was available, our approach provided an alternative to users to automatically identify and remove empty images, which can significantly reduce the time and personnel costs required to manually remove empty images. The cost savings were comparable to the percentage of empty images removed by models.

Collapse

257

Mallick C, Das AK, Nayak J, Pelusi D, Vimal S. Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System. Interdiscip Sci 2021;13:229-259. [PMID: 33576956 DOI: 10.1007/s12539-020-00412-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 12/17/2020] [Accepted: 12/21/2020] [Indexed: 11/25/2022]

Abstract

The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.

Collapse

258

How to identify early defaults in online lending: A cost-sensitive multi-layer learning framework. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106963] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

259

Alhassan Z, Watson M, Budgen D, Alshammari R, Alessa A, Al Moubayed N. Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records. JMIR Med Inform 2021;9:e25237. [PMID: 34028357 PMCID: PMC8185616 DOI: 10.2196/25237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 01/05/2021] [Accepted: 04/22/2021] [Indexed: 01/30/2023] Open

Abstract

Background

Predicting the risk of glycated hemoglobin (HbA_1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes.

Objective

Our study investigated the performance of predictive models to forecast HbA_1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models.

Methods

This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA_1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records.

Results

The machine learning models achieved promising results for predicting current HbA_1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data.

Conclusions

This study shows that machine learning models can provide promising results for the task of predicting current HbA_1c levels (≥5.7% or less). Using patients’ longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies.

Collapse

260

Ahirwal J, Nath A, Brahma B, Deb S, Sahoo UK, Nath AJ. Patterns and driving factors of biomass carbon and soil organic carbon stock in the Indian Himalayan region. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021;770:145292. [PMID: 33736385 DOI: 10.1016/j.scitotenv.2021.145292] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/13/2021] [Accepted: 01/14/2021] [Indexed: 06/12/2023]

Abstract

Tree-based ecosystems are critical to climate change mitigation. The study analysed carbon (C) stock patterns and examined the importance of environmental variables in predicting carbon stock in biomass and soils of the Indian Himalayan Region (IHR). We conducted a synthesis of 100 studies reporting biomass carbon stock and 67 studies on soil organic carbon (SOC) stock from four land-uses: forests, plantation, agroforest, and herbaceous ecosystem from the IHR. Machine learning techniques were used to examine the importance of various environmental variables in predicting carbon stock in biomass and soils. Despite large variations in biomass C and SOC stock (mean ± SD) within the land-uses, natural forests have the highest biomass C stock (138.5 ± 87.3 Mg C ha^-1), and plantation forests exhibited the highest SOC stock (168.8 ± 74.4 Mg C ha^-1) in the top 1-m of soils. The relationship between the environmental variables (altitude, latitude, precipitation, and temperature) and carbon stock was not significantly correlated. The prediction of biomass carbon and SOC stock using different machine learning techniques (Adaboost, Bagging, Random Forest, and XGBoost) shows that the XGBoost model can predict the carbon stock for the IHR closely. Our study confirms that the carbon stock in the IHR vary on a large scale due to a diverse range of land-use and ecosystems within the region. Therefore, predicting the driver of carbon stock on a single environmental variable is impossible for the entire IHR. The IHR possesses a prominent carbon sink and biodiversity pool. Therefore, its protection is essential in fulfilling India's commitment to nationally determined contributions (NDC). Our data synthesis may also provide a baseline for the precise estimation of carbon stock, which will be vital for India's National Mission for Sustaining the Himalayan Ecosystem (NMSHE).

Collapse

261

Sung SF, Hung LC, Hu YH. Developing a stroke alert trigger for clinical decision support at emergency triage using machine learning. Int J Med Inform 2021;152:104505. [PMID: 34030088 DOI: 10.1016/j.ijmedinf.2021.104505] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/01/2021] [Accepted: 05/17/2021] [Indexed: 11/19/2022]

Abstract

BACKGROUND

Acute stroke is an urgent medical condition that requires immediate assessment and treatment. Prompt identification of patients with suspected stroke at emergency department (ED) triage followed by timely activation of code stroke systems is the key to successful management of stroke. While false negative detection of stroke may prevent patients from receiving optimal treatment, excessive false positive alarms will substantially burden stroke neurologists. This study aimed to develop a stroke-alert trigger to identify patients with suspected stroke at ED triage.

METHODS

Patients who arrived at the ED within 12 h of symptom onset and were suspected of a stroke or transient ischemic attack or triaged with a stroke-related symptom were included. Clinical features at ED triage were collected, including the presenting complaint, triage level, self-reported medical history (hypertension, diabetes, hyperlipidemia, heart disease, and prior stroke), vital signs, and presence of atrial fibrillation. Three rule-based algorithms, ie, Face Arm Speech Test (FAST) and two flavors of Balance, Eyes, FAST (BE-FAST), and six machine learning (ML) techniques with various resampling methods were used to build classifiers for identification of patients with suspected stroke. Logistic regression (LR) was used to find important features.

RESULTS

The study population consisted of 1361 patients. The values of area under the precision-recall curve (AUPRC) were 0.737, 0.710, and 0.562 for the FAST, BE-FAST-1, and BE-FAST-2 models, respectively. The values of AUPRC for the top three ML models were 0.787 for classification and regression tree with undersampling, 0.783 for LR with synthetic minority oversampling technique (SMOTE), and 0.782 for LR with class weighting. Among the ML models, logistic regression and random forest models in general achieved higher values of AUPRC, in particular in those with class weighting or SMOTE to handle class imbalance problem. In addition to the presenting complaint and triage level, age, diastolic blood pressure, body temperature, and pulse rate, were also important features for developing a stroke-alert trigger.

CONCLUSIONS

ML techniques significantly improved the performance of prediction models for identification of patients with suspected stroke. Such ML models can be embedded in the electronic triage system for clinical decision support at ED triage.

Collapse

262

A Machine Learning Classifier Improves Mortality Prediction Compared With Pediatric Logistic Organ Dysfunction-2 Score: Model Development and Validation. Crit Care Explor 2021;3:e0426. [PMID: 34036277 PMCID: PMC8133049 DOI: 10.1097/cce.0000000000000426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Supplemental Digital Content is available in the text.

Objectives:

To determine whether machine learning algorithms can better predict PICU mortality than the Pediatric Logistic Organ Dysfunction-2 score.

Design:

Retrospective study.

Setting:

Quaternary care medical-surgical PICU.

Patients:

All patients admitted to the PICU from 2013 to 2019.

Interventions:

None.

Measurements and Main Results:

We investigated the performance of various machine learning algorithms using the same variables used to calculate the Pediatric Logistic Organ Dysfunction-2 score to predict PICU mortality. We used 10,194 patient records from 2013 to 2017 for training and 4,043 patient records from 2018 to 2019 as a holdout validation cohort. Mortality rate was 3.0% in the training cohort and 3.4% in the validation cohort. The best performing algorithm was a random forest model (area under the receiver operating characteristic curve, 0.867 [95% CI, 0.863–0.895]; area under the precision-recall curve, 0.327 [95% CI, 0.246–0.414]; F1, 0.396 [95% CI, 0.321–0.468]) and significantly outperformed the Pediatric Logistic Organ Dysfunction-2 score (area under the receiver operating characteristic curve, 0.761 [95% CI, 0.713–0.810]; area under the precision-recall curve (0.239 [95% CI, 0.165–0.316]; F1, 0.284 [95% CI, 0.209–0.360]), although this difference was reduced after retraining the Pediatric Logistic Organ Dysfunction-2 logistic regression model at the study institution. The random forest model also showed better calibration than the Pediatric Logistic Organ Dysfunction-2 score, and calibration of the random forest model remained superior to the retrained Pediatric Logistic Organ Dysfunction-2 model.

Conclusions:

A machine learning model achieved better performance than a logistic regression-based score for predicting ICU mortality. Better estimation of mortality risk can improve our ability to adjust for severity of illness in future studies, although external validation is required before this method can be widely deployed.

Collapse

263

Wang Z, Tsai CF, Lin WC. Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-01-2021-0027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract PurposeClass imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared. Collapse

264

Jothi Prakash V, Karthikeyan NK. Enhanced Evolutionary Feature Selection and Ensemble Method for Cardiovascular Disease Prediction. Interdiscip Sci 2021;13:389-412. [PMID: 33988832 DOI: 10.1007/s12539-021-00430-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 04/01/2021] [Accepted: 04/09/2021] [Indexed: 11/26/2022]

265

Yang PT, Wu WS, Wu CC, Shih YN, Hsieh CH, Hsu JL. Breast cancer recurrence prediction with ensemble methods and cost-sensitive learning. Open Med (Wars) 2021;16:754-768. [PMID: 34027105 PMCID: PMC8122465 DOI: 10.1515/med-2021-0282] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 03/11/2021] [Accepted: 04/03/2021] [Indexed: 11/15/2022] Open

266

Accessing Imbalance Learning Using Dynamic Selection Approach in Water Quality Anomaly Detection. Symmetry (Basel) 2021. [DOI: 10.3390/sym13050818] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learning for Dynamic Ensemble Selection (META-DES) in combination with homogeneous and heterogeneous ensemble models and three SMOTE-based resampling algorithms (SMOTE, SMOTE+ENN and SMOTE+Tomek Links), and one missing data method (missForest) are proposed and evaluated. A binary real-world drinking-water quality anomaly detection dataset is utilised to evaluate the models. The experimental results obtained reveal all the models benefitting from the combined optimisation of both the classifiers and resampling methods. Considering the three performance measures (balanced accuracy, F-score and G-mean), the result also shows that the dynamic classifier selection (DCS) techniques, in particular, the missForest+SMOTE+RANK and missForest+SMOTE+OLA models based on homogeneous ensemble-bagging with decision tree as the base classifier, exhibited better performances in terms of balanced accuracy and G-mean, while the Bg+mF+SMENN+LCA model based on homogeneous ensemble-bagging with random forest has a better overall F1-measure in comparison to the other models. Collapse

267

Chen T, Wong YD, Shi X, Yang Y. A data-driven feature learning approach based on Copula-Bayesian Network and its application in comparative investigation on risky lane-changing and car-following maneuvers. ACCIDENT; ANALYSIS AND PREVENTION 2021;154:106061. [PMID: 33691229 DOI: 10.1016/j.aap.2021.106061] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 06/12/2023]

Abstract

The era of 'Big Data' provides opportunities for researchers to have deep insights into traffic safety. By taking advantages of 'Big Data', this study proposes a data-driven method to develop a Copula-Bayesian Network (Copula-BN) using a large-scale naturalistic driving dataset with multiple features. The Copula-BN is able to explain the causality of a risky driving maneuver. As compared with conventional BNs, the Copula-BN developed in this study has the following advantages: the Copula-BN 1. Has a more rational and explainable structure; 2. Is less likely to be over-fitting and can attain more satisfactory prediction performance; and 3. Can handle not only discrete but also continuous features. In terms of technical innovations, Shapley Additive Explanation (SHAP) is used for feature selection, while Gaussian Copula function is employed to build the dependency structure of the Copula-BN. As for applications, the Copula-BNs are used to investigate the causality of risky lane-changing (LC) and car-following (CF) maneuvers, upon which the comparisons are made between the two essential but risky driving maneuvers. In this study, the Copula-BNs are developed based on the Second Highway Research Program (SHRP2) Naturalistic Driving Study (NDS) database. Upon network evaluation, the Copula-BNs for both risky LC and CF maneuvers demonstrate satisfactory structure performance and promising prediction performance. Feature inferences are conducted based on the Copula-BNs to respectively illustrate the causation of the two risky maneuvers. Several interesting findings related to features' contribution are discussed in this paper. To a certain extent, the Copula-BN developed using the data-driven method makes a trade-off between prediction and causality within the 'Big Data'. The comparison between risky LC and CF maneuvers also provides a valuable reference for crash risk evaluation, road safety policy-making, etc. In the future, the achievements of this study could be applied in Advanced Driver-Assistance System (ADAS) and accident diagnosis system to enhance road traffic safety.

Collapse

268

Hoyos-Osorio J, Alvarez-Meza A, Daza-Santacoloma G, Orozco-Gutierrez A, Castellanos-Dominguez G. Relevant information undersampling to support imbalanced data classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.033] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

269

Jin Y, Liu Y, Zhang W, Zhang S, Lou Y. A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

270

Wang L, Qi S, Liu Y, Lou H, Zuo X. Bagging k-dependence Bayesian network classifiers. INTELL DATA ANAL 2021. [DOI: 10.3233/ida-205125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

271

Wang Z, Sun Y. Optimization of SMOTE for imbalanced data based on AdaRBFNN and hybrid metaheuristics. INTELL DATA ANAL 2021. [DOI: 10.3233/ida-205176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

272

Tang X, Zhang T, Cheng N, Wang H, Zheng CH, Xia J, Zhang T. usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme. Brief Bioinform 2021;22:6236069. [PMID: 33866367 DOI: 10.1093/bib/bbab123] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/12/2021] [Accepted: 03/15/2021] [Indexed: 12/11/2022] Open

273

Fletcher RR, Nakeshimana A, Olubeko O. Addressing Fairness, Bias, and Appropriate Use of Artificial Intelligence and Machine Learning in Global Health. Front Artif Intell 2021;3:561802. [PMID: 33981989 PMCID: PMC8107824 DOI: 10.3389/frai.2020.561802] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/10/2020] [Indexed: 11/16/2022] Open

274

Yu H, Sun H, Li J, Shi L, Bao N, Li H, Qian W, Zhou S. Effective diagnostic model construction based on discriminative breast ultrasound image regions using deep feature extraction. Med Phys 2021;48:2920-2928. [PMID: 33690962 DOI: 10.1002/mp.14832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/22/2021] [Accepted: 03/03/2021] [Indexed: 12/24/2022] Open

Abstract

PURPOSE

This research aims to analyze the diagnostic contribution of different discriminative regions of the breast ultrasound image and develop a more effective diagnosis method taking advantage of the discriminative regions' complementarity.

METHODS

First, the discriminative regions of the original breast ultrasound image as the inner region of the lesion, the marginal zone of the lesion, and the posterior echo region of the lesion were defined. The pretrained Inception-V3 network was used to analyze the diagnostic contribution of these discriminative regions. Then, the network was applied to extract the deep features of the original image and the other three discriminative region images. Since there are many features, principal components analysis (PCA) was used to reduce the dimensionality of the extracted deep features. The selected deep features from different discriminative regions were fused to original image features and sent to the stacking ensemble learning classifier for classification experiments. In this study, 479 cases of breast ultrasound images, including 356 benign lesions and 123 malignant ones, were collected retrospectively and randomly divided into the training and validation set.

RESULTS

Experimental results show that by using Inception-V3, the diagnostic performance of each discriminative region is different, and the diagnostic accuracy and the area under the ROC curve (AUC) of the lesion marginal zone image (78.3%, 0.798) are higher than those of the lesion inner region image (73.3%, 0.763) and the posterior echo region image (71.7%, 0.688), but lower than those of the original image (80.0%, 0.817). Furthermore, the best classification performance was obtained when all the four types of deep features (from the original image and three discriminative region images) were fused, and the ensemble learning for classification evaluation was employed. Compared with the original image, the classification accuracy and AUC increased from 80.83%, 0.818 to 85.00%, 0.872, and the classification sensitivity and specificity varied from 0.710, 0.798 to 0.871, 0.787.

CONCLUSIONS

The inner region of the lesion, the marginal zone of the lesion, and the posterior echo region of the lesion play significant roles in the diagnosis of the breast ultrasound image. Deep feature fusion of these three kinds of images and the original image can effectively improve the accuracy of diagnosis.

Collapse

275

Ma JH, Feng Z, Wu JY, Zhang Y, Di W. Learning from imbalanced fetal outcomes of systemic lupus erythematosus in artificial neural networks. BMC Med Inform Decis Mak 2021;21:127. [PMID: 33845834 PMCID: PMC8042715 DOI: 10.1186/s12911-021-01486-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/31/2021] [Indexed: 11/10/2022] Open

276

Singh A, Ranjan RK, Tiwari A. Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms. J EXP THEOR ARTIF IN 2021. [DOI: 10.1080/0952813x.2021.1907795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

277

Chen B, Xia S, Chen Z, Wang B, Wang G. RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.10.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

278

A hybrid ensemble learning method for the identification of gang-related arson cases. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106875] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

279

Mehmood Z, Asghar S. Customizing SVM as a base learner with AdaBoost ensemble to learn from multi-class problems: A hybrid approach AdaBoost-MSVM. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106845] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

280

Bi Y, Xue B, Zhang M. Genetic Programming With a New Representation to Automatically Learn Features and Evolve Ensembles for Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2021;51:1769-1783. [PMID: 32011275 DOI: 10.1109/tcyb.2020.2964566] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

281

Chen Z, Duan J, Kang L, Qiu G. A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

282

Moyano JM, Reyes O, Fardoun HM, Ventura S. Performing multi-target regression via gene expression programming-based ensemble models. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

283

Starovoitov VV, Golub YI. About the confusion-matrix-based assessment of the results of imbalanced data classification. INFORMATICS 2021. [DOI: 10.37661/10.37661/1816-0301-2021-18-1-61-71] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

284

Wang X, Zhai M, Ren Z, Ren H, Li M, Quan D, Chen L, Qiu L. Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier. BMC Med Inform Decis Mak 2021;21:105. [PMID: 33743696 PMCID: PMC7980612 DOI: 10.1186/s12911-021-01471-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 03/11/2021] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM.

METHODS

Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model.

RESULTS

According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes.

CONCLUSIONS

The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.

Collapse

285

ImbTreeEntropy and ImbTreeAUC: Novel R Packages for Decision Tree Learning on the Imbalanced Datasets. ELECTRONICS 2021. [DOI: 10.3390/electronics10060657] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

286

Peacock CJ, Lamont C, Sheen DA, Shen VK, Kreplak L, Frampton JP. Predicting the Mixing Behavior of Aqueous Solutions Using a Machine Learning Framework. ACS APPLIED MATERIALS & INTERFACES 2021;13:11449-11460. [PMID: 33645207 DOI: 10.1021/acsami.0c21036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Abstract

The most direct approach to determining if two aqueous solutions will phase-separate upon mixing is to exhaustively screen them in a pair-wise fashion. This is a time-consuming process that involves preparation of numerous stock solutions, precise transfer of highly concentrated and often viscous solutions, exhaustive agitation to ensure thorough mixing, and time-sensitive monitoring to observe the presence of emulsion characteristics indicative of phase separation. Here, we examined the pair-wise mixing behavior of 68 water-soluble compounds by observing the formation of microscopic phase boundaries and droplets of 2278 unique 2-component solutions. A series of machine learning classifiers (artificial neural network, random forest, k-nearest neighbors, and support vector classifier) were then trained on physicochemical property data associated with the 68 compounds and used to predict their miscibility upon mixing. Miscibility predictions were then compared to the experimental observations. The random forest classifier was the most successful classifier of those tested, displaying an average receiver operator characteristic area under the curve of 0.74. The random forest classifier was validated by removing either one or two compounds from the input data, training the classifier on the remaining data and then predicting the miscibility of solutions involving the removed compound(s) using the classifier. The accuracy, specificity, and sensitivity of the random forest classifier were 0.74, 0.80, and 0.51, respectively, when one of the two compounds to be examined was not represented in the training data. When asked to predict the miscibility of two compounds, neither of which were represented in the training data, the accuracy, specificity, and sensitivity values for the random forest classifier were 0.70, 0.82 and 0.29, respectively. Thus, there is potential for this machine learning approach to improve the design of screening experiments to accelerate the discovery of aqueous two-phase systems for numerous scientific and industrial applications.

Collapse

287

Shaw SS, Ahmed S, Malakar S, Garcia-Hernandez L, Abraham A, Sarkar R. Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00314-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Abstract AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance. Collapse

288

Classifying multiclass imbalanced data using generalized class-specific extreme learning machine. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00236-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

289

A stacking-based ensemble learning method for earthquake casualty prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107038] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

290

Shin J, Yoon S, Kim Y, Kim T, Go B, Cha Y. Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2020.101202] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

291

An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106800] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

292

Lu Y, Cheung YM, Tang YY. Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021;51:1598-1612. [PMID: 31150353 DOI: 10.1109/tcyb.2019.2916196] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

293

Pei W, Xue B, Shang L, Zhang M. Genetic programming for development of cost-sensitive classifiers for binary high-dimensional unbalanced classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

294

Le HL, Landa-Silva D, Galar M, Garcia S, Triguero I. EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

295

Sainin MS, Alfred R, Ahmad F. ENSEMBLE META CLASSIFIER WITH SAMPLING AND FEATURE SELECTION FOR DATA WITH MULTICLASS IMBALANCE PROBLEM. JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY 2021. [DOI: 10.32890/jict2021.20.2.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

296

An adaptive boosting algorithm based on weighted feature selection and category classification confidence. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02184-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

297

Artificial Intelligence and the Medical Physicist: Welcome to the Machine. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11041691] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

298

Windows PE Malware Detection Using Ensemble Learning. INFORMATICS 2021. [DOI: 10.3390/informatics8010010] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

299

EnCNN-UPMWS: Waste Classification by a CNN Ensemble Using the UPM Weighting Strategy. ELECTRONICS 2021. [DOI: 10.3390/electronics10040427] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

300

Ruiz-Pérez I, López-Valenciano A, Hernández-Sánchez S, Puerta-Callejón JM, De Ste Croix M, Sainz de Baranda P, Ayala F. A Field-Based Approach to Determine Soft Tissue Injury Risk in Elite Futsal Using Novel Machine Learning Techniques. Front Psychol 2021;12:610210. [PMID: 33613389 PMCID: PMC7892460 DOI: 10.3389/fpsyg.2021.610210] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/14/2021] [Indexed: 12/19/2022] Open

Abstract

Lower extremity non-contact soft tissue (LE-ST) injuries are prevalent in elite futsal. The purpose of this study was to develop robust screening models based on pre-season measures obtained from questionnaires and field-based tests to prospectively predict LE-ST injuries after having applied a range of supervised Machine Learning techniques. One hundred and thirty-nine elite futsal players underwent a pre-season screening evaluation that included individual characteristics; measures related to sleep quality, athlete burnout, psychological characteristics related to sport performance and self-reported perception of chronic ankle instability. A number of neuromuscular performance measures obtained through three field-based tests [isometric hip strength, dynamic postural control (Y-Balance) and lower extremity joints range of motion (ROM-Sport battery)] were also recorded. Injury incidence was monitored over one competitive season. There were 25 LE-ST injuries. Only those groups of measures from two of the field-based tests (ROM-Sport battery and Y-Balance), as independent data sets, were able to build robust models [area under the receiver operating characteristic curve (AUC) score ≥0.7] to identify elite futsal players at risk of sustaining a LE-ST injury. Unlike the measures obtained from the five questionnaires selected, the neuromuscular performance measures did build robust prediction models (AUC score ≥0.7). The inclusion in the same data set of the measures recorded from all the questionnaires and field-based tests did not result in models with significantly higher performance scores. The model generated by the UnderBagging technique with a cost-sensitive SMO as the base classifier and using only four ROM measures reported the best prediction performance scores (AUC = 0.767, true positive rate = 65.9% and true negative rate = 62%). The models developed might help coaches, physical trainers and medical practitioners in the decision-making process for injury prevention in futsal.

Collapse