51
|
Fu S, Su D, Li S, Sun S, Tian Y. Linear-exponential loss incorporated deep learning for imbalanced classification. ISA TRANSACTIONS 2023; 140:279-292. [PMID: 37385859 DOI: 10.1016/j.isatra.2023.06.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/01/2023]
Abstract
The class imbalance issue is a pretty common and enduring topic all the time. When encountering unbalanced data distribution, conventional methods are prone to classify minority samples as majority ones, which may cause severe consequences in reality. It is crucial yet challenging to cope with such problems. In this paper, inspired by our previous work, we borrow the linear-exponential (LINEX) loss function in statistics into deep learning for the first time and extend it into a multi-class form, denoted as DLINEX. Compared with existing loss functions in class imbalance learning (e.g., the weighted cross entropy-loss and the focal loss), DLINEX has an asymmetric geometry interpretation, which can adaptively focus more on the minority and hard-to-classify samples by solely adjusting one parameter. Besides, it simultaneously achieves between and within class diversities via caring about the inherent properties of each instance. As a result, DLINEX achieves 42.08% G-means on the CIFAR-10 dataset at the imbalance ratio of 200, 79.06% G-means on the HAM10000 dataset, 82.74% F1 on the DRIVE dataset, 83.93% F1 on the CHASEDB1 dataset and 79.55% F1 on the STARE dataset The quantitative and qualitative experiments convincingly demonstrate that DLINEX can work favorably in imbalanced classifications, either at the image-level or the pixel-level.
Collapse
Affiliation(s)
- Saiji Fu
- School of Economics and Management, Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road, Haidian District, Beijing, 100876, China.
| | - Duo Su
- School of Computer Science and Technology, University of Chinese Academy of Sciences, No. 19 (A) Yuquan Road, Shijingshan District, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China.
| | - Shilin Li
- School of Mathematics, Renmin University of China, No. 59 Zhongguancun Street, Haidian District, Beijing, 100872, China.
| | - Shiding Sun
- School of Mathematical Sciences, University of Chinese Academy of Sciences, No. 19 (A) Yuquan Road, Shijingshan District, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China.
| | - Yingjie Tian
- School of Economics and Management, University of Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation at UCAS, No. 3 of Zhongguancun South Street 1, Haidian District, Beijing, 100190, China.
| |
Collapse
|
52
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
53
|
Balanced neighbor exploration for semi-supervised node classification on imbalanced graph data. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
|
54
|
Marques HO, Swersky L, Sander J, Campello RJGB, Zimek A. On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles. Data Min Knowl Discov 2023; 37:1473-1517. [PMID: 37424877 PMCID: PMC10326160 DOI: 10.1007/s10618-023-00931-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 02/28/2023] [Indexed: 07/11/2023]
Abstract
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. Supplementary Information The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
Collapse
|
55
|
Wang Y, Sun P. Kernel principle component analysis and random under sampling boost based fault diagnosis method and its application to a pressurized water reactor. NUCLEAR ENGINEERING AND DESIGN 2023. [DOI: 10.1016/j.nucengdes.2023.112258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
|
56
|
Xu Y, Yu Z, Chen CLP, Liu Z. Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2284-2297. [PMID: 34469316 DOI: 10.1109/tnnls.2021.3106306] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
It is hard to construct an optimal classifier for high-dimensional imbalanced data, on which the performance of classifiers is seriously affected and becomes poor. Although many approaches, such as resampling, cost-sensitive, and ensemble learning methods, have been proposed to deal with the skewed data, they are constrained by high-dimensional data with noise and redundancy. In this study, we propose an adaptive subspace optimization ensemble method (ASOEM) for high-dimensional imbalanced data classification to overcome the above limitations. To construct accurate and diverse base classifiers, a novel adaptive subspace optimization (ASO) method based on adaptive subspace generation (ASG) process and rotated subspace optimization (RSO) process is designed to generate multiple robust and discriminative subspaces. Then a resampling scheme is applied on the optimized subspace to build a class-balanced data for each base classifier. To verify the effectiveness, our ASOEM is implemented based on different resampling strategies on 24 real-world high-dimensional imbalanced datasets. Experimental results demonstrate that our proposed methods outperform other mainstream imbalance learning approaches and classifier ensemble methods.
Collapse
|
57
|
Parrott JM, Parrott AJ, Rouhi AD, Parrott JS, Dumon KR. What We Are Missing: Using Machine Learning Models to Predict Vitamin C Deficiency in Patients with Metabolic and Bariatric Surgery. Obes Surg 2023:10.1007/s11695-023-06571-w. [PMID: 37060491 DOI: 10.1007/s11695-023-06571-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/23/2023] [Accepted: 03/28/2023] [Indexed: 04/16/2023]
Abstract
PURPOSE Vitamin C (VC) is implicated in many physiological pathways. Vitamin C deficiency (VCD) can compromise the health of patients with metabolic and bariatric surgery (patients). As symptoms of VCD are elusive and data on VCD in patients is scarce, we aim to characterize patients with measured VC levels, investigate the association of VCD with other lab abnormalities, and create predictive models of VCD using machine learning (ML). METHODS A retrospective chart review of patients seen from 2017 to 2021 at a tertiary care center in Northeastern USA was conducted. A 1:4 case mix of patients with VC measured to a random sample of patients without VC measured was created for comparative purposes. ML models (BayesNet and random forest) were used to create predictive models and estimate the prevalence of VCD patients. RESULTS Of 5946 patients reviewed, 187 (3.1%) had VC measures, and 73 (39%) of these patients had VC<23 μmol/L(VCD. When comparing patients with VCD to patients without VCD, the ML algorithms identified a higher risk of VCD in patients deficient in vitamin B1, D, calcium, potassium, iron, and blood indices. ML models reached 70% accuracy. Applied to the testing sample, a "true" VCD prevalence of ~20% was predicted, among whom ~33% had scurvy levels (VC<11 μmol/L). CONCLUSION Our models suggest a much higher level of patients have VCD than is reflected in the literature. This indicates a high proportion of patients remain potentially undiagnosed for VCD and are thus at risk for postoperative morbidity and mortality.
Collapse
Affiliation(s)
- Julie M Parrott
- Temple University Health System, 7600 Centrail Avenue, Philadelphia, PA, 19111, USA.
- Departmet of Clinical and Preventive Nutrition Sciences, Rutgers University, 65 Bergen Street, Suite 120, Newark, NJ, 07107-1709, USA.
- Faculty of Health Sciences and Wellbeing, The University of Sunderland, Edinburg Building, City Campus, Chester Road, Sunderland, SR1 3SD, UK.
| | - Austen J Parrott
- The Child Center of NY, 118-35 Queens Boulevard, 6th Floor, Forest Hills, New York, NY, 11375, USA
| | - Armaun D Rouhi
- Department of Surgery, Hospital of the University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA
| | - J Scott Parrott
- School of Health Professions, Rutgers Biomedical and Health Sciences, Reserach Tower, 836B, 675 Hoes Lane West, Piscataway, NJ, 08854, USA
| | - Kristoffel R Dumon
- Penn Metabolic and Bariatic Surgery and Gastrointestinal Surgery, Hospital of the University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA
| |
Collapse
|
58
|
Quayson E, Ganaa ED, Zhu Q, Shen XJ. Multi-view Representation Induced Kernel Ensemble Support Vector Machine. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11250-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
59
|
Van der Schraelen L, Stouthuysen K, Vanden Broucke S, Verdonck T. Regularization oversampling for classification tasks: To exploit what you do not know. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
60
|
Zhao Q, Shu J, Yuan X, Liu Z, Meng D. A Probabilistic Formulation for Meta-Weight-Net. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1194-1208. [PMID: 34460386 DOI: 10.1109/tnnls.2021.3105104] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In the last decade, deep neural networks (DNNs) have become dominant tools for various of supervised learning tasks, especially classification. However, it is demonstrated that they can easily overfit to training set biases, such as label noise and class imbalance. Example reweighting algorithms are simple and effective solutions against this issue, but most of them require manually specifying the weighting functions as well as additional hyperparameters. Recently, a meta-learning-based method Meta-Weight-Net (MW-Net) has been proposed to automatically learn the weighting function parameterized by an MLP via additional unbiased metadata, which significantly improves the robustness of prior arts. The method, however, is proposed in a deterministic manner, and short of intrinsic statistical support. In this work, we propose a probabilistic formulation for MW-Net, probabilistic MW-Net (PMW-Net) in short, which treats the weighting function in a probabilistic way, and can include the original MW-Net as a special case. By this probabilistic formulation, additional randomness is introduced while the flexibility of the weighting function can be further controlled during learning. Our experimental results on both synthetic and real datasets show that the proposed method improves the performance of the original MW-Net. Besides, the proposed PMW-Net can also be further extended to fully Bayesian models, to improve their robustness.
Collapse
|
61
|
Cao Q, Tang J, Huang Y, Shi M, van Rompaey A, Huang F. Modeling Production-Living-Ecological Space for Chengdu, China: An Analytical Framework Based on Machine Learning with Automatic Parameterization of Environmental Elements. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3911. [PMID: 36900922 PMCID: PMC10001890 DOI: 10.3390/ijerph20053911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/01/2023] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Cities worldwide are facing the dual pressures of growing population and land expansion, leading to the intensification of conflicts in urban productive-living-ecological spaces (PLES). Therefore, the question of "how to dynamically judge the different thresholds of different indicators of PLES" plays an indispensable role in the studies of the multi-scenario simulation of land space changes and needs to be tackled in an appropriate way, given that the process simulation of key elements that affect the evolution of urban systems is yet to achieve complete coupling with PLES utilization configuration schemes. In this paper, we developed a scenario simulation framework combining the dynamic coupling model of Bagging-Cellular Automata (Bagging-CA) to generate various environmental element configuration patterns for urban PLES development. The key merit of our analytical approach is that the weights of different key driving factors under different scenarios are obtained through the automatic parameterized adjustment process, and we enrich the study cases for the vast southwest region in China, which is beneficial for balanced development between eastern and western regions in the country. Finally, we simulate the PLES with the data of finer land use classification, combining a machine learning and multi-objective scenario. Automatic parameterization of environmental elements can help planners and stakeholders understand more comprehensively the complex land space changes caused by the uncertainty of space resources and environment changes, so as to formulate appropriate policies and effectively guide the implementation of land space planning. The multi-scenario simulation method developed in this study has offered new insights and high applicability to other regions for modeling PLES.
Collapse
Affiliation(s)
- Qi Cao
- Department of Civil Engineering and Architecture, Southwest University of Science and Technology, Mianyang 621000, China
- Geography and Tourism Research Group, Department of Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200E, 3001 Heverlee, Belgium
| | - Junqing Tang
- School of Urban Planning and Design, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
- Key Laboratory of Earth Surface System and Human-Earth Relations of Ministry of Natural Resources of China, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| | - Yudie Huang
- Department of Civil Engineering and Architecture, Southwest University of Science and Technology, Mianyang 621000, China
| | - Manjiang Shi
- Department of Civil Engineering and Architecture, Southwest University of Science and Technology, Mianyang 621000, China
| | - Anton van Rompaey
- Geography and Tourism Research Group, Department of Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200E, 3001 Heverlee, Belgium
| | - Fengjue Huang
- School of Urban Planning and Design, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| |
Collapse
|
62
|
Xu Y, Liu L, Zhang S, Xiao W. Multilayer extreme learning machine-based unsupervised deep feature representation for heartbeat classification. Soft comput 2023. [DOI: 10.1007/s00500-023-07861-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
63
|
Wang J, Qiu J, Zhu T, Zeng Y, Yang H, Shang Y, Yin J, Sun Y, Qu Y, Valdimarsdóttir UA, Song H. Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank. JMIR Public Health Surveill 2023; 9:e43419. [PMID: 36805366 PMCID: PMC9989910 DOI: 10.2196/43419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 01/12/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND Suicidal behaviors, including suicide deaths and attempts, are major public health concerns. However, previous suicide models required a huge amount of input features, resulting in limited applicability in clinical practice. OBJECTIVE We aimed to construct applicable models (ie, with limited features) for short- and long-term suicidal behavior prediction. We further validated these models among individuals with different genetic risks of suicide. METHODS Based on the prospective cohort of UK Biobank, we included 223 (0.06%) eligible cases of suicide attempts or deaths, according to hospital inpatient or death register data within 1 year from baseline and randomly selected 4460 (1.18%) controls (1:20) without such records. We similarly identified 833 (0.22%) cases of suicidal behaviors 1 to 6 years from baseline and 16,660 (4.42%) corresponding controls. Based on 143 input features, mainly including sociodemographic, environmental, and psychosocial factors; medical history; and polygenic risk scores (PRS) for suicidality, we applied a bagged balanced light gradient-boosting machine (LightGBM) with stratified 10-fold cross-validation and grid-search to construct the full prediction models for suicide attempts or deaths within 1 year or between 1 and 6 years. The Shapley Additive Explanations (SHAP) approach was used to quantify the importance of input features, and the top 20 features with the highest SHAP values were selected to train the applicable models. The external validity of the established models was assessed among 50,310 individuals who participated in UK Biobank repeated assessments both overall and by the level of PRS for suicidality. RESULTS Individuals with suicidal behaviors were on average 56 years old, with equal sex distribution. The application of these full models in the external validation data set demonstrated good model performance, with the area under the receiver operating characteristic (AUROC) curves of 0.919 and 0.892 within 1 year and between 1 and 6 years, respectively. Importantly, the applicable models with the top 20 most important features showed comparable external-validated performance (AUROC curves of 0.901 and 0.885) as the full models, based on which we found that individuals in the top quintile of predicted risk accounted for 91.7% (n=11) and 80.7% (n=25) of all suicidality cases within 1 year and during 1 to 6 years, respectively. We further obtained comparable prediction accuracy when applying these models to subpopulations with different genetic susceptibilities to suicidality. For example, for the 1-year risk prediction, the AUROC curves were 0.907 and 0.885 for the high (>2nd tertile of PRS) and low (<1st) genetic susceptibilities groups, respectively. CONCLUSIONS We established applicable machine learning-based models for predicting both the short- and long-term risk of suicidality with high accuracy across populations of varying genetic risk for suicide, highlighting a cost-effective method of identifying individuals with a high risk of suicidality.
Collapse
Affiliation(s)
- Junren Wang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Jiajun Qiu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Ting Zhu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Yu Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Huazhen Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Yanan Shang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Jin Yin
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Yajing Sun
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Yuanyuan Qu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Unnur A Valdimarsdóttir
- Center of Public Health Sciences, Faculty of Medicine, University of Iceland, Reykjavík, Iceland.,Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Epidemiology, Harvard T H Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Huan Song
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China.,Center of Public Health Sciences, Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
64
|
Wong S, Simmons A, Rivera-Villicana J, Barnett S, Sivathamboo S, Perucca P, Ge Z, Kwan P, Kuhlmann L, Vasa R, Mouzakis K, O'Brien TJ. EEG datasets for seizure detection and prediction- A review. Epilepsia Open 2023. [PMID: 36740244 DOI: 10.1002/epi4.12704] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 01/28/2023] [Indexed: 02/07/2023] Open
Abstract
Electroencephalogram (EEG) datasets from epilepsy patients have been used to develop seizure detection and prediction algorithms using machine learning (ML) techniques with the aim of implementing the learned model in a device. However, the format and structure of publicly available datasets are different from each other, and there is a lack of guidelines on the use of these datasets. This impacts the generatability, generalizability, and reproducibility of the results and findings produced by the studies. In this narrative review, we compiled and compared the different characteristics of the publicly available EEG datasets that are commonly used to develop seizure detection and prediction algorithms. We investigated the advantages and limitations of the characteristics of the EEG datasets. Based on our study, we identified 17 characteristics that make the EEG datasets unique from each other. We also briefly looked into how certain characteristics of the publicly available datasets affect the performance and outcome of a study, as well as the influences it has on the choice of ML techniques and preprocessing steps required to develop seizure detection and prediction algorithms. In conclusion, this study provides a guideline on the choice of publicly available EEG datasets to both clinicians and scientists working to develop a reproducible, generalizable, and effective seizure detection and prediction algorithm.
Collapse
Affiliation(s)
- Sheng Wong
- Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
| | - Anj Simmons
- Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
| | | | - Scott Barnett
- Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
| | - Shobi Sivathamboo
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, Victoria, Australia.,Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
| | - Piero Perucca
- Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia.,Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, Victoria, Australia.,Comprehensive Epilepsy Program, Austin Health, Heidelberg, Victoria, Australia
| | - Zongyuan Ge
- Monash eResearch Centre, Monash University, Clayton, Victoria, Australia
| | - Patrick Kwan
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
| | - Levin Kuhlmann
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria, Australia.,Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Victoria, Australia
| | - Rajesh Vasa
- Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
| | - Kon Mouzakis
- Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
| | - Terence J O'Brien
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, Victoria, Australia.,Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
| |
Collapse
|
65
|
A machine learning approach for hierarchical classification of software requirements. MACHINE LEARNING WITH APPLICATIONS 2023. [DOI: 10.1016/j.mlwa.2023.100457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
|
66
|
Furzer J, Isabelle M, Miloucheva B, Laporte A. Public drug insurance, moral hazard and children's use of mental health medication: Latent mental health risk-specific responses to lower out-of-pocket treatment costs. HEALTH ECONOMICS 2023; 32:518-538. [PMID: 36408897 DOI: 10.1002/hec.4631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/09/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
Studies have shown that reducing out-of-pocket costs can lead to higher medication initiation rates in childhood. Whether the cost of such initiatives is inflated by moral hazard issues remains a question of concern. This paper looks to the implementation of a public drug insurance program in Québec, Canada, to investigate potential low-benefit consumption in children. Using a nationally representative longitudinal sample, we harness machine learning techniques to predict a child's risk of developing a mental health disorder. Using difference-in-differences analyses, we then assess the impact of the drug program on children's mental health medication uptake across the distribution of predicted mental health risk. Beyond showing that eliminating out-of-pocket costs led to a 3 percentage point increase in mental health drug uptake, we show that demand responses are concentrated in the top two deciles of risk for developing mental health disorders. These higher-risk children increase take-up of mental health drugs by 7-8 percentage points. We find even stronger effects for stimulants (8-11 percentage point increases among the highest risk children). Our results suggest that reductions in out-of-pocket costs could achieve better uptake of mental health medications, without inducing substantial low-benefit care among lower-risk children.
Collapse
Affiliation(s)
- Jill Furzer
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Maripier Isabelle
- Department of Economics, Université Laval, Quebec City, Quebec, Canada
- Centre de recherche CERVO, Quebec City, Quebec, Canada
- CIRANO, Montreal, Quebec, Canada
- Canadian Centre for Health Economics, University of Toronto, Toronto, Ontario, Canada
| | - Boriana Miloucheva
- Center for Health and Wellbeing, Princeton University, Princeton, New Jersey, USA
| | - Audrey Laporte
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Canadian Centre for Health Economics, University of Toronto, Toronto, Ontario, Canada
- Department of Economics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
67
|
Amin A, Adnan A, Anwar S. An adaptive learning approach for customer churn prediction in the telecommunication industry using evolutionary computation and Naïve Bayes. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
68
|
Li K, Wang B, Tian Y, Qi Z. Fast and Accurate Road Crack Detection Based on Adaptive Cost-Sensitive Loss Function. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1051-1062. [PMID: 34546935 DOI: 10.1109/tcyb.2021.3103885] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Numerous detection problems in computer vision, including road crack detection, suffer from exceedingly foreground-background imbalance. Fortunately, modification of loss function appears to solve this puzzle once and for all. In this article, we propose a pixel-based adaptive weighted cross-entropy (WCE) loss in conjunction with Jaccard distance to facilitate high-quality pixel-level road crack detection. Our work profoundly demonstrates the influence of loss functions on detection outcomes and sheds light on the sophisticated consecutive improvements in the realm of crack detection. Specifically, to verify the effectiveness of the proposed loss, we conduct extensive experiments on four public databases, that is, CrackForest, AigleRN, Crack360, and BJN260. Compared to the vanilla WCE, the proposed loss significantly speeds up the training process while retaining the performance.
Collapse
|
69
|
An Efficient Two-Stage Network Intrusion Detection System in the Internet of Things. INFORMATION 2023. [DOI: 10.3390/info14020077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Internet of Things (IoT) devices and services provide convenience but face serious security threats. The network intrusion detection system is vital in ensuring the security of the IoT environment. In the IoT environment, we propose a novel two-stage intrusion detection model that combines machine learning and deep learning to deal with the class imbalance of network traffic data and achieve fine-grained intrusion detection on large-scale flow data. The superiority of the model is verified on the newer and larger CSE-CIC-IDS2018 dataset. In Stage-1, the LightGBM algorithm recognizes normal and abnormal network traffic data and compares six classic machine learning techniques. In Stage-2, the Convolutional Neural Network (CNN) performs fine-grained attack class detection on the samples predicted to be abnormal in Stage-1. The Stage-2 multiclass classification achieves a detection rate of 99.896%, F1score of 99.862%, and an MCC of 95.922%. The total training time of the two-stage model is 74.876 s. The detection time of a sample is 0.0172 milliseconds. Moreover, we set up an optional Synthetic Minority Over-sampling Technique based on the imbalance ratio (IR-SMOTE) of the dataset in Stage-2. Experimental results show that, compared with SMOTE technology, the two-stage intrusion detection model can adapt to imbalanced datasets well and reveal higher efficiency and better performance when processing large-scale flow data, outperforming state-of-the-art intrusion detection systems.
Collapse
|
70
|
Ke SW, Tsai CF, Pan YY, Lin WC. Majority re-sampling via sub-class clustering for imbalanced datasets. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2165715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Shih-Wen Ke
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Yi-Ying Pan
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Wei-Chao Lin
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Department of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| |
Collapse
|
71
|
Khan F, Yu X, Yuan Z, Rehman AU. ECG classification using 1-D convolutional deep residual neural network. PLoS One 2023; 18:e0284791. [PMID: 37098024 PMCID: PMC10128986 DOI: 10.1371/journal.pone.0284791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/07/2023] [Indexed: 04/26/2023] Open
Abstract
An electrocardiograph (ECG) is widely used in diagnosis and prediction of cardiovascular diseases (CVDs). The traditional ECG classification methods have complex signal processing phases that leads to expensive designs. This paper provides a deep learning (DL) based system that employs the convolutional neural networks (CNNs) for classification of ECG signals present in PhysioNet MIT-BIH Arrhythmia database. The proposed system implements 1-D convolutional deep residual neural network (ResNet) model that performs feature extraction by directly using the input heartbeats. We have used synthetic minority oversampling technique (SMOTE) that process class-imbalance problem in the training dataset and effectively classifies the five heartbeat types in the test dataset. The classifier's performance is evaluated with ten-fold cross validation (CV) using accuracy, precision, sensitivity, F1-score, and kappa. We have obtained an average accuracy of 98.63%, precision of 92.86%, sensitivity of 92.41%, and specificity of 99.06%. The average F1-score and Kappa obtained were 92.63% and 95.5% respectively. The study shows that proposed ResNet performs well with deep layers compared to other 1-D CNNs.
Collapse
Affiliation(s)
- Fahad Khan
- School of Automation, Northwestern Polytechnical University, Xi'an, China
- Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Pakistan
| | - Xiaojun Yu
- School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Zhaohui Yuan
- School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Atiq Ur Rehman
- Artificial Intelligence and Intelligent Systems Research Group, School of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden
- Department of Electrical and Computer Engineering, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan
| |
Collapse
|
72
|
Adarsh V, Arun Kumar P, Lavanya V, Gangadharan G. Fair and Explainable Depression Detection in Social Media. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
73
|
Wan M, Wu Q, Yan L, Guo J, Li W, Lin W, Lu S. Taxi drivers' traffic violations detection using random forest algorithm: A case study in China. TRAFFIC INJURY PREVENTION 2023; 24:362-370. [PMID: 36976788 DOI: 10.1080/15389588.2023.2191286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 01/29/2023] [Accepted: 03/12/2023] [Indexed: 05/23/2023]
Abstract
OBJECTIVE To effectively explore the impacts of several key factors on taxi drivers' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries. METHODS 43,458 electronic enforcement data about taxi drivers' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (SHAP) framework. RESULTS Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean SHAP values were 0.39, 0.36, and 0.26, respectively. CONCLUSIONS Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.
Collapse
Affiliation(s)
- Ming Wan
- School of Transportation Engineering, East China Jiaotong University, Nanchang, China
| | - Qian Wu
- School of Transportation Engineering, East China Jiaotong University, Nanchang, China
| | - Lixin Yan
- School of Transportation Engineering, East China Jiaotong University, Nanchang, China
| | - Junhua Guo
- School of Transportation Engineering, East China Jiaotong University, Nanchang, China
| | - Wenxia Li
- School of Transportation Engineering, East China Jiaotong University, Nanchang, China
| | - Wei Lin
- Traffic Administration Bureau of Nanchang Public Security Bureau, Nanchang, China
| | - Shan Lu
- Institute of Intelligence Science and Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
74
|
Shokrollahi P, Chaves JMZ, Lam JPH, Sharma A, Pal D, Bahrami N, Chaudhari AS, Loening AM. Radiology Decision Support System for Selecting Appropriate CT Imaging Titles Using Machine Learning Techniques Based on Electronic Medical Records. IEEE ACCESS 2023; 11:99222-99236. [DOI: 10.1109/access.2023.3314380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
Affiliation(s)
- Peyman Shokrollahi
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Jonathan P. H. Lam
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Avishkar Sharma
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | | | - Akshay S. Chaudhari
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Andreas M. Loening
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| |
Collapse
|
75
|
Zhao Y, Ding Y, Chekired H, Wu Y. Student adaptation to college and coping in relation to adjustment during COVID-19: A machine learning approach. PLoS One 2022; 17:e0279711. [PMID: 36584087 PMCID: PMC9803197 DOI: 10.1371/journal.pone.0279711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 12/12/2022] [Indexed: 12/31/2022] Open
Abstract
The COVID-19 pandemic has presented unprecedented challenges for university students, creating uncertainties for their academic careers, social lives, and mental health. Our study utilized a machine learning approach to examine the degree to which students' college adjustment and coping styles impacted their adjustment to COVID-19 disruptions. More specifically, we developed predictive models to distinguish between well-adjusted and not well-adjusted students in each of five psychological domains: academic adjustment, emotionality adjustment, social support adjustment, general COVID-19 regulations response, and discriminatory impact. The predictive features used for these models are students' individual characteristics in three psychological domains, i.e., Ways of Coping (WAYS), Adaptation to College (SACQ), and Perceived Stress Scale (PSS), assessed using established commercial and open-access questionnaires. We based our study on a proprietary survey dataset collected from 517 U.S. students during the initial peak of the pandemic. Our models achieved an average of 0.91 AUC score over the five domains. Using the SHAP method, we further identified the most relevant risk factors associated with each classification task. The findings reveal the relationship of students' general adaptation to college and coping in relation to their adjustment during COVID-19. Our results could help universities identify systemic and individualized strategies to support their students in coping with stress and to facilitate students' college adjustment in this era of challenges and uncertainties.
Collapse
Affiliation(s)
- Yijun Zhao
- Computer and Information Sciences Department, Fordham University, New York, New York, United States of America
- * E-mail:
| | - Yi Ding
- Graduate School of Education, Fordham University, New York, New York, United States of America
| | - Hayet Chekired
- Computer and Information Sciences Department, Fordham University, New York, New York, United States of America
| | - Ying Wu
- Graduate School of Education, Fordham University, New York, New York, United States of America
| |
Collapse
|
76
|
Yang F, Wang K, Sun L, Zhai M, Song J, Wang H. A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis. BMC Med Inform Decis Mak 2022; 22:344. [PMID: 36581862 PMCID: PMC9801640 DOI: 10.1186/s12911-022-02075-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 12/02/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Clinical diagnosis based on machine learning usually uses case samples as training samples, and uses machine learning to construct disease prediction models characterized by descriptive texts of clinical manifestations. However, the problem of sample imbalance often exists in the medical field, which leads to a decrease in classification performance of the machine learning. METHODS To solve the problem of sample imbalance in medical dataset, we propose a hybrid sampling algorithm combining synthetic minority over-sampling technique (SMOTE) and edited nearest neighbor (ENN). Firstly, the SMOTE is used to over-sampling missed abortion and diabetes datasets, so that the number of samples of the two classes is balanced. Then, ENN is used to under-sampling the over-sampled dataset to delete the "noisy sample" in the majority. Finally, Random forest is used to model and predict the sampled missed abortion and diabetes datasets to achieve an accurate clinical diagnosis. RESULTS Experimental results show that Random forest has the best classification performance on missed abortion and diabetes datasets after SMOTE-ENN sampled, and the MCC index is 95.6% and 90.0%, respectively. In addition, the results of pairwise comparison and multiple comparisons show that the SMOTE-ENN is significantly better than other sampling algorithms. CONCLUSION Random forest has significantly improved all indexes on the missed abortion dataset after SMOTE-ENN sampled.
Collapse
Affiliation(s)
- Fangyuan Yang
- grid.412097.90000 0000 8645 6375Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, 454000 Henan China
| | - Kang Wang
- Autobio Labtec Instruments Co. Ltd., Zhengzhou, 450016 Henan China
| | - Lisha Sun
- grid.412097.90000 0000 8645 6375Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, 454000 Henan China
| | - Mengjiao Zhai
- grid.412097.90000 0000 8645 6375Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, 454000 Henan China
| | - Jiejie Song
- grid.412097.90000 0000 8645 6375Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, 454000 Henan China
| | - Hong Wang
- grid.412097.90000 0000 8645 6375Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, 454000 Henan China
| |
Collapse
|
77
|
Automated imbalanced classification via layered learning. Mach Learn 2022. [DOI: 10.1007/s10994-022-06282-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
78
|
Akshay A, Abedi M, Shekarchizadeh N, Burkhard FC, Katoch M, Bigger-Allen A, Adam RM, Monastyrskaya K, Gheinani AH. MLcps: machine learning cumulative performance score for classification problems. Gigascience 2022; 12:giad108. [PMID: 38091508 PMCID: PMC10716825 DOI: 10.1093/gigascience/giad108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Assessing the performance of machine learning (ML) models requires careful consideration of the evaluation metrics used. It is often necessary to utilize multiple metrics to gain a comprehensive understanding of a trained model's performance, as each metric focuses on a specific aspect. However, comparing the scores of these individual metrics for each model to determine the best-performing model can be time-consuming and susceptible to subjective user preferences, potentially introducing bias. RESULTS We propose the Machine Learning Cumulative Performance Score (MLcps), a novel evaluation metric for classification problems. MLcps integrates several precomputed evaluation metrics into a unified score, enabling a comprehensive assessment of the trained model's strengths and weaknesses. We tested MLcps on 4 publicly available datasets, and the results demonstrate that MLcps provides a holistic evaluation of the model's robustness, ensuring a thorough understanding of its overall performance. CONCLUSIONS By utilizing MLcps, researchers and practitioners no longer need to individually examine and compare multiple metrics to identify the best-performing models. Instead, they can rely on a single MLcps value to assess the overall performance of their ML models. This streamlined evaluation process saves valuable time and effort, enhancing the efficiency of model evaluation. MLcps is available as a Python package at https://pypi.org/project/MLcps/.
Collapse
Affiliation(s)
- Akshay Akshay
- Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, 3008 Bern, Switzerland
- Graduate School for Cellular and Biomedical Sciences, University of Bern, 3012 Bern, Switzerland
| | - Masoud Abedi
- Department of Medical Data Science, Leipzig University Medical Centre, 04107 Leipzig, Germany
| | - Navid Shekarchizadeh
- Department of Medical Data Science, Leipzig University Medical Centre, 04107 Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, 04105 Leipzig, Germany
| | - Fiona C Burkhard
- Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, 3008 Bern, Switzerland
- Department of Urology, Inselspital University Hospital, 3010 Bern, Switzerland
| | - Mitali Katoch
- Institute of Neuropathology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany
| | - Alex Bigger-Allen
- Biological & Biomedical Sciences Program, Division of Medical Sciences, Harvard Medical School, 02115 Boston, MA, USA
- Urological Diseases Research Center, Boston Children's Hospital, 02115 Boston, MA, USA
- Department of Surgery, Harvard Medical School, 02115 Boston, MA, USA
- Broad Institute of MIT and Harvard, 02142 Cambridge, MA, USA
| | - Rosalyn M Adam
- Urological Diseases Research Center, Boston Children's Hospital, 02115 Boston, MA, USA
- Department of Surgery, Harvard Medical School, 02115 Boston, MA, USA
- Broad Institute of MIT and Harvard, 02142 Cambridge, MA, USA
| | - Katia Monastyrskaya
- Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, 3008 Bern, Switzerland
- Department of Urology, Inselspital University Hospital, 3010 Bern, Switzerland
| | - Ali Hashemi Gheinani
- Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, 3008 Bern, Switzerland
- Department of Urology, Inselspital University Hospital, 3010 Bern, Switzerland
- Urological Diseases Research Center, Boston Children's Hospital, 02115 Boston, MA, USA
- Department of Surgery, Harvard Medical School, 02115 Boston, MA, USA
- Broad Institute of MIT and Harvard, 02142 Cambridge, MA, USA
| |
Collapse
|
79
|
Brasil J, Maitelli C, Nascimento J, Chiavone-Filho O, Galvão E. Diagnosis of Operating Conditions of the Electrical Submersible Pump via Machine Learning. SENSORS (BASEL, SWITZERLAND) 2022; 23:279. [PMID: 36616878 PMCID: PMC9823322 DOI: 10.3390/s23010279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/14/2022] [Accepted: 10/25/2022] [Indexed: 06/17/2023]
Abstract
In wells that operate by electrical submersible pump (ESP), the use of automation tools becomes essential in the interpretation of data. However, the fact that the wells work with automated systems does not guarantee the early diagnosis of operating conditions. The analysis of amperimetric charts is one of the ways to identify fail conditions. Generally, the analysis of these histographics is performed by operators who are often overloaded, generating a decrease in the efficiency of observing the well operating conditions. Currently, technologies based on machine learning (ML) algorithms create solutions to early diagnose abnormalities in the well's operation. Thus, this work aims to provide a proposal for detecting the operating conditions of the ESP pump from electrical current data from 24 wells in the city of Mossoró, Rio Grande do Norte state, Brazil. The algorithms used were Decision Tree, Support Vector Machine, K-Nearest Neighbor and Neural Network. The algorithms were tested without and with hyperparameter tuning based on a training dataset. The results confirm that the application of the ML algorithm is feasible for classifying the operating conditions of the ESP pump, as all had an accuracy greater than 87%, with the best result being the application of the SVM model, which reached an accuracy of 93%.
Collapse
Affiliation(s)
- Jéssica Brasil
- Department of Chemical Engineering (DEQ), Federal University of Rio Grande do Norte (UFRN), Natal 59078-970, Brazil
| | - Carla Maitelli
- Department of Petroleum Engineering (DPET), Federal University of Rio Grande do Norte (UFRN), Natal 59078-970, Brazil
| | - João Nascimento
- Federal Institute of Education, Science and Technology of Rio Grande do Norte (IFRN), Parnamirim 59143-455, Brazil
| | - Osvaldo Chiavone-Filho
- Department of Chemical Engineering (DEQ), Federal University of Rio Grande do Norte (UFRN), Natal 59078-970, Brazil
| | - Edney Galvão
- Department of Petroleum Engineering (DPET), Federal University of Rio Grande do Norte (UFRN), Natal 59078-970, Brazil
| |
Collapse
|
80
|
Wang F, Dong A, Zhang K, Qian D, Tian Y. A Quantitative Assessment Grading Study of Balance Performance Based on Lower Limb Dataset. SENSORS (BASEL, SWITZERLAND) 2022; 23:33. [PMID: 36616632 PMCID: PMC9824022 DOI: 10.3390/s23010033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/07/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
Balance ability is one of the important factors in measuring human physical fitness and a common index for evaluating sports performance. Its quality directly affects the coordination ability of human movements and plays an important role in human productive activities. In the field of sports, balance ability is an important indicator of athletes' selection and training. How to objectively analyze balance performance becomes a problem for every non-professional sports enthusiast. Therefore, in this paper, we used a dataset of lower limb collected by inertial sensors to extract the feature parameters, then designed a RUS Boost classifier for unbalanced data whose basic classifier was SVM model to predict three classifications of balance degree, and, finally, evaluated the performance of the new classifier by comparing it with two basic classifiers (KNN, SVM). The result showed that the new classifier could be used to evaluate the balanced ability of lower limb, and performed higher than basic ones (RUS Boost: 72%; KNN: 60%; SVM: 44%). The results meant the established classification model could be used for and quantitative assessment of balance ability in initial screening and targeted training.
Collapse
|
81
|
Celik Y, Aslan MF, Sabanci K, Stuart S, Woo WL, Godfrey A. Improving Inertial Sensor-Based Activity Recognition in Neurological Populations. SENSORS (BASEL, SWITZERLAND) 2022; 22:9891. [PMID: 36560259 PMCID: PMC9783358 DOI: 10.3390/s22249891] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 06/17/2023]
Abstract
Inertial sensor-based human activity recognition (HAR) has a range of healthcare applications as it can indicate the overall health status or functional capabilities of people with impaired mobility. Typically, artificial intelligence models achieve high recognition accuracies when trained with rich and diverse inertial datasets. However, obtaining such datasets may not be feasible in neurological populations due to, e.g., impaired patient mobility to perform many daily activities. This study proposes a novel framework to overcome the challenge of creating rich and diverse datasets for HAR in neurological populations. The framework produces images from numerical inertial time-series data (initial state) and then artificially augments the number of produced images (enhanced state) to achieve a larger dataset. Here, we used convolutional neural network (CNN) architectures by utilizing image input. In addition, CNN enables transfer learning which enables limited datasets to benefit from models that are trained with big data. Initially, two benchmarked public datasets were used to verify the framework. Afterward, the approach was tested in limited local datasets of healthy subjects (HS), Parkinson's disease (PD) population, and stroke survivors (SS) to further investigate validity. The experimental results show that when data augmentation is applied, recognition accuracies have been increased in HS, SS, and PD by 25.6%, 21.4%, and 5.8%, respectively, compared to the no data augmentation state. In addition, data augmentation contributes to better detection of stair ascent and stair descent by 39.1% and 18.0%, respectively, in limited local datasets. Findings also suggest that CNN architectures that have a small number of deep layers can achieve high accuracy. The implication of this study has the potential to reduce the burden on participants and researchers where limited datasets are accrued.
Collapse
Affiliation(s)
- Yunus Celik
- Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - M. Fatih Aslan
- Department of Electrical and Electronics Engineering, Karamanoglu Mehmetbey University, Karaman 70100, Turkey
| | - Kadir Sabanci
- Department of Electrical and Electronics Engineering, Karamanoglu Mehmetbey University, Karaman 70100, Turkey
| | - Sam Stuart
- Department of Sport, Exercise and Rehabilitation, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Wai Lok Woo
- Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Alan Godfrey
- Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| |
Collapse
|
82
|
Raghuwanshi BS. Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem. Soft comput 2022. [DOI: 10.1007/s00500-022-07705-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
83
|
A Hybrid Risk Factor Evaluation Scheme for Metabolic Syndrome and Stage 3 Chronic Kidney Disease Based on Multiple Machine Learning Techniques. Healthcare (Basel) 2022; 10:healthcare10122496. [PMID: 36554020 PMCID: PMC9778302 DOI: 10.3390/healthcare10122496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022] Open
Abstract
With the rapid development of medicine and technology, machine learning (ML) techniques are extensively applied to medical informatics and the suboptimal health field to identify critical predictor variables and risk factors. Metabolic syndrome (MetS) and chronic kidney disease (CKD) are important risk factors for many comorbidities and complications. Existing studies that utilize different statistical or ML algorithms to perform CKD data analysis mostly analyze the early-stage subjects directly, but few studies have discussed the predictive models and important risk factors for the stage-III CKD high-risk health screening population. The middle stages 3a and 3b of CKD indicate moderate renal failure. This study aims to construct an effective hybrid important risk factor evaluation scheme for subjects with MetS and CKD stages III based on ML predictive models. The six well-known ML techniques, namely random forest (RF), logistic regression (LGR), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost), and a light gradient boosting machine (LightGBM), were used in the proposed scheme. The data were sourced from the Taiwan health examination indicators and the questionnaire responses of 71,108 members between 2005 and 2017. In total, 375 stage 3a CKD and 50 CKD stage 3b CKD patients were enrolled, and 33 different variables were used to evaluate potential risk factors. Based on the results, the top five important variables, namely BUN, SBP, Right Intraocular Pressure (R-IOP), RBCs, and T-Cho/HDL-C (C/H), were identified as significant variables for evaluating the subjects with MetS and CKD stage 3a or 3b.
Collapse
|
84
|
Sidey-Gibbons CJ, Sun C, Schneider A, Lu SC, Lu K, Wright A, Meyer L. Predicting 180-day mortality for women with ovarian cancer using machine learning and patient-reported outcome data. Sci Rep 2022; 12:21269. [PMID: 36481644 PMCID: PMC9732183 DOI: 10.1038/s41598-022-22614-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 10/17/2022] [Indexed: 12/13/2022] Open
Abstract
Contrary to national guidelines, women with ovarian cancer often receive treatment at the end of life, potentially due to the difficulty in accurately estimating prognosis. We trained machine learning algorithms to guide prognosis by predicting 180-day mortality for women with ovarian cancer using patient-reported outcomes (PRO) data. We collected data from a single academic cancer institution in the United States. Women completed biopsychosocial PRO measures every 90 days. We randomly partitioned our dataset into training and testing samples. We used synthetic minority oversampling to reduce class imbalance in the training dataset. We fitted training data to six machine learning algorithms and combined their classifications on the testing dataset into an unweighted voting ensemble. We assessed each algorithm's accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) using testing data. We recruited 245 patients who completed 1319 PRO assessments. The final voting ensemble produced state-of-the-art results on the task of predicting 180-day mortality for ovarian cancer paitents (Accuracy = 0.79, Sensitivity = 0.71, Specificity = 0.80, AUROC = 0.76). The algorithm correctly identified 25 of the 35 women in the testing dataset who died within 180 days of assessment. Machine learning algorithms trained using PRO data offer encouraging performance in predicting whether a woman with ovarian cancer will die within 180 days. This model could be used to drive data-driven end-of-life care and address current shortcomings in care delivery. Our model demonstrates the potential of biopsychosocial PROM information to make substantial contributions to oncology prediction modeling. This model could inform clinical decision-making Future research is needed to validate these findings in a larger, more diverse sample.
Collapse
Affiliation(s)
- Chris J. Sidey-Gibbons
- grid.240145.60000 0001 2291 4776Section of Patient-Centered Analytics, Department of Symptom Research, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Charlotte Sun
- grid.240145.60000 0001 2291 4776Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Amy Schneider
- grid.240145.60000 0001 2291 4776Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Sheng-Chieh Lu
- grid.240145.60000 0001 2291 4776Section of Patient-Centered Analytics, Department of Symptom Research, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Karen Lu
- grid.240145.60000 0001 2291 4776Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Alexi Wright
- grid.65499.370000 0001 2106 9910Department of Medical Oncology, Dana Farber Cancer Institute, Boston, USA ,grid.38142.3c000000041936754XDepartment of Medicine, Harvard Medical School, Boston, USA
| | - Larissa Meyer
- grid.240145.60000 0001 2291 4776Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center, Houston, USA
| |
Collapse
|
85
|
ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08004-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
|
86
|
Rangan ES, Pathinarupothi RK, Anand KJS, Snyder MP. Performance effectiveness of vital parameter combinations for early warning of sepsis-an exhaustive study using machine learning. JAMIA Open 2022; 5:ooac080. [PMID: 36267121 PMCID: PMC9566305 DOI: 10.1093/jamiaopen/ooac080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/07/2022] [Accepted: 09/20/2022] [Indexed: 11/15/2022] Open
Abstract
Objective To carry out exhaustive data-driven computations for the performance of noninvasive vital signs heart rate (HR), respiratory rate (RR), peripheral oxygen saturation (SpO2), and temperature (Temp), considered both independently and in all possible combinations, for early detection of sepsis. Materials and methods By extracting features interpretable by clinicians, we applied Gradient Boosted Decision Tree machine learning on a dataset of 2630 patients to build 240 models. Validation was performed on a geographically distinct dataset. Relative to onset, predictions were clocked as per 16 pairs of monitoring intervals and prediction times, and the outcomes were ranked. Results The combination of HR and Temp was found to be a minimal feature set yielding maximal predictability with area under receiver operating curve 0.94, sensitivity of 0.85, and specificity of 0.90. Whereas HR and RR each directly enhance prediction, the effects of SpO2 and Temp are significant only when combined with HR or RR. In benchmarking relative to standard methods Systemic Inflammatory Response Syndrome (SIRS), National Early Warning Score (NEWS), and quick-Sequential Organ Failure Assessment (qSOFA), Vital-SEP outperformed all 3 of them. Conclusion It can be concluded that using intensive care unit data even 2 vital signs are adequate to predict sepsis upto 6 h in advance with promising accuracy comparable to standard scoring methods and other sepsis predictive tools reported in literature. Vital-SEP can be used for fast-track prediction especially in limited resource hospital settings where laboratory based hematologic or biochemical assays may be unavailable, inaccurate, or entail clinically inordinate delays. A prospective study is essential to determine the clinical impact of the proposed sepsis prediction model and evaluate other outcomes such as mortality and duration of hospital stay.
Collapse
Affiliation(s)
- Ekanath Srihari Rangan
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | | | - Kanwaljeet J S Anand
- Division of Critical Care, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
87
|
Ning Z, Jiang Z, Zhang D. Sparse projection infinite selection ensemble for imbalanced classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
88
|
El Moutaouakil K, Roudani M, El Ouissari A. Optimal Entropy Genetic Fuzzy-C-Means SMOTE (OEGFCM-SMOTE). Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
89
|
Fair evaluation of classifier predictive performance based on binary confusion matrix. Comput Stat 2022. [DOI: 10.1007/s00180-022-01301-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractEvaluating the ability of a classifier to make predictions on unseen data and increasing it by tweaking the learning algorithm are two of the main reasons motivating the evaluation of classifier predictive performance. In this study the behavior of Balanced $$AC_1$$
A
C
1
— a novel classifier accuracy measure — is investigated under different class imbalance conditions via a Monte Carlo simulation. The behavior of Balanced $$AC_1$$
A
C
1
is compared against that of several well-known performance measures based on binary confusion matrix. Study results reveal the suitability of Balanced $$AC_1$$
A
C
1
with both balanced and imbalanced data sets. A real example of the effects of class imbalance on the behavior of the investigated classifier performance measures is provided by comparing the performance of several machine learning algorithms in a churn prediction problem.
Collapse
|
90
|
Tran LV, Tran HM, Le TM, Huynh TTM, Tran HT, Dao SVT. Application of Machine Learning in Epileptic Seizure Detection. Diagnostics (Basel) 2022; 12:diagnostics12112879. [PMID: 36428941 PMCID: PMC9689720 DOI: 10.3390/diagnostics12112879] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 11/09/2022] [Accepted: 11/13/2022] [Indexed: 11/22/2022] Open
Abstract
Epileptic seizure is a neurological condition caused by short and unexpectedly occurring electrical disruptions in the brain. It is estimated that roughly 60 million individuals worldwide have had an epileptic seizure. Experiencing an epileptic seizure can have serious consequences for the patient. Automatic seizure detection on electroencephalogram (EEG) recordings is essential due to the irregular and unpredictable nature of seizures. By thoroughly analyzing EEG records, neurophysiologists can discover important information and patterns, and proper and timely treatments can be provided for the patients. This research presents a novel machine learning-based approach for detecting epileptic seizures in EEG signals. A public EEG dataset from the University of Bonn was used to validate the approach. Meaningful statistical features were extracted from the original data using discrete wavelet transform analysis, then the relevant features were selected using feature selection based on the binary particle swarm optimizer. This facilitated the reduction of 75% data dimensionality and 47% computational time, which eventually sped up the classification process. After having been selected, relevant features were used to train different machine learning models, then hyperparameter optimization was utilized to further enhance the models' performance. The results achieved up to 98.4% accuracy and showed that the proposed method was very effective and practical in detecting seizure presence in EEG signals. In clinical applications, this method could help relieve the suffering of epilepsy patients and alleviate the workload of neurologists.
Collapse
Affiliation(s)
- Ly V. Tran
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Hieu M. Tran
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Tuan M. Le
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Tri T. M. Huynh
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Hung T. Tran
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Son V. T. Dao
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
- School of Science, Engineering & Technology, RMIT University Vietnam, Ho Chi Minh City 700000, Vietnam
- Correspondence: or ; Tel.: +84-98-159-1145
| |
Collapse
|
91
|
SVM ensemble training for imbalanced data classification using multi-objective optimization techniques. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04291-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
AbstractOne of the main problems with classifier training for imbalanced data is defining the correct learning criterion. On the one hand, we want the minority class to be correctly recognized, and on the other hand, we do not want to make too many mistakes in the majority class. Commonly used metrics focus either on the predictive quality of the distinguished class or propose an aggregation of simple metrics. The aggregate metrics, such as Gmean or AUC, are primarily ambiguous, i.e., they do not indicate the specific values of errors made on the minority or majority class. Additionally, improper use of aggregate metrics results in solutions selected with their help that may favor the majority class. The authors realize that a solution to this problem is using overall risk. However, this requires knowledge of the costs associated with errors made between classes, which is often unavailable. Hence, this paper will propose the semoos algorithm - an approach based on multi-objective optimization that optimizes criteria related to the prediction quality of both minority and majority classes. semoos returns a pool of non-dominated solutions from which the user can choose the model that best suits him. Automatic solution selection formulas with a so-called Pareto front have also been proposed to compare state-of-the-art methods. The proposed approach will train a svm classifier ensemble dedicated to the imbalanced data classification task. The experimental evaluations carried out on a large number of benchmark datasets confirm its usefulness.
Collapse
|
92
|
Cinaroglu S. Learning from unbalanced catastrophic out-of-pocket health expenditure dataset: blending SMOTE-boosting with ensemble models. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2143907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Songul Cinaroglu
- Hacettepe Unıversıty, Faculty of Economıcs and Admınıstratıve Scıences, Department of Health Care Management, Beytepe, Ankara, Turkey
| |
Collapse
|
93
|
Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
94
|
An approach to multi-class imbalanced problem in ecology using machine learning. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
95
|
Zhu F, Zhu Z, Zhang Y, Zhu H, Gao Z, Liu X, Zhou G, Xu Y, Shan F. Severity detection of COVID-19 infection with machine learning of clinical records and CT images. Technol Health Care 2022; 30:1299-1314. [DOI: 10.3233/thc-220321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND: Coronavirus disease 2019 (COVID-19) is a deadly viral infection spreading rapidly around the world since its outbreak in 2019. In the worst case a patient’s organ may fail leading to death. Therefore, early diagnosis is crucial to provide patients with adequate and effective treatment. OBJECTIVE: This paper aims to build machine learning prediction models to automatically diagnose COVID-19 severity with clinical and computed tomography (CT) radiomics features. METHOD: P-V-Net was used to segment the lung parenchyma and then radiomics was used to extract CT radiomics features from the segmented lung parenchyma regions. Over-sampling, under-sampling, and a combination of over- and under-sampling methods were used to solve the data imbalance problem. RandomForest was used to screen out the optimal number of features. Eight different machine learning classification algorithms were used to analyze the data. RESULTS: The experimental results showed that the COVID-19 mild-severe prediction model trained with clinical and CT radiomics features had the best prediction results. The accuracy of the GBDT classifier was 0.931, the ROUAUC 0.942, and the AUCPRC 0.694, which indicated it was better than other classifiers. CONCLUSION: This study can help clinicians identify patients at risk of severe COVID-19 deterioration early on and provide some treatment for these patients as soon as possible. It can also assist physicians in prognostic efficacy assessment and decision making.
Collapse
Affiliation(s)
- Fubao Zhu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
| | - Zelin Zhu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
| | - Yijun Zhang
- Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Hanlei Zhu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
| | - Zhengyuan Gao
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
| | - Xiaoman Liu
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan, China
| | - Guanbin Zhou
- People’s Hospital of Yicheng City, Yicheng, Hubei, China
| | - Yan Xu
- Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Fei Shan
- Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| |
Collapse
|
96
|
Xing M, Yao F, Zhang J, Meng X, Jiang L, Bao Y. Data reconstruction of daily MODIS chlorophyll-a concentration and spatio-temporal variations in the Northwestern Pacific. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 843:156981. [PMID: 35764151 DOI: 10.1016/j.scitotenv.2022.156981] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
Sea surface chlorophyll-a concentration (Chl-a) is a key proxy for phytoplankton biomass. Spatio-temporal continuous Chl-a data are important to understand the mechanisms of chlorophyll occurrence and development and track phytoplankton changes. However, the greatest challenge in utilizing daily Chl-a data is massive missing pixels due to orbital position and cloud coverage. This study proposes the application of a spatial filling method using the machine learning-based Extreme Gradient Boosting (BST) to reconstruct missing pixels of daily MODIS Chl-a data from 2007 to 2018. The approach is applied to different trophic biogeographical subregions of the Northwestern Pacific where it has complex phytoplankton dynamics and frequent data missing. Various environmental variables are taken into consideration, including meteorological forcing, geographic and topographic features, and oceanic physical components. The BST-reconstructed Chl-a (BST Chl-a) is validated using in-situ Chl-a measurements, VIIRS and Himawari-8 Chl-a products. The results show that the BST model is highly adaptive in reconstructing Chl-a data, and it performs well in pelagic, offshore and coastal with the best performance in pelagic. BST Chl-a improves coverage without significant quality degradation compared to the original MODIS Chl-a. BST Chl-a agrees better with in-situ data than that of MODIS, with CC of 0.742, RMSE of 0.247, MAE of 0.202 and Bias of 0.089. Cross-satellite validation using VIIRS and Himawari-8 Chl-a also shows promising results with the CC of 0.861 and 0.765, respectively, suggesting the high accuracy of BST Chl-a. The inter-annual trend of BST Chl-a decreases in coastal and increases in offshore and pelagic. BST Chl-a images present similar spatial patterns to MODIS Chl-a under different missing rates, with gradual decreases from coastal to pelagic. It indicates that phytoplankton bloom patterns can be identified by daily BST Chl-a images.
Collapse
Affiliation(s)
- Mingming Xing
- College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China; The Key Laboratory of Earth Observation of Hainan Province, Hainan Aerospace Information Research Institute, Sanya, China.
| | - Fengmei Yao
- College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China; The Key Laboratory of Computational Geodynamics, Chinese Academy of Sciences, Beijing, China.
| | - Jiahua Zhang
- The Key Laboratory of Earth Observation of Hainan Province, Hainan Aerospace Information Research Institute, Sanya, China; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.
| | - Xianglei Meng
- College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China.
| | - Lijun Jiang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.
| | - Yilin Bao
- College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
97
|
Röhr V, Blankertz B, Radtke FM, Spies C, Koch S. Machine-learning model predicting postoperative delirium in older patients using intraoperative frontal electroencephalographic signatures. Front Aging Neurosci 2022; 14:911088. [PMID: 36313029 PMCID: PMC9614270 DOI: 10.3389/fnagi.2022.911088] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 09/27/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectiveIn older patients receiving general anesthesia, postoperative delirium (POD) is the most frequent form of cerebral dysfunction. Early identification of patients at higher risk to develop POD could provide the opportunity to adapt intraoperative and postoperative therapy. We, therefore, propose a machine learning approach to predict the risk of POD in elderly patients, using routine intraoperative electroencephalography (EEG) and clinical data that are readily available in the operating room.MethodsWe conducted a retrospective analysis of the data of a single-center study at the Charité-Universitätsmedizin Berlin, Department of Anesthesiology [ISRCTN 36437985], including 1,277 patients, older than 60 years with planned surgery and general anesthesia. To deal with the class imbalance, we used balanced ensemble methods, specifically Bagging and Random Forests and as a performance measure, the area under the ROC curve (AUC-ROC). We trained our models including basic clinical parameters and intraoperative EEG features in particular classical spectral and burst suppression signatures as well as multi-band covariance matrices, which were classified, taking advantage of the geometry of a Riemannian manifold. The models were validated with 10 repeats of a 10-fold cross-validation.ResultsIncluding EEG data in the classification resulted in a robust and reliable risk evaluation for POD. The clinical parameters alone achieved an AUC-ROC score of 0.75. Including EEG signatures improved the classification when the patients were grouped by anesthetic agents and evaluated separately for each group. The spectral features alone showed an AUC-ROC score of 0.66; the covariance features showed an AUC-ROC score of 0.68. The AUC-ROC scores of EEG features relative to patient data differed by anesthetic group. The best performance was reached, combining both the EEG features and the clinical parameters. Overall, the AUC-ROC score was 0.77, for patients receiving Propofol it was 0.78, for those receiving Sevoflurane it was 0.8 and for those receiving Desflurane 0.73. Applying the trained prediction model to an independent data set of a different clinical study confirmed these results for the combined classification, while the classifier on clinical parameters alone did not generalize.ConclusionA machine learning approach combining intraoperative frontal EEG signatures with clinical parameters could be an easily applicable tool to early identify patients at risk to develop POD.
Collapse
Affiliation(s)
- Vera Röhr
- Neurotechnology Group, Technische Universität Berlin, Berlin, Germany
- *Correspondence: Vera Röhr
| | | | - Finn M. Radtke
- Department of Anaesthesia, Hospital of Nykobing, University of Southern Denmark, Odense, Denmark
| | - Claudia Spies
- Department of Anaesthesiology and Operative Intensive Care Medicine, Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Susanne Koch
- Department of Anaesthesiology and Operative Intensive Care Medicine, Charité—Universitätsmedizin Berlin, Berlin, Germany
- Susanne Koch
| |
Collapse
|
98
|
Hajek P, Abedin MZ, Sivarajah U. Fraud Detection in Mobile Payment Systems using an XGBoost-based Framework. INFORMATION SYSTEMS FRONTIERS : A JOURNAL OF RESEARCH AND INNOVATION 2022; 25:1-19. [PMID: 36258679 PMCID: PMC9560719 DOI: 10.1007/s10796-022-10346-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/23/2022] [Indexed: 06/16/2023]
Abstract
Mobile payment systems are becoming more popular due to the increase in the number of smartphones, which, in turn, attracts the interest of fraudsters. Extant research has therefore developed various fraud detection methods using supervised machine learning. However, sufficient labeled data are rarely available and their detection performance is negatively affected by the extreme class imbalance in financial fraud data. The purpose of this study is to propose an XGBoost-based fraud detection framework while considering the financial consequences of fraud detection systems. The framework was empirically validated on a large dataset of more than 6 million mobile transactions. To demonstrate the effectiveness of the proposed framework, we conducted a comparative evaluation of existing machine learning methods designed for modeling imbalanced data and outlier detection. The results suggest that in terms of standard classification measures, the proposed semi-supervised ensemble model integrating multiple unsupervised outlier detection algorithms and an XGBoost classifier achieves the best results, while the highest cost savings can be achieved by combining random under-sampling and XGBoost methods. This study has therefore financial implications for organizations to make appropriate decisions regarding the implementation of effective fraud detection systems.
Collapse
Affiliation(s)
- Petr Hajek
- Science and Research Centre, Faculty of Economics and Administration, University of Pardubice, Studentska 84, Pardubice, 532 10 Czech Republic
| | - Mohammad Zoynul Abedin
- Department of Finance, Performance & Marketing, Teesside University International Business School, Teesside University, Middlesbrough, TS1 3BX Tees Valley UK
| | | |
Collapse
|
99
|
Singh A, Jain A. An efficient credit card fraud detection approach using cost‐sensitive weak learner with imbalanced dataset. Comput Intell 2022. [DOI: 10.1111/coin.12555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ajeet Singh
- University School of Information, Communication & Technology Guru Gobind Singh Indraprastha University Delhi India
| | - Anurag Jain
- University School of Information, Communication & Technology Guru Gobind Singh Indraprastha University Delhi India
| |
Collapse
|
100
|
Santhiappan S, Chelladurai J, Ravindran B. TOMBoost: a topic modeling based boosting approach for learning with class imbalance. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00363-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|