1
|
Li L, Guo D, Shi C, Zheng Y. The predictive role of sedentary behavior and physical activity on adolescent depressive symptoms: A machine learning approach. J Affect Disord 2025; 378:81-89. [PMID: 40015649 DOI: 10.1016/j.jad.2025.02.085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/07/2025] [Accepted: 02/24/2025] [Indexed: 03/01/2025]
Abstract
OBJECTIVE This study aims to investigate the predictive value of sedentary behavior and physical activity in adolescent depressive symptoms. METHODS A total of 2419 adolescent students (grades 7-12) from six administrative regions in China were surveyed. Measures included the Physical Activity Rating Scale for Children (PARS-3), a self-designed questionnaire assessing sedentary behavior among Chinese children and adolescents, and the Children's Depression Inventory (CDI). Machine learning models were trained and tested to predict depressive symptoms based on different types of sedentary behavior, physical activity, and other key variables. RESULTS The trained random forest model demonstrated high predictive accuracy (ACC = 90.52 %), with a precision of 92.01 %, recall of 87.95 %, and an F1 score of 0.90. Key predictors of depressive symptoms included sedentary behaviors such as multimedia learning, watching TV, classroom learning, and playing video games. Physical activity also emerged as a significant factor in predicting adolescent depressive symptoms. CONCLUSIONS The machine learning-based predictive model exhibited strong performance, suggesting that sedentary behavior and physical activity data can effectively predict depression symptoms in Chinese adolescents.
Collapse
Affiliation(s)
- Lin Li
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China.
| | - Dongxi Guo
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| | - Chengchao Shi
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| | - Yifan Zheng
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| |
Collapse
|
2
|
Kiobia DO, Mwitta CJ, Ngimbwa PC, Schmidt JM, Lu G, Rains GC. Machine-learning approach facilitates prediction of whitefly spatiotemporal dynamics in a plant canopy. JOURNAL OF ECONOMIC ENTOMOLOGY 2025; 118:732-745. [PMID: 40036620 PMCID: PMC12034313 DOI: 10.1093/jee/toaf035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/11/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025]
Abstract
Plant-specific insect scouting and prediction are still challenging in most crop systems. In this article, a machine-learning algorithm is proposed to predict populations during whiteflies (Bemisia tabaci, Hemiptera; Gennadius Aleyrodidae) scouting and aid in determining the population distribution of adult whiteflies in cotton plant canopies. The study investigated the main location of adult whiteflies relative to plant nodes (stem points where leaves or branches emerge), population variation within and between canopies, whitefly density variability across fields, the impact of dense nodes on overall canopy populations, and the feasibility of using machine learning for prediction. Daily scouting was conducted on 64 non-pesticide cotton plants, focusing on all leaves of a node with the highest whitefly counts. A linear mixed-effect model assessed distribution over time, and machine-learning model selection identified a suitable forecasting model for the entire canopy whitefly population. Findings showed that the top 3 to 5 nodes are key habitats, with a single node potentially accounting for 44.4% of the full canopy whitefly population. The Bagging Ensemble Artificial Neural Network Regression model accurately predicted canopy populations (R² = 85.57), with consistency between actual and predicted counts (P-value > 0.05). Strategic sampling of the top nodes could estimate overall plant populations when taking a few samples or transects across a field. The suggested machine-learning model could be integrated into computing devices and automated sensors to predict real-time whitefly population density within the entire plant canopy during scouting operations.
Collapse
Affiliation(s)
- Denis O Kiobia
- College of Engineering, University of Georgia, Tifton, GA, USA
| | | | - Peter C Ngimbwa
- College of Engineering, University of Georgia, Tifton, GA, USA
| | - Jason M Schmidt
- Department of Entomology, University of Georgia, Tifton, GA, USA
| | - Guoyu Lu
- College of Engineering, University of Georgia, Tifton, GA, USA
| | - Glen C Rains
- College of Engineering, University of Georgia, Tifton, GA, USA
| |
Collapse
|
3
|
Garcia-Lopez YJ, Marquez PH, Morales NN. Microfinance institutions failure prediction in emerging countries, a machine learning approach. PLoS One 2025; 20:e0321989. [PMID: 40273124 PMCID: PMC12021153 DOI: 10.1371/journal.pone.0321989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 03/14/2025] [Indexed: 04/26/2025] Open
Abstract
This study is about what matters: predicting when microfinance institutions might fail, especially in places where financial stability is closely linked to economic inclusion. The challenge? Creating something practical and usable. The Adjusted Gross Granular Model (ARGM) model comes here. It combines clever techniques, such as granular computing and machine learning, to handle messy and imbalanced data, ensuring that the model is not just a theoretical concept but a practical tool that can be used in the real world.Data from 56 financial institutions in Peru was analyzed over almost a decade (2014-2023). The results were quite promising. The model detected risks with nearly 90% accuracy in detecting failures and was right more than 95% of the time in identifying safe institutions. But what does this mean in practice? It was tested and flagged six institutions (20% of the total) as high risk. This tool's impact on emerging markets would be very significant. Financial regulators could act in advance with this model, potentially preventing financial disasters. This is not just a theoretical exercise but a practical solution to a pressing problem in these markets, where every failure has domino effects on small businesses and clients in local communities, who may see their life savings affected and lost due to the failure of these institutions. Ultimately, this research is not just about a machine learning model or using statistics to evaluate results. It is about giving regulators and supervisors of financial institutions a tool they can rely on to help them take action before it is too late when microfinance institutions get into bad financial shape and to make immediate decisions in the event of a possible collapse.
Collapse
Affiliation(s)
- Yvan J. Garcia-Lopez
- CENTRUM Católica Graduate Business School (CCGBS), Lima, Peru
- Pontificia Universidad Católica del Perú (PUCP), Lima, Peru
| | - Patricia Henostroza Marquez
- CENTRUM Católica Graduate Business School (CCGBS), Lima, Peru
- Pontificia Universidad Católica del Perú (PUCP), Lima, Peru
| | - Nicolas Nuñez Morales
- CENTRUM Católica Graduate Business School (CCGBS), Lima, Peru
- Pontificia Universidad Católica del Perú (PUCP), Lima, Peru
| |
Collapse
|
4
|
Yaseen ZM, Alhalimi FL. Heavy metal adsorption efficiency prediction using biochar properties: a comparative analysis for ensemble machine learning models. Sci Rep 2025; 15:13434. [PMID: 40251173 PMCID: PMC12008194 DOI: 10.1038/s41598-025-96271-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2025] [Accepted: 03/27/2025] [Indexed: 04/20/2025] Open
Abstract
The contamination of water and soils with heavy metals poses a significant environmental threat, making the development of effective removal strategies a global priority. Hence, the determination of heavy metals can play an essential role in environmental monitoring and assessment. In the current research, ensemble machine learning (ML) models (i.e., Random Forest Regressor (RFR), Adaptive Boosting (Adaboost), Gradient Boosting (GB), HistGradientBoosting, Extreme Gradient Boosting (XGBoost), and Light Gradient-Boosting Machine (LightGBM)) were applied in attempt to predict the adsorption efficiency of several heavy metals (i.e., Pb, Cd, Ni, Cu, and Zn) according to different factors including temperature, pH, and biochar characteristics. Data were collected from open-source literature review including 353 samples. At the first stage, data processing was performed including outliers' removal and scaling for better data modeling applicability; whereas, in the second stage the predictive models were conducted. The results showed that XGBoost model attained the superior accuracy in comparison with other models by achieving the highest determination coefficient (R2 = 0.92). The research was extended to investigate the feature importance analysis which indicated that the initial concentration ratio of metals to biochar and pH were the most influential factors toward the adsorption efficiency followed by Pyrolysis temperature, while other features like physical properties as surface area and pore structure had a minimal effect on efficiency. These findings highlighted the importance of using ensemble ML models in guiding heavy metals removal solutions as it provides an efficient prediction and ease the selection of the environmental application.
Collapse
Affiliation(s)
- Zaher Mundher Yaseen
- Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia.
| | - Farah Loui Alhalimi
- Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia
| |
Collapse
|
5
|
Hachamnia AH, Mehri A, Jamaati M. Integrating neuroscience and artificial intelligence: EEG analysis using ensemble learning for diagnosis Alzheimer's disease and frontotemporal dementia. J Neurosci Methods 2025; 416:110377. [PMID: 39894256 DOI: 10.1016/j.jneumeth.2025.110377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Revised: 01/14/2025] [Accepted: 01/23/2025] [Indexed: 02/04/2025]
Abstract
BACKGROUND Alzheimer's disease (AD) and frontotemporal dementia (FTD) are both progressive neurological disorders that affect the elderly. Distinguishing between individuals suffering from these two diseases in the early stages can be quite challenging, and due to their different treatments, it has become an important problem. Machine learning (ML) algorithms can be helpful in this matter due to their high ability to manage large data and deliver high-quality diagnostic results. NEW METHOD In this research, we integrate multiple ML algorithms into 10 ensemble learning techniques, utilizing 7 distinct features: 3 from the time domain and 4 from the frequency domain. RESULTS They are used to achieve a higher diagnostic accuracy level in binary and multiclass classification of samples from electroencephalography (EEG) signals of elderly patients with AD, FTD, and healthy age-matching controls (CN), during the eye resting state. COMPARISON WITH EXISTING METHODS The best results in carrying out binary AD/CN, FTD/CN, and AD/FTD classifications with significant accuracy>95% have been obtained with the help of the light gradient boosting machine (LGBM) method applying the wavelet transform feature. CONCLUSION This combination (LGBM&wavelet) also displays the best performance in the AD/FTD/CN multiclass classification process with accuracy>93%.
Collapse
Affiliation(s)
- Amir Hossein Hachamnia
- Department of Physics, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| | - Ali Mehri
- Department of Physics, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran.
| | - Maryam Jamaati
- Faculty of Computer Engineering, Iranian eUniversity, Tehran, Iran
| |
Collapse
|
6
|
Wang Y, Chen S, Liu J, Zhang B, Zhu Z, Zou X, Zhou Y, Niu B. Unveiling sex difference in factors associated with suicide attempt among Chinese adolescents with depression: a machine learning-based study. J Ment Health 2025:1-11. [PMID: 40111411 DOI: 10.1080/09638237.2025.2478374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 02/20/2025] [Accepted: 02/24/2025] [Indexed: 03/22/2025]
Abstract
BACKGROUND Adolescents with depression are at heightened risk of suicide, with a distinct sex difference in suicidal behaviour observed. This study explores the sex-specific factors influencing suicide attempts among Chinese adolescents with depression. METHODS Data were collected from 2343 depressed adolescents across 14 hospitals in 9 provinces through self-report questionnaires. The survey was conducted between December 2020 and December 2023. Thirty-six potential risk factors were selected from validated measures of psychological, sociodemographic, and social stress domains. The dataset was split by sex, and SMOTE was applied to address class imbalance. Logistic regression, elastic net regression, random forest, XGBoost, and neural networks were used to model the data, evaluated by accuracy, precision, recall, and F1 score. The optimal model was employed for SHapley Additive exPlanations (SHAP) analysis to identify key factors influencing suicide attempts. RESULTS The Random Forest model exhibited the best performance for both sexes (AUC: females 0.720, males 0.736). Non-suicidal self-injury and depression were significant predictors for both sexes. Among females, factors like difficulty identifying emotions and physical abuse had a stronger impact, while resilience and hopelessness were more predictive for males. CONCLUSIONS The study highlights sex differences in suicide attempt predictors, emphasizing the need for sex-specific prevention strategies.
Collapse
Affiliation(s)
- Yang Wang
- College of Management, Shenzhen University, Shenzhen, China
| | - Siyu Chen
- College of Management, Shenzhen University, Shenzhen, China
| | - Jiayao Liu
- College of Management, Shenzhen University, Shenzhen, China
| | - Bowen Zhang
- College of Management, Shenzhen University, Shenzhen, China
| | - Zhenzhen Zhu
- Shenzhen Health Development Research and Data Management Center, Shenzhen, China
| | - Xinwen Zou
- School of Business Informatics and Mathematics, University of Mannheim, Mannheim, Germany
| | | | - Ben Niu
- College of Management, Shenzhen University, Shenzhen, China
| |
Collapse
|
7
|
Vlontzou ME, Athanasiou M, Dalakleidi KV, Skampardoni I, Davatzikos C, Nikita K. A comprehensive interpretable machine learning framework for mild cognitive impairment and Alzheimer's disease diagnosis. Sci Rep 2025; 15:8410. [PMID: 40069342 PMCID: PMC11897299 DOI: 10.1038/s41598-025-92577-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 02/28/2025] [Indexed: 03/15/2025] Open
Abstract
An interpretable machine learning (ML) framework is introduced to enhance the diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) by ensuring robustness of the ML models' interpretations. The dataset used comprises volumetric measurements from brain MRI and genetic data from healthy individuals and patients with MCI/AD, obtained through the Alzheimer's Disease Neuroimaging Initiative. The existing class imbalance is addressed by an ensemble learning approach, while various attribution-based and counterfactual-based interpretability methods are leveraged towards producing diverse explanations related to the pathophysiology of MCI/AD. A unification method combining SHAP with counterfactual explanations assesses the interpretability techniques' robustness. The best performing model yielded 87.5% balanced accuracy and 90.8% F1-score. The attribution-based interpretability methods highlighted significant volumetric and genetic features related to MCI/AD risk. The unification method provided useful insights regarding those features' necessity and sufficiency, further showcasing their significance in MCI/AD diagnosis.
Collapse
Affiliation(s)
- Maria Eleftheria Vlontzou
- Faculty of Electrical and Computer Engineering, National Technical University of Athens, Athens, 15773, Greece.
| | - Maria Athanasiou
- Faculty of Electrical and Computer Engineering, National Technical University of Athens, Athens, 15773, Greece
| | - Kalliopi V Dalakleidi
- Faculty of Electrical and Computer Engineering, National Technical University of Athens, Athens, 15773, Greece
| | - Ioanna Skampardoni
- Faculty of Electrical and Computer Engineering, National Technical University of Athens, Athens, 15773, Greece
| | - Christos Davatzikos
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA
- Department ofRadiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Konstantina Nikita
- Faculty of Electrical and Computer Engineering, National Technical University of Athens, Athens, 15773, Greece
| |
Collapse
|
8
|
Gu Q, Patel A, Hanna MG, Lennerz JK, Garcia C, Zarella M, McClintock D, Hart SN. Bridging the Clinical-Computational Transparency Gap in Digital Pathology. Arch Pathol Lab Med 2025; 149:276-287. [PMID: 38871349 DOI: 10.5858/arpa.2023-0250-ra] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2024] [Indexed: 06/15/2024]
Abstract
CONTEXT.— Computational pathology combines clinical pathology with computational analysis, aiming to enhance diagnostic capabilities and improve clinical productivity. However, communication barriers between pathologists and developers often hinder the full realization of this potential. OBJECTIVE.— To propose a standardized framework that improves mutual understanding of clinical objectives and computational methodologies. The goal is to enhance the development and application of computer-aided diagnostic (CAD) tools. DESIGN.— This article suggests pivotal roles for pathologists and computer scientists in the CAD development process. It calls for increased understanding of computational terminologies, processes, and limitations among pathologists. Similarly, it argues that computer scientists should better comprehend the true use cases of the developed algorithms to avoid clinically meaningless metrics. RESULTS.— CAD tools improve pathology practice significantly. Some tools have even received US Food and Drug Administration approval. However, improved understanding of machine learning models among pathologists is essential to prevent misuse and misinterpretation. There is also a need for a more accurate representation of the algorithms' performance compared to that of pathologists. CONCLUSIONS.— A comprehensive understanding of computational and clinical paradigms is crucial for overcoming the translational gap in computational pathology. This mutual comprehension will improve patient care through more accurate and efficient disease diagnosis.
Collapse
Affiliation(s)
- Qiangqiang Gu
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| | - Ankush Patel
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| | - Matthew G Hanna
- the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna)
| | - Jochen K Lennerz
- the Center for Integrated Diagnostics, Massachusetts General Hospital/Harvard Medical School, Boston (Lennerz)
| | - Chris Garcia
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| | - Mark Zarella
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| | - David McClintock
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| | - Steven N Hart
- From the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Gu, Patel, Garcia, Zarella, McClintock, Hart)
| |
Collapse
|
9
|
Huang MW, Tsai CF, Lin WC, Lin JY. Interaction effect between data discretization and data resampling for class-imbalanced medical datasets. Technol Health Care 2025; 33:1000-1013. [PMID: 40105161 DOI: 10.1177/09287329241295874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
BackgroundData discretization is an important preprocessing step in data mining for the transfer of continuous feature values to discrete ones, which allows some specific data mining algorithms to construct more effective models and facilitates the data mining process. Because many medical domain datasets are class imbalanced, data resampling methods, including oversampling, undersampling, and hybrid sampling methods, have been widely applied to rebalance the training set, facilitating effective differentiation between majority and minority classes.ObjectiveHerein, we examine the effect of incorporating both data discretization and data resampling as steps in the analytical process on the classifier performance for class-imbalanced medical datasets. The order in which these two steps are carried out is compared in the experiments.MethodsTwo experimental studies were conducted, one based on 11 two-class imbalanced medical datasets and the other using 3 multiclass imbalanced medical datasets. In addition, the two discretization algorithms employed are ChiMerge and minimum description length principle (MDLP). On the other hand, the data resampling algorithms chosen for performance comparison are Tomek links undersampling, synthetic minority oversampling technique (SMOTE) oversampling, and SMOTE-Tomek hybrid sampling algorithms. Moreover, the support vector machine (SVM), C4.5 decision tree, and random forest (RF) techniques were used to examine the classification performances of the different approaches.ResultsThe results show that on average, the combination approaches can allow the classifiers to provide higher area under the ROC curve (AUC) rates than the best baseline approach at approximately 0.8%-3.5% and 0.9%-2.5% for twoclass and multiclass imbalanced medical datasets, respectively. Particularly, the optimal results for two-class imbalanced datasets are obtained by performing the MDLP method first for data discretization and SMOTE second for oversampling, providing the highest AUC rate and requiring the least computational cost. For multiclass imbalanced datasets, performing SMOTE or SMOTE-Tomek first for data resampling and ChiMerge second for data discretization offers the best performances.ConclusionsClassifiers with oversampling can provide better performances than the baseline method without oversampling. In contrast, performing data discretization does not necessarily make the classifiers outperform the baselines. On average, the combination approaches have potential to allow the classifiers to provide higher AUC rates than the best baseline approach.
Collapse
Affiliation(s)
- Min-Wei Huang
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung
- Department of Physical Therapy and Graduate Institute of Rehabilitation Science, China Medical University, Taichung
- School of Medicine, College of Medicine, National Sun Yat-sen University, Kaohsiung
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan
| | - Wei-Chao Lin
- Department of Information Management, Chang Gung University, Taoyuan
- Department of Digital Financial Technology, Chang Gung University, Taoyuan
- Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan
| | - Jia-Yang Lin
- Department of Information Management, National Central University, Taoyuan
| |
Collapse
|
10
|
Salehi A, Khedmati M. Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification. Sci Rep 2025; 15:3460. [PMID: 39870706 PMCID: PMC11772689 DOI: 10.1038/s41598-024-84786-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 12/27/2024] [Indexed: 01/29/2025] Open
Abstract
Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.
Collapse
Affiliation(s)
- Amirreza Salehi
- Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran
| | - Majid Khedmati
- Department of Industrial Engineering, Sharif University of Technology, Azadi Ave., Tehran, 1458889694, Iran.
| |
Collapse
|
11
|
Sobral PS, Carvalho T, Izadi S, Castilho A, Silva Z, Videira PA, Pereira F. Advancements in drug discovery: integrating CADD tools and drug repurposing for PD-1/PD-L1 axis inhibition. RSC Adv 2025; 15:2298-2316. [PMID: 39867321 PMCID: PMC11755407 DOI: 10.1039/d4ra08245a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 01/13/2025] [Indexed: 01/28/2025] Open
Abstract
Despite significant strides in improving cancer survival rates, the global cancer burden remains substantial, with an anticipated rise in new cases. Immune checkpoints, key regulators of immune responses, play a crucial role in cancer evasion mechanisms. The discovery of immune checkpoint inhibitors (ICIs) targeting PD-1/PD-L1 has revolutionized cancer treatment, with monoclonal antibodies (mAbs) becoming widely prescribed. However, challenges with current mAb ICIs, such as limited oral bioavailability, adverse effects, and high costs, underscore the need to explore alternative small-molecule inhibitors. In this work, we aimed to identify new potential ICI among all FDA-approved drugs. We employed QSAR models to predict PD-1/PD-L1 inhibition, utilizing a diverse dataset of 29 197 molecules sourced from ChEMBL, PubChem, and recent literature. Machine learning techniques, including Random Forest, Support Vector Machine, and Convolutional Neural Network, were employed for benchmarking to assess model performance. Additionally, we undertook a drug repurposing strategy, leveraging the best in silico model for a virtual screening campaign involving 1576 off-patent approved drugs. Only two virtual screening hits were proposed based on the criteria established for this approach, including: (1) QSAR probability of being active against PD-L1; (2) QSAR applicability domain; (3) prediction of the affinity between the PD-L1 and ligands through molecular docking. One of the proposed hits was sonidegib, an anticancer drug, featuring a biphenyl system. Sonidegib was subsequently validated for in vitro PD-1/PD-L1 binding modulation using ELISA and flow cytometry. This integrated approach, which combines computer-aided drug design (CADD) tools, QSAR modelling, drug repurposing, and molecular docking, offers a pioneering strategy to expedite drug discovery for PD-1/PD-L1 axis inhibition. The findings underscore the potential to identify a wider range small molecules to contribute to the ongoing efforts to advancing cancer immunotherapy.
Collapse
Affiliation(s)
- Patrícia S Sobral
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
| | - Tiago Carvalho
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
- CDG & Allies - Professionals and Patient Associations International Network (CDG & Allies - PPAIN), Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
| | - Shiva Izadi
- University of Natural Resources and Life Sciences, Department of Applied Genetics and Cell Biology Vienna Austria
| | - Alexandra Castilho
- University of Natural Resources and Life Sciences, Department of Applied Genetics and Cell Biology Vienna Austria
| | - Zélia Silva
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
- CDG & Allies - Professionals and Patient Associations International Network (CDG & Allies - PPAIN), Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
| | - Paula A Videira
- UCIBIO, Departamento Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
- CDG & Allies - Professionals and Patient Associations International Network (CDG & Allies - PPAIN), Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa 2829-516 Caparica Portugal
| | - Florbela Pereira
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Caparica Portugal
| |
Collapse
|
12
|
Wang CW, Liu TC, Lai PJ, Muzakky H, Wang YC, Yu MH, Wu CH, Chao TK. Ensemble transformer-based multiple instance learning to predict pathological subtypes and tumor mutational burden from histopathological whole slide images of endometrial and colorectal cancer. Med Image Anal 2025; 99:103372. [PMID: 39461079 DOI: 10.1016/j.media.2024.103372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/30/2024] [Accepted: 10/09/2024] [Indexed: 10/29/2024]
Abstract
In endometrial cancer (EC) and colorectal cancer (CRC), in addition to microsatellite instability, tumor mutational burden (TMB) has gradually gained attention as a genomic biomarker that can be used clinically to determine which patients may benefit from immune checkpoint inhibitors. High TMB is characterized by a large number of mutated genes, which encode aberrant tumor neoantigens, and implies a better response to immunotherapy. Hence, a part of EC and CRC patients associated with high TMB may have higher chances to receive immunotherapy. TMB measurement was mainly evaluated by whole-exome sequencing or next-generation sequencing, which was costly and difficult to be widely applied in all clinical cases. Therefore, an effective, efficient, low-cost and easily accessible tool is urgently needed to distinguish the TMB status of EC and CRC patients. In this study, we present a deep learning framework, namely Ensemble Transformer-based Multiple Instance Learning with Self-Supervised Learning Vision Transformer feature encoder (ETMIL-SSLViT), to predict pathological subtype and TMB status directly from the H&E stained whole slide images (WSIs) in EC and CRC patients, which is helpful for both pathological classification and cancer treatment planning. Our framework was evaluated on two different cancer cohorts, including an EC cohort with 918 histopathology WSIs from 529 patients and a CRC cohort with 1495 WSIs from 594 patients from The Cancer Genome Atlas. The experimental results show that the proposed methods achieved excellent performance and outperforming seven state-of-the-art (SOTA) methods in cancer subtype classification and TMB prediction on both cancer datasets. Fisher's exact test further validated that the associations between the predictions of the proposed models and the actual cancer subtype or TMB status are both extremely strong (p<0.001). These promising findings show the potential of our proposed methods to guide personalized treatment decisions by accurately predicting the EC and CRC subtype and the TMB status for effective immunotherapy planning for EC and CRC patients.
Collapse
Affiliation(s)
- Ching-Wei Wang
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 10607, Taiwan
| | - Tzu-Chien Liu
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 10607, Taiwan
| | - Po-Jen Lai
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 10607, Taiwan
| | - Hikam Muzakky
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 10607, Taiwan
| | - Yu-Chi Wang
- Department of Gynecology and Obstetrics, Tri-Service General Hospital, Taipei, 114202, Taiwan; Department of Gynecology and Obstetrics, National Defense Medical Center, Taipei, 11490, Taiwan
| | - Mu-Hsien Yu
- Department of Gynecology and Obstetrics, Tri-Service General Hospital, Taipei, 114202, Taiwan; Department of Gynecology and Obstetrics, National Defense Medical Center, Taipei, 11490, Taiwan
| | - Chia-Hua Wu
- Department of Pathology, Tri-Service General Hospital, Taipei, 114202, Taiwan
| | - Tai-Kuang Chao
- Department of Pathology, Tri-Service General Hospital, Taipei, 114202, Taiwan; Institute of Pathology and Parasitology, National Defense Medical Center, Taipei, 11490, Taiwan.
| |
Collapse
|
13
|
Akbari F. Prediction of electron-solid interaction parameters using machine learning. Med Phys 2025; 52:652-661. [PMID: 39395202 PMCID: PMC11699995 DOI: 10.1002/mp.17445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 04/14/2024] [Accepted: 09/18/2024] [Indexed: 10/14/2024] Open
Abstract
BACKGROUND Electron backscattering coefficient and electron-stopping power are essential concepts in many disciplines, from radiation to materials science, semiconductor manufacturing, and space exploration. They enable precise calculations, measurements, and simulations of electron interactions with matter, which contribute to advancing science, technology, and safety in a variety of applications. The availability of these data is fundamental to scientific research to validate hypotheses, conduct experiments, and explore new theories. A relatively novel machine learning approach has demonstrated notable success in enhancing data quality and completeness, significantly contributing to the facilitation of data discovery. PURPOSE Using fundamental material property data, the stacking ensemble machine learning (EML) technique was established in this study to generate electron-solid interaction parameters for any target material over a wide range of energies. The final stacking EML was built using the base and meta learners bagging regressor (BR), K-nearest neighbors (k-NN), random forest (RF), support vector regression (SVR), and eXtreme Gradient Boosting (XGB). METHODS In this study, two publicly available databases with a total of 4030 data points were used. Training datasets have 785 and 525 data points for electron backscattering coefficient and stopping power, respectively, whereas testing datasets contain 262 and 175 data points. Five features were used as input variables to train different individual algorithms and their combinations. On both the training and test datasets, the model was evaluated using different error metrics, including R-squared (R2), mean-absolute-error (MAE), root-mean-squared-error (RMSE), and mean-absolute-percentage-error (MAPE). RESULTS Our model evaluation tests revealed that combining RF and XGB with a k-NN meta-learner outperformed other algorithms. The analysis of error metrics demonstrated a very close fit to all samples in each training dataset. Furthermore, predictions made by the model on unseen test data indicated accurate estimations of new backscattering and stopping power data. CONCLUSIONS The developed model achieved high prediction accuracy for various target materials across the broad electron energy spectrum. The outcomes demonstrate the effectiveness of machine learning methodology and the chosen models' suitability for addressing substantial physics challenges.
Collapse
Affiliation(s)
- Fatemeh Akbari
- Carleton Laboratory for Radiotherapy PhysicsDepartment of PhysicsCarleton UniversityOttawaOntarioCanada
- Department of Radiation OncologyUniversity of ToledoToledoOhioUSA
| |
Collapse
|
14
|
Chang Z, Cai Y, Liu XF, Xie Z, Liu Y, Zhan Q. Anomalous Node Detection in Blockchain Networks Based on Graph Neural Networks. SENSORS (BASEL, SWITZERLAND) 2024; 25:1. [PMID: 39796797 PMCID: PMC11723008 DOI: 10.3390/s25010001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/09/2024] [Accepted: 12/19/2024] [Indexed: 01/13/2025]
Abstract
With the rapid development of blockchain technology, fraudulent activities have significantly increased, posing a major threat to the personal assets of blockchain users. The blockchain transaction network formed during user transactions can be represented as a graph consisting of nodes and edges, making it suitable for a graph data structure. Fraudulent nodes in the transaction network are referred to as anomalous nodes. In recent years, the mainstream method for detecting anomalous nodes in graphs has been the use of graph data mining techniques. However, anomalous nodes typically constitute only a small portion of the transaction network, known as the minority class, while the majority of nodes are normal nodes, referred to as the majority class. This discrepancy in sample sizes results in class imbalance data, where models tend to overfit the features of the majority class and neglect those of the minority class. This issue presents significant challenges for traditional graph data mining techniques. In this paper, we propose a novel graph neural network method to overcome class imbalance issues by improving the Graph Attention Network (GAT) and incorporating ensemble learning concepts. Our method combines GAT with a subtree attention mechanism and two ensemble learning methods: Bootstrap Aggregating (Bagging) and Categorical Boosting (CAT), called SGAT-BC. We conducted experiments on four real-world blockchain transaction datasets, and the results demonstrate that SGAT-BC outperforms existing baseline models.
Collapse
Affiliation(s)
- Ze Chang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; (Z.C.); (Y.C.); (Z.X.); (Y.L.)
| | - Yunfei Cai
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; (Z.C.); (Y.C.); (Z.X.); (Y.L.)
| | - Xiao Fan Liu
- Department of Media and Communication, City University of Hong Kong, Hong Kong SAR, China;
| | - Zhenping Xie
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; (Z.C.); (Y.C.); (Z.X.); (Y.L.)
| | - Yuan Liu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; (Z.C.); (Y.C.); (Z.X.); (Y.L.)
| | - Qianyi Zhan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; (Z.C.); (Y.C.); (Z.X.); (Y.L.)
| |
Collapse
|
15
|
Ahmed U, Jiangbin Z, Almogren A, Sadiq M, Rehman AU, Sadiq MT, Choi J. Hybrid bagging and boosting with SHAP based feature selection for enhanced predictive modeling in intrusion detection systems. Sci Rep 2024; 14:30532. [PMID: 39690165 DOI: 10.1038/s41598-024-81151-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 11/25/2024] [Indexed: 12/19/2024] Open
Abstract
The novelty and growing sophistication of cyber threats mean that high accuracy and interpretable machine learning models are needed more than ever before for Intrusion Detection and Prevention Systems. This study aims to solve this challenge by applying Explainable AI techniques, including Shapley Additive explanations feature selection, to improve model performance, robustness, and transparency. The method systematically employs different classifiers and proposes a new hybrid method called Hybrid Bagging-Boosting and Boosting on Residuals. Then, performance is taken in four steps: the multistep evaluation of hybrid ensemble learning methods for binary classification and fine-tuning of performance; feature selection using Shapley Additive explanations values retraining the hybrid model for better performance and reducing overfitting; the generalization of the proposed model for multiclass classification; and the evaluation using standard information metrics such as accuracy, precision, recall, and F1-score. Key results indicate that the proposed methods outperform state-of-the-art algorithms, achieving a peak accuracy of 98.47% and an F1 score of 96.19%. These improvements stem from advanced feature selection and resampling techniques, enhancing model accuracy and balancing precision and recall. Integrating Shapley Additive explanations-based feature selection with hybrid ensemble methods significantly boosts the predictive and explanatory power of Intrusion Detection and Prevention Systems, addressing common pitfalls in traditional cybersecurity models. This study paves the way for further research on statistical innovations to enhance Intrusion Detection and Prevention Systems performance.
Collapse
Affiliation(s)
- Usman Ahmed
- School of Software, Northwestern Ploytechnical University, Xian, 710072, China
| | - Zheng Jiangbin
- School of Software, Northwestern Ploytechnical University, Xian, 710072, China
| | - Ahmad Almogren
- Chair of Cyber Security, Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, 11633, Saudi Arabia
| | - Muhammad Sadiq
- School of Computer Science and Electronic Engineering, University of Essex, Colchester Campus, United Kingdom.
| | - Ateeq Ur Rehman
- Applied Science Research Center, Applied Science Private University, Amman, Jordan
- Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, India
- University Center for Research and Development, Chandigarh University, Mohali, India
| | - M T Sadiq
- Applied Science Research Center, Applied Science Private University, Amman, Jordan
| | - Jaeyoung Choi
- School of Computing, Gachon University, Seongnam-si, 13120, Republic of Korea
| |
Collapse
|
16
|
Balch JA, Ruppert MM, Guan Z, Buchanan TR, Abbott KL, Shickel B, Bihorac A, Liang M, Upchurch GR, Tignanelli CJ, Loftus TJ. Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction. JAMA Surg 2024; 159:1424-1431. [PMID: 39382865 PMCID: PMC11465118 DOI: 10.1001/jamasurg.2024.4299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 08/05/2024] [Indexed: 10/10/2024]
Abstract
Importance Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. Objective To evaluate risk-prediction model performance when trained on risk-specific cohorts. Design, Setting, and Participants This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. Exposures The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. Main Outcomes and Measures Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. Results A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40). Conclusion and Relevance In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.
Collapse
Affiliation(s)
- Jeremy A. Balch
- Department of Surgery, University of Florida, Gainesville
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville
- Intelligent Clinical Care Center, College of Medicine, University of Florida, Gainesville
| | | | - Ziyuan Guan
- Intelligent Clinical Care Center, College of Medicine, University of Florida, Gainesville
| | | | | | - Benjamin Shickel
- Intelligent Clinical Care Center, College of Medicine, University of Florida, Gainesville
| | - Azra Bihorac
- Intelligent Clinical Care Center, College of Medicine, University of Florida, Gainesville
| | - Muxuan Liang
- College of Medicine, University of Florida, Gainesville
| | | | | | - Tyler J. Loftus
- Department of Surgery, University of Florida, Gainesville
- Intelligent Clinical Care Center, College of Medicine, University of Florida, Gainesville
| |
Collapse
|
17
|
Yassin MM, Lu J, Zaman A, Yang H, Cao A, Zeng X, Hassan H, Han T, Miao X, Shi Y, Guo Y, Luo Y, Kang Y. Advancing ischemic stroke diagnosis and clinical outcome prediction using improved ensemble techniques in DSC-PWI radiomics. Sci Rep 2024; 14:27580. [PMID: 39528656 PMCID: PMC11555321 DOI: 10.1038/s41598-024-78353-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024] Open
Abstract
Ischemic stroke is a leading global cause of death and disability and is expected to rise in the future. The present diagnostic techniques, like CT and MRI, have some limitations in distinguishing acute from chronic ischemia and in early ischemia detection. This study investigates the function of ensemble models based on the dynamic radiomics features (DRF) from the dynamic susceptibility contrast perfusion-weighted imaging (DSC-PWI) ischemic stroke diagnosis, neurological impairment assessment, and modified Rankin Scale (mRS) outcome prediction). DRF is extracted from the 3D images, features are selected, and dimensionality is reduced. After that, ensemble models are applied. Two model structures were developed: a voting classifier with 6 bagging classifiers and a stacking classifier based on 4 bagging classifiers. The ensemble models were evaluated on three core tasks. The Stacking_ens_LR model performed best for ischemic stroke detection, the LR Bagging model for NIH Stroke Scale (NIHSS) prediction, and the NB Bagging model for outcome prediction. These outcomes illustrate the strength of ensemble models. The work showcases the role of ensemble models and DRF in the stroke management process.
Collapse
Affiliation(s)
- Mazen M Yassin
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- Biomedical Engineering Department, Faculty of Engineering, Minia University, Menia, 61111, Egypt
| | - Jiaxi Lu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Asim Zaman
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Huihui Yang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Anbo Cao
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Xueqiang Zeng
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Haseeb Hassan
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
| | - Taiyu Han
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Xiaoqiang Miao
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
| | - Yongkang Shi
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen, 518055, China
| | - Yingwei Guo
- School of Electrical and Information Engineering, Northeast Petroleum University, Daqing, 163318, China
| | - Yu Luo
- Department of Radiology, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai, 200434, China
| | - Yan Kang
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China.
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, 518118, China.
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China.
- Faculty of Data Science, City University of Macau, Macau, China.
| |
Collapse
|
18
|
Wang X, Yu L, Wang H, Tsui KL, Zhao Y. Sensor-Based Multifaceted Feature Extraction and Ensemble Elastic Net Approach for Assessing Fall Risk in Community-Dwelling Older Adults. IEEE J Biomed Health Inform 2024; 28:6661-6673. [PMID: 39172618 DOI: 10.1109/jbhi.2024.3447705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Accurate identification of community-dwelling older adults at high fall risk can facilitate timely intervention and significantly reduce fall incidents. Analyzing gait and balance capabilities via feature extraction and modeling through sensor-based motion data has emerged as a viable approach for fall risk assessment. However, the existing approaches for extracting key features related to fall risk lack inclusiveness, with limited consideration of the non-linear characteristics of sensor signals, such as signal complexity, self-similarity, and local stability. In this study, we developed a multifaceted feature extraction scheme employing diverse feature types, including demographic, descriptive statistical, non-linear, spatiotemporal and spectral features, derived from three-axis accelerometers and gyroscope data. This study is the first attempt to investigate non-linear features related to fall risk in multi-task scenarios from a dynamic system perspective. Based on the extracted multifaceted features, we propose an ensemble elastic net (E-E-N) approach for handling imbalanced data and offering high model interpretability. The E-E-N utilizes bootstrap sampling to construct base classifiers and employs a weighting mechanism to aggregate the base classifiers. We conducted a set of validation experiments using real-world data for comprehensive comparative analysis. The results demonstrate that the E-E-N approach exhibits superior predictive performance on fall risk classification. Our proposed approach offers a cost-effective tool for accurately assessing fall risk and alleviating the burden of continuous health monitoring in the long term.
Collapse
|
19
|
Adegbenjo AO, Ngadi MO. Handling the Imbalanced Problem in Agri-Food Data Analysis. Foods 2024; 13:3300. [PMID: 39456362 PMCID: PMC11507408 DOI: 10.3390/foods13203300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 09/07/2024] [Accepted: 10/15/2024] [Indexed: 10/28/2024] Open
Abstract
Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue of concern in food processing applications. Inappropriate analysis of agricultural and food processing data was identified as limiting the robustness of predictive models built from agri-food applications. As a result of rare cases occurring infrequently, classification rules that detect small groups are scarce, so samples belonging to small classes are largely misclassified. Most existing machine learning algorithms including the K-means, decision trees, and support vector machines (SVMs) are not optimal in handling imbalanced data. Consequently, models developed from the analysis of such data are very prone to rejection and non-adoptability in real industrial and commercial settings. This paper showcases the reality of the imbalanced data problem in agri-food applications and therefore proposes some state-of-the-art artificial intelligence algorithm approaches for handling the problem using methods including data resampling, one-class learning, ensemble methods, feature selection, and deep learning techniques. This paper further evaluates existing and newer metrics that are well suited for handling imbalanced data. Rightly analyzing imbalanced data from food processing application research works will improve the accuracy of results and model developments. This will consequently enhance the acceptability and adoptability of innovations/inventions.
Collapse
Affiliation(s)
- Adeyemi O. Adegbenjo
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore Road, Ste-Anne-de-Bellevue, Montreal, QC H9X 3V9, Canada
- Process Quality Engineering, School of Engineering and Technology, Conestoga College Institute of Technology and Advanced Learning, 299 Doon Valley Drive, Kitchener, ON N2G 4M4, Canada
| | - Michael O. Ngadi
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore Road, Ste-Anne-de-Bellevue, Montreal, QC H9X 3V9, Canada
| |
Collapse
|
20
|
Aitcheson-Huehn N, MacPherson R, Panchuk D, Kiefer AW. Predicting Basketball Shot Outcome From Visuomotor Control Data Using Explainable Machine Learning. JOURNAL OF SPORT & EXERCISE PSYCHOLOGY 2024; 46:293-300. [PMID: 39244200 DOI: 10.1123/jsep.2024-0063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 08/10/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
Quiet eye (QE), the visual fixation on a target before initiation of a critical action, is associated with improved performance. While QE is trainable, it is unclear whether QE can directly predict performance, which has implications for training interventions. This study predicted basketball shot outcome (make or miss) from visuomotor control variables using a decision tree classification approach. Twelve basketball athletes completed 200 shots from six on-court locations while wearing mobile eye-tracking glasses. Training and testing data sets were used for modeling eight predictors (shot location, arm extension time, and absolute and relative QE onset, offset, and duration) via standard and conditional inference decision trees and random forests. On average, the trees predicted over 66% of makes and over 50% of misses. The main predictor, relative QE duration, indicated success for durations over 18.4% (range: 14.5%-22.0%). Training to prolong QE duration beyond 18% may enhance shot success.
Collapse
Affiliation(s)
- Nikki Aitcheson-Huehn
- Human Movement Science Curriculum, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ryan MacPherson
- Department of Exercise and Sport Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Adam W Kiefer
- Human Movement Science Curriculum, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Exercise and Sport Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
21
|
Li J, Li Q, Luo W, Zeng L, Luo L. Rapid Color Quality Evaluation of Needle-Shaped Green Tea Using Computer Vision System and Machine Learning Models. Foods 2024; 13:2516. [PMID: 39200443 PMCID: PMC11353727 DOI: 10.3390/foods13162516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/02/2024] Open
Abstract
Color characteristics are a crucial indicator of green tea quality, particularly in needle-shaped green tea, and are predominantly evaluated through subjective sensory analysis. Thus, the necessity arises for an objective, precise, and efficient assessment methodology. In this study, 885 images from 157 samples, obtained through computer vision technology, were used to predict sensory evaluation results based on the color features of the images. Three machine learning methods, Random Forest (RF), Support Vector Machine (SVM) and Decision Tree-based AdaBoost (DT-AdaBoost), were carried out to construct the color quality evaluation model. Notably, the DT-Adaboost model shows significant potential for application in evaluating tea quality, with a correct discrimination rate (CDR) of 98.50% and a relative percent deviation (RPD) of 14.827 in the 266 samples used to verify the accuracy of the model. This result indicates that the integration of computer vision with machine learning models presents an effective approach for assessing the color quality of needle-shaped green tea.
Collapse
Affiliation(s)
- Jinsong Li
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, College of Food Science, Southwest University, Chongqing 400715, China (L.L.)
- Chongqing Key Laboratory of Speciality Food Co-Built by Sichuan and Chongqing, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
- College of Food Science, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
| | - Qijun Li
- College of Computer and Information Science, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
| | - Wei Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, College of Food Science, Southwest University, Chongqing 400715, China (L.L.)
- Chongqing Key Laboratory of Speciality Food Co-Built by Sichuan and Chongqing, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
- College of Food Science, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
| | - Liang Zeng
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, College of Food Science, Southwest University, Chongqing 400715, China (L.L.)
- Chongqing Key Laboratory of Speciality Food Co-Built by Sichuan and Chongqing, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
- College of Food Science, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
| | - Liyong Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, College of Food Science, Southwest University, Chongqing 400715, China (L.L.)
- Chongqing Key Laboratory of Speciality Food Co-Built by Sichuan and Chongqing, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
- College of Food Science, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400715, China
| |
Collapse
|
22
|
Shen L, An J, Wang N, Wu J, Yao J, Gao Y. Artificial intelligence and machine learning applications in urinary tract infections identification and prediction: a systematic review and meta-analysis. World J Urol 2024; 42:464. [PMID: 39088072 DOI: 10.1007/s00345-024-05145-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 06/23/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Urinary tract infections (UTIs) have been one of the most common bacterial infections in clinical practice worldwide. Artificial intelligence (AI) and machine learning (ML) based algorithms have been increasingly applied in UTI case identification and prediction. However, the overall performance of AI/ML algorithms in identifying and predicting UTI has not been evaluated. The purpose of this paper is to quantitatively evaluate the application value of AI/ML in identifying and predicting UTI cases. METHODS MEDLINE, EMBASE, Web of Science, and PubMed databases were systematically searched for articles published up to December 31, 2023. Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and Prediction Model Risk of Bias Assessment Tool (PROBAST) were used to assess the risk of bias. Study characteristics and detailed algorithm information were extracted. Pooled sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were synthesized using a bivariate mix-effects model. Meta-regression and subgroup analysis were conducted to test the source of heterogeneity. RESULTS In total, 11 studies with 14 AI/ML models were included in the final meta-analysis. The overall pooled AUC was 0.89 (95%CI 0.86-0.92). Additionally, the pooled Sen, Spe, PLR, NLR, and DOR were 0.78 (95%CI 0.71-0.84), 0.89 (95%CI 0.83-0.93), 6.99 (95%CI 4.38-11.14), 0.25 (95%CI 0.18-0.34) and 28.07 (95%CI 14.27-55.20), respectively. The results of meta-regression suggested that reference standard definitions might be the source of heterogeneity. CONCLUSION AI/ML algorithms appear to be promising to help clinicians detect and identify patients at high risk of UTIs. However, further studies are demanded to evaluate the application value of AI/ML more thoroughly.
Collapse
Affiliation(s)
- Li Shen
- Department of Infection Control, Xi'an Hospital of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China
| | - Jialu An
- Department of Information Consultation, Library of Xi'an Jiaotong University, No.76 Yan Ta West Road, Yanta District, Xi'an, 710061, China
| | - Nanding Wang
- Department of Cardiology, Xi'an Hospital of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China
| | - Jin Wu
- Department of Clinical Laboratory, Xi'an Hospital of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China
| | - Jia Yao
- Experimental Center, Xi'an Hospital of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China
- Xi'an Academy of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China
| | - Yumei Gao
- Department of Infection Control, Xi'an Hospital of Traditional Chinese Medicine, No.69 Feng Cheng 8th Road, Weiyang District, Xi'an, 710021, China.
| |
Collapse
|
23
|
Parrott JM, Parrott AJ, Parrott JS, Williams NN, Dumon KR. Predicting Recurrent Deficiency and Suboptimal Monitoring of Thiamin Deficiency in Patients with Metabolic and Bariatric Surgery. Nutrients 2024; 16:2226. [PMID: 39064668 PMCID: PMC11280029 DOI: 10.3390/nu16142226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 07/06/2024] [Accepted: 07/08/2024] [Indexed: 07/28/2024] Open
Abstract
INTRODUCTION Vitamin B1 (thiamine) deficiency (TD) after metabolic and bariatric surgery (MBS) is often insidious and, if unrecognized, can lead to irreversible damage or death. As TD symptoms are vague and overlap with other disorders, we aim to identify predictors of recurrent TD and failure to collect B1 labs. METHODS We analyzed a large sample of data from patients with MBS (n = 878) to identify potential predictors of TD risk. We modeled recurrent TD and failure to collect B1 labs using classical statistical and machine learning (ML) techniques. RESULTS We identified clusters of labs associated with increased risk of recurrent TD: micronutrient deficiencies, abnormal blood indices, malnutrition, and fluctuating electrolyte levels (aIRR range: 1.62-4.68). Additionally, demographic variables associated with lower socioeconomic status were predictive of recurrent TD. ML models predicting characteristics associated with failure to collect B1 labs achieved 75-81% accuracy, indicating that clinicians may fail to match symptoms with the underlying condition. CONCLUSIONS Our analysis suggests that both clinical and social factors can increase the risk of life-threatening TD episodes in some MBS patients. Identifying these indicators can help with diagnosis and treatment.
Collapse
Affiliation(s)
- Julie M. Parrott
- Faculty of Health Sciences and Wellbeing, University of Sunderland, Sunderland SR1 3SD, UK
- Bariatric Surgery Program, Temple University Hospital, Philadelphia, PA 19140, USA
| | - Austen J. Parrott
- Behavioral Health, The Child Center of New York, New York, NY 11355, USA;
| | - J. Scott Parrott
- School of Health Professions, Rutgers University, Newark, NJ 07102, USA;
| | - Noel N. Williams
- Division of Gastrointestinal Surgery and Metabolic and Bariatric Surgery, Hospital of the University of Pennsylvania, Philadelphia, PA 19104, USA; (N.N.W.); (K.R.D.)
| | - Kristoffel R. Dumon
- Division of Gastrointestinal Surgery and Metabolic and Bariatric Surgery, Hospital of the University of Pennsylvania, Philadelphia, PA 19104, USA; (N.N.W.); (K.R.D.)
| |
Collapse
|
24
|
Sikri A, Jameel R, Idrees SM, Kaur H. Enhancing customer retention in telecom industry with machine learning driven churn prediction. Sci Rep 2024; 14:13097. [PMID: 38849493 PMCID: PMC11161656 DOI: 10.1038/s41598-024-63750-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 05/31/2024] [Indexed: 06/09/2024] Open
Abstract
Customer churn remains a critical concern for businesses, highlighting the significance of retaining existing customers over acquiring new ones. Effective prediction of potential churners aids in devising robust retention policies and efficient customer management strategies. This study dives into the realm of machine learning algorithms for predictive analysis in churn prediction, addressing the inherent challenge posed by diverse and imbalanced customer churn data distributions. This paper introduces a novel approach-the Ratio-based data balancing technique, which addresses data skewness as a pre-processing step, ensuring improved accuracy in predictive modelling. This study fills gaps in existing literature by highlighting the effectiveness of ensemble algorithms and the critical role of data balancing techniques in optimizing churn prediction models. While our research contributes a novel approach, there remain avenues for further exploration. This work evaluates several machine learning algorithms-Perceptron, Multi-Layer Perceptron, Naive Bayes, Logistic Regression, K-Nearest Neighbour, Decision Tree, alongside Ensemble techniques such as Gradient Boosting and Extreme Gradient Boosting (XGBoost)-on balanced datasets achieved through our proposed Ratio-based data balancing technique and the commonly used Data Resampling. Results reveal that our proposed Ratio-based data balancing technique notably outperforms traditional Over-Sampling and Under-Sampling methods in churn prediction accuracy. Additionally, using combined algorithms like Gradient Boosting and XGBoost showed better results than using single methods. Our study looked at different aspects like Accuracy, Precision, Recall, and F-Score, finding that these combined methods are better for predicting customer churn. Specifically, when we used a 75:25 ratio with the XGBoost method, we got the most promising results for our analysis which are presented in this work.
Collapse
Affiliation(s)
- Alisha Sikri
- Noida Institute of Engineering and Technology, Greater Noida, 201306, Uttar Pradesh, India
| | - Roshan Jameel
- Westford University College, Sharjah, United Arab Emirates
| | - Sheikh Mohammad Idrees
- Department of Computer Science (IDI), Norwegian University of Science and Technology, Trondheim, Norway.
| | - Harleen Kaur
- Department of Computer Science, Jamia Hamdard, New Delhi, India
| |
Collapse
|
25
|
Karabacak M, Bhimani AD, Schupper AJ, Carr MT, Steinberger J, Margetis K. Machine learning models on a web application to predict short-term postoperative outcomes following anterior cervical discectomy and fusion. BMC Musculoskelet Disord 2024; 25:401. [PMID: 38773464 PMCID: PMC11110429 DOI: 10.1186/s12891-024-07528-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024] Open
Abstract
BACKGROUND The frequency of anterior cervical discectomy and fusion (ACDF) has increased up to 400% since 2011, underscoring the need to preoperatively anticipate adverse postoperative outcomes given the procedure's expanding use. Our study aims to accomplish two goals: firstly, to develop a suite of explainable machine learning (ML) models capable of predicting adverse postoperative outcomes following ACDF surgery, and secondly, to embed these models in a user-friendly web application, demonstrating their potential utility. METHODS We utilized data from the National Surgical Quality Improvement Program database to identify patients who underwent ACDF surgery. The outcomes of interest were four short-term postoperative adverse outcomes: prolonged length of stay (LOS), non-home discharges, 30-day readmissions, and major complications. We utilized five ML algorithms - TabPFN, TabNET, XGBoost, LightGBM, and Random Forest - coupled with the Optuna optimization library for hyperparameter tuning. To bolster the interpretability of our models, we employed SHapley Additive exPlanations (SHAP) for evaluating predictor variables' relative importance and used partial dependence plots to illustrate the impact of individual variables on the predictions generated by our top-performing models. We visualized model performance using receiver operating characteristic (ROC) curves and precision-recall curves (PRC). Quantitative metrics calculated were the area under the ROC curve (AUROC), balanced accuracy, weighted area under the PRC (AUPRC), weighted precision, and weighted recall. Models with the highest AUROC values were selected for inclusion in a web application. RESULTS The analysis included 57,760 patients for prolonged LOS [11.1% with prolonged LOS], 57,780 for non-home discharges [3.3% non-home discharges], 57,790 for 30-day readmissions [2.9% readmitted], and 57,800 for major complications [1.4% with major complications]. The top-performing models, which were the ones built with the Random Forest algorithm, yielded mean AUROCs of 0.776, 0.846, 0.775, and 0.747 for predicting prolonged LOS, non-home discharges, readmissions, and complications, respectively. CONCLUSIONS Our study employs advanced ML methodologies to enhance the prediction of adverse postoperative outcomes following ACDF. We designed an accessible web application to integrate these models into clinical practice. Our findings affirm that ML tools serve as vital supplements in risk stratification, facilitating the prediction of diverse outcomes and enhancing patient counseling for ACDF.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA
| | - Abhiraj D Bhimani
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA
| | - Alexander J Schupper
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA
| | - Matthew T Carr
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA
| | - Jeremy Steinberger
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA
| | - Konstantinos Margetis
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Ave, New York, NY, 10029, USA.
| |
Collapse
|
26
|
Leiherer A, Muendlein A, Mink S, Mader A, Saely CH, Festa A, Fraunberger P, Drexel H. Machine Learning Approach to Metabolomic Data Predicts Type 2 Diabetes Mellitus Incidence. Int J Mol Sci 2024; 25:5331. [PMID: 38791370 PMCID: PMC11120685 DOI: 10.3390/ijms25105331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/30/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
Metabolomics, with its wealth of data, offers a valuable avenue for enhancing predictions and decision-making in diabetes. This observational study aimed to leverage machine learning (ML) algorithms to predict the 4-year risk of developing type 2 diabetes mellitus (T2DM) using targeted quantitative metabolomics data. A cohort of 279 cardiovascular risk patients who underwent coronary angiography and who were initially free of T2DM according to American Diabetes Association (ADA) criteria was analyzed at baseline, including anthropometric data and targeted metabolomics, using liquid chromatography (LC)-mass spectroscopy (MS) and flow injection analysis (FIA)-MS, respectively. All patients were followed for four years. During this time, 11.5% of the patients developed T2DM. After data preprocessing, 362 variables were used for ML, employing the Caret package in R. The dataset was divided into training and test sets (75:25 ratio) and we used an oversampling approach to address the classifier imbalance of T2DM incidence. After an additional recursive feature elimination step, identifying a set of 77 variables that were the most valuable for model generation, a Support Vector Machine (SVM) model with a linear kernel demonstrated the most promising predictive capabilities, exhibiting an F1 score of 50%, a specificity of 93%, and balanced and unbalanced accuracies of 72% and 88%, respectively. The top-ranked features were bile acids, ceramides, amino acids, and hexoses, whereas anthropometric features such as age, sex, waist circumference, or body mass index had no contribution. In conclusion, ML analysis of metabolomics data is a promising tool for identifying individuals at risk of developing T2DM and opens avenues for personalized and early intervention strategies.
Collapse
Affiliation(s)
- Andreas Leiherer
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
- Central Medical Laboratories, A-6800 Feldkirch, Austria; (S.M.); (P.F.)
- Faculty of Medical Sciences, Private University of the Principality of Liechtenstein, FL-9495 Triesen, Liechtenstein
| | - Axel Muendlein
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
| | - Sylvia Mink
- Central Medical Laboratories, A-6800 Feldkirch, Austria; (S.M.); (P.F.)
- Faculty of Medical Sciences, Private University of the Principality of Liechtenstein, FL-9495 Triesen, Liechtenstein
| | - Arthur Mader
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
- Department of Internal Medicine III, Academic Teaching Hospital Feldkirch, A-6800 Feldkirch, Austria
| | - Christoph H. Saely
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
- Faculty of Medical Sciences, Private University of the Principality of Liechtenstein, FL-9495 Triesen, Liechtenstein
- Department of Internal Medicine III, Academic Teaching Hospital Feldkirch, A-6800 Feldkirch, Austria
| | - Andreas Festa
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
| | - Peter Fraunberger
- Central Medical Laboratories, A-6800 Feldkirch, Austria; (S.M.); (P.F.)
- Faculty of Medical Sciences, Private University of the Principality of Liechtenstein, FL-9495 Triesen, Liechtenstein
| | - Heinz Drexel
- Vorarlberg Institute for Vascular Investigation and Treatment (VIVIT), A-6800 Feldkirch, Austria; (A.M.); (A.M.); (C.H.S.); (A.F.); (H.D.)
- Faculty of Medical Sciences, Private University of the Principality of Liechtenstein, FL-9495 Triesen, Liechtenstein
- Vorarlberger Landeskrankenhausbetriebsgesellschaft, Academic Teaching Hospital Feldkirch, A-6800 Feldkirch, Austria
- Drexel University College of Medicine, Philadelphia, PA 19129, USA
| |
Collapse
|
27
|
Warad AAM, Wassif K, Darwish NR. An ensemble learning model for forecasting water-pipe leakage. Sci Rep 2024; 14:10683. [PMID: 38724568 PMCID: PMC11082134 DOI: 10.1038/s41598-024-60840-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/28/2024] [Indexed: 05/12/2024] Open
Abstract
Based on the benefits of different ensemble methods, such as bagging and boosting, which have been studied and adopted extensively in research and practice, where bagging and boosting focus more on reducing variance and bias, this paper presented an optimization ensemble learning-based model for a large pipe failure dataset of water pipe leakage forecasting, something that was not previously considered by others. It is known that tuning the hyperparameters of each base learned inside the ensemble weight optimization process can produce better-performing ensembles, so it effectively improves the accuracy of water pipe leakage forecasting based on the pipeline failure rate. To evaluate the proposed model, the results are compared with the results of the bagging ensemble and boosting ensemble models using the root-mean-square error (RMSE), the mean square error (MSE), the mean absolute error (MAE), and the coefficient of determination (R2) of the bagging ensemble technique, the boosting ensemble technique and optimizable ensemble technique are higher than other models. The experimental result shows that the optimizable ensemble model has better prediction accuracy. The optimizable ensemble model has achieved the best prediction of water pipe failure rate at the 14th iteration, with the least RMSE = 0.00231 and MAE = 0.00071513 when building the model that predicts water pipe leakage forecasting via pipeline failure rate.
Collapse
Affiliation(s)
- Ahmed Ali Mohamed Warad
- Department of Information Systems and Technology, Faculty of Graduate Studies for Statistical Research, Cairo University, Cairo, Egypt.
| | - Khaled Wassif
- Department of Computer Science, Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| | - Nagy Ramadan Darwish
- Department of Information Systems and Technology, Faculty of Graduate Studies for Statistical Research, Cairo University, Cairo, Egypt
| |
Collapse
|
28
|
Yang C, Huebner ES, Tian L. Prediction of suicidal ideation among preadolescent children with machine learning models: A longitudinal study. J Affect Disord 2024; 352:403-409. [PMID: 38387673 DOI: 10.1016/j.jad.2024.02.070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 02/15/2024] [Accepted: 02/19/2024] [Indexed: 02/24/2024]
Abstract
BACKGROUND Machine learning (ML) has been widely used to predict suicidal ideation (SI) in adolescents and adults. Nevertheless, studies of accurate and efficient models of SI prediction with preadolescent children are still needed because SI is surprisingly prevalent during the transition into adolescence. This study aimed to explore the potential of ML models to predict SI among preadolescent children. METHODS A total of 4691 Chinese children (54.89 % boys, Mage = 10.92 at baseline) and their parents completed relevant measures at baseline and the children provided 6-month follow-up data for SI. The current study compared four ML models: Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), to predict SI and to identify variables with predictive value based on the best-performing model among Chinese preadolescent children. RESULTS The RF model achieved the highest discriminant performance with an AUC of 0.92, accuracy of 0.93 (balanced accuracy = 0.88). The factors of internalizing problems, externalizing problems, neuroticism, childhood maltreatment, and subjective well-being in school demonstrated the highest values in predicting SI. CONCLUSION The findings of this study suggested that ML models based on the observation and assessment of children's general characteristics and experiences in everyday life can serve as convenient screening and evaluation tools for suicide risk assessment among Chinese preadolescent children. The findings also provide insights for early intervention.
Collapse
Affiliation(s)
- Chi Yang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, South China Normal University, Ministry of Education, Guangzhou 510631, People's Republic of China; School of Psychology, South China Normal University, Guangzhou 510631, People's Republic of China
| | - E Scott Huebner
- Department of Psychology, University of South Carolina, Columbia, SC 29208, USA
| | - Lili Tian
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, South China Normal University, Ministry of Education, Guangzhou 510631, People's Republic of China.
| |
Collapse
|
29
|
Zhu H, Yuan J, Wan Q, Cheng F, Dong X, Xia S, Zhou C. A UV-Vis spectroscopic detection method for cobalt ions in zinc sulfate solution based on discrete wavelet transform and extreme gradient boosting. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 311:123982. [PMID: 38320470 DOI: 10.1016/j.saa.2024.123982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 01/16/2024] [Accepted: 01/29/2024] [Indexed: 02/08/2024]
Abstract
Zinc is a crucial strategic metal resource. The concentration of cobalt ions in zinc refining solution significantly impacts the efficiency of zinc electrolysis production. The traditional method of detecting cobalt ions in zinc solution is time-consuming, labor-intensive and ineffective. However, optical detection offers the advantage of high efficiency and low cost, making it a potential replacement for the traditional method. In this study, the spectral curve of cobalt ions in zinc solution is detected by ultraviolet-visible (UV-Vis) spectrophotometry. Additionally, we propose a model for the concentration-absorbance relationship of cobalt ions in zinc solution based on discrete wavelet transform and extreme gradient boosting (DWT-XGBoost) algorithms. First, the spectral curve's information region is denoised by using Savitzky-Golay (S-G) smoothing. Then, the denoised spectra is utilized to extract features through discrete wavelet transform and principal component analysis. These features are used as inputs to the XGBoost model to establish prediction models for low and high cobalt ions in zinc solution. Bayesian optimization is implemented to adjust the model's hyperparameters, including learning rate, feature sampling ratio, to enhance the prediction performance. Finally, applying the model to zinc solution samples from a zinc smelter and compared with other state-of-the-art algorithms, the DWT-XGBoost algorithm exhibits the lowest RMSE, MAE and MAPE, with values of 0.034 mg/L, 0.025 mg/L, 6.983 % for low cobalt and with values of 0.231 mg/L, 0.067 mg/L and 0.472 % for high cobalt. The experimental results demonstrate that the DWT-XGBoost model exhibits significantly superior prediction performance.
Collapse
Affiliation(s)
- Hongqiu Zhu
- School of Automation, Central South University, Changsha 410083, China
| | - Jianqiang Yuan
- School of Automation, Central South University, Changsha 410083, China
| | - Qilong Wan
- School of Automation, Central South University, Changsha 410083, China.
| | - Fei Cheng
- School of Automation, Central South University, Changsha 410083, China
| | - Xinran Dong
- School of Automation, Central South University, Changsha 410083, China
| | - Sibo Xia
- School of Automation, Central South University, Changsha 410083, China
| | - Can Zhou
- School of Automation, Central South University, Changsha 410083, China.
| |
Collapse
|
30
|
Juwara L, El-Hussuna A, El Emam K. An evaluation of synthetic data augmentation for mitigating covariate bias in health data. PATTERNS (NEW YORK, N.Y.) 2024; 5:100946. [PMID: 38645766 PMCID: PMC11026977 DOI: 10.1016/j.patter.2024.100946] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/23/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.
Collapse
Affiliation(s)
- Lamin Juwara
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
- Research Institute, Children’s Hospital of Eastern Ontario, Ottawa, ON, Canada
| | | | - Khaled El Emam
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
- Research Institute, Children’s Hospital of Eastern Ontario, Ottawa, ON, Canada
- Data Science, Replica Analytics Ltd., Ottawa, ON, Canada
| |
Collapse
|
31
|
Zhu C, Liu X, Chen D. Prediction of digital transformation of manufacturing industry based on interpretable machine learning. PLoS One 2024; 19:e0299147. [PMID: 38551908 PMCID: PMC10980183 DOI: 10.1371/journal.pone.0299147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 02/05/2024] [Indexed: 04/01/2024] Open
Abstract
The enhancement of digital transformation is of paramount importance for business development. This study employs machine learning to establish a predictive model for digital transformation, investigates crucial factors that influence digital transformation, and proposes corresponding improvement strategies. Initially, four commonly used machine learning algorithms are compared, revealing that the Extreme tree classification (ETC) algorithm exhibits the most accurate prediction. Subsequently, through correlation analysis and recursive elimination, key features that impact digital transformation are selected resulting in the corresponding feature subset. Shapley Additive Explanation (SHAP) values are then employed to perform an interpretable analysis on the predictive model, elucidating the effects of each key feature on digital transformation and obtaining critical feature values. Lastly, informed by practical considerations, we propose a quantitative adjustment strategy to enhance the degree of digital transformation in enterprises, which provides guidance for digital development.
Collapse
Affiliation(s)
- Chen Zhu
- Business School, Yangzhou University, Yangzhou, China
- School of Economics, Dongbei University of Finance and Economics, Dalian, China
- Postdoctoral Workstation of China Dalian International Economic & Technical Cooperation Group Co., Ltd, Dalian, China
| | - Xue Liu
- Business School, Yangzhou University, Yangzhou, China
| | - Dong Chen
- School of Accounting, Dongbei University of Finance and Economics, Dalian, China
- China Internal Control Research Center, Dalian, China
| |
Collapse
|
32
|
Salehi AR, Khedmati M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data. Sci Rep 2024; 14:5152. [PMID: 38431701 PMCID: PMC10908853 DOI: 10.1038/s41598-024-55598-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 02/26/2024] [Indexed: 03/05/2024] Open
Abstract
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
Collapse
Affiliation(s)
- Amir Reza Salehi
- Department of Industrial Engineering, Sharif University of Technology, 9414 Azadi Ave, P.O. Box 11155, Tehran, 1458889694, Iran
| | - Majid Khedmati
- Department of Industrial Engineering, Sharif University of Technology, 9414 Azadi Ave, P.O. Box 11155, Tehran, 1458889694, Iran.
| |
Collapse
|
33
|
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. J Transl Med 2024; 104:100320. [PMID: 38158124 DOI: 10.1016/j.labinv.2023.100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 12/05/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024] Open
Abstract
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Lin Zhao
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Ning Yu
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Yuxiang Lin
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey.
| |
Collapse
|
34
|
Karabacak M, Jagtiani P, Margetis K. The Predictive Abilities of Machine Learning Algorithms in Patients with Thoracolumbar Spinal Cord Injuries. World Neurosurg 2024; 182:e67-e90. [PMID: 38030070 DOI: 10.1016/j.wneu.2023.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 12/01/2023]
Abstract
OBJECTIVES The goal of this study is to implement machine learning (ML) algorithms to predict mortality, non-home discharge, prolonged length of stay (LOS), prolonged length of intensive care unit stay (ICU-LOS), and major complications in patients diagnosed with thoracolumbar spinal cord injury, while creating a publicly accessible online tool. METHODS The American College of Surgeons Trauma Quality Program database was used to identify patients with thoracolumbar spinal cord injury. Feature selection was performed with the Least Absolute Shrinkage and Selection Operator algorithm. Five ML algorithms, including TabPFN, TabNet, XGBoost, LightGBM, and Random Forest, were used along with the Optuna optimization library for hyperparameter tuning. RESULTS A total of 147,819 patients were included in the analysis. For each outcome, we determined the best model for deployment in our web application based on the area under the receiver operating characteristic (AUROC) values. The top performing algorithms were as follows: LightGBM for mortality with an AUROC of 0.885, TabPFN for non-home discharge with an AUROC of 0.801, LightGBM for prolonged LOS with an AUROC of 0.673, Random Forest for prolonged ICU-LOS with an AUROC of 0.664, and LightGBM for major complications with an AUROC of 0.73. CONCLUSIONS ML models demonstrate good predictive ability for in-hospital mortality and non-home discharge, fair predictive ability for major complications and prolonged ICU-LOS, but poor predictive ability for prolonged LOS. We have developed a web application that allows these models to be accessed.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, New York, USA
| | - Pemla Jagtiani
- School of Medicine, SUNY Downstate Health Sciences University, New York, New York, USA
| | | |
Collapse
|
35
|
Lee K, Ha SM, Gurudatt NG, Heo W, Hyun KA, Kim J, Jung HI. Machine learning-powered electrochemical aptasensor for simultaneous monitoring of di(2-ethylhexyl) phthalate and bisphenol A in variable pH environments. JOURNAL OF HAZARDOUS MATERIALS 2024; 462:132775. [PMID: 37865074 DOI: 10.1016/j.jhazmat.2023.132775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/19/2023] [Accepted: 10/11/2023] [Indexed: 10/23/2023]
Abstract
Plastic waste is a pernicious environmental pollutant that threatens ecosystems and human health by releasing contaminants including di(2-ethylhexyl) phthalate (DEHP) and bisphenol A (BPA). Therefore, a machine-learning (ML)-powered electrochemical aptasensor was developed in this study for simultaneously detecting DEHP and BPA in river waters, particularly to minimize the electrochemical signal errors caused by varying pH levels. The aptasensor leverages a straightforward and effective surface modification strategy featuring gold nanoflowers to achieve low detection limits for DEHP and BPA (0.58 and 0.59 pg/mL, respectively), excellent specificity, and stability. The least-squares boosting (LSBoost) algorithm was introduced to reliably monitor the targets regardless of pH; it employs a layer that adjusts the number of multi-indexes and the parallel learning structure of an ensemble model to accurately predict concentrations by preventing overfitting and enhancing the learning effect. The ML-powered aptasensor successfully detected targets in 12 river sites with diverse pH values, exhibiting higher accuracy and reliability. To our knowledge, the platform proposed in this study is the first attempt to utilize ML for the simultaneous assessment of DEHP and BPA. This breakthrough allows for comprehensive investigations into the effects of contamination originating from diverse plastics by eliminating external interferent-caused influences.
Collapse
Affiliation(s)
- Kyungyeon Lee
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea; Department of Medical Engineering, College of Medicine, Yonsei University, Seoul 03722, Republic of Korea
| | - Seong Min Ha
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - N G Gurudatt
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Woong Heo
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Kyung-A Hyun
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea; Korea Electronics Technology Institute (KETI), 25 Saenari-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13509, Republic of Korea
| | - Jayoung Kim
- Department of Medical Engineering, College of Medicine, Yonsei University, Seoul 03722, Republic of Korea
| | - Hyo-Il Jung
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, Republic of Korea; The DABOM Inc., Seoul, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea.
| |
Collapse
|
36
|
Chen Y, Chien J, Dai B, Lin D, Chen ZS. Identifying behavioral links to neural dynamics of multifiber photometry recordings in a mouse social behavior network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.25.573308. [PMID: 38234793 PMCID: PMC10793434 DOI: 10.1101/2023.12.25.573308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Distributed hypothalamic-midbrain neural circuits orchestrate complex behavioral responses during social interactions. How population-averaged neural activity measured by multi-fiber photometry (MFP) for calcium fluorescence signals correlates with social behaviors is a fundamental question. We propose a state-space analysis framework to characterize mouse MFP data based on dynamic latent variable models, which include continuous-state linear dynamical system (LDS) and discrete-state hidden semi-Markov model (HSMM). We validate these models on extensive MFP recordings during aggressive and mating behaviors in male-male and male-female interactions, respectively. Our results show that these models are capable of capturing both temporal behavioral structure and associated neural states. Overall, these analysis approaches provide an unbiased strategy to examine neural dynamics underlying social behaviors and reveals mechanistic insights into the relevant networks.
Collapse
Affiliation(s)
- Yibo Chen
- Department of Psychiatry, Department of Neuroscience and Physiology, New York University School of Medicine, New York, NY, USA
- Program in Artificial Intelligence, University of Science and Technology of China, Hefei, Anhui, China
| | - Jonathan Chien
- Department of Psychiatry, Department of Neuroscience and Physiology, New York University School of Medicine, New York, NY, USA
| | - Bing Dai
- Neuroscience Institute, New York University Grossman School of Medicine, New York, NY, USA
| | - Dayu Lin
- Department of Psychiatry, Department of Neuroscience and Physiology, New York University School of Medicine, New York, NY, USA
- Neuroscience Institute, New York University Grossman School of Medicine, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| | - Zhe Sage Chen
- Department of Psychiatry, Department of Neuroscience and Physiology, New York University School of Medicine, New York, NY, USA
- Neuroscience Institute, New York University Grossman School of Medicine, New York, NY, USA
- Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY, USA
| |
Collapse
|
37
|
Fu X, Suo H, Zhang J, Chen D. Machine-learning-guided Directed Evolution for AAV Capsid Engineering. Curr Pharm Des 2024; 30:811-824. [PMID: 38445704 DOI: 10.2174/0113816128286593240226060318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024]
Abstract
Target gene delivery is crucial to gene therapy. Adeno-associated virus (AAV) has emerged as a primary gene therapy vector due to its broad host range, long-term expression, and low pathogenicity. However, AAV vectors have some limitations, such as immunogenicity and insufficient targeting. Designing or modifying capsids is a potential method of improving the efficacy of gene delivery, but hindered by weak biological basis of AAV, complexity of the capsids, and limitations of current screening methods. Artificial intelligence (AI), especially machine learning (ML), has great potential to accelerate and improve the optimization of capsid properties as well as decrease their development time and manufacturing costs. This review introduces the traditional methods of designing AAV capsids and the general steps of building a sequence-function ML model, highlights the applications of ML in the development workflow, and summarizes its advantages and challenges.
Collapse
Affiliation(s)
- Xianrong Fu
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Hairui Suo
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Jiachen Zhang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Dongmei Chen
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
38
|
Karabacak M, Margetis K. Prognosis at Your Fingertips: A Machine Learning-Based Web Application for Outcome Prediction in Acute Traumatic Epidural Hematoma. J Neurotrauma 2024; 41:147-160. [PMID: 37261977 DOI: 10.1089/neu.2023.0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023] Open
Abstract
Traumatic brain injury (TBI) affects 69 million people worldwide each year, and acute traumatic epidural hematoma (atEDH) is a frequent and severe consequence of TBI. The aim of the study is to use machine learning (ML) algorithms to predict in-hospital death, non-home discharges, prolonged length of stay (LOS), prolonged length of intensive care unit stay (ICU-LOS), and major complications in patients with atEDH and incorporate the resulting ML models into a user-friendly web application for use in the clinical settings. The American College of Surgeons (ACS) Trauma Quality Program (TQP) database was used to identify patients with atEDH. Four ML algorithms (XGBoost, LightGBM, CatBoost, and Random Forest) were utilized, and the best performing models were incorporated into an open-access web application to predict the outcomes of interest. The study found that the ML algorithms had high area under the receiver operating characteristic curve (AUROC) values in predicting outcomes for patients with atEDH. In particular, the algorithms had an AUROC value range of between 0.874 to 0.956 for in-hospital mortality, 0.776 to 0.798 for non-home discharges, 0.737 to 0.758 for prolonged LOS, 0.712 to 0.774 for prolonged ICU-LOS, and 0.674 to 0.733 for major complications. The following link will take users to the open-access web application designed to generate predictions for individual patients based on their characteristics: huggingface.co/spaces/MSHS-Neurosurgery-Research/TQP-atEDH. This study aimed to improve the prognostication of patients with atEDH using ML algorithms and developed a web application for easy integration in clinical practice. It found that ML algorithms can aid in risk stratification and have significant potential for predicting in-hospital outcomes. Results demonstrated excellent performance for predicting in-hospital death and fair performance for non-home discharges, prolonged LOS and ICU-LOS, and poor performance for major complications.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, New York, USA
| | | |
Collapse
|
39
|
Siarkos K, Karavasilis E, Velonakis G, Papageorgiou C, Smyrnis N, Kelekis N, Politis A. Brain multi-contrast, multi-atlas segmentation of diffusion tensor imaging and ensemble learning automatically diagnose late-life depression. Sci Rep 2023; 13:22743. [PMID: 38123613 PMCID: PMC10733280 DOI: 10.1038/s41598-023-49935-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
We investigated the potential of machine learning for diagnostic classification in late-life major depression based on an advanced whole brain white matter segmentation framework. Twenty-six late-life depression and 12 never depressed individuals aged > 55 years, matched for age, MMSE, and education underwent brain diffusion tensor imaging and a multi-contrast, multi-atlas segmentation in MRIcloud. Fractional anisotropy volume, mean fractional anisotropy, trace, axial and radial diffusivity (RD) extracted from 146 white matter parcels for each subject were used to train and test the AdaBoost classifier using stratified 12-fold cross validation. Performance was evaluated using various measures. The statistical power of the classifier was assessed using label permutation test. Statistical analysis did not yield significant differences in DTI measures between the groups. The classifier achieved a balanced accuracy of 71% and an Area Under the Receiver Operator Characteristic Curve (ROC-AUC) of 0.81 by trace, and a balanced accuracy of 70% and a ROC-AUC of 0.80 by RD, in limbic, cortico-basal ganglia-thalamo-cortical loop, brainstem, external and internal capsules, callosal and cerebellar structures. Both indices shared important structures for classification, while fornix was the most important structure for classification by both indices. The classifier proved statistically significant, as trace and RD ROC-AUC scores after permutation were lower than those obtained with the actual data (P = 0.022 and P = 0.024, respectively). Similar results were obtained with the Gradient Boosting classifier, whereas the RBF-kernel Support Vector Machine with k-best feature selection did not exceed the chance level. Finally, AdaBoost significantly predicted the class using all features together. Limitations are discussed. The results encourage further investigation of the implemented methods for computer aided diagnostics and anatomically informed therapeutics.
Collapse
Affiliation(s)
- Kostas Siarkos
- Division of Geriatric Psychiatry, First Department of Psychiatry, National and Kapodistrian University of Athens, Athens, Greece.
| | - Efstratios Karavasilis
- Medical School, Democritus University of Thrace, Alexandroupolis, Greece
- Second Department of Radiology, Attikon General University Hospital, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Georgios Velonakis
- Second Department of Radiology, Attikon General University Hospital, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Charalabos Papageorgiou
- University Mental Health, Neurosciences and Precision Medicine Research Institute "Costas Stefanis", Athens, Greece
| | - Nikolaos Smyrnis
- Second Department of Psychiatry, Attikon General University Hospital, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Nikolaos Kelekis
- Second Department of Radiology, Attikon General University Hospital, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Antonios Politis
- Division of Geriatric Psychiatry, First Department of Psychiatry, National and Kapodistrian University of Athens, Athens, Greece
- Department of Psychiatry, Division of Geriatric Psychiatry and Neuropsychiatry, Johns Hopkins Medical School, Baltimore, USA
| |
Collapse
|
40
|
Verdonk C, Duffaud AM, Longin A, Bertrand M, Zagnoli F, Trousselard M, Canini F. Posture analysis in predicting fall-related injuries during French Navy Special Forces selection course using machine learning: a proof-of-concept study. BMJ Mil Health 2023:e002542. [PMID: 38124202 DOI: 10.1136/military-2023-002542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023]
Abstract
INTRODUCTION Injuries induced by falls represent the main cause of failure in the French Navy Special Forces selection course. In the present study, we made the assumption that probing the posture might contribute to predicting the risk of fall-related injury at the individual level. METHODS Before the start of the selection course, the postural signals of 99 male soldiers were recorded using static posturography while they were instructed to maintain balance with their eyes closed. The event to be predicted was a fall-related injury during the selection course that resulted in the definitive termination of participation. Following a machine learning methodology, we designed an artificial neural network model to predict the risk of fall-related injury from the descriptors of postural signal. RESULTS The neural network model successfully predicted with 69.9% accuracy (95% CI 69.3-70.5) the occurrence of a fall-related injury event during the selection course from the selected descriptors of the posture. The area under the curve value was 0.731 (95% CI 0.725-0.738), the sensitivity was 56.8% (95% CI 55.2-58.4) and the specificity was 77.7% (95% CI 76.8-0.78.6). CONCLUSION If confirmed with a larger sample, these findings suggest that probing the posture using static posturography and machine learning-based analysis might contribute to inform risk assessment of fall-related injury during military training, and could ultimately lead to the development of novel programmes for personalised injury prevention in military population.
Collapse
Affiliation(s)
- Charles Verdonk
- French Armed Forces Biomedical Research Institute, Brétigny-sur-Orge, France
- Laureate Institute for Brain Research, Tulsa, Oklahoma, USA
- VIFASOM, Université Paris Cité, Paris, France
| | - A M Duffaud
- French Armed Forces Biomedical Research Institute, Brétigny-sur-Orge, France
| | - A Longin
- 125th Medical Unit of Lann Bihoué, Lorient, France
| | - M Bertrand
- 6th Special Medical Unit of Orléans-Bricy, Bricy, France
| | - F Zagnoli
- Department of Neurology, Clermont Tonnerre Military Hospital, Brest, France
- French Military Health Academy, Paris, France
| | - M Trousselard
- French Armed Forces Biomedical Research Institute, Brétigny-sur-Orge, France
- French Military Health Academy, Paris, France
| | - F Canini
- French Armed Forces Biomedical Research Institute, Brétigny-sur-Orge, France
- French Military Health Academy, Paris, France
| |
Collapse
|
41
|
Karabacak M, Margetis K. Precision medicine for traumatic cervical spinal cord injuries: accessible and interpretable machine learning models to predict individualized in-hospital outcomes. Spine J 2023; 23:1750-1763. [PMID: 37619871 DOI: 10.1016/j.spinee.2023.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 06/28/2023] [Accepted: 08/13/2023] [Indexed: 08/26/2023]
Abstract
BACKGROUND CONTEXT A traumatic spinal cord injury (SCI) can cause temporary or permanent motor and sensory impairment, leading to serious short and long-term consequences that can result in significant morbidity and mortality. The cervical spine is the most commonly affected area, accounting for about 60% of all traumatic SCI cases. PURPOSE This study aims to employ machine learning (ML) algorithms to predict various outcomes, such as in-hospital mortality, nonhome discharges, extended length of stay (LOS), extended length of intensive care unit stay (ICU-LOS), and major complications in patients diagnosed with cervical SCI (cSCI). STUDY DESIGN Our study was a retrospective machine learning classification study aiming to predict the outcomes of interest, which were binary categorical variables, in patients diagnosed with cSCI. PATIENT SAMPLE The data for this study were obtained from the American College of Surgeons (ACS) Trauma Quality Program (TQP) database, which was queried to identify patients who suffered from cSCI between 2019 and 2021. OUTCOME MEASURES The outcomes of interest of our study were in-hospital mortality, nonhome discharges, prolonged LOS, prolonged ICU-LOS, and major complications. The study evaluated the models' performance using both graphical and numerical methods. The receiver operating characteristic (ROC) and precision-recall curves (PRC) were used to assess model performance graphically. Numerical evaluation metrics included AUROC, balanced accuracy, weighted area under PRC (AUPRC), weighted precision, and weighted recall. METHODS The study employed data from the American College of Surgeons (ACS) Trauma Quality Program (TQP) database to identify patients with cSCI. Four ML algorithms, namely XGBoost, LightGBM, CatBoost, and Random Forest, were utilized to develop predictive models. The most effective models were then incorporated into a publicly available web application designed to forecast the outcomes of interest. RESULTS There were 71,661 patients included in the analysis for the outcome mortality, 67,331 for the outcome nonhome discharges, 76,782 for the outcome prolonged LOS, 26,615 for the outcome prolonged ICU-LOS, and 72,132 for the outcome major complications. The algorithms exhibited an AUROC value range of 0.78 to 0.839 for in-hospital mortality, 0.806 to 0.815 for nonhome discharges, 0.679 to 0.742 for prolonged LOS, 0.666 to 0.682 for prolonged ICU-LOS, and 0.637 to 0.704 for major complications. An open access web application was developed as part of the study, which can generate predictions for individual patients based on their characteristics. CONCLUSIONS Our study suggests that ML models can be valuable in assessing risk for patients with cervical cSCI and may have considerable potential for predicting outcomes during hospitalization. ML models demonstrated good predictive ability for in-hospital mortality and nonhome discharges, fair predictive ability for prolonged LOS, but poor predictive ability for prolonged ICU-LOS and major complications. Along with these promising results, the development of a user-friendly web application that facilitates the integration of these models into clinical practice is a significant contribution of this study. The product of this study may have significant implications in clinical settings to personalize care, anticipate outcomes, facilitate shared decision making and informed consent processes for cSCI patients.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison (Ave), New York, 10029 NY, USA
| | - Konstantinos Margetis
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison (Ave), New York, 10029 NY, USA
| |
Collapse
|
42
|
Huang MW, Tsai CF, Tsui SC, Lin WC. Combining data discretization and missing value imputation for incomplete medical datasets. PLoS One 2023; 18:e0295032. [PMID: 38033140 PMCID: PMC10688879 DOI: 10.1371/journal.pone.0295032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 11/14/2023] [Indexed: 12/02/2023] Open
Abstract
Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process incomplete medical datasets, examining how the order of discretization and missing-value imputation combined influenced performance. The experimental results were obtained using seven different medical domain problem datasets: two discretizers, including the minimum description length principle (MDLP) and ChiMerge; three imputation methods, including the mean/mode, classification and regression tree (CART), and k-nearest neighbor (KNN) methods; and two classifiers, including support vector machines (SVM) and the C4.5 decision tree. The results show that a better performance can be obtained by first performing discretization followed by imputation, rather than vice versa. Furthermore, the highest classification accuracy rate was achieved by combining ChiMerge and KNN with SVM.
Collapse
Affiliation(s)
- Min-Wei Huang
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan
- Department of Physical Therapy and Graduate Institute of Rehabilitation Science, China Medical University, Taichung, Taiwan
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Shu-Ching Tsui
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Wei-Chao Lin
- Department of Digital Financial Technology, Chang Gung University, Taoyuan, Taiwan
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| |
Collapse
|
43
|
Hiraga K, Takeuchi M, Kimura T, Yoshida S, Kawakami K. Prediction models for in-hospital deaths of patients with COVID-19 using electronic healthcare data. Curr Med Res Opin 2023; 39:1463-1471. [PMID: 37828849 DOI: 10.1080/03007995.2023.2270420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 10/10/2023] [Indexed: 10/14/2023]
Abstract
OBJECTIVE Many models for predicting various disease prognoses have achieved high performance without laboratory test results. However, whether laboratory test results can improve performance remains unclear. This study aimed to investigate whether laboratory test results improve the model performance for coronavirus disease 2019 (COVID-19). METHODS Prediction models were developed using data from the electronic healthcare record database in Japan. Patients aged ≥18 years hospitalized for COVID-19 after February 11, 2020, were included. Their age, sex, comorbidities, laboratory test results, and number of days from February 11, 2020, were collected. We developed a logistic regression, XGBOOST, random forest, and neural network analysis and compared the performance with and without laboratory test results. The performance of predicting in-hospital death was evaluated using the area under the curve (AUC). RESULTS Data from 8,288 hospitalized patients (females, 46.5%) were analyzed. The median patient age was 71 years. A total of 6,630 patients were included in the training dataset, and 312 (4.7%) died. In the logistic regression model, the area under the curve was 0.88 (95% confidence interval [CI] = 0.83-0.93) and 0.75 (95% CI = 0.68-0.81) with and without laboratory test results, respectively. The performance was not fundamentally different between the model types, and the laboratory test results improved the performance in all cases. The variables useful for prediction were blood urea nitrogen, albumin, and lactate dehydrogenase. CONCLUSIONS Laboratory test results, such as blood urea nitrogen, albumin, and lactate dehydrogenase levels, along with background information, helped estimate the prognosis of patients hospitalized for COVID-19.
Collapse
Affiliation(s)
- Kenichi Hiraga
- Department of Pharmacoepidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| | - Masato Takeuchi
- Department of Pharmacoepidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| | - Takeshi Kimura
- Research and Analytics Department, Real World Data Co., Ltd, Kyoto, Japan
| | - Satomi Yoshida
- Department of Pharmacoepidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| | - Koji Kawakami
- Department of Pharmacoepidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| |
Collapse
|
44
|
Rodríguez-Belenguer P, March-Vila E, Pastor M, Mangas-Sanjuan V, Soria-Olivas E. Usage of model combination in computational toxicology. Toxicol Lett 2023; 389:34-44. [PMID: 37890682 DOI: 10.1016/j.toxlet.2023.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 10/29/2023]
Abstract
New Approach Methodologies (NAMs) have ushered in a new era in the field of toxicology, aiming to replace animal testing. However, despite these advancements, they are not exempt from the inherent complexities associated with the study's endpoint. In this review, we have identified three major groups of complexities: mechanistic, chemical space, and methodological. The mechanistic complexity arises from interconnected biological processes within a network that are challenging to model in a single step. In the second group, chemical space complexity exhibits significant dissimilarity between compounds in the training and test series. The third group encompasses algorithmic and molecular descriptor limitations and typical class imbalance problems. To address these complexities, this work provides a guide to the usage of a combination of predictive Quantitative Structure-Activity Relationship (QSAR) models, known as metamodels. This combination of low-level models (LLMs) enables a more precise approach to the problem by focusing on different sub-mechanisms or sub-processes. For mechanistic complexity, multiple Molecular Initiating Events (MIEs) or levels of information are combined to form a mechanistic-based metamodel. Regarding the complexity arising from chemical space, two types of approaches were reviewed to construct a fragment-based chemical space metamodel: those with and without structure sharing. Metamodels with structure sharing utilize unsupervised strategies to identify data patterns and build low-level models for each cluster, which are then combined. For situations without structure sharing due to pharmaceutical industry intellectual property, the use of prediction sharing, and federated learning approaches have been reviewed. Lastly, to tackle methodological complexity, various algorithms are combined to overcome their limitations, diverse descriptors are employed to enhance problem definition and balanced dataset combinations are used to address class imbalance issues (methodological-based metamodels). Remarkably, metamodels consistently outperformed classical QSAR models across all cases, highlighting the importance of alternatives to classical QSAR models when faced with such complexities.
Collapse
Affiliation(s)
- Pablo Rodríguez-Belenguer
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain; Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain
| | - Eric March-Vila
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Victor Mangas-Sanjuan
- Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain; Interuniversity Research Institute for Molecular Recognition and Technological Development, Universitat Politècnica de València, 46100 Valencia, Spain
| | - Emilio Soria-Olivas
- IDAL, Intelligent Data Analysis Laboratory, ETSE, Universitat de València, 46100 Valencia, Spain.
| |
Collapse
|
45
|
Saiwaeo S, Arwatchananukul S, Mungmai L, Preedalikit W, Aunsri N. Human skin type classification using image processing and deep learning approaches. Heliyon 2023; 9:e21176. [PMID: 38027689 PMCID: PMC10656243 DOI: 10.1016/j.heliyon.2023.e21176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 10/11/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023] Open
Abstract
Cosmetics consumers need to be aware of their skin type before purchasing products. Identifying skin types can be challenging, especially when they vary from oily to dry in different areas, with skin specialist providing more accurate results. In recent years, artificial intelligence and machine learning have been utilized across various fields, including medicine, to assist in identifying and predicting situations. This study developed a skin type classification model using a Convolutional Neural Networks (CNN) deep learning algorithms. The dataset consisted of normal, oily, and dry skin images, with 112 images for normal skin, 120 images for oily skin, and 97 images for dry skin. Image quality was enhanced using the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique, with data augmentation by rotation applied to increase dataset variety, resulting in a total of 1,316 images. CNN architectures including MobileNet-V2, EfficientNet-V2, InceptionV2, and ResNet-V1 were optimized and evaluated. Findings showed that the EfficientNet-V2 architecture performed the best, achieving an accuracy of 91.55% with average loss of 22.74%. To further improve the model, hyperparameter tuning was conducted, resulting in an accuracy of 94.57% and a loss of 13.77%. The Model performance was validated using 10-fold cross-validation and tested on unseen data, achieving an accuracy of 89.70% with a loss of 21.68%.
Collapse
Affiliation(s)
- Sirawit Saiwaeo
- School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand
| | - Sujitra Arwatchananukul
- School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand
- Integrated AgriTech Ecosystem Research Group (IATE), Mae Fah Luang University, Chiang Rai, Thailand
| | - Lapatrada Mungmai
- Division of Cosmetic Science, School of Pharmaceutical Sciences, University of Phayao, Phayao, Thailand
- Research and Innovation Center in Cosmetic Sciences and Natural products, Division of Cosmetic Sciences, School of Pharmaceutical Sciences, University of Phayao, Phayao, Thailand
| | - Weeraya Preedalikit
- Division of Cosmetic Science, School of Pharmaceutical Sciences, University of Phayao, Phayao, Thailand
- Research and Innovation Center in Cosmetic Sciences and Natural products, Division of Cosmetic Sciences, School of Pharmaceutical Sciences, University of Phayao, Phayao, Thailand
| | - Nattapol Aunsri
- School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand
- Integrated AgriTech Ecosystem Research Group (IATE), Mae Fah Luang University, Chiang Rai, Thailand
| |
Collapse
|
46
|
Karabacak M, Margetis K. Development of personalized machine learning-based prediction models for short-term postoperative outcomes in patients undergoing cervical laminoplasty. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2023; 32:3857-3867. [PMID: 37698693 DOI: 10.1007/s00586-023-07923-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 08/16/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023]
Abstract
PURPOSE By predicting short-term postoperative outcomes before surgery, patients undergoing cervical laminoplasty (CLP) surgery could benefit from more accurate patient care strategies that could reduce the likelihood of adverse outcomes. With this study, we developed a series of machine learning (ML) models for predicting short-term postoperative outcomes and integrated them into an open-source online application. METHODS National surgical quality improvement program database was utilized to identify individuals who have undergone CLP surgery. The investigated outcomes were prolonged length of stay (LOS), non-home discharges, 30-day readmissions, unplanned reoperations, and major complications. ML models were developed and implemented on a website to predict these three outcomes. RESULTS A total of 1740 patients that underwent CLP were included in the analysis. Performance evaluation indicated that the top-performing models for each outcome were the models built with TabPFN and LightGBM algorithms. The TabPFN models yielded AUROCs of 0.830, 0.847, and 0.858 in predicting non-home discharges, unplanned reoperations, and major complications, respectively. The LightGBM models yielded AUROCs of 0.812 and 0.817 in predicting prolonged LOS, and 30-day readmissions, respectively. CONCLUSION The potential of ML approaches to predict postoperative outcomes following spine surgery is significant. As the volume of data in spine surgery continues to increase, the development of predictive models as clinically relevant decision-making tools could significantly improve risk assessment and prognosis. Here, we present an accessible predictive model for predicting short-term postoperative outcomes following CLP intended to achieve the stated objectives.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, USA
| | | |
Collapse
|
47
|
Zhao Y, Chen X, Xue H, Weiss GM. A machine learning approach to graduate admissions and the role of letters of recommendation. PLoS One 2023; 18:e0291107. [PMID: 37878617 PMCID: PMC10599576 DOI: 10.1371/journal.pone.0291107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/22/2023] [Indexed: 10/27/2023] Open
Abstract
The graduate admissions process is time-consuming, subjective, and complicated by the need to combine information from diverse data sources. Letters of recommendation (LORs) are particularly difficult to evaluate and it is unclear how much impact they have on admissions decisions. This study addresses these concerns by building machine learning models to predict admissions decisions for two STEM graduate programs, with a focus on examining the contribution of LORs in the decision-making process. We train our predictive models leveraging information extracted from structured application forms (e.g., undergraduate GPA, standardized test scores, etc.), applicants' resumes, and LORs. A particular challenge in our study is the different modalities of application data (i.e., text vs. structured forms). To address this issue, we converted the textual LORs into features using a commercial natural language processing product and a manual rating process that we developed. By analyzing the predictive performance of the models using different subsets of features, we show that LORs alone provide only modest, but useful, predictive signals to admission decisions; the best model for predicting admissions decisions utilized both LOR and non-LOR data and achieved 89% accuracy. Our experiments demonstrate promising results in the utility of automated systems for assisting with graduate admission decisions. The findings confirm the value of LORs and the effectiveness of our feature engineering methods from LOR text. This study also assesses the significance of individual features using the SHAP method, thereby providing insight into key factors affecting graduate admission decisions.
Collapse
Affiliation(s)
- Yijun Zhao
- Computer and Information Sciences Department, Fordham University, New York, NY, United States of America
| | - Xiaoyu Chen
- Computer and Information Sciences Department, Fordham University, New York, NY, United States of America
| | - Haoran Xue
- Computer and Information Sciences Department, Fordham University, New York, NY, United States of America
| | - Gary M. Weiss
- Computer and Information Sciences Department, Fordham University, New York, NY, United States of America
| |
Collapse
|
48
|
Juan CK, Su YH, Wu CY, Yang CS, Hsu CH, Hung CL, Chen YJ. Deep convolutional neural network with fusion strategy for skin cancer recognition: model development and validation. Sci Rep 2023; 13:17087. [PMID: 37816815 PMCID: PMC10564722 DOI: 10.1038/s41598-023-42693-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/13/2023] [Indexed: 10/12/2023] Open
Abstract
We aimed to develop an accurate and efficient skin cancer classification system using deep-learning technology with a relatively small dataset of clinical images. We proposed a novel skin cancer classification method, SkinFLNet, which utilizes model fusion and lifelong learning technologies. The SkinFLNet's deep convolutional neural networks were trained using a dataset of 1215 clinical images of skin tumors diagnosed at Taichung and Taipei Veterans General Hospital between 2015 and 2020. The dataset comprised five categories: benign nevus, seborrheic keratosis, basal cell carcinoma, squamous cell carcinoma, and malignant melanoma. The SkinFLNet's performance was evaluated using 463 clinical images between January and December 2021. SkinFLNet achieved an overall classification accuracy of 85%, precision of 85%, recall of 82%, F-score of 82%, sensitivity of 82%, and specificity of 93%, outperforming other deep convolutional neural network models. We also compared SkinFLNet's performance with that of three board-certified dermatologists, and the average overall performance of SkinFLNet was comparable to, or even better than, the dermatologists. Our study presents an efficient skin cancer classification system utilizing model fusion and lifelong learning technologies that can be trained on a relatively small dataset. This system can potentially improve skin cancer screening accuracy in clinical practice.
Collapse
Affiliation(s)
- Chao-Kuei Juan
- Department of Dermatology, Taichung Veterans General Hospital, Taichung, Taiwan
- Department of Dermatology, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yu-Hao Su
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chen-Yi Wu
- Department of Dermatology, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Department of Dermatology, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Chi-Shun Yang
- Department of Pathology, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chung-Hao Hsu
- Department of Dermatology, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Che-Lun Hung
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| | - Yi-Ju Chen
- Department of Dermatology, Taichung Veterans General Hospital, Taichung, Taiwan.
- Department of Dermatology, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- Department of Post-Baccalaureate Medicine, Chung-Hsing University, Taichung, Taiwan.
| |
Collapse
|
49
|
Haghish EF, Czajkowski NO, von Soest T. Predicting suicide attempts among Norwegian adolescents without using suicide-related items: a machine learning approach. Front Psychiatry 2023; 14:1216791. [PMID: 37822798 PMCID: PMC10562596 DOI: 10.3389/fpsyt.2023.1216791] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 09/04/2023] [Indexed: 10/13/2023] Open
Abstract
Introduction Research on the classification models of suicide attempts has predominantly depended on the collection of sensitive data related to suicide. Gathering this type of information at the population level can be challenging, especially when it pertains to adolescents. We addressed two main objectives: (1) the feasibility of classifying adolescents at high risk of attempting suicide without relying on specific suicide-related survey items such as history of suicide attempts, suicide plan, or suicide ideation, and (2) identifying the most important predictors of suicide attempts among adolescents. Methods Nationwide survey data from 173,664 Norwegian adolescents (ages 13-18) were utilized to train a binary classification model, using 169 questionnaire items. The Extreme Gradient Boosting (XGBoost) algorithm was fine-tuned to classify adolescent suicide attempts, and the most important predictors were identified. Results XGBoost achieved a sensitivity of 77% with a specificity of 90%, and an AUC of 92.1% and an AUPRC of 47.1%. A coherent set of predictors in the domains of internalizing problems, substance use, interpersonal relationships, and victimization were pinpointed as the most important items related to recent suicide attempts. Conclusion This study underscores the potential of machine learning for screening adolescent suicide attempts on a population scale without requiring sensitive suicide-related survey items. Future research investigating the etiology of suicidal behavior may direct particular attention to internalizing problems, interpersonal relationships, victimization, and substance use.
Collapse
Affiliation(s)
- E. F. Haghish
- Department of Psychology, Faculty of Social Sciences, University of Oslo, Oslo, Norway
| | - Nikolai O. Czajkowski
- Department of Psychology, Faculty of Social Sciences, University of Oslo, Oslo, Norway
- Department of Mental Disorders, Division of Mental and Physical Health, Norwegian Institute of Public Health (NIPH), Oslo, Norway
| | - Tilmann von Soest
- Department of Psychology, Faculty of Social Sciences, University of Oslo, Oslo, Norway
- Norwegian Social Research (NOVA), Oslo Metropolitan University, Oslo, Norway
| |
Collapse
|
50
|
Almannaa M, Zawad MN, Moshawah M, Alabduljabbar H. Investigating the effect of road condition and vacation on crash severity using machine learning algorithms. Int J Inj Contr Saf Promot 2023; 30:392-402. [PMID: 37079354 DOI: 10.1080/17457300.2023.2202660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/14/2023] [Accepted: 04/10/2023] [Indexed: 04/21/2023]
Abstract
Investigating the contributing factors to traffic crash severity is a demanding topic in research focusing on traffic safety and policies. This research investigates the impact of 16 roadway condition features and vacations (along with the spatial and temporal factors and road geometry) on crash severity for major intra-city roads in Saudi Arabia. We used a crash dataset that covers four years (Oct. 2016 - Feb. 2021) with more than 59,000 crashes. Machine learning algorithms were utilized to predict the crash severity outcome (non-fatal/fatal) for three types of roads: single, multilane, and freeway. Furthermore, features that have a strong impact on crash severity were examined. Results show that only 4 out of 16 road condition variables were found to be contributing to crash severity, namely: paints, cat eyes, fence side, and metal cable. Additionally, vacation was found to be a contributing factor to crash severity, meaning crashes that occur on vacation are more severe than non-vacation days.
Collapse
Affiliation(s)
- Mohammed Almannaa
- Department of Civil Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
| | - Md Nabil Zawad
- Department of Civil Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
| | - May Moshawah
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Haifa Alabduljabbar
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|