1
|
Tyagi A, Singh VP, Gore MM. Spatial and frequency domain-based feature fusion for accurate detection of schizophrenia using AI-driven approaches. Health Inf Sci Syst 2025; 13:32. [PMID: 40224734 PMCID: PMC11992288 DOI: 10.1007/s13755-025-00345-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 02/25/2025] [Indexed: 04/15/2025] Open
Abstract
Schizophrenia is a neuropsychiatric disorder that hampers brain functions and causes hallucinations, delusions, and bizarre behavior. The stigmatization associated with this disabling disorder drives the need to build diagnostic models with impeccable performances. Neuroimaging modality such as structural MRI is coupled with machine learning techniques to perform schizophrenia diagnosis with increased reliability. We investigate the structural aberrations present in the structural MR images using machine learning techniques. In this study, we propose a new hybrid approach using spatial and frequency domain-based features for the early automated detection of schizophrenia using machine learning techniques. The spatial or texture features are extracted using the local binary pattern method, and frequency-based features, including magnitude and phase, are extracted using the fast fourier transform feature extraction technique. Hybrid features, combining spatial and frequency-based features, are utilized for schizophrenia classification using support vector machine, random forest, and k-nearest neighbor with stratified 10-fold cross-validation. The support vector machine and random forest classifiers achieve encouraging detection performances on the hybrid feature set, with 86.5% and 85.1% accuracy, respectively. Among the three classifiers, k-nearest neighbor shows outstanding detection performance with an accuracy of 98.1%. The precision and recall achieved by the k-nearest neighbor classifier are 98.1% and 98.0% respectively, reflecting accurate detection of schizophrenia by the proposed model.
Collapse
Affiliation(s)
- Ashima Tyagi
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004 India
| | - Vibhav Prakash Singh
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004 India
| | - Manoj Madhava Gore
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004 India
| |
Collapse
|
2
|
Seal S, Mahale M, García-Ortegón M, Joshi CK, Hosseini-Gerami L, Beatson A, Greenig M, Shekhar M, Patra A, Weis C, Mehrjou A, Badré A, Paisley B, Lowe R, Singh S, Shah F, Johannesson B, Williams D, Rouquie D, Clevert DA, Schwab P, Richmond N, Nicolaou CA, Gonzalez RJ, Naven R, Schramm C, Vidler LR, Mansouri K, Walters WP, Wilk DD, Spjuth O, Carpenter AE, Bender A. Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World. Chem Res Toxicol 2025; 38:759-807. [PMID: 40314361 DOI: 10.1021/acs.chemrestox.5c00033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
Collapse
Affiliation(s)
- Srijit Seal
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Manas Mahale
- Department of Pharmaceutical Chemistry, Bombay College of Pharmacy, Mumbai 400098, India
| | | | - Chaitanya K Joshi
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K
| | | | - Alex Beatson
- Axiom Bio, San Francisco, California 94107, United States
| | - Matthew Greenig
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Mrinal Shekhar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | | | | | | | - Adrien Badré
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Brianna Paisley
- Eli Lilly & Company, Indianapolis, Indiana 46285, United States
| | | | - Shantanu Singh
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Falgun Shah
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | | | | | - David Rouquie
- Toxicology Data Science, Bayer SAS Crop Science Division, Valbonne Sophia-Antipolis 06560, France
| | - Djork-Arné Clevert
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin 10922, Germany
| | | | | | - Christos A Nicolaou
- Computational Drug Design, Digital Science & Innovation, Novo Nordisk US R&D, Lexington, Massachusetts 02421, United States
| | - Raymond J Gonzalez
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | - Russell Naven
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | | | - Kamel Mansouri
- NIH/NIEHS/DTT/NICEATM, Research Triangle Park, North Carolina 27709, United States
| | | | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala 751 24, Sweden
- Phenaros Pharmaceuticals AB, Uppsala 75239, Sweden
| | - Anne E Carpenter
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
- College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
3
|
Gholamzadeh M, Safdari R, Asadi Gharabaghi M, Abtahi H. Analysis of the most influential factors affecting outcomes of lung transplant recipients: a multivariate prediction model based on UNOS Data. BMJ Open 2025; 15:e089796. [PMID: 40379311 DOI: 10.1136/bmjopen-2024-089796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/19/2025] Open
Abstract
OBJECTIVES In lung transplantation (LTx), a priority is assigned to each candidate on the waiting list. Our primary objective was to identify the key factors that influence the allocation of priorities in LTx using machine learning (ML) techniques to enhance the process of prioritising patients. DESIGN Developing a prediction model. SETTING AND PARTICIPANTS Our data were retrieved from the United Network for Organ Sharing (UNOS) open-source database of transplant patients between 2005 and 2023. INTERVENTIONS After the preprocessing process, a feature engineering technique was employed to select the most relevant features. Then, six ML models with optimised hyperparameters including multiple linear regression, random forest regressor (RF), support vector machine regressor, XGBoost regressor, a multilayer perceptron model and a deep learning model were developed based on the UNOS dataset. PRIMARY AND SECONDARY OUTCOME MEASURES The performance of each model was evaluated using R-squared (R2) and other error rate metrics. Next, the Shapley Additive Explanations (SHAP) technique was used to identify the most important features in the prediction. RESULTS The raw dataset contains 196 270 records with 545 features in all organs. After preprocessing, 32 966 records with 15 features remain. Among various models, the RF model achieved a high R2 score. Additionally, the RF model exhibited the lowest error values, indicating its superior precision compared with other regression models. The SHAP technique in conjunction with the RF model revealed the 11 most important features for priority allocation. Subsequently, we developed a web-based decision support tool using Python and the Streamlit framework based on the best-fine-tuned model. CONCLUSION The deployment of the ML model has the potential to act as an automated tool to aid physicians in assessing the priority of lung transplants and identifying significant factors that play a role in patient survival.
Collapse
Affiliation(s)
- Marsa Gholamzadeh
- Health Information Management and Medical Informatics Department, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Reza Safdari
- Health Information Management and Medical Informatics Department, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Mehrnaz Asadi Gharabaghi
- Department of Pulmonary Medicine, Faculty of Medicine, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Hamidreza Abtahi
- Pulmonary and Critical Care Medicine Department, Thoracic Research Center, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| |
Collapse
|
4
|
Zhao X, Wang Y, Li J, Liu W, Yang Y, Qiao Y, Liao J, Chen M, Li D, Wu B, Huang D, Wu D. A machine-learning-derived online prediction model for depression risk in COPD patients: A retrospective cohort study from CHARLS. J Affect Disord 2025; 377:284-293. [PMID: 39988142 DOI: 10.1016/j.jad.2025.02.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 02/14/2025] [Accepted: 02/17/2025] [Indexed: 02/25/2025]
Abstract
BACKGROUND Depression associated with Chronic Obstructive Pulmonary Disease (COPD) is a detrimental complication that significantly impairs patients' quality of life. This study aims to develop an online predictive model to estimate the risk of depression in COPD patients. METHODS This study included 2921 COPD patients from the 2018 China Health and Retirement Longitudinal Study (CHARLS), analyzing 36 behavioral, health, psychological, and socio-demographic indicators. LASSO regression filtered predictive factors, and six machine learning models-Logistic Regression, Support Vector Machine, Multilayer Perceptron, LightGBM, XGBoost, and Random Forest-were applied to identify the best model for predicting depression risk in COPD patients. Temporal validation used 2013 CHARLS data. We developed a personalized, interpretable risk prediction platform using SHAP. RESULTS A total of 2921 patients with COPD were included in the analysis, of whom 1451 (49.7 %) presented with depressive symptoms. 11 variables were selected to develop 6 machine learning models. Among these, the XGBoost model exhibited exceptional predictive performance in terms of discrimination, calibration, and clinical applicability, with an AUROC range of 0.747-0.811. In validation sets encompassing diverse population characteristics, XGBoost achieved the highest accuracy (70.63 %), sensitivity (59.05 %), and F1 score (63.17 %). LIMITATIONS The target population for the model is COPD patients. And the clinical benefits of interventions based on the prediction results remain uncertain. CONCLUSION We developed an online prediction platform for clinical application, allowing healthcare professionals to swiftly and efficiently evaluate the risk of depression in COPD patients, facilitating timely interventions and treatments.
Collapse
Affiliation(s)
- Xuanna Zhao
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Yunan Wang
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Jiahua Li
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Weiliang Liu
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Yuting Yang
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Youping Qiao
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Jinyu Liao
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Min Chen
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Dongming Li
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Bin Wu
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Dan Huang
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China
| | - Dong Wu
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524013, China.
| |
Collapse
|
5
|
Toribio-Celestino L, San Millan A. Plasmid-bacteria associations in the clinical context. Trends Microbiol 2025:S0966-842X(25)00122-2. [PMID: 40374465 DOI: 10.1016/j.tim.2025.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2025] [Revised: 04/11/2025] [Accepted: 04/15/2025] [Indexed: 05/17/2025]
Abstract
Antimicrobial resistance (AMR) is one of the most pressing global health problems, with plasmids playing a central role in its evolution and dissemination. Over the past decades, many studies have investigated the ecoevolutionary dynamics between plasmids and their bacterial hosts. However, what drives the epidemiological success of certain plasmid-bacterium associations remains unclear. In this opinion article, we review which factors influence these associations and underline that studying plasmid-host interactions of clinical relevance is critical for understanding the evolution and spread of AMR. We also highlight the increasing importance of integrating experimental research with bioinformatics and machine learning tools to study plasmid-bacteria dynamics. This combined approach will assist researchers to dissect the molecular mechanisms underlying successful plasmid-host associations and to design strategies to prevent and predict future high-risk associations.
Collapse
Affiliation(s)
| | - Alvaro San Millan
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain; Centro de Investigación Biológica en Red de Epidemiología y Salud Pública, Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
6
|
Pramanick N, Mathew J, Selvarajan S, Agarwal M. Leveraging stacking machine learning models and optimization for improved cyberattack detection. Sci Rep 2025; 15:16757. [PMID: 40369010 PMCID: PMC12078668 DOI: 10.1038/s41598-025-01052-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 05/02/2025] [Indexed: 05/16/2025] Open
Abstract
The ever-growing number of complex cyber attacks requires the need for high-level intrusion detection systems (IDS). While the available research deals with traditional, hybrid, and ensemble methods for network data analysis, serious challenges are still being met in terms of producing robust and highly accurate detection systems. There are high hurdles in managing high-dimensional network traffic since current methodologies are limited in dealing with imbalanced data issues of minority classes versus the majority and high false positive rate in classification accuracy. This study introduces an innovative framework that directly addresses these persistent challenges through a novel approach to intrusion detection. The proposed method integrates two ML models: J48 and ExtraTreeClassifier for classification. Besides, we propose an improved equilibrium optimizer (EO) approach whereby the previous EO is modified. In this enhanced equilibrium optimizer (EEO), the Fisher score and accuracy score of the K-Nearest Neighbors (KNN) algorithm select the attributes optimally, whereas the synthetic minority oversampling technique combined with iterative partitioning filters (SMOTE-IPF) used to provide class balancing. The KNN technique is also used for data imputation to improve the overall system accuracy. The superior performance of the framework has been validated experimentally on several benchmark datasets, i.e., NSL-KDD, and UNSW-NB15, achieving 99.7% and 98.1% accuracy and F1 score 99.6 and 98.0 respectively. By subjecting the system to a comparative analysis with recent state-of-the-art works, this paper has shown that the proposed methodology yields better improvement in feature selection precision classification accuracy, handling of minority class instance, less demanding storage and computational efficiency.
Collapse
Affiliation(s)
- Neha Pramanick
- Computer Science and Engineering, IIT Patna, Patna, Bihar, 801103, India
| | - Jimson Mathew
- Computer Science and Engineering, IIT Patna, Patna, Bihar, 801103, India
| | - Shitharth Selvarajan
- Department of Computer Science, Kebri Dehar University, 250, Kebri Dehar, Ethiopia.
- Department of Computer Science and Engineering, Chennai Institute of Technology, Chennai, India.
- Centre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, India.
| | - Mayank Agarwal
- Computer Science and Engineering, IIT Patna, Patna, Bihar, 801103, India
| |
Collapse
|
7
|
Mariotti F, Agostini A, Borgheresi A, Marchegiani M, Zannotti A, Giacomelli G, Pierpaoli L, Tola E, Galiffa E, Giovagnoni A. Insights into radiomics: a comprehensive review for beginners. Clin Transl Oncol 2025:10.1007/s12094-025-03939-5. [PMID: 40355777 DOI: 10.1007/s12094-025-03939-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2025] [Accepted: 04/16/2025] [Indexed: 05/15/2025]
Abstract
Radiomics and artificial intelligence (AI) are rapidly evolving, significantly transforming the field of medical imaging. Despite their growing adoption, these technologies remain challenging to approach due to their technical complexity. This review serves as a practical guide for early-career radiologists and researchers seeking to integrate radiomics into their studies. It provides practical insights for clinical and research applications, addressing common challenges, limitations, and future directions in the field. This work offers a structured overview of the essential steps in the radiomics workflow, focusing on concrete aspects of each step, including indicative and practical examples. It covers the main steps such as dataset definition, image acquisition and preprocessing, segmentation, feature extraction and selection, and AI model training and validation. Different methods to be considered are discussed, accompanied by summary diagrams. This review equips readers with the knowledge necessary to approach radiomics and AI in medical imaging from a hands-on research perspective.
Collapse
Affiliation(s)
- Francesco Mariotti
- Department of Clinical, Special and Dental Sciences, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Andrea Agostini
- Department of Clinical, Special and Dental Sciences, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
- Department of Radiological Sciences - Division of Clinical Radiology, University Hospital "Azienda Ospedaliero Universitaria delle Marche", Via Conca, 71, 60126, Ancona, Italy
| | - Alessandra Borgheresi
- Department of Clinical, Special and Dental Sciences, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy.
- Department of Radiological Sciences - Division of Clinical Radiology, University Hospital "Azienda Ospedaliero Universitaria delle Marche", Via Conca, 71, 60126, Ancona, Italy.
| | - Marzia Marchegiani
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Alice Zannotti
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Gloria Giacomelli
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Luca Pierpaoli
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Elisabetta Tola
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Elena Galiffa
- School of Radiology, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
| | - Andrea Giovagnoni
- Department of Clinical, Special and Dental Sciences, University Politecnica delle Marche, Via Tronto, 10/A, 60126, Ancona, Italy
- Department of Radiological Sciences - Division of Clinical Radiology, University Hospital "Azienda Ospedaliero Universitaria delle Marche", Via Conca, 71, 60126, Ancona, Italy
| |
Collapse
|
8
|
Wang P, Liu L, Xie Z, Ren G, Hu Y, Shen M, Wang H, Wang J, Wang Y, Wu XT. Explainable Machine Learning Models for Prediction of Surgical Site Infection After Posterior Lumbar Fusion Surgery Based on Shapley Additive Explanations. World Neurosurg 2025; 197:123942. [PMID: 40154601 DOI: 10.1016/j.wneu.2025.123942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2025] [Accepted: 03/18/2025] [Indexed: 04/01/2025]
Abstract
OBJECTIVE This study aims to develop machine learning (ML) models combined with an explainable method for the prediction of surgical site infection (SSI) after posterior lumbar fusion surgery. METHODS In this retrospective, single-center study, a total of 1016 consecutive patients who underwent posterior lumbar fusion surgery were included. A comprehensive dataset was established, encompassing demographic variables, comorbidities, preoperative evaluation, details related to diagnosed lumbar disease, preoperative laboratory tests, surgical specifics, and postoperative factors. Utilizing this dataset, 6nullML models were developed to predict the occurrence of SSI. Performance evaluation of the models on the testing set involved several metrics, including the receiver operating characteristic curve, the area under the receiver operating characteristic curve, accuracy, recall, F1 score, and precision. The Shapley Additive Explanations (SHAP) method was employed to generate interpretable predictions, enabling a comprehensive assessment of SSI risk and providing individualized interpretations of the model results. RESULTS Among the 1016 retrospective cases included in the study, 36 (3.54%) experienced SSI. Out of the six models examined, the Extreme Gradient Boost model demonstrated the highest discriminatory performance on the testing set, achieving the following metrics: precision (0.9000), recall (0.8182), accuracy (0.9902), F1 score (0.8571), and area under the receiver operating characteristic curve (0.9447). By utilizing the SHAP method, several important predictors of SSI were identified, including the duration of indwelling jugular vein catheter, blood urea nitrogen levels, total protein levels, sustained fever, creatinine levels, triglycerides levels, monocyte count, diabetes mellitus, drainage time, white blood cell count, cerebral infarction, estimated blood loss, prealbumin levels, Prognostic Nutritional Index, low back pain, posterior fusion score, and osteoporosis. CONCLUSIONS ML-based prediction tools can accurately assess the risk of SSI after posterior lumbar fusion surgery. Additionally, ML combined with SHAP could provide a clear interpretation of individualized risk prediction and give physicians an intuitive comprehension of the effects of the model's essential features.
Collapse
Affiliation(s)
- PeiYang Wang
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - Lei Liu
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - ZhiYang Xie
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - GuanRui Ren
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - YiLi Hu
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - MeiJi Shen
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - Hui Wang
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - JiaDong Wang
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - YunTao Wang
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - Xiao-Tao Wu
- Department of Spine Surgery, Affiliated Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China.
| |
Collapse
|
9
|
Han J, Guzman JA, Chu ML. Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 383:125478. [PMID: 40286423 DOI: 10.1016/j.jenvman.2025.125478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 03/22/2025] [Accepted: 04/19/2025] [Indexed: 04/29/2025]
Abstract
This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the "very high" class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.
Collapse
Affiliation(s)
- Jeongho Han
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA; Agriculture and Life Sciences Research Institute, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Jorge A Guzman
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Maria L Chu
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
10
|
Haker R, Helft C, Natali Shamir E, Shahar M, Solomon H, Omer N, Blumenfeld‐Katzir T, Zlotzover S, Piontkewitz Y, Weiner I, Ben‐Eliezer N. Characterization of Brain Abnormalities in Lactational Neurodevelopmental Poly I:C Rat Model of Schizophrenia and Depression Using Machine-Learning and Quantitative MRI. J Magn Reson Imaging 2025; 61:2281-2291. [PMID: 39466009 PMCID: PMC11987781 DOI: 10.1002/jmri.29634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 10/03/2024] [Accepted: 10/05/2024] [Indexed: 10/29/2024] Open
Abstract
BACKGROUND A recent neurodevelopmental rat model, utilizing lactational exposure to polyriboinosinic-polyribocytidilic acid (Poly I:C) leads to mimics of behavioral phenotypes resembling schizophrenia-like symptoms in male offspring and depression-like symptoms in female offspring. PURPOSE To identify mechanisms of neuronal abnormalities in lactational Poly I:C offspring using quantitative MRI (qMRI) tools. STUDY TYPE Prospective. ANIMAL MODEL Twenty Poly I:C rats and 20 healthy control rats, age 130 postnatal day. FIELD STRENGTH/SEQUENCE 7 T. Multiflip-angle FLASH protocol for T1 mapping; multi-echo spin-echo T2-mapping protocol; echo planar imaging protocol for diffusion tensor imaging. ASSESSMENT Nursing dams were injected with the viral mimic Poly I:C or saline (control group). In adulthood, quantitative maps of T1, T2, proton density, and five diffusion metrics were generated for the offsprings. Seven regions of interest (ROIs) were segmented, followed by extracting 10 quantitative features for each ROI. STATISTICAL TESTS Random forest machine learning (ML) tool was employed to identify MRI markers of disease and classify Poly I:C rats from healthy controls based on quantitative features. RESULTS Poly I:C rats were identified from controls with an accuracy of 82.5 ± 25.9% for females and 85.0 ± 24.0% for males. Poly I:C females exhibited differences mainly in diffusion-derived parameters in the thalamus and the medial prefrontal cortex (MPFC), while males displayed changes primarily in diffusion-derived parameters in the corpus callosum and MPFC. DATA CONCLUSION qMRI shows potential for identifying sex-specific brain abnormalities in the Poly I:C model of neurodevelopmental disorders. LEVEL OF EVIDENCE NA TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Rona Haker
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
| | - Coral Helft
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
| | | | - Moni Shahar
- The AI and Data Science CenterTel Aviv UniversityTel AvivIsrael
| | - Hadas Solomon
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
| | - Noam Omer
- Department of Biomedical EngineeringTel Aviv UniversityTel AvivIsrael
| | | | - Sharon Zlotzover
- Department of Biomedical EngineeringTel Aviv UniversityTel AvivIsrael
| | - Yael Piontkewitz
- School of Psychological SciencesTel Aviv UniversityTel AvivIsrael
| | - Ina Weiner
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
- School of Psychological SciencesTel Aviv UniversityTel AvivIsrael
| | - Noam Ben‐Eliezer
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
- Department of Biomedical EngineeringTel Aviv UniversityTel AvivIsrael
- Center for Advanced Imaging Innovation and Research (CAI2R)New York University School of MedicineNew YorkNew YorkUSA
| |
Collapse
|
11
|
Rao A, Haydel J, Ma S, Thrift AP, Nguyen-Wenker T, El-Serag HB. A Simple, Interpretable Machine Learning Model Based on Clinical Factors Accurately Predicts Incident Dysplasia or Malignancy in Barrett's Esophagus. Dig Dis Sci 2025:10.1007/s10620-025-09069-w. [PMID: 40293634 DOI: 10.1007/s10620-025-09069-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Accepted: 04/14/2025] [Indexed: 04/30/2025]
Abstract
PURPOSE Identifying patients likely to develop dysplasia or malignancy is critical for effective surveillance in patients with Barrett's Esophagus (BE). However, current predictive models are limited. We evaluated the performance of machine learning (ML) models in predicting incident dysplasia or malignancy in a cohort of veteran patients with BE. METHODS We analyzed data from 598 patients newly diagnosed with non-dysplastic BE (NDBE), BE indefinite for dysplasia (BE-IND), and BE with non-persistent low-grade dysplasia (LGD) at the Michael DeBakey Veterans Affairs Medical Center from November 1990 to January 2019 with follow-up through January 2024. Progressors were patients who developed persistent LGD, HGD, or EAC within 5 years of index endoscopy. Six models were evaluated, encompassing regression and ensemble-based ML methods. RESULTS Of 598 qualifying patients, 61 (10.2%) progressed. Longer segments and indefinite/non-persistent LGD pathology were associated with higher risk of progression in unadjusted analyses. BE segment length remained significant on multivariate analysis (OR 1.26; 95% CI 1.17-1.36 per 1 cm increase). A decision tree (DT) model, using only segment length, achieved the highest discrimination (AUROC = 0.79) and excellent sensitivity (93.3%). The DT model also identified segment length thresholds for risk stratification: < 0.95 cm (minimal risk), 0.95-2.44 cm (low), 2.44-9.45 cm (moderate), > 9.45 cm (high). CONCLUSIONS A simple, interpretable DT model with segment length as the sole predictor outperformed regression and complex ML-based models in predicting BE progressors. Findings align with European Society of Gastrointestinal Endoscopy (ESGE) guidelines suggesting tailored surveillance based on segment length and provide actionable thresholds. These results offer a practical ML tool for BE surveillance.
Collapse
Affiliation(s)
- Ashwin Rao
- Section of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, TX, USA
| | - Jasmine Haydel
- Department of Internal Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Samuel Ma
- Department of Internal Medicine, Baylor College of Medicine, Houston, TX, USA
- School of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Aaron P Thrift
- Section of Epidemiology and Population Sciences, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Theresa Nguyen-Wenker
- Section of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, TX, USA
| | - Hashem B El-Serag
- Section of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
12
|
Alam F, Mohammed Alnazzawi TS, Mehmood R, Al-maghthawi A. A Review of the Applications, Benefits, and Challenges of Generative AI for Sustainable Toxicology. Curr Res Toxicol 2025; 8:100232. [PMID: 40331045 PMCID: PMC12051651 DOI: 10.1016/j.crtox.2025.100232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 03/10/2025] [Accepted: 04/09/2025] [Indexed: 05/08/2025] Open
Abstract
Sustainable toxicology is vital for living species and the environment because it guarantees the safety, efficacy, and regulatory compliance of drugs, treatments, vaccines, and chemicals in living organisms and the environment. Conventional toxicological methods often lack sustainability as they are costly, time-consuming, and sometimes inaccurate. It means delays in producing new drugs, vaccines, and treatments and understanding the adverse effects of the chemicals on the environment. To address these challenges, the healthcare sector must leverage the power of the Generative-AI (GenAI) paradigm. This paper aims to help understand how the healthcare field can be revolutionized in multiple ways by using GenAI to facilitate sustainable toxicological developments. This paper first reviews the present literature and identifies the possible classes of GenAI that can be applied to toxicology. A generalized and holistic visualization of various toxicological processes powered by GenAI is presented in tandem. The paper discussed toxicological risk assessment and management, spotlighting how global agencies and organizations are forming policies to standardize and regulate AI-related development, such as GenAI, in these fields. The paper identifies and discusses the advantages and challenges of GenAI in toxicology. Further, the paper outlines how GenAI empowers Conversational-AI, which will be critical for highly tailored toxicological solutions. This review will help to develop a comprehensive understanding of the impacts and future potential of GenAI in the field of toxicology. The knowledge gained can be applied to create sustainable GenAI applications for various problems in toxicology, ultimately benefiting our societies and the environment.
Collapse
Affiliation(s)
- Furqan Alam
- Faculty of Computing and Information Technology (FoCIT), Sohar University, Sohar 311, Oman
| | - Tahani Saleh Mohammed Alnazzawi
- Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah 41477, Kingdom of Saudi Arabia
| | - Rashid Mehmood
- Faculty of Computer Science and Information Systems, Islamic University Madinah, Madinah 42351, Kingdom of Saudi Arabia
| | - Ahmed Al-maghthawi
- Department of Computer Science, College of Science & Art at Mahayil, King Khalid University, Abha 62529, Kingdom of Saudi Arabia
| |
Collapse
|
13
|
Jawli A, Nabi G, Huang Z, Alhusaini AJ, Wei C, Tang B. Machine Learning Model Development for Malignant Prostate Lesion Prediction Using Texture Analysis Features from Ultrasound Shear-Wave Elastography. Cancers (Basel) 2025; 17:1358. [PMID: 40282532 PMCID: PMC12026400 DOI: 10.3390/cancers17081358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2025] [Revised: 04/13/2025] [Accepted: 04/15/2025] [Indexed: 04/29/2025] Open
Abstract
Introduction: Artificial intelligence (AI) is increasingly utilized for texture analysis and the development of machine learning (ML) techniques to enhance diagnostic accuracy. ML algorithms are trained to differentiate between normal and malignant conditions based on provided data. Texture feature analysis, including first-order and second-order features, is a critical step in ML development. This study aimed to evaluate quantitative texture features of normal and prostate cancer tissues identified through ultrasound B-mode and shear-wave elastography (SWE) imaging and to develop and assess ML models for predicting and classifying normal versus malignant prostate tissues. Methodology: First-order and second-order texture features were extracted from B-mode and SWE imaging, including four reconstructed regions of interest (ROIs) from SWE images for normal and malignant tissues. A total of 94 texture features were derived, including features for intensity, Gray-Level Co-Occurrence Matrix (GLCM), Gray-Level Dependence Length Matrix (GLDLM), Gray-Level Run Length Matrix (GLRLM), and Gray-Level Size Zone Matrix (GLSZM). Five ML models were developed and evaluated using 5-fold cross-validation to predict normal and malignant tissues. Results: Data from 62 patients were analyzed. All ROIs, except those derived from B-mode imaging, exhibited statistically significant differences in features between normal and malignant tissues. Among the developed models, Support Vector Machines (SVM), Random Forest (RF), and Naive Bayes (NB) demonstrated the highest performance across all ROIs. These models consistently achieved strong predictive accuracy for classifying normal versus malignant tissues. Gray Pure SWE and Gray Reconstructed images Provided the highest sensitivity and specificity in PCa prediction by 82%, 90%, and 98%, 96%, respectively. Conclusions: Texture analysis with machine learning on SWE-US and reconstructed images effectively differentiates malignant from benign prostate lesions, with features like contrast, entropy, and correlation playing a key role. Random Forest, SVM, and Naïve Bayes showed the highest classification performance, while grayscale reconstructions (GPSWE and GRRI) enhanced detection accuracy.
Collapse
Affiliation(s)
- Adel Jawli
- Biomedical Engineering, School of Science and Engineering, Fulton Building, University of Dundee, Dundee DD1 4HN, UK
| | - Ghulam Nabi
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
| | - Zhihong Huang
- School of Physics, Engineering and Technology, University of York, Heslington, York YO10 5DD, UK
| | - Abeer J. Alhusaini
- Division of Imaging Sciences and Technology, School of Medicine, Ninewells Hospital, University of Dundee, Dundee DD1 9SY, UK
| | - Cheng Wei
- Biomedical Engineering, School of Science and Engineering, Fulton Building, University of Dundee, Dundee DD1 4HN, UK
| | - Benjie Tang
- Surgical Skills Centre, Dundee Institute for Healthcare Simulation Respiratory Medicine and Gastroenterology, School of Medicine, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| |
Collapse
|
14
|
Lamsaf A, Carrilho R, Neves JC, Proença H. Causality, Machine Learning, and Feature Selection: A Survey. SENSORS (BASEL, SWITZERLAND) 2025; 25:2373. [PMID: 40285063 PMCID: PMC12030831 DOI: 10.3390/s25082373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 03/17/2025] [Accepted: 03/28/2025] [Indexed: 04/29/2025]
Abstract
Causality, which involves distinguishing between cause and effect, is essential for understanding complex relationships in data. This paper provides a review of causality in two key areas: causal discovery and causal inference. Causal discovery transforms data into graphical structures that illustrate how variables influence one another, while causal inference quantifies the impact of these variables on a target outcome. The models are more robust and accurate with the integration of causal reasoning into machine learning, improving applications like prediction and classification. We present various methods used in detecting causal relationships and how these can be applied in selecting or extracting relevant features, particularly from sensor datasets. When causality is used in feature selection, it supports applications like fault detection, anomaly detection, and predictive maintenance applications critical to the maintenance of complex systems. Traditional correlation-based methods of feature selection often overlook significant causal links, leading to incomplete insights. Our research highlights how integrating causality can be integrated and lead to stronger, deeper feature selection and ultimately enable better decision making in machine learning tasks.
Collapse
Affiliation(s)
- Asmae Lamsaf
- IT: Instituto de Telecomunicações, University of Beira Interior, 6200-001 Covilhã, Portugal;
(R.C.); (H.P.)
| | - Rui Carrilho
- IT: Instituto de Telecomunicações, University of Beira Interior, 6200-001 Covilhã, Portugal;
(R.C.); (H.P.)
| | - João C. Neves
- Department of Computer Science, University of Beira Interior, 6200-209 Covilhã, Portugal;
| | - Hugo Proença
- IT: Instituto de Telecomunicações, University of Beira Interior, 6200-001 Covilhã, Portugal;
(R.C.); (H.P.)
| |
Collapse
|
15
|
Amin SA, Kar S, Piotto S. pDILI_v1: A Web-Based Machine Learning Tool for Predicting Drug-Induced Liver Injury (DILI) Integrating Chemical Space Analysis and Molecular Fingerprints. ACS OMEGA 2025; 10:13502-13514. [PMID: 40224405 PMCID: PMC11983207 DOI: 10.1021/acsomega.5c00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 03/06/2025] [Accepted: 03/18/2025] [Indexed: 04/15/2025]
Abstract
Drug-induced liver injury (DILI) represents a critical safety concern for drug development, regulatory oversight, and clinical practice, with substantial economic and public health implications. While predicting DILI risk in humans has garnered significant attention, the associated chemical space has remained insufficiently explored. This study addresses this gap through a comprehensive computational approach, leveraging machine learning (ML) to investigate structural determinants of DILI risk systematically. The study focuses on three key objectives: (i) exploring the chemical space and scaffold diversity associated with DILI; (ii) employing fragment-based approaches to identify structural alerts (SAs) that influence DILI risk; and (iii) developing supervised ML models to not only predict DILI risk but also elucidate the structural significance of molecular fingerprints. To broaden accessibility, we introduce pDILI_v1, a Python-based web application available at https://pdiliv1web.streamlit.app/. This user-friendly platform facilitates the prediction and visualization of DILI risk, enabling both experts and nonexperts to screen compounds effectively. Additional formats, including a Google Colab notebook and a graphical user interface (GUI) for Windows, ensure flexibility for diverse user needs. The proposed models demonstrate the potential for early identification of hepatotoxic risks in drug candidates, providing critical insights into drug discovery and development. By integrating ML-driven predictions with chemical space analysis, this research advances the field of drug safety evaluation, contributing to the development of safer pharmaceuticals and mitigating the risks of DILI.
Collapse
Affiliation(s)
- Sk Abdul Amin
- Department
of Pharmacy, Universita degli Studi di Salerno, Via Giovanni Paolo II 132, Fisciano 84084, Campania, Italy
| | - Supratik Kar
- Chemometrics
and Molecular Modeling Laboratory, Department of Chemistry and Physics, Kean University, 1000 Morris Avenue, Union, New Jersey 07083, United States
| | - Stefano Piotto
- Department
of Pharmacy, Universita degli Studi di Salerno, Via Giovanni Paolo II 132, Fisciano 84084, Campania, Italy
| |
Collapse
|
16
|
Chaturvedi M, Rashid MA, Paliwal KK. RNA structure prediction using deep learning - A comprehensive review. Comput Biol Med 2025; 188:109845. [PMID: 39983363 DOI: 10.1016/j.compbiomed.2025.109845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 02/09/2025] [Accepted: 02/10/2025] [Indexed: 02/23/2025]
Abstract
In computational biology, accurate RNA structure prediction offers several benefits, including facilitating a better understanding of RNA functions and RNA-based drug design. Implementing deep learning techniques for RNA structure prediction has led tremendous progress in this field, resulting in significant improvements in prediction accuracy. This comprehensive review aims to provide an overview of the diverse strategies employed in predicting RNA secondary structures, emphasizing deep learning methods. The article categorizes the discussion into three main dimensions: feature extraction methods, existing state-of-the-art learning model architectures, and prediction approaches. We present a comparative analysis of various techniques and models highlighting their strengths and weaknesses. Finally, we identify gaps in the literature, discuss current challenges, and suggest future approaches to enhance model performance and applicability in RNA structure prediction tasks. This review provides a deeper insight into the subject and paves the way for further progress in this dynamic intersection of life sciences and artificial intelligence.
Collapse
Affiliation(s)
- Mayank Chaturvedi
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Mahmood A Rashid
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Kuldip K Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| |
Collapse
|
17
|
Xu Y, Wu Q. Using machine learning and single nucleotide polymorphisms for improving rheumatoid arthritis risk Prediction in postmenopausal women. PLOS DIGITAL HEALTH 2025; 4:e0000790. [PMID: 40202941 PMCID: PMC11981130 DOI: 10.1371/journal.pdig.0000790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/17/2025] [Indexed: 04/11/2025]
Abstract
Genetic factors contribute to 60-70% of the variability in rheumatoid arthritis (RA). However, few studies have used genetic variants to predict RA risk. This study aimed to enhance RA risk prediction by leveraging single nucleotide polymorphisms (SNPs) through machine-learning algorithms, utilizing Women's Health Initiative data. We developed four predictive models: 1) based on common RA risk factors, 2) model 1 incorporating polygenic risk scores (PRS) with principal components, 3) model 1 and SNPs after feature reduction, and 4) model 1 and SNPs with kernel principal component analysis. Each model was assessed using logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGBoost), and support vector machine (SVM). Performance metrics included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive and negative predictive values (PPV and NPV), and F1-score. The fourth model, integrating SNPs with XGBoost, outperformed all other models. In addition, the XGBoost model that combines genomic data with conventional phenotypic predictors significantly enhanced predictive accuracy, achieving the highest AUC of 0.90 and an F1 score of 0.83. The DeLong test confirmed significant differences in AUC between this model and the others (p-values < 0.0001), particularly highlighting its efficacy in utilizing complex genetic information. These findings emphasize the advantage of combining in-depth genomic data with advanced machine learning for RA risk prediction. The most robust performance of the XGBoost model, which integrated both conventional risk factors and individual SNPs, demonstrates its potential as a tool in personalized medicine for complex diseases like RA. This approach offers a more nuanced and effective RA risk assessment strategy, underscoring the need for further studies to extend broader applications.
Collapse
Affiliation(s)
- Yingke Xu
- Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, Nevada, United States of America
- Department of Epidemiology and Biostatistics, School of Public Health, the University of Nevada Las Vegas, Las Vegas, Nevada, United States of America
| | - Qing Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
18
|
Xie J, Ma RW, Feng YJ, Qiao Y, Zhu HY, Tao XP, Chen WJ, Liu CY, Li T, Liu K, Cheng LM. Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study. BMC Infect Dis 2025; 25:428. [PMID: 40148755 PMCID: PMC11951648 DOI: 10.1186/s12879-025-10797-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 03/13/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods. OBJECTIVE The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management. METHODS First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time. RESULTS The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making. CONCLUSION The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.
Collapse
Affiliation(s)
- Juan Xie
- Department of Anesthesiology, Kunming Children'S Hospital, Kunming City, Yunnan Province, China
| | - Run-Wei Ma
- Department of Cardiac Surgery, Fuwai Yunnan Hospital, Chinese Academy of Medical Sciences/Affiliated Cardiovascular Hospital of Kunming Medical University, Kunming City, Yunnan Province, China
| | - Yu-Jing Feng
- Comprehensive Pediatrics, Wenshan Maternal and Child Health Care Hospital, Wenshan City, Yunnan Province, China
| | - Yuan Qiao
- Comprehensive Pediatrics and Neonatology, Chuxiong Yi Autonomous Prefecture People's Hospital, Chuxiong City, Yunnan Province, China
| | - Hong-Yan Zhu
- Pediatric Respiratory Department, Qujing Maternal and Child Health Hospital, Qujing City, Yunnan Province, China
| | - Xing-Ping Tao
- Department of Pediatrics, Kaiyuan People's Hospital, Kaiyuan, China
| | - Wen-Juan Chen
- Department of Pediatrics and Emergency, Yuxi Children'S Hospital, Yuxi City, Yunnan Province, China
| | - Cong-Yun Liu
- Comprehensive Pediatrics & Pulmonary and Critical Care Medicine, Baoshan People's Hospital, Baoshan City, Yunnan Province, China
| | - Tan Li
- Department of Respiratory Medicine Kunming Children'S Hospital, Kunming City, Yunnan Province, China
| | - Kai Liu
- Comprehensive Pediatrics & Pulmonary and Critical Care Medicine, Kunming Children'S Hospital, Yunnan Province, Shulin Street 28, Kunming City, Yunnan Province, 650000, China.
| | - Li-Ming Cheng
- Department of Anesthesiology, Kunming Children'S Hospital, Kunming City, Yunnan Province, China.
| |
Collapse
|
19
|
Dai X, Liu S, Chu X, Jiang X, Chen W, Qi G, Zhao S, Zhou Y, Shi X. Evaluation and comparison of machine learning algorithms for predicting discharge against medical advice in injured inpatients. Surgery 2025; 182:109335. [PMID: 40127503 DOI: 10.1016/j.surg.2025.109335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/12/2025] [Accepted: 02/25/2025] [Indexed: 03/26/2025]
Abstract
BACKGROUND Whether the application of machine learning algorithms offers an advantage over logistic regression in forecasting discharge against medical advice occurrences needs to be evaluated. METHODS This retrospective study included all inpatient records from January 1, 2018, to December 31, 2023. The foundational data set (2018-2021) was divided into a training set (80%) and a test set (20%) for model construction and internal validation. The temporal validation data set (2022-2023) was used to assess the model's prospective performance. Feature selection was performed using the BorutaShap method. Techniques including random oversampling, random undersampling, synthetic minority oversampling technique, and edited nearest neighbors were applied to address data imbalance. Model performance was evaluated using metrics including the area under the receiver operating characteristic curve, accuracy, specificity, sensitivity, F1 score, and geometric mean. The Shapley Additive Explanations analysis provided interpretation for the best machine learning model. RESULTS A total of 48,394 inpatient records for injured patients met the study criteria, of which 44,119 were discharged following medical advice and 4,275 chose discharge against medical advice, resulting in a ratio of 10.32:1. Among injury inpatients, 8.8% opted for discharge against medical advice. Based on the results of feature selection and multicollinearity analysis, 16 variables were ultimately selected for the construction and evaluation of the discharge against medical advice model. The light gradient boosting machine + edited nearest neighbors model showed the best generalization, with areas under the curves of 0.820 for internal validation and 0.837 for temporal validation. The Shapley Additive Explanations method was used to interpret the model, indicating that the grade of surgery is the most important variable. CONCLUSIONS The study is the first to use machine learning models to predict discharge against medical advice in injured inpatients, demonstrating its feasibility. In the future, health care institutions can learn from these models to optimize patient management and reduce discharge against medical advice incidents.
Collapse
Affiliation(s)
- Xiu Dai
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Shifang Liu
- Department of Medical Record Management, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Xiangyuan Chu
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Xuheng Jiang
- Emergency Department, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Weihang Chen
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Guojia Qi
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China; Department of Medical Record Management, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Shimin Zhao
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China
| | - Yanna Zhou
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China; Key Laboratory of Maternal & Child Health and Exposure Science of Guizhou Higher Education Institutes, Zunyi, Guizhou, PR China
| | - Xiuquan Shi
- Department of Epidemiology and Health Statistics, School of Public Health, Zunyi Medical University, Zunyi, Guizhou, PR China; Key Laboratory of Maternal & Child Health and Exposure Science of Guizhou Higher Education Institutes, Zunyi, Guizhou, PR China; Center for Pediatric Trauma Research & Center for Injury Research and Policy, The Abigail Wexner Research Institute at Nationwide Children's Hospital, The Ohio State University College of Medicine, Columbus, Ohio, USA.
| |
Collapse
|
20
|
Howard KA, Anderson W, Podichetty JT, Gould R, Boyce D, Dasher P, Evans L, Kao C, Kumar VK, Hamilton C, Mathé E, Guerin PJ, Dodd K, Mehta AK, Ortman C, Patil N, Rhodes J, Robinson M, Stone H, Heavner SF. Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2025; 22:464. [PMID: 40283693 PMCID: PMC12026860 DOI: 10.3390/ijerph22040464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 01/08/2025] [Accepted: 01/13/2025] [Indexed: 04/29/2025]
Abstract
Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts.
Collapse
Affiliation(s)
- Kerry A. Howard
- Department of Public Health Sciences, Clemson University, Clemson, SC 29634, USA; (K.A.H.); (S.F.H.)
- Center for Public Health Modeling and Response, Clemson University, Clemson, SC 29634, USA
| | - Wes Anderson
- Critical Path Institute, Tucson, AZ 85718, USA; (J.T.P.)
| | | | - Ruth Gould
- Centers of Disease Control and Prevention, Atlanta, GA 30329, USA
| | - Danielle Boyce
- Tufts University School of Medicine, Tufts University, Medford, MA 02155, USA
| | - Pam Dasher
- Critical Path Institute, Tucson, AZ 85718, USA; (J.T.P.)
| | - Laura Evans
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA 98195, USA
| | - Cindy Kao
- IR Research & Academic Systems, University of Texas Southwestern, Dallas, TX 75390, USA
| | | | - Chase Hamilton
- Society of Critical Care Medicine, Mount Prospect, IL 60056, USA
| | - Ewy Mathé
- National Institutes of Health National Center for Advancing Translational Sciences (NCATS), Rockville, MD 20850, USA
| | - Philippe J. Guerin
- Infectious Diseases Data Observatory (IDDO), Nuffield Department of Medicine, University of Oxford, Oxford, Oxfordshire OX3 LF, UK
| | - Kenneth Dodd
- Department of Emergency Medicine, Advocate Christ Medical Center, Oak Lawn, IL 60453, USA
| | - Aneesh K. Mehta
- Department of Medicine, Emory University, Atlanta, GA 30322, USA (J.R.)
| | - Chris Ortman
- Institute for Translational and Clinical Science, University of Iowa, Iowa City, IA 52242, USA
| | - Namrata Patil
- Brigham and Women’s Hospital, Boston, MA 02115, USA;
| | - Jeselyn Rhodes
- Department of Medicine, Emory University, Atlanta, GA 30322, USA (J.R.)
| | - Matthew Robinson
- Division of Infectious Diseases, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Heather Stone
- US Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Smith F. Heavner
- Department of Public Health Sciences, Clemson University, Clemson, SC 29634, USA; (K.A.H.); (S.F.H.)
- Critical Path Institute, Tucson, AZ 85718, USA; (J.T.P.)
- Department of Biomedical Sciences, University of South Carolina School of Medicine Greenville, Greenville, SC 29605, USA
| |
Collapse
|
21
|
Sharma J, Jangale V, Shekhawat RS, Yadav P. Improving genetic variant identification for quantitative traits using ensemble learning-based approaches. BMC Genomics 2025; 26:237. [PMID: 40075256 PMCID: PMC11899862 DOI: 10.1186/s12864-025-11443-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 03/04/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds. RESULTS We propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12. CONCLUSIONS In conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.
Collapse
Affiliation(s)
- Jyoti Sharma
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Vaishnavi Jangale
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Rajveer Singh Shekhawat
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Pankaj Yadav
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.
- School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.
| |
Collapse
|
22
|
Scott I, Aarts E, Wannan C, Gao CX, Clark S, Hartmann S, Nguyen J, Cavve B, Hartmann JA, Dwyer D, van der Tuin S, Raposo de Almeida E, Lin A, Amminger GP, Thompson A, Wood SJ, Yung AR, van den Berg D, McGorry PD, Wigman JT, Nelson B. Characterising symptomatic substates in individuals on the psychosis continuum: a hidden Markov modelling approach. Psychol Med 2025; 55:e82. [PMID: 40071550 PMCID: PMC12068681 DOI: 10.1017/s003329172500056x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/30/2025] [Accepted: 02/14/2025] [Indexed: 05/13/2025]
Abstract
BACKGROUND To improve early intervention and personalise treatment for individuals early on the psychosis continuum, a greater understanding of symptom dynamics is required. We address this by identifying and evaluating the movement between empirically derived attenuated psychotic symptomatic substates-clusters of symptoms that occur within individuals over time. METHODS Data came from a 90-day daily diary study evaluating attenuated psychotic and affective symptoms. The sample included 96 individuals aged 18-35 on the psychosis continuum, divided into four subgroups of increasing severity based on their psychometric risk of psychosis, with the fourth meeting ultra-high risk (UHR) criteria. A multilevel hidden Markov modelling (HMM) approach was used to characterise and determine the probability of switching between symptomatic substates. Individual substate trajectories and time spent in each substate were subsequently assessed. RESULTS Four substates of increasing psychopathological severity were identified: (1) low-grade affective symptoms with negligible psychotic symptoms; (2) low levels of nonbizarre ideas with moderate affective symptoms; (3) low levels of nonbizarre ideas and unusual thought content, with moderate affective symptoms; and (4) moderate levels of nonbizarre ideas, unusual thought content, and affective symptoms. Perceptual disturbances predominantly occurred within the third and fourth substates. UHR individuals had a reduced probability of switching out of the two most severe substates. CONCLUSIONS Findings suggest that individuals reporting unusual thought content, rather than nonbizarre ideas in isolation, may exhibit symptom dynamics with greater psychopathological severity. Individuals at a higher risk of psychosis exhibited persistently severe symptom dynamics, indicating a potential reduction in psychological flexibility.
Collapse
Affiliation(s)
- Isabelle Scott
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Emmeke Aarts
- Department of Methodology and Statistics, Faculty of Social and Behavioural Sciences, Utrecht University, Utrecht, Netherlands
| | - Cassandra Wannan
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Caroline X. Gao
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Scott Clark
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
| | - Simon Hartmann
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
| | - Josh Nguyen
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Blake Cavve
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Jessica A. Hartmann
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
- Department of Public Mental Health, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Dominic Dwyer
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Sara van der Tuin
- Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Esdras Raposo de Almeida
- Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Institute & Department of Psychiatry (LIM-23), Hospital das Clinicas, School of Medicine, University of Sao Paulo, Sao Paulo, Brazil
| | - Ashleigh Lin
- School of Population and Global Health, The University of Western Australia, Perth, WA, Australia
| | - G. Paul Amminger
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Andrew Thompson
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Stephen J Wood
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
- School of Psychology, The University of Birmingham, Birmingham, UK
| | - Alison R. Yung
- Institute for Mental and Physical Health and Clinical Translation, Deakin University, Melbourne, VIC, Australia
| | - David van den Berg
- Department of Clinical Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Mark van der Gaag Research Centre, Parnassia Psychiatric Institute, The Hague, The Netherlands
| | - Patrick D. McGorry
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Johanna T.W. Wigman
- Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Barnaby Nelson
- Orygen, Parkville, VC, Australia
- Centre for Youth Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
23
|
Revelou PK, Tsakali E, Batrinou A, Strati IF. Applications of Machine Learning in Food Safety and HACCP Monitoring of Animal-Source Foods. Foods 2025; 14:922. [PMID: 40231903 PMCID: PMC11941095 DOI: 10.3390/foods14060922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/26/2025] [Accepted: 03/06/2025] [Indexed: 04/16/2025] Open
Abstract
Integrating advanced computing techniques into food safety management has attracted significant attention recently. Machine learning (ML) algorithms offer innovative solutions for Hazard Analysis Critical Control Point (HACCP) monitoring by providing advanced data analysis capabilities and have proven to be powerful tools for assessing the safety of Animal-Source Foods (ASFs). Studies that link ML with HACCP monitoring in ASFs are limited. The present review provides an overview of ML, feature extraction, and selection algorithms employed for food safety. Several non-destructive techniques are presented, including spectroscopic methods, smartphone-based sensors, paper chromogenic arrays, machine vision, and hyperspectral imaging combined with ML algorithms. Prospects include enhancing predictive models for food safety with the development of hybrid Artificial Intelligence (AI) models and the automation of quality control processes using AI-driven computer vision, which could revolutionize food safety inspections. However, handling conceivable inclinations in AI models is vital to guaranteeing reasonable and exact hazard assessments in an assortment of nourishment generation settings. Moreover, moving forward, the interpretability of ML models will make them more straightforward and dependable. Conclusively, applying ML algorithms allows real-time monitoring and predictive analytics and can significantly reduce the risks associated with ASF consumption.
Collapse
Affiliation(s)
- Panagiota-Kyriaki Revelou
- Department of Food Science and Technology, University of West Attica, Agiou Spyridonos, 12243 Egaleo, Greece; (E.T.); (A.B.); (I.F.S.)
| | | | | | | |
Collapse
|
24
|
Dong J, Jin Z, Li C, Yang J, Jiang Y, Li Z, Chen C, Zhang B, Ye Z, Hu Y, Ma J, Li P, Li Y, Wang D, Ji Z. Machine Learning Models With Prognostic Implications for Predicting Gastrointestinal Bleeding After Coronary Artery Bypass Grafting and Guiding Personalized Medicine: Multicenter Cohort Study. J Med Internet Res 2025; 27:e68509. [PMID: 40053791 PMCID: PMC11926454 DOI: 10.2196/68509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/04/2025] [Accepted: 02/13/2025] [Indexed: 03/09/2025] Open
Abstract
BACKGROUND Gastrointestinal bleeding is a serious adverse event of coronary artery bypass grafting and lacks tailored risk assessment tools for personalized prevention. OBJECTIVE This study aims to develop and validate predictive models to assess the risk of gastrointestinal bleeding after coronary artery bypass grafting (GIBCG) and to guide personalized prevention. METHODS Participants were recruited from 4 medical centers, including a prospective cohort and the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. From an initial cohort of 18,938 patients, 16,440 were included in the final analysis after applying the exclusion criteria. Thirty combinations of machine learning algorithms were compared, and the optimal model was selected based on integrated performance metrics, including the area under the receiver operating characteristic curve (AUROC) and the Brier score. This model was then developed into a web-based risk prediction calculator. The Shapley Additive Explanations method was used to provide both global and local explanations for the predictions. RESULTS The model was developed using data from 3 centers and a prospective cohort (n=13,399) and validated on the Drum Tower cohort (n=2745) and the MIMIC cohort (n=296). The optimal model, based on 15 easily accessible admission features, demonstrated an AUROC of 0.8482 (95% CI 0.8328-0.8618) in the derivation cohort. In external validation, the AUROC was 0.8513 (95% CI 0.8221-0.8782) for the Drum Tower cohort and 0.7811 (95% CI 0.7275-0.8343) for the MIMIC cohort. The analysis indicated that high-risk patients identified by the model had a significantly increased mortality risk (odds ratio 2.98, 95% CI 1.784-4.978; P<.001). For these high-risk populations, preoperative use of proton pump inhibitors was an independent protective factor against the occurrence of GIBCG. By contrast, dual antiplatelet therapy and oral anticoagulants were identified as independent risk factors. However, in low-risk populations, the use of proton pump inhibitors (χ21=0.13, P=.72), dual antiplatelet therapy (χ21=0.38, P=.54), and oral anticoagulants (χ21=0.15, P=.69) were not significantly associated with the occurrence of GIBCG. CONCLUSIONS Our machine learning model accurately identified patients at high risk of GIBCG, who had a poor prognosis. This approach can aid in early risk stratification and personalized prevention. TRIAL REGISTRATION Chinese Clinical Registry Center ChiCTR2400086050; http://www.chictr.org.cn/showproj.html?proj=226129.
Collapse
Affiliation(s)
- Jiale Dong
- Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Department of Acute Abdomen Surgery, Beijing ChaoYang Hospital, Capital Medical University, Beijing, China
- Department of General Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Zhechuan Jin
- Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Chengxiang Li
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Jian Yang
- Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Yi Jiang
- Department of Cardiovascular Surgery, Nanjing Drum Tower Hospital, Chinese Academy of Medical Science & Peking Union Medical College, Nanjing, China
| | - Zeqian Li
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Cheng Chen
- Department of Cardiovascular Surgery, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Beijing, China
| | - Bo Zhang
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
- Department of General Surgery, Beijing Luhe Hospital, Capital Medical University, Beijing, China
| | - Zhaofei Ye
- Department of Cardiovascular Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Yang Hu
- Department of General Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Jianguo Ma
- School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
| | - Ping Li
- Department of Cardiovascular Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Yulin Li
- Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Dongjin Wang
- Department of Cardiovascular Surgery, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Beijing, China
| | - Zhili Ji
- Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Department of Acute Abdomen Surgery, Beijing ChaoYang Hospital, Capital Medical University, Beijing, China
- Department of General Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
25
|
Shraim R, Mooney B, Conkrite KL, Hamilton AK, Morin GB, Sorensen PH, Maris JM, Diskin SJ, Sacan A. ImmunoTar-integrative prioritization of cell surface targets for cancer immunotherapy. Bioinformatics 2025; 41:btaf060. [PMID: 39932005 PMCID: PMC11904301 DOI: 10.1093/bioinformatics/btaf060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 12/11/2024] [Accepted: 02/07/2025] [Indexed: 02/19/2025] Open
Abstract
MOTIVATION Cancer remains a leading cause of mortality globally. Recent improvements in survival have been facilitated by the development of targeted and less toxic immunotherapies, such as chimeric antigen receptor (CAR)-T cells and antibody-drug conjugates (ADCs). These therapies, effective in treating both pediatric and adult patients with solid and hematological malignancies, rely on the identification of cancer-specific surface protein targets. While technologies like RNA sequencing and proteomics exist to survey these targets, identifying optimal targets for immunotherapies remains a challenge in the field. RESULTS To address this challenge, we developed ImmunoTar, a novel computational tool designed to systematically prioritize candidate immunotherapeutic targets. ImmunoTar integrates user-provided RNA-sequencing or proteomics data with quantitative features from multiple public databases, selected based on predefined criteria, to generate a score representing the gene's suitability as an immunotherapeutic target. We validated ImmunoTar using three distinct cancer datasets, demonstrating its effectiveness in identifying both known and novel targets across various cancer phenotypes. By compiling diverse data into a unified platform, ImmunoTar enables comprehensive evaluation of surface proteins, streamlining target identification and empowering researchers to efficiently allocate resources, thereby accelerating the development of effective cancer immunotherapies. AVAILABILITY AND IMPLEMENTATION Code and data to run and test ImmunoTar are available at https://github.com/sacanlab/immunotar.
Collapse
Affiliation(s)
- Rawan Shraim
- Division of Oncology and Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- School of Biomedical Engineering, Science and Health System, Drexel University, Philadelphia, PA 19104, United States
| | - Brian Mooney
- Department of Molecular Oncology, BC Cancer Research Institute, Vancouver, BC V5Z 0B4, Canada
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Vancouver, BC V5Z 4S6, Canada
| | - Karina L Conkrite
- Division of Oncology and Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Amber K Hamilton
- Division of Oncology and Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Gregg B Morin
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Poul H Sorensen
- Department of Molecular Oncology, BC Cancer Research Institute, Vancouver, BC V5Z 0B4, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - John M Maris
- Division of Oncology and Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Sharon J Diskin
- Division of Oncology and Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Ahmet Sacan
- School of Biomedical Engineering, Science and Health System, Drexel University, Philadelphia, PA 19104, United States
| |
Collapse
|
26
|
Thai NH, Post B, Young GJ, Noor-E-Alam M. Hospital-Physician Integration and Cardiac Rehabilitation Following Major Cardiovascular Events. JAMA Netw Open 2025; 8:e2462580. [PMID: 40029658 PMCID: PMC11877175 DOI: 10.1001/jamanetworkopen.2024.62580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 12/19/2024] [Indexed: 03/05/2025] Open
Abstract
Importance Cardiac rehabilitation (CR) is a medically supervised program designed to improve heart health after a cardiac event. Despite its demonstrated clinical benefits, CR participation among eligible patients remains poor due to low referral rates and individual barriers to care. Objectives To evaluate CR participation by patients who receive care from hospital-integrated physicians compared with independent physicians, and subsequently, to examine CR and recurrent cardiac hospitalizations. Design, Setting, and Participants This retrospective cohort study evaluated Medicare Part A and Part B claims data from calendar years 2016 to 2019. All analyses were conducted between January 1 and April 30, 2024. Patients were included if they had a qualifying event for CR between 2017 and 2018, and qualifying events were identified using diagnosis codes on inpatient claims and procedure codes on outpatient and carrier claims. Eligible patients also had to continuously enroll in fee-for-service Medicare for 12 months or more before and after the index event. Physicians' integration status and patients' CR participation were determined during the 12-month follow-up period. The study covariates were ascertained during the 12 months before the index event. Exposure Hospital-integration status of the treating physician during follow-up. Main Outcomes and Measures Postindex CR participation was determined by qualifying procedure codes on outpatient and carrier claims. Results The study consisted of 28 596 Medicare patients eligible for CR. Their mean (SD) age was 74.0 (9.6) years; 16 839 (58.9%) were male. A total of 9037 patients (31.6%) were treated by a hospital-integrated physician, of which 2995 (33.1%) received CR during follow-up. Logistic regression via propensity score weighting showed that having a hospital-integrated physician was associated with an 11% increase in the odds of receiving CR (odds ratio [OR], 1.11; 95% CI, 1.05-1.18). Additionally, CR participation was associated with a 14% decrease in the odds of recurrent cardiovascular-related hospitalizations (OR, 0.86; 95% CI, 0.81-0.91). Conclusions and Relevance The findings of this cohort study suggest that hospital integration has the potential to facilitate greater CR participation and improve heart care. Several factors may help explain this positive association, including enhanced care coordination and value-based payment policies. Further research is needed to assess the association of integration with other appropriate high-quality care activities.
Collapse
Affiliation(s)
- Ngoc H. Thai
- Center for Health Policy and Healthcare Research, Northeastern University, Boston, Massachusetts
| | - Brady Post
- Center for Health Policy and Healthcare Research, Northeastern University, Boston, Massachusetts
- Department of Health Sciences, Bouve College of Health Sciences, Northeastern University, Boston, Massachusetts
| | - Gary J. Young
- Center for Health Policy and Healthcare Research, Northeastern University, Boston, Massachusetts
- Department of Health Sciences, Bouve College of Health Sciences, Northeastern University, Boston, Massachusetts
- D’Amore McKim School of Business, Northeastern University, Boston, Massachusetts
| | - Md. Noor-E-Alam
- Center for Health Policy and Healthcare Research, Northeastern University, Boston, Massachusetts
- Department of Mechanical and Industrial Engineering, Northeastern University, Boston, Massachusetts
| |
Collapse
|
27
|
Khanduja A, Mohanty D. SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes. NAR Genom Bioinform 2025; 7:lqae186. [PMID: 39781515 PMCID: PMC11704790 DOI: 10.1093/nargab/lqae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 11/07/2024] [Accepted: 12/17/2024] [Indexed: 01/12/2025] Open
Abstract
Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.
Collapse
Affiliation(s)
- Akshay Khanduja
- National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Debasisa Mohanty
- National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India
| |
Collapse
|
28
|
Petmezas G, Papageorgiou VE, Vassilikos V, Pagourelias E, Tachmatzidis D, Tsaklidis G, Katsaggelos AK, Maglaveras N. Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning. J Biomed Inform 2025; 163:104800. [PMID: 39956346 DOI: 10.1016/j.jbi.2025.104800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 01/02/2025] [Accepted: 01/23/2025] [Indexed: 02/18/2025]
Abstract
Heart failure (HF) remains a significant public health challenge with high mortality rates. Machine learning (ML) techniques offer a promising approach to predict HF mortality, potentially improving clinical outcomes. However, the effectiveness of these techniques heavily depends on the quality and relevance of the features used. This study introduces a novel hybrid feature selection methodology that combines Extremely Randomized Trees (Extra-Trees) and non-linear correlation measures to enhance 1-year all-cause mortality prediction in HF patients using echocardiographic and key demographic data. Unlike existing feature selection methods that are often tied to specific ML models and produce inconsistent feature sets across different algorithms, our proposed approach is model-independent, ensuring robustness and generalizability. Moreover, the optimal number of predictive features is identified through loss graph inspection, leading to a compact and highly informative subset of seven features. We trained and evaluated seven widely-used ML models on both the full feature set and the selected subset, finding that most models maintained or improved their predictive performance despite an 80% reduction in features. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP), allowing for a detailed examination of how individual features influence predictions. To further assess its effectiveness, we compared our methodology against widely known feature selection techniques across all seven ML models. The results underscore the superiority of our proposed feature set in accurately predicting HF mortality over conventional methods, offering new opportunities for personalized management strategies based on a streamlined and explainable feature subset.
Collapse
Affiliation(s)
- Georgios Petmezas
- 2(nd) Department of Obstetrics and Gynecology, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | | | - Vassilios Vassilikos
- 3(rd) Department of Cardiology, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Efstathios Pagourelias
- 3(rd) Department of Cardiology, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Dimitrios Tachmatzidis
- 3(rd) Department of Cardiology, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - George Tsaklidis
- Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Aggelos K Katsaggelos
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
| | - Nicos Maglaveras
- 2(nd) Department of Obstetrics and Gynecology, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
29
|
Al‐Mamun HA, Danilevicz MF, Marsh JI, Gondro C, Edwards D. Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset. THE PLANT GENOME 2025; 18:e20503. [PMID: 39253773 PMCID: PMC11726426 DOI: 10.1002/tpg2.20503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/15/2024] [Accepted: 07/15/2024] [Indexed: 09/11/2024]
Abstract
The surge in high-throughput technologies has empowered the acquisition of vast genomic datasets, prompting the search for genetic markers and biomarkers relevant to complex traits. However, grappling with the inherent complexities of high dimensionality and sparsity within these datasets poses formidable hurdles. The immense number of features and their potential redundancy demand efficient strategies for extracting pertinent information and identifying significant markers. Feature selection is important in large genomic data as it helps in enhancing interpretability and computational efficiency. This study focuses on addressing these challenges through a comprehensive investigation into genomic feature selection methodologies, employing a rich soybean (Glycine max L. Merr.) dataset comprising 966 lines with over 5.5 million single nucleotide polymorphisms. Emphasizing the "small n large p" dilemma prevalent in contemporary genomic studies, we compared the efficacy of traditional genome-wide association studies (GWAS) with two prominent machine learning tools, random forest and extreme gradient boosting, in pinpointing predictive features. Utilizing the expansive soybean dataset, we assessed the performance of these methodologies in selecting features that optimize predictive modeling for various phenotypes. By constructing predictive models based on the selected features, we ascertain the comparative prediction accuracies, thereby illuminating the strengths and limitations of these feature selection methodologies in the realm of genomic data analysis.
Collapse
Affiliation(s)
- Hawlader A. Al‐Mamun
- Centre for Applied Bioinformaticsand School of Biological SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Monica F. Danilevicz
- Centre for Applied Bioinformaticsand School of Biological SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Jacob I. Marsh
- Department of BiologyUniversity of North CarolinaChapel HillNorth CarolinaUSA
| | - Cedric Gondro
- Department of Animal ScienceMichigan State UniversityEast LansingMichiganUSA
| | - David Edwards
- Centre for Applied Bioinformaticsand School of Biological SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
| |
Collapse
|
30
|
Wade BSC, Pindale R, Luccarelli J, Li S, Meisner RC, Seiner SJ, Camprodon JA, Henry ME. Prediction of individual treatment allocation between electroconvulsive therapy or ketamine using the Personalized Advantage Index. NPJ Digit Med 2025; 8:127. [PMID: 40016503 PMCID: PMC11868618 DOI: 10.1038/s41746-025-01523-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 02/17/2025] [Indexed: 03/01/2025] Open
Abstract
Electroconvulsive therapy (ECT) and ketamine are effective treatments for depression; however, evidence-based guidelines are needed to inform individual treatment selection. We adapted the Personalized Advantage Index (PAI) using machine learning to predict optimal treatment assignment to ECT or ketamine using EHR data on 2506 ECT and 196 ketamine patients. Depressive symptoms were evaluated using the Quick Inventory of Depressive Symptomatology (QIDS) before and during acute treatment. Propensity score matching across treatments was used to address confounding by indication, yielding a sample of 392 patients (n = 196 per treatment). Models predicted differential minimum QIDS scores (min-QIDS) over acute treatment using pretreatment EHR measures and SHAP values identified prescriptive predictors. Patients with large PAI scores who received a predicted optimal had significantly lower min-QIDS compared to the non-optimal treatment group (mean difference = 1.19 [95% CI: 0.32, ∞], t = 2.25, q < 0.05, d = 0.26). Our model identified candidate pretreatment factors to provide actionable, effective antidepressant treatment selection guidelines.
Collapse
Affiliation(s)
- Benjamin S C Wade
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Ryan Pindale
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - James Luccarelli
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Shuang Li
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Robert C Meisner
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | | | - Joan A Camprodon
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael E Henry
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
31
|
Alwakid G, Ul Haq F, Tariq N, Humayun M, Shaheen M, Alsadun M. Optimized machine learning framework for cardiovascular disease diagnosis: a novel ethical perspective. BMC Cardiovasc Disord 2025; 25:123. [PMID: 39979842 PMCID: PMC11844188 DOI: 10.1186/s12872-025-04550-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/05/2025] [Indexed: 02/22/2025] Open
Abstract
Alignment of advanced cutting-edge technologies such as Artificial Intelligence (AI) has emerged as a significant driving force to achieve greater precision and timeliness in identifying cardiovascular diseases (CVDs). However, it is difficult to achieve high accuracy and reliability in CVD diagnostics due to complex clinical data and the selection and modeling process of useful features. Therefore, this paper studies advanced AI-based feature selection techniques and the application of AI technologies in the CVD classification. It uses methodologies such as Chi-square, Info Gain, Forward Selection, and Backward Elimination as an essence of cardiovascular health indicators into a refined eight-feature subset. This study emphasizes ethical considerations, including transparency, interpretability, and bias mitigation. This is achieved by employing unbiased datasets, fair feature selection techniques, and rigorous validation metrics to ensure fairness and trustworthiness in the AI-based diagnostic process. In addition, the integration of various Machine Learning (ML) models, encompassing Random Forest (RF), XGBoost, Decision Trees (DT), and Logistic Regression (LR), facilitates a comprehensive exploration of predictive performance. Among this diverse range of models, XGBoost stands out as the top performer, achieving exceptional scores with a 99% accuracy rate, 100% recall, 99% F1-measure, and 99% precision. Furthermore, we venture into dimensionality reduction, applying Principal Component Analysis (PCA) to the eight-feature subset, effectively refining it to a compact six-attribute feature subset. Once again, XGBoost shines as the model of choice, yielding outstanding results. It achieves accuracy, recall, F1-measure, and precision scores of 98%, 100%, 98%, and 97%, respectively, when applied to the feature subset derived from the combination of Chi-square and Forward Selection methods.
Collapse
Affiliation(s)
- Ghadah Alwakid
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| | - Farman Ul Haq
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad, Pakistan
| | - Noshina Tariq
- Department of Artificial Intelligence and Data Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | - Mamoona Humayun
- Department of Computing, School of Arts Humanities and Social Sciences, University of Roehampton, London, UK.
| | - Momina Shaheen
- Department of Computing, School of Arts Humanities and Social Sciences, University of Roehampton, London, UK
| | - Marwa Alsadun
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| |
Collapse
|
32
|
Al-Ahmari S, Nadeem F. Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced Data. Diagnostics (Basel) 2025; 15:501. [PMID: 40002652 PMCID: PMC11854898 DOI: 10.3390/diagnostics15040501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 02/13/2025] [Accepted: 02/17/2025] [Indexed: 02/27/2025] Open
Abstract
Background: Surgical site infections (SSIs) lead to higher hospital readmission rates and healthcare costs, representing a significant global healthcare burden. Machine learning (ML) has demonstrated potential in predicting SSIs; however, the challenge of addressing imbalanced class ratios remains. Objectives: The aim of this study is to evaluate and enhance the predictive capabilities of machine learning models for SSIs by assessing the effects of feature selection, resampling techniques, and hyperparameter optimization. Methods: Using routine SSI surveillance data from multiple hospitals in Saudi Arabia, we analyzed a dataset of 64,793 surgical patients, of whom 1632 developed SSI. Seven machine learning algorithms were created and tested: Decision Tree (DT), Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Stochastic Gradient Boosting (SGB), and K-Nearest Neighbors (KNN). We also improved several resampling strategies, such as undersampling and oversampling. Grid search five-fold cross-validation was employed for comprehensive hyperparameter optimization, in conjunction with balanced sampling techniques. Features were selected using a filter method based on their relationships with the target variable. Results: Our findings revealed that RF achieves the highest performance, with an MCC of 0.72. The synthetic minority oversampling technique (SMOTE) is the best-performing resampling technique, consistently enhancing the performance of most machine learning models, except for LR and GNB. LR struggles with class imbalance due to its linear assumptions and bias toward the majority class, while GNB's reliance on feature independence and Gaussian distribution make it unreliable for under-represented minority classes. For computational efficiency, the Instance Hardness Threshold (IHT) offers a viable alternative undersampling technique, though it may compromise performance to some extent. Conclusions: This study underscores the potential of ML models as effective tools for assessing SSI risk, warranting further clinical exploration to improve patient outcomes. By employing advanced ML techniques and robust validation methods, these models demonstrate promising accuracy and reliability in predicting SSI events, even in the face of significant class imbalances. In addition, using MCC in this study ensures a more reliable and robust evaluation of the model's predictive performance, particularly in the presence of an imbalanced dataset, where other metrics may fail to provide an accurate evaluation.
Collapse
Affiliation(s)
- Salha Al-Ahmari
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Department of Computer and Information Systems, Applied College, King Khalid University, Abha 61421, Saudi Arabia
| | - Farrukh Nadeem
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
33
|
Wang H, Zhang M, Mai L, Li X, Bellou A, Wu L. An effective multi-step feature selection framework for clinical outcome prediction using electronic medical records. BMC Med Inform Decis Mak 2025; 25:84. [PMID: 39962480 PMCID: PMC11834488 DOI: 10.1186/s12911-025-02922-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 02/05/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Identifying key variables is essential for developing clinical outcome prediction models based on high-dimensional electronic medical records (EMR). However, despite the abundance of feature selection (FS) methods available, challenges remain in choosing the most appropriate method, deciding how many top-ranked variables to include, and ensuring these selections are meaningful from a medical perspective. METHODS We developed a practical multi-step feature selection (FS) framework that integrates data-driven statistical inference with a knowledge verification strategy. This framework was validated using two distinct EMR datasets targeting different clinical outcomes. The first cohort, sourced from the Medical Information Mart for Intensive Care III (MIMIC-III), focused on predicting acute kidney injury (AKI) in ICU patients. The second cohort, drawn from the MIMIC-IV Emergency Department (MIMIC-IV-ED), aimed to estimate in-hospital mortality (IHM) for patients transferred from the ED to the ICU. We employed various machine learning (ML) methods and conducted a comparative analysis considering accuracy, stability, similarity, and interpretability. The effectiveness of our FS framework was evaluated using discrimination and calibration metrics, with SHAP applied to enhance the interpretability of model decisions. RESULTS Cohort 1 comprised 48,780 ICU encounters, of which 8,883 (18.21%) developed AKI. Cohort 2 included 29,197 transfers from the ED to the ICU, with 3,219 (11.03%) resulting in IHM. Among the ten ML methods evaluated, the tree-based ensemble method achieved the highest accuracy. As the number of top-ranking features increased, the models' accuracy began to stabilize, while feature subset stability (considering sample variations) and inter-method feature similarity reached optimal levels, confirming the validity of the FS framework. The integration of interpretative methods and expert knowledge in the final step further improved feature interpretability. The FS framework effectively reduced the number of features (e.g., from 380 to 35 for Cohort 1, and from 273 to 54 for Cohort 2) without significantly affecting prediction performance (Delong test, p > 0.05). CONCLUSION The multi-step FS method developed in this study successfully reduces the dimensionality of features in EMR while preserving the accuracy of clinical outcome prediction. Furthermore, it improves the interpretability of risk factors by incorporating expert knowledge validation.
Collapse
Affiliation(s)
- Hongnian Wang
- School of Management, Jinan University, Guangzhou, 510632, China
- Key Laboratory of Digital-Intelligent Disease Surveillance and Health Governance, North Sichuan Medical College, Nanchong, 637100, China
| | - Mingyang Zhang
- School of Social Work, Henan Normal University, Xinxiang, 453007, China
| | - Liyi Mai
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China
| | - Xin Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
| | - Abdelouahab Bellou
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
- Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
- Department of Emergency Medicine, Wayne State University School of Medicine, Detroit, MI, 48201, USA.
- Global Network on Emergency Medicine, Brookline, MA, USA.
| | - Lijuan Wu
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
- Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.
| |
Collapse
|
34
|
Chellappan D, Rajaguru H. Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets. Sci Rep 2025; 15:4479. [PMID: 39915538 PMCID: PMC11802925 DOI: 10.1038/s41598-025-87471-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 01/20/2025] [Indexed: 02/09/2025] Open
Abstract
Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression-NLR, Linear Regression-LR, Gaussian Mixture Model-GMM, Expectation Maximization-EM, Bayesian Linear Discriminant Analysis-BLDA, Softmax Discriminant Classifier-SDC, and Support Vector Machine with Radial Basis Function kernel-SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, 641 407, India.
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638 401, India
| |
Collapse
|
35
|
Yang Y, Hu L, Chen Y, Gu W, Lin G, Xie Y, Nie S. Identification of Parkinson's disease using MRI and genetic data from the PPMI cohort: an improved machine learning fusion approach. Front Aging Neurosci 2025; 17:1510192. [PMID: 39968123 PMCID: PMC11832485 DOI: 10.3389/fnagi.2025.1510192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 01/20/2025] [Indexed: 02/20/2025] Open
Abstract
Objective This study aim to leverage advanced machine learning techniques to develop and validate novel MRI imaging features and single nucleotide polymorphism (SNP) gene data fusion methodologies to enhance the early identification and diagnosis of Parkinson's disease (PD). Methods We leveraged a comprehensive dataset from the Parkinson's Progression Markers Initiative (PPMI), which includes high-resolution neuroimaging data, genetic single-nucleotide polymorphism (SNP) profiles, and detailed clinical information from individuals with early-stage PD and healthy controls. Two multi-modal fusion strategies were used: feature-level fusion, where we employed a hybrid feature selection algorithm combining Fisher discriminant analysis, an ensemble Lasso (EnLasso) method, and partial least squares (PLS) regression to identify and integrate the most informative features from neuroimaging and genetic data; and decision-level fusion, where we developed an adaptive ensemble stacking (AE_Stacking) model to synergistically integrate the predictions from multiple base classifiers trained on individual modalities. Results The AE_Stacking model achieving the highest average balanced accuracy of 95.36% and an area under the receiver operating characteristic curve (AUC) of 0.974, significantly outperforming feature-level fusion and single-modal models (p < 0.05). Furthermore, by analyzing the features selected across multiple iterations of our models, we identified stable brain region features [lh 6r (FD) and rh 46 (GI)] and key genetic markers (rs356181 and rs2736990 SNPs within the SNCA gene region; rs213202 SNP within the VPS52 gene region), highlighting their potential as reliable early diagnostic indicators for the disease. Conclusion The AE_Stacking model, trained on MRI and genetic data, demonstrates potential in distinguishing individuals with PD. Our findings enhance understanding of the disease and advance us toward the goal of precision medicine for neurodegenerative disorder.
Collapse
Affiliation(s)
- Yifeng Yang
- Department of Medical Imaging, Huadong Hospital, Fudan University, Shanghai, China
| | - Liangyun Hu
- Center for Functional Neurosurgery, RuiJin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yang Chen
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Weidong Gu
- Department of Anesthesiology, Huadong Hospital, Fudan University, Shanghai, China
| | - Guangwu Lin
- Department of Medical Imaging, Huadong Hospital, Fudan University, Shanghai, China
| | - YuanZhong Xie
- Medical Imaging Center, Taian Central Hospital, Shandong, China
| | - Shengdong Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
36
|
Agogo GO, Mwambi H. Application of machine learning algorithms in an epidemiologic study of mortality. Ann Epidemiol 2025; 102:36-47. [PMID: 39756630 DOI: 10.1016/j.annepidem.2024.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 12/20/2024] [Accepted: 12/29/2024] [Indexed: 01/07/2025]
Abstract
PURPOSE Epidemiologic studies are important in assessing risk factors of mortality. Machine learning (ML) is efficient in analyzing multidimensional data to unravel dependencies between risk factors and health outcomes. METHODS Using a representative sample from the National Health and Nutrition Examination Survey data collected from 2009 to 2016 linked to the National Death Index public-use mortality data through December 31, 2019, we applied logistic, random forests, k-Nearest Neighbors, multivariate adaptive regression splines, support vector machines, extreme gradient boosting, and super learner ML algorithms to study risk factors of all-cause mortality. We evaluated the algorithms using area under the receiver operating curve (AUC-ROC), sensitivity, negative predictive value (NPV) among other metrics and interpreted the results using SHapley Additive exPlanation. RESULTS The AUC-ROC ranged from 0.80 ─ 0.87. The super learner had the highest AUC-ROC of 0.87 (95 % CI, 0.86 ─ 0.88), sensitivity of 0.86 (95 % CI, 0.84 ─ 0.88) and NPV of 0.98 (95 % CI, 0.98 ─ 0.99). Key risk factors of mortality included advanced age, larger waist circumference, male and systolic blood pressure. Being married, high annual household income, and high education level were linked with low risk of mortality. CONCLUSIONS Machine learning can be used to identify risk factors of mortality, which is critical for individualized targeted interventions in epidemiologic studies.
Collapse
Affiliation(s)
- George O Agogo
- StatsDecide Analytics and Consulting Ltd, P.O Box 17432- 20100, Nakuru, Kenya.
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg Campus, Pietermaritzburg, South Africa
| |
Collapse
|
37
|
Liu X, Chen F, Zhang W, Ma F, Xu P. Machine learning models for easily obtainable descriptors of the electrocatalytic properties of Ag-Pd-Ir nanoalloys toward the formate oxidation reaction. NANOSCALE 2025; 17:2810-2819. [PMID: 39831809 DOI: 10.1039/d4nr03735a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Direct formate fuel cells (DFFCs) have received increasing attention due to their environmentally benign and highly safe characteristics. However, the absence of highly active electrocatalysts for the formate oxidation reaction (FOR) restricts their widespread application. Currently, the design of FOR catalysts, which relies on experimental trial-and-error and high-throughput DFT calculations, is costly and time-consuming. In this study, based on a DFT dataset of FOR overpotentials for 137 Ag-Pd-Ir nanoalloy catalysts, six machine learning (ML) models were trained, where the K-nearest neighbors (KNN) model demonstrated the best performance, with an R2 value of 0.94, an MAE value of 0.041 V and an RMSE value of 0.050 V. Using the KNN model, six optimal catalysts with an overpotential of 0.48 V were screened from 310 candidate catalysts, with an MAE value as low as 0.004 V compared to the DFT results, proving the accuracy of the ML model. This work provides a novel strategy to accelerate the design of high-performance catalysts.
Collapse
Affiliation(s)
- Xiaoqing Liu
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China.
- School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Fuyi Chen
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China.
- School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Wanxuan Zhang
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China.
- School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Fanzhe Ma
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China.
- School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Peng Xu
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China.
- School of Materials Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
38
|
Wang B, Cai J, Fang L, Ma P, Leung YF. Tensor analysis of animal behavior by matricization and feature selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635088. [PMID: 39975151 PMCID: PMC11838277 DOI: 10.1101/2025.01.28.635088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Contemporary neurobehavior research often collects multi-dimensional tensor (MDT) data, consisting of time-series measurements for multiple features from multiple animals subjected to various perturbations. Proper analysis of the MDT data can facilitate the dissection of the underlying neural circuitry driving the behavior. However, many common approaches for MDT analysis, such as tensor decomposition, often yield results that are difficult to interpret and not directly compatible with standard multivariate analysis (MVA), which is designed for simpler, lower-dimensional data structures. To address this issue, dimensionality reduction techniques, including matricization methods such as Index Construction and Feature Concatenation, are applied to transform all or a subset of the features in the MDT into a lower-dimensional tensor, commonly a 2-dimensional tensor (2DT), that is compatible with MVA. However, the matricization methods may exclude information from the MDT features or create too many 2DT features that introduce spurious noise to the downstream analyses. Their impacts on the downstream MVA performance remain elusive. In this study, we systematically evaluated different approaches for matricization and feature selection and their impacts on MVA performance using an MDT dataset of zebrafish visual- motor response collected from wild-types (WTs) and visually-impaired mutants. We matricized the MDT dataset using various Index Construction and Feature Concatenation methods, then identified informative 2DT features using the filter and embedded methods. To evaluate these feature-selection approaches, we conducted a classification task distinguishing WT and visually-impaired zebrafish by multiple classifiers. We then assessed classification performance with cross-validation and holdout validation. We found that most classifiers performed the best when using all 2DT features matricized by Feature Concatenation and selected by the embedded method. The results also revealed unique behavioral differences between the WTs and visually-impaired mutants that were not identified by standard MVA or MDT analysis. Our results demonstrate the utility of analyzing MDT behavioral data by matricization and feature selection.
Collapse
|
39
|
Zemariam AB, Abate BB, Alamaw AW, Lake ES, Yilak G, Ayele M, Tilahun BD, Ngusie HS. Prediction of stunting and its socioeconomic determinants among adolescent girls in Ethiopia using machine learning algorithms. PLoS One 2025; 20:e0316452. [PMID: 39854425 PMCID: PMC11760002 DOI: 10.1371/journal.pone.0316452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 12/11/2024] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND Stunting is a vital indicator of chronic undernutrition that reveals a failure to reach linear growth. Investigating growth and nutrition status during adolescence, in addition to infancy and childhood is very crucial. However, the available studies in Ethiopia have been usually focused in early childhood and they used the traditional stastical methods. Therefore, this study aimed to employ multiple machine learning algorithms to identify the most effective model for the prediction of stunting among adolescent girls in Ethiopia. METHODS A total of 3156 weighted samples of adolescent girls aged 15-19 years were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 80% and 20% of the observations were used for training, and testing the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. The performance of the predictive model was evaluated using evaluation metrics value through Python software. The synthetic minority oversampling technique was used for data balancing and Boruta algorithm was used to identify best features. Association rule mining using an Apriori algorithm was employed to generate the best rule for the association between the independent feature and the targeted feature using R software. RESULTS The random forest classifier (sensitivity = 81%, accuracy = 77%, precision = 75%, f1-score = 78%, AUC = 85%) outperformed in predicting stunting compared to other ML algorithms considered in this study. Region, poor wealth index, no formal education, unimproved toilet facility, rural residence, not used contraceptive method, religion, age, no media exposure, occupation, and having one or more children were the top attributes to predict stunting. Association rule mining was identified the top seven best rules that most frequently associated with stunting among adolescent girls in Ethiopia. CONCLUSION The random forest classifier outperformed in predicting and identifying the relevant predictors of stunting. Results have shown that machine learning algorithms can accurately predict stunting, making them potentially valuable as decision-support tools for the relevant stakeholders and giving emphasis for the identified predictors could be an important intervention to halt stunting among adolescent girls.
Collapse
Affiliation(s)
- Alemu Birara Zemariam
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Biruk Beletew Abate
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Addis Wondmagegn Alamaw
- Department of Emergency and Critical Care Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Eyob shitie Lake
- Department of Midwifery, School of Midwifery, School of Midwifery, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Gizachew Yilak
- Department of Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Mulat Ayele
- Department of Midwifery, School of Midwifery, School of Midwifery, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Befkad Derese Tilahun
- Department of Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Habtamu Setegn Ngusie
- Department of Health Informatics, School of Public Health, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| |
Collapse
|
40
|
Lee H, Park MB, Won YJ. AI Machine Learning-Based Diabetes Prediction in Older Adults in South Korea: Cross-Sectional Analysis. JMIR Form Res 2025; 9:e57874. [PMID: 39838554 PMCID: PMC11779598 DOI: 10.2196/57874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 12/09/2024] [Accepted: 12/09/2024] [Indexed: 01/23/2025] Open
Abstract
Background Diabetes is prevalent in older adults, and machine learning algorithms could help predict diabetes in this population. Objective This study determined diabetes risk factors among older adults aged ≥60 years using machine learning algorithms and selected an optimized prediction model. Methods This cross-sectional study was conducted on 3084 older adults aged ≥60 years in Seoul from January to November 2023. Data were collected using a mobile app (Gosufit) that measured depression, stress, anxiety, basal metabolic rate, oxygen saturation, heart rate, and average daily step count. Health coordinators recorded data on diabetes, hypertension, hyperlipidemia, chronic obstructive pulmonary disease, percent body fat, and percent muscle. The presence of diabetes was the target variable, with various health indicators as predictors. Machine learning algorithms, including random forest, gradient boosting model, light gradient boosting model, extreme gradient boosting model, and k-nearest neighbors, were employed for analysis. The dataset was split into 70% training and 30% testing sets. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the curve (AUC). Shapley additive explanations (SHAPs) were used for model interpretability. Results Significant predictors of diabetes included hypertension (χ²1=197.294; P<.001), hyperlipidemia (χ²1=47.671; P<.001), age (mean: diabetes group 72.66 years vs nondiabetes group 71.81 years), stress (mean: diabetes group 42.68 vs nondiabetes group 41.47; t3082=-2.858; P=.004), and heart rate (mean: diabetes group 75.05 beats/min vs nondiabetes group 73.14 beats/min; t3082=-7.948; P<.001). The extreme gradient boosting model (XGBM) demonstrated the best performance, with an accuracy of 84.88%, precision of 77.92%, recall of 66.91%, F1 score of 72.00, and AUC of 0.7957. The SHAP analysis of the top-performing XGBM revealed key predictors for diabetes: hypertension, age, percent body fat, heart rate, hyperlipidemia, basal metabolic rate, stress, and oxygen saturation. Hypertension strongly increased diabetes risk, while advanced age and elevated stress levels also showed significant associations. Hyperlipidemia and higher heart rates further heightened diabetes probability. These results highlight the importance and directional impact of specific features in predicting diabetes, providing valuable insights for risk stratification and targeted interventions. Conclusions This study focused on modifiable risk factors, providing crucial data for establishing a system for the automated collection of health information and lifelog data from older adults using digital devices at service facilities.
Collapse
Affiliation(s)
- Hocheol Lee
- Department of Health Administration, College of Software and Digital Healthcare Convergence, Yonsei University, Changjogwan, Yonseidae-gil 1, Wonju, 26493, Republic of Korea, +82 (0) 33-760-2257
| | - Myung-Bae Park
- Department of Health Administration, College of Software and Digital Healthcare Convergence, Yonsei University, Changjogwan, Yonseidae-gil 1, Wonju, 26493, Republic of Korea, +82 (0) 33-760-2257
| | - Young-Joo Won
- Department of Health Administration, College of Software and Digital Healthcare Convergence, Yonsei University, Changjogwan, Yonseidae-gil 1, Wonju, 26493, Republic of Korea, +82 (0) 33-760-2257
| |
Collapse
|
41
|
Yin L, Viswanathan M, Kurmi Y, Zu Z. Improving quantification accuracy of a nuclear Overhauser enhancement signal at -1.6 ppm at 4.7 T using a machine learning approach. Phys Med Biol 2025; 70:025009. [PMID: 39774035 PMCID: PMC11740009 DOI: 10.1088/1361-6560/ada716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 12/16/2024] [Accepted: 01/07/2025] [Indexed: 01/11/2025]
Abstract
Objective.A new nuclear Overhauser enhancement (NOE)-mediated saturation transfer MRI signal at -1.6 ppm, potentially from choline phospholipids and termed NOE(-1.6), has been reported in biological tissues at high magnetic fields. This signal shows promise for detecting brain tumors and strokes. However, its proximity to the water peak and low signal-to-noise ratio makes accurate quantification challenging, especially at low fields, due to the difficulty in separating it from direct water saturation and other confounding signals. This study proposes using a machine learning (ML) method to address this challenge.Approach.The ML model was trained on a partially synthetic chemical exchange saturation transfer dataset with a curriculum learning denoising approach. The accuracy of our method in quantifying NOE(-1.6) was validated using tissue-mimicking data from Bloch simulations providing ground truth, with subsequent application to an animal tumor model at 4.7 T. The predictions from the proposed ML method were compared with outcomes from traditional Lorentzian fit and ML models trained on other data types, including measured and fully simulated data.Main results.Our tissue-mimicking validation suggests that our method offers superior accuracy compared to all other methods. The results from animal experiments show that our method, despite variations in training data size or simulation models, produces predictions within a narrower range than the ML method trained on other data types.Significance.The ML method proposed in this work significantly enhances the accuracy and robustness of quantifying NOE(-1.6), thereby expanding the potential for applications of this novel molecular imaging mechanism in low-field environments.
Collapse
Affiliation(s)
- Leqi Yin
- Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, Nashville, TN, United States of America
- School of Engineering, Vanderbilt University, Nashville, TN, United States of America
| | - Malvika Viswanathan
- Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, United States of America
| | - Yashwant Kurmi
- Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Zhongliang Zu
- Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, United States of America
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, TN, United States of America
| |
Collapse
|
42
|
Venkatesan A, Basak J, Bahadur RP. pmiRScan: a LightGBM based method for prediction of animal pre-miRNAs. Funct Integr Genomics 2025; 25:9. [PMID: 39786653 DOI: 10.1007/s10142-025-01527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 12/03/2024] [Accepted: 01/01/2025] [Indexed: 01/12/2025]
Abstract
MicroRNAs (miRNA) are categorized as short endogenous non-coding RNAs, which have a significant role in post-transcriptional gene regulation. Identifying new animal precursor miRNA (pre-miRNA) and miRNA is crucial to understand the role of miRNAs in various biological processes including the development of diseases. The present study focuses on the development of a Light Gradient Boost (LGB) based method for the classification of animal pre-miRNAs using various sequence and secondary structural features. In various pre-miRNA families, distinct k-mer repeat signatures with a length of three nucleotides have been identified. Out of nine different classifiers that have been trained and tested in the present study, LGB has an overall better performance with an AUROC of 0.959. In comparison with the existing methods, our method 'pmiRScan' has an overall better performance with accuracy of 0.93, sensitivity of 0.86, specificity of 0.95 and F-score of 0.82. Moreover, pmiRScan effectively classifies pre-miRNAs from four distinct taxonomic groups: mammals, nematodes, molluscs and arthropods. We have used our classifier to predict genome-wide pre-miRNAs in human. We find a total of 313 pre-miRNA candidates using pmiRScan. A total of 180 potential mature miRNAs belonging to 60 distinct miRNA families are extracted from predicted pre-miRNAs; of which 128 were novel and are note reported in miRBase. These discoveries may enhance our current understanding of miRNAs and their targets in human. pmiRScan is freely available at http://www.csb.iitkgp.ac.in/applications/pmiRScan/index.php .
Collapse
Affiliation(s)
- Amrit Venkatesan
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
| | - Jolly Basak
- Genomics of Plant Stress Biology Lab, Department of Biotechnology, Visva-Bharati, Santiniketan, West Bengal, 731235, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
- Bioinformatics Centre, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
| |
Collapse
|
43
|
Wang J, Zhang Z, Wang Y. Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics. Biomolecules 2025; 15:81. [PMID: 39858475 PMCID: PMC11763904 DOI: 10.3390/biom15010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/02/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Cancer's heterogeneity presents significant challenges in accurate diagnosis and effective treatment, including the complexity of identifying tumor subtypes and their diverse biological behaviors. This review examines how feature selection techniques address these challenges by improving the interpretability and performance of machine learning (ML) models in high-dimensional datasets. Feature selection methods-such as filter, wrapper, and embedded techniques-play a critical role in enhancing the precision of cancer diagnostics by identifying relevant biomarkers. The integration of multi-omics data and ML algorithms facilitates a more comprehensive understanding of tumor heterogeneity, advancing both diagnostics and personalized therapies. However, challenges such as ensuring data quality, mitigating overfitting, and addressing scalability remain critical limitations of these methods. Artificial intelligence (AI)-powered feature selection offers promising solutions to these issues by automating and refining the feature extraction process. This review highlights the transformative potential of these approaches while emphasizing future directions, including the incorporation of deep learning (DL) models and integrative multi-omics strategies for more robust and reproducible findings.
Collapse
Affiliation(s)
- Jihan Wang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Zhengxiang Zhang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
44
|
Naved BA, Han S, Koss KM, Kando MJ, Wang JJ, Weiss C, Passman MG, Wertheim JA, Luo Y, Zhang ZJ. Multivariate description of gait changes in a mouse model of peripheral nerve injury and trauma. PLoS One 2025; 20:e0312415. [PMID: 39774494 PMCID: PMC11706367 DOI: 10.1371/journal.pone.0312415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 10/05/2024] [Indexed: 01/11/2025] Open
Abstract
OBJECTIVE Animal models of nerve injury are important for studying nerve injury and repair, particularly for interventions that cannot be studied in humans. However, the vast majority of gait analysis in animals has been limited to univariate analysis even though gait data is highly multi-dimensional. As a result, little is known about how various spatiotemporal components of the gait relate to each other in the context of peripheral nerve injury and trauma. We hypothesize that a multivariate characterization of gait will reveal relationships among spatiotemporal components of gait with biological relevance to peripheral nerve injury and trauma. We further hypothesize that legitimate relationships among said components will allow for more accurate classification among distinct gait phenotypes than if attempted with univariate analysis alone. METHODS DigiGait data was collected of mice across groups representing increasing degrees of damage to the neuromusculoskeletal sequence of gait; that is (a) healthy controls, (b) nerve damage only via total nerve transection + reconnection of the femoral and sciatic nerves, and (c) nerve, muscle, and bone damage via total hind-limb transplantation. Multivariate relationships among the 30+ spatiotemporal measures were evaluated using exploratory factor analysis and forward feature selection to identify the features and latent factors that best described gait phenotypes. The identified features were then used to train classifier models and compared to a model trained with features identified using only univariate analysis. RESULTS 10-15 features relevant to describing gait in the context of increasing degrees of traumatic peripheral nerve injury were identified. Factor analysis uncovered relationships among the identified features and enabled the extrapolation of a set of latent factors that further described the distinct gait phenotypes. The latent factors tied to biological differences among the groups (e.g. alterations to the anatomical configuration of the limb due to transplantation or aberrant fine motor function due to peripheral nerve injury). Models trained using the identified features generated values that could be used to distinguish among pathophysiological states with high statistical significance (p < .001) and accuracy (>80%) as compared to univariate analysis alone. CONCLUSION This is the first performance evaluation of a multivariate approach to gait analysis and the first demonstration of superior performance as compared to univariate gait analysis in animals. It is also the first study to use multivariate statistics to characterize and distinguish among different gradations of gait deficit in animals. This study contributes a comprehensive, multivariate characterization pipeline for application in the study of any pathologies in which gait is a quantitative translational outcome metric.
Collapse
Affiliation(s)
- Bilal A. Naved
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, United States of America
- Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Shuling Han
- Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Kyle M. Koss
- Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
- Department of Surgery, College of Medicine, University of Arizona, Tucson, Arizona, United States of America
| | - Mary J. Kando
- Behavioral Phenotyping Core, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Jiao-Jing Wang
- Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Craig Weiss
- Behavioral Phenotyping Core, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Maya G. Passman
- Barnard College, Columbia University, New York, NY United States of America
| | - Jason A. Wertheim
- Department of Surgery, College of Medicine, University of Arizona, Tucson, Arizona, United States of America
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Zheng J. Zhang
- Comprehensive Transplant Center, Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| |
Collapse
|
45
|
Nakajo M, Hirahara D, Jinguji M, Hirahara M, Tani A, Nagano H, Takumi K, Kamimura K, Kanzaki F, Yamashita M, Yoshiura T. Applying deep learning-based ensemble model to [ 18F]-FDG-PET-radiomic features for differentiating benign from malignant parotid gland diseases. Jpn J Radiol 2025; 43:91-100. [PMID: 39254903 PMCID: PMC11717794 DOI: 10.1007/s11604-024-01649-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 08/26/2024] [Indexed: 09/11/2024]
Abstract
OBJECTIVES To develop and identify machine learning (ML) models using pretreatment 2-deoxy-2-[18F]fluoro-D-glucose ([18F]-FDG)-positron emission tomography (PET)-based radiomic features to differentiate benign from malignant parotid gland diseases (PGDs). MATERIALS AND METHODS This retrospective study included 62 patients with 63 PGDs who underwent pretreatment [18F]-FDG-PET/computed tomography (CT). The lesions were assigned to the training (n = 44) and testing (n = 19) cohorts. In total, 49 [18F]-FDG-PET-based radiomic features were utilized to differentiate benign from malignant PGDs using five different conventional ML algorithmic models (random forest, neural network, k-nearest neighbors, logistic regression, and support vector machine) and the deep learning (DL)-based ensemble ML model. In the training cohort, each conventional ML model was constructed using the five most important features selected by the recursive feature elimination method with the tenfold cross-validation and synthetic minority oversampling technique. The DL-based ensemble ML model was constructed using the five most important features of the bagging and multilayer stacking methods. The area under the receiver operating characteristic curves (AUCs) and accuracies were used to compare predictive performances. RESULTS In total, 24 benign and 39 malignant PGDs were identified. Metabolic tumor volume and four GLSZM features (GLSZM_ZSE, GLSZM_SZE, GLSZM_GLNU, and GLSZM_ZSNU) were the five most important radiomic features. All five features except GLSZM_SZE were significantly higher in malignant PGDs than in benign ones (each p < 0.05). The DL-based ensemble ML model had the best performing classifier in the training and testing cohorts (AUC = 1.000, accuracy = 1.000 vs AUC = 0.976, accuracy = 0.947). CONCLUSIONS The DL-based ensemble ML model using [18F]-FDG-PET-based radiomic features can be useful for differentiating benign from malignant PGDs. The DL-based ensemble ML model using [18F]-FDG-PET-based radiomic features can overcome the previously reported limitation of [18F]-FDG-PET/CT scan for differentiating benign from malignant PGDs. The DL-based ensemble ML approach using [18F]-FDG-PET-based radiomic features can provide useful information for managing PGD.
Collapse
Affiliation(s)
- Masatoyo Nakajo
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan.
| | - Daisuke Hirahara
- Department of Management Planning Division, Harada Academy, 2-54-4 Higashitaniyama, Kagoshima, 890-0113, Japan
| | - Megumi Jinguji
- Department of Radiology, Nanpuh Hospital, 14-3 Nagata, Kagoshima, 892-8512, Japan
| | - Mitsuho Hirahara
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Atsushi Tani
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Hiromi Nagano
- Department of Otolaryngology Head and Neck Surgery, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Koji Takumi
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Kiyohisa Kamimura
- Department of Advanced Radiological Imaging, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Fumiko Kanzaki
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Masaru Yamashita
- Department of Otolaryngology Head and Neck Surgery, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| | - Takashi Yoshiura
- Department of Radiology, Graduate School of Medical and Dental Sciences, Kagoshima University, 8-35-1 Sakuragaoka, Kagoshima, 890-8544, Japan
| |
Collapse
|
46
|
Rafiepoor H, Ghorbankhanloo A, Soleimani Dorcheh S, Angouraj Taghavi E, Ghanadan A, Shirkoohi R, Aryanian Z, Amanpour S. Diagnostic Power of MicroRNAs in Melanoma: Integrating Machine Learning for Enhanced Accuracy and Pathway Analysis. J Cell Mol Med 2025; 29:e70367. [PMID: 39823244 PMCID: PMC11740884 DOI: 10.1111/jcmm.70367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 12/10/2024] [Accepted: 01/06/2025] [Indexed: 01/19/2025] Open
Abstract
This study identifies microRNAs (miRNAs) with significant discriminatory power in distinguishing melanoma from nevus, notably hsa-miR-26a and hsa-miR-211, which have exhibited diagnostic potential with accuracy of 81% and 78% respectively. To enhance diagnostic accuracy, we integrated miRNAs into various machine-learning (ML) models. Incorporating miRNAs with AUC scores above 0.70 significantly improved diagnostic accuracy to 94%, with a sensitivity of 91%. These findings underscore the potential of ML models to leverage miRNA data for enhanced melanoma diagnosis. Additionally, using the miRNet tool, we constructed a network of miRNA-miRNA interactions, revealing 170 key genes in melanoma pathophysiology. Protein-protein interaction network analysis via Cytoscape identified hub genes including MYC, BRCA1, JUN, AURKB, CDKN2A, DDX5, MAPK14, DDX3X, DDX6, FOXM1 and GSK3B. The identification of hub genes and their interactions with miRNAs enhances our understanding of the molecular mechanisms driving melanoma. Pathway enrichment analyses highlighted key pathways associated with differentially expressed miRNAs, including the PI3K/AKT, TGF-beta signalling pathway and cell cycle regulation. These pathways are implicated in melanoma development and progression, reinforcing the significance of our findings. The functional enrichment of miRNAs suggests their critical role in modulating essential pathways in melanoma, suggesting their potential as therapeutic targets.
Collapse
Affiliation(s)
- Haniyeh Rafiepoor
- Cancer Biology Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
| | - Alireza Ghorbankhanloo
- Cancer Biology Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
| | | | - Elham Angouraj Taghavi
- Cancer Biology Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
| | - Alireza Ghanadan
- Department of Dermatopathology, Razi HospitalTehran University of Medical SciencesTehranIran
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
- Cancer Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
| | - Zeinab Aryanian
- Autoimmune Bullous Diseases Research Center, Razi HospitalTehran University of Medical SciencesTehranIran
| | - Saeid Amanpour
- Cancer Biology Research Center, Cancer InstituteTehran University of Medical SciencesTehranIran
| |
Collapse
|
47
|
Ding H, Li N, Li L, Xu Z, Xia W. Machine learning-enabled mental health risk prediction for youths with stressful life events: A modelling study. J Affect Disord 2025; 368:537-546. [PMID: 39306010 DOI: 10.1016/j.jad.2024.09.111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/10/2024] [Accepted: 09/15/2024] [Indexed: 09/25/2024]
Abstract
BACKGROUND Youths face significant mental health challenges exacerbated by stressful life events, particularly in the context of the COVID-19 pandemic. Immature coping strategies can worsen mental health outcomes. METHODS This study utilised a two-wave cross-sectional survey design with data collected from Chinese youth aged 14-25 years. Wave 1 (N = 3038) and Wave 2 (N = 539) datasets were used for model development and external validation, respectively. Twenty-five features, encompassing dimensions related to demographic information, stressful life events, social support, coping strategies, and emotional intelligence, were input into the model to predict the mental health status of youth, which was considered their coping outcome. Shapley additive explanation (SHAP) was used to determine the importance of each risk factor in the feature selection. The intersection of top 10 features identified by random forest and XGBoost were considered the most influential predictors of mental health during the feature selection process, and was then taken as the final set of features for model development. Machine learning models, including logistic regression, AdaBoost, and a backpropagation neural network (BPNN), were trained to predict the outcomes. The optimum model was selected according to the performance in both internal and external validation. RESULTS This study identified six key features that were significantly associated with mental health outcomes: punishment, adaptation issues, self-regulation of emotions, learning pressure, use of social support, and recognition of others' emotions. The BPNN model, optimized through feature selection methods like SHAP, demonstrated superior performance in internal validation (C-index [95 % CI] = 0.9120 [0.9111, 0.9129], F-score [95 % CI] = 0.8861 [0.8853, 0.8869]). Additionally, external validation showed the model had strong discrimination (C-index = 0.9749, F-score = 0.8442) and calibration (Brier score = 0.029) capabilities. LIMITATIONS Although the clinical prediction model performed well, the study it still limited by self-reported data and representativeness of samples. Causal relationships need to be established to interpret the coping mechanism from multiple perspectives. Also, the limited data on minority groups may lead to algorithmic unfairness. CONCLUSIONS Machine learning models effectively identified and predicted mental health outcomes among youths, with the SHAP+BPNN model showing promising clinical applicability. These findings emphasise the importance and effectiveness of targeted interventions with the help of clinical prediction model.
Collapse
Affiliation(s)
- Hexiao Ding
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China; Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hunghom, Hong Kong SAR, China.
| | - Na Li
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Lishan Li
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Ziruo Xu
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Wei Xia
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| |
Collapse
|
48
|
Shahin-Shamsabadi A, Cappuccitti J. Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis. Heliyon 2024; 10:e40772. [PMID: 39720035 PMCID: PMC11667615 DOI: 10.1016/j.heliyon.2024.e40772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/22/2024] [Accepted: 11/27/2024] [Indexed: 12/26/2024] Open
Abstract
Omics techniques, such as proteomics, contain crucial data for understanding biological processes, but they remain underutilized due to their high dimensionality. Typically, proteomics research focuses narrowly on using a limited number of datasets, hindering cross-study comparisons, a problem that can potentially be addressed by machine learning. Despite this potential, machine learning has seen limited adoption in the field of proteomics. Here, skeletal muscle proteomics datasets from five separate studies were combined. These studies included conditions such as in vitro models (both 2D and 3D), in vivo skeletal muscle tissue, and adjacent tissues such as tendons. The collected data was preprocessed using MaxQuant, and then enriched using a Python script fetching structural and compositional details from UniProt and Ensembl databases. This was used to handle high-dimensional and sparsely labeled dataset by breaking it down into five smaller categories using cellular composition information and then training a Random Forest model for each category separately. Using biological context for interpreting the data resulted in improved model performance and made tailored analysis possible by reducing the dimensionality and increasing signal-to-noise ratio as well as only preserving biologically relevant features in each category. This integration of domain knowledge into data analysis and model training facilitated the discovery of new patterns while ensuring the retention of critical details, often overlooked when blind feature selection methods are used to exclude proteins with minimal expressions or variances. This approach was shown to be suitable for performing diverse analyses on individual as well as combined datasets within a broader biological context, ultimately leading to the identification of biologically relevant patterns. Besides from generating new biological insights, this approach can be used to perform tasks such as biomarker discovery, cluster analysis, classification, and anomaly detection more accurately, but incorporation of more datasets is needed to further expand the computational capabilities of such models in clinical settings.
Collapse
|
49
|
Özkahraman A, Ölmez T, Dokur Z. Performance Improvement with Reduced Number of Channels in Motor Imagery BCI System. SENSORS (BASEL, SWITZERLAND) 2024; 25:120. [PMID: 39796911 PMCID: PMC11723053 DOI: 10.3390/s25010120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 12/19/2024] [Accepted: 12/23/2024] [Indexed: 01/13/2025]
Abstract
Classifying Motor Imaging (MI) Electroencephalogram (EEG) signals is of vital importance for Brain-Computer Interface (BCI) systems, but challenges remain. A key challenge is to reduce the number of channels to improve flexibility, portability, and computational efficiency, especially in multi-class scenarios where more channels are needed for accurate classification. This study demonstrates that combining Electrooculogram (EOG) channels with a reduced set of EEG channels is more effective than relying on a large number of EEG channels alone. EOG channels provide useful information for MI signal classification, countering the notion that they only introduce eye-related noise. The study uses advanced deep learning techniques, including multiple 1D convolution blocks and depthwise-separable convolutions, to optimize classification accuracy. The findings in this study are tested on two datasets: dataset 1, the BCI Competition IV Dataset IIa (4-class MI), and dataset 2, the Weibo dataset (7-class MI). The performance for dataset 1, utilizing 3 EEG and 3 EOG channels (6 channels total), is of 83% accuracy, while dataset 2, with 3 EEG and 2 EOG channels (5 channels total), achieves an accuracy of 61%, demonstrating the effectiveness of the proposed channel reduction method and deep learning model.
Collapse
Affiliation(s)
- Ali Özkahraman
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
- Department of Electrical and Electronics Engineering, Iskenderun Technical University, 31200 Iskenderun, Hatay, Turkey
| | - Tamer Ölmez
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
| | - Zümray Dokur
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
| |
Collapse
|
50
|
Lisik D, Basna R, Dinh T, Hennig C, Shah SA, Wennergren G, Goksör E, Nwaru BI. Artificial intelligence in pediatric allergy research. Eur J Pediatr 2024; 184:98. [PMID: 39706990 PMCID: PMC11662037 DOI: 10.1007/s00431-024-05925-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/06/2024] [Accepted: 12/11/2024] [Indexed: 12/23/2024]
Abstract
Atopic dermatitis, food allergy, allergic rhinitis, and asthma are among the most common diseases in childhood. They are heterogeneous diseases, can co-exist in their development, and manifest complex associations with other disorders and environmental and hereditary factors. Elucidating these intricacies by identifying clinically distinguishable groups and actionable risk factors will allow for better understanding of the diseases, which will enhance clinical management and benefit society and affected individuals and families. Artificial intelligence (AI) is a promising tool in this context, enabling discovery of meaningful patterns in complex data. Numerous studies within pediatric allergy have and continue to use AI, primarily to characterize disease endotypes/phenotypes and to develop models to predict future disease outcomes. However, most implementations have used relatively simplistic data from one source, such as questionnaires. In addition, methodological approaches and reporting are lacking. This review provides a practical hands-on guide for conducting AI-based studies in pediatric allergy, including (1) an introduction to essential AI concepts and techniques, (2) a blueprint for structuring analysis pipelines (from selection of variables to interpretation of results), and (3) an overview of common pitfalls and remedies. Furthermore, the state-of-the art in the implementation of AI in pediatric allergy research, as well as implications and future perspectives are discussed. CONCLUSION AI-based solutions will undoubtedly transform pediatric allergy research, as showcased by promising findings and innovative technical solutions, but to fully harness the potential, methodologically robust implementation of more advanced techniques on richer data will be needed. WHAT IS KNOWN • Pediatric allergies are heterogeneous and common, inflicting substantial morbidity and societal costs. • The field of artificial intelligence is undergoing rapid development, with increasing implementation in various fields of medicine and research. WHAT IS NEW • Promising applications of AI in pediatric allergy have been reported, but implementation largely lags behind other fields, particularly in regard to use of advanced algorithms and non-tabular data. Furthermore, lacking reporting on computational approaches hampers evidence synthesis and critical appraisal. • Multi-center collaborations with multi-omics and rich unstructured data as well as utilization of deep learning algorithms are lacking and will likely provide the most impactful discoveries.
Collapse
Affiliation(s)
- Daniil Lisik
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden.
| | - Rani Basna
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden
- Division of Geriatric Medicine, Department of Clinical Sciences in Malmö, Lund University, 214 28, Malmö, Sweden
| | - Tai Dinh
- CMC University, No. 11, Duy Tan Street, Dich Vong Hau Ward, Cau Giay District, Hanoi, Vietnam
- The Kyoto College of Graduate Studies for Informatics, 7 Tanaka Monzencho, Sakyo Ward, Kyoto, Japan
| | - Christian Hennig
- Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Bologna, Italy
| | | | - Göran Wennergren
- Department of Paediatrics, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Emma Goksör
- Department of Paediatrics, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Bright I Nwaru
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|