1
|
Shen D, Yang B, Li J, Zhang J, Li Y, Zhang G, Zheng Y. The potential associations between acupuncture sensation and brain functional network: a EEG study. Cogn Neurodyn 2025; 19:49. [PMID: 40099217 PMCID: PMC11910458 DOI: 10.1007/s11571-025-10233-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 02/17/2025] [Indexed: 03/19/2025] Open
Abstract
Acupuncture has been widely used as an effective treatment for post-stroke rehabilitation. However, the potential association between acupuncture sensation, an important factor influencing treatment efficacy, and brain functional network is unclear. This research sought to reveal and quantify the changes in brain functional network associated with acupuncture sensation. So multi-channel EEG signals were collected from 30 healthy participants and the Massachusetts General Hospital Acupuncture Sensation Scale (MASS) was utilized to assess their needling sensations. Phase Lag Index (PLI) was used to construct the brain functional network, which was analyzed with graph theoretic methods. It showed that in the needle insertion (NI) state the MASS Index was significantly higher than in the needle retention (NR) state (P < 0.001), and the mean values of PLI were also higher than in the Pre-Rest state and NR state significantly (P < 0.01). In the NI state global efficiency, local efficiency, nodal efficiency, and degree centrality were significantly higher than in the Pre-Rest state and the NR state (P < 0.05), while the opposite is true for the shortest path length (P < 0.01). Then Pearson correlation analysis showed a correlation between MASS Index and graph theory metrics (P < 0.05). Finally, Support Vector Regression (SVR) was used to predict the MASS Index with a minimum mean absolute error of 0.65. These findings suggest that the NI state of acupuncture treatment changes the structure of the brain functional network and affects the graph theory metrics of the brain functional network, which may be an objective biomarker for quantitative evaluation of acupuncture sensation. Supplementary Information The online version contains supplementary material available at 10.1007/s11571-025-10233-1.
Collapse
Affiliation(s)
- Dongyang Shen
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444 China
| | - Banghua Yang
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444 China
- School of Medicine, Shanghai University, Shanghai, 200444 China
| | - Jing Li
- Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 200000 China
| | - Jiayang Zhang
- Shanghai Shaonao Technology Co., Ltd, Shanghai, 200444 China
| | - Yongcong Li
- School of Medicine, Shanghai University, Shanghai, 200444 China
| | - Guofu Zhang
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444 China
| | - Yanyan Zheng
- Wenzhou People’s Hospital, Wenzhou, 325000 Zhejiang China
| |
Collapse
|
2
|
Asadollah SBHS, Safaeinia A, Jarahizadeh S, Alcalá FJ, Sharafati A, Jodar-Abellan A. Dissolved organic carbon estimation in lakes: Improving machine learning with data augmentation on fusion of multi-sensor remote sensing observations. WATER RESEARCH 2025; 277:123350. [PMID: 39999600 DOI: 10.1016/j.watres.2025.123350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 02/18/2025] [Accepted: 02/21/2025] [Indexed: 02/27/2025]
Abstract
This paper presents a novel approach for estimating Dissolved Organic Carbon (DOC) concentrations in lakes considering both carbon sources and sink operators. Despite the critical role of DOC, the combined application of machine learning, as a robust predictor, and remote sensing technology, which reduces costly and time-intensive in-situ sampling, has been underexplored in DOC research. Focusing on lakes over the states of New York, Vermont and Maine (United States, U.S.), this study integrates in-situ DOC measurements with surface reflectance bands obtained from Landsat satellites between 2000 and 2020. Using these bands as inputs of the Random Forest (RF) predictive model, the introduced methodology aims to explore the ability of remote sensing data for large-scale DOC simulation. Initial results indicate low accuracy metrics and significant under-estimation due to the imbalance distribution of DOC samples. Statistical analysis showed that the mean DOC concentration was 5.37±3.37 mg/L (mean±one standard deviation), with peak up to 25 mg/L. A highly skewed distribution of chemical components towards the lower ranges can lead to model misrepresentation of extreme and hazardous events, as they are clouded by unimportant events due to significantly lower occurrence rates. To address this issue, the Synthetic Minority Over-sampling Technique (SMOTE) was applied as a key innovation, generating synthetic samples that enhance RF accuracy and reduce the associated errors. Fusion of in-situ and remote sensing data, combined with machine learning and data augmentation, significantly enhances DOC estimation accuracy, especially in high concentration ranges which are critical for environmental health. With prediction metrics of RMSE = 1.75, MAE = 1.09, and R2 = 0.74, RF-SMOTE significantly improve the metrics obtained from stand-alone RF, particularly in estimating high DOC concentrations. Considering the product spatial resolution of 30 m, the model's output provides potential revenue for global application in lake monitoring, even in remote regions where direct sampling is limited. This novel fusion of remote sensing, machine learning and data augmentation offers valuable insights for water quality management and understanding carbon cycling in aquatic ecosystems.
Collapse
Affiliation(s)
- Seyed Babak Haji Seyed Asadollah
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA; Department of Civil Engineering, University of Alicante, 03690 Alicante, Spain.
| | - Ahmadreza Safaeinia
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA.
| | - Sina Jarahizadeh
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA.
| | - Francisco Javier Alcalá
- Departamento de Desertificación y Geo-Ecología, Estación Experimental de Zonas Áridas (EEZA-CSIC), 04120 Almería, Spain; Instituto de Ciencias Químicas Aplicadas, Facultad de Ingeniería, Universidad Autónoma de Chile, Santiago 7500138, Chile.
| | - Ahmad Sharafati
- Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran; New Era and Development in Civil Engineering Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq
| | - Antonio Jodar-Abellan
- Soil and Water Conservation Research Group, Centre for Applied Soil Science and Biology of the Segura, Spanish National Research Council (CEBAS-CSIC), Campus de Espinardo 30100, P.O. Box 164, Murcia, Spain.
| |
Collapse
|
3
|
Zhou Y, Zhu H, Yuan Y, Song Z, Mort BC. Machine Learning Classification of Chirality and Optical Rotation Using a Simple One-Hot Encoded Cartesian Coordinate Molecular Representation. J Chem Inf Model 2025. [PMID: 40311114 DOI: 10.1021/acs.jcim.4c02374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Absolute stereochemical configurations and optical rotations were computed for 121,416 molecular structures from the QM9 quantum chemistry data set using density functional theory. A representation for the molecules was developed using Cartesian coordinate geometries and encoded atom types to serve as input for various machine learning algorithms. Classifiers were developed and trained to predict the chirality and signs of optical rotations using a variety of machine learning methods. These methods are compared, and the results demonstrate that machine learning is a viable tool for making predictions of the stereochemical properties of molecules.
Collapse
Affiliation(s)
- Yilin Zhou
- Center for Integrated Research Computing, University of Rochester, Rochester, New York 14627, United States
| | - Haoran Zhu
- Center for Integrated Research Computing, University of Rochester, Rochester, New York 14627, United States
| | - Yijie Yuan
- Center for Integrated Research Computing, University of Rochester, Rochester, New York 14627, United States
| | - Ziyu Song
- Center for Integrated Research Computing, University of Rochester, Rochester, New York 14627, United States
| | - Brendan C Mort
- Center for Integrated Research Computing, University of Rochester, Rochester, New York 14627, United States
| |
Collapse
|
4
|
Karakas AB, Govsa F, Ozer MA, Biceroglu H, Eraslan C, Tanir D. From pixels to prognosis: leveraging radiomics and machine learning to predict IDH1 genotype in gliomas. Neurosurg Rev 2025; 48:396. [PMID: 40299088 PMCID: PMC12040993 DOI: 10.1007/s10143-025-03515-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/17/2025] [Accepted: 04/05/2025] [Indexed: 04/30/2025]
Abstract
Gliomas are the most common primary tumors of the central nervous system, and advances in genetics and molecular medicine have significantly transformed their classification and treatment. This study aims to predict the IDH1 genotype in gliomas using radiomics and machine learning (ML) methods. Retrospective data from 108 glioma patients were analyzed, including MRI data supported by demographic details such as age, sex, and comorbidities. Tumor segmentation was manually performed using 3D Slicer software, and 112 radiomic features were extracted with the PyRadiomics library. Feature selection using the mRMR algorithm identified 17 significant radiomic features. Various ML algorithms, including KNN, Ensemble, DT, LR, Discriminant and SVM, were applied to predict the IDH1 genotype. The KNN and Ensemble models achieved the highest sensitivity (92-100%) and specificity (100%), emerging as the most successful models. Comparative analyses demonstrated that KNN achieved an accuracy of 92.59%, sensitivity of 92.38%, specificity of 100%, precision of 100%, and an F1-score of 95.02%. Similarly, the Ensemble model achieved an accuracy of 90.74%, sensitivity of 90.65%, specificity of 100%, precision of 100%, and an F1-score of 95.13%. To evaluate their effectiveness, KNN and Ensemble models were compared with commonly used machine learning approaches in glioma classification. LR, a conventional statistical approach, exhibited lower predictive performance with an accuracy of 79.63%, while SVM, a frequently utilized ML model for radiomics-based tumor classification, achieved an accuracy of 85.19%. Our findings are consistent with previous research indicating that radiomics-based ML models achieve high accuracy in IDH1 mutation prediction, with reported performances typically exceeding 80%. These findings suggest that KNN and Ensemble models are more effective in capturing the non-linear radiomic patterns associated with IDH1 status, compared to traditional ML approaches. Our findings indicate that radiomic analyses provide comprehensive genotypic classification by assessing the entire tumor and present a safer, faster, and more patient-friendly alternative to traditional biopsies. This study highlights the potential of radiomics and ML techniques, particularly KNN, Ensemble, and SVM, as powerful tools for predicting the molecular characteristics of gliomas and developing personalized treatment strategies.
Collapse
Affiliation(s)
- Asli Beril Karakas
- Department of Anatomy, Faculty of Medicine, Kastamonu University, Kastamonu, 37200, Turkey.
| | - Figen Govsa
- Department of Anatomy, Faculty of Medicine, Ege University, Izmir, Turkey
| | - Mehmet Asim Ozer
- Department of Anatomy, Faculty of Medicine, Ege University, Izmir, Turkey
| | - Huseyin Biceroglu
- Department of Neurosurgery, Faculty of Medicine, Ege University, Izmir, Turkey
| | - Cenk Eraslan
- Department of Radiology, Faculty of Medicine, Ege University, Izmir, Turkey
| | - Deniz Tanir
- Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Kafkas University, Kars, Turkey
| |
Collapse
|
5
|
Tibble H, Sheikh A, Tsanas A. Development and validation of a machine learning risk prediction model for asthma attacks in adults in primary care. NPJ Prim Care Respir Med 2025; 35:24. [PMID: 40268974 PMCID: PMC12019439 DOI: 10.1038/s41533-025-00428-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 04/07/2025] [Indexed: 04/25/2025] Open
Abstract
Primary care consultations provide an opportunity for patients and clinicians to assess asthma attack risk. Using a data-driven risk prediction tool with routinely collected health records may be an efficient way to aid promotion of effective self-management, and support clinical decision making. Longitudinal Scottish primary care data for 21,250 asthma patients were used to predict the risk of asthma attacks in the following year. A selection of machine learning algorithms (i.e., Naïve Bayes Classifier, Logistic Regression, Random Forests, and Extreme Gradient Boosting), hyperparameters, training data enrichment methods were explored, and validated in a random unseen data partition. Our final Logistic Regression model achieved the best performance when no training data enrichment was applied. Around 1 in 3 (36.2%) predicted high-risk patients had an attack within one year of consultation, compared to approximately 1 in 16 in the predicted low-risk group (6.7%). The model was well calibrated, with a calibration slope of 1.02 and an intercept of 0.004, and the Area under the Curve was 0.75. This model has the potential to increase the efficiency of routine asthma care by creating new personalized care pathways mapped to predicted risk of asthma attacks, such as priority ranking patients for scheduled consultations and interventions. Furthermore, it could be used to educate patients about their individual risk and risk factors, and promote healthier lifestyle changes, use of self-management plans, and early emergency care seeking following rapid symptom deterioration.
Collapse
Affiliation(s)
- Holly Tibble
- Usher Institute, The University of Edinburgh, Edinburgh, UK.
- Asthma UK Centre for Applied Research, Edinburgh, UK.
| | - Aziz Sheikh
- Usher Institute, The University of Edinburgh, Edinburgh, UK
- Asthma UK Centre for Applied Research, Edinburgh, UK
| | - Athanasios Tsanas
- Usher Institute, The University of Edinburgh, Edinburgh, UK
- Asthma UK Centre for Applied Research, Edinburgh, UK
| |
Collapse
|
6
|
Temesgen SA, Ahmad B, Grace-Mercure BK, Liu M, Liu L, Lin H, Deng K. Exploring species taxonomic kingdom using information entropy and nucleotide compositional features of coding sequences based on machine learning methods. Methods 2025:S1046-2023(25)00106-9. [PMID: 40280261 DOI: 10.1016/j.ymeth.2025.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 03/08/2025] [Accepted: 03/31/2025] [Indexed: 04/29/2025] Open
Abstract
The flow of genetic information from DNA to protein is governed by the central dogma of molecular biology. Genetic drift and mutations usually lead to changes in DNA composition, thereby affecting the coding sequences (CDS) that encode functional proteins. Analyzing the nucleotide distribution in the coding regions of species is crucial for understanding their evolution. In this study, we applied Markov processes to analyze codon formation in 37,031,061 CDSs across 3,735 species genomes, spanning viruses, archaea, bacteria, and eukaryotes, to explore compositional changes. Our results revealed species preferences for different nucleotides. Information entropies and Markov information densities show that eukaryotes exhibit higher redundancy, followed by viruses, suggesting more gene duplication in eukaryotes and high mutation rates in viruses. Evolutionary trends showed an increase in information entropy and a decrease in Markov entropy, with negative correlations between first- and second-order Markov information densities. Furthermore, uniform manifold approximation and projection (UMAP) was used to reduce information redundancy for revealing unique evolutionary patterns in species classification. The machine learning methods demonstrated excellent performance in species classification accuracy, providing profound insights into CDS evolution and protein synthesis.
Collapse
Affiliation(s)
- Sebu Aboma Temesgen
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Basharat Ahmad
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | | | - Minghao Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Kejun Deng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
| |
Collapse
|
7
|
Alkan BB, Kumartas M, Kuzucuk S, Alkan N. Fast and effective assessment for individuals with special needs form optimization and prediction models. BMC Psychol 2025; 13:415. [PMID: 40264220 PMCID: PMC12016118 DOI: 10.1186/s40359-025-02768-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Accepted: 04/17/2025] [Indexed: 04/24/2025] Open
Abstract
The aim of this study was to determine which items in the psychological assessment forms used by counselling and research centres for individuals with special needs are effective in classifying individuals into special needs diagnostic categories. Data were obtained from the psychological assessment request forms of 1814 individuals aged 6 to 12 years who were referred to the centre between 2019 and 2023 with suspected special needs and who were classified as having special needs based on psychological and developmental assessments. In addition, we wanted to develop new predictive models using machine learning methods based on these items. Optimizing the psychological assessment application form so that it contains fewer questions may help to make the assessment process faster and more effective. It is expected that the results of this study will make an important contribution to saving time and energy for experts and individuals.
Collapse
Affiliation(s)
- Bilal Baris Alkan
- Measurement and Evaluation in Education, Faculty of Education, University of Akdeniz, Dumlupınar Boulevard, Campus, Antalya, 07058, Turkey.
| | - Muhammet Kumartas
- Guidance and Psychological Counselling, Beyşehir Guidance and Research Center, Konya, 42700, Turkey
| | - Serafettin Kuzucuk
- Measurement and Evaluation in Education, Faculty of Education, University of Akdeniz, Dumlupınar Boulevard, Campus, Antalya, 07058, Turkey
| | - Nesrin Alkan
- Faculty of Economics and Administrative Sciences, Akdeniz University, Dumlupınar Boulevard, Campus, Antalya, 07058, Turkey
| |
Collapse
|
8
|
Gunasekera U, Alkhamis MA, Puvanendiran S, Das M, Kumarawadu PL, Sultana M, Hossain MA, Arzt J, Perez A. Ecological niche modeling for surveillance of foot-and-mouth disease in South Asia. PLoS One 2025; 20:e0320921. [PMID: 40261938 PMCID: PMC12013921 DOI: 10.1371/journal.pone.0320921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 02/26/2025] [Indexed: 04/24/2025] Open
Abstract
Control of transboundary diseases at a regional level is commended over the country level due to its inherent complexities. World Organization for Animal Health (WOAH) has established different zones worldwide to control such contagious diseases as foot-and-mouth disease (FMD). Controlling FMD is difficult because of the complicated connection between FMD risk factors, and the deficits of surveillance activities in countries. We used an ecological niche model (ENM) that accounts for the under-reporting of outbreaks to determine FMD risk and risk factors in South Asian countries India, Bangladesh, and Sri Lanka. Centered on known outbreak information, we predicted high-risk areas using similar regional ecological features. Using a multi-algorithm machine-learning ensemble that includes random forest, support vector, and gradient boosting, 15 predictive variables (i.e., livestock densities, land cover, and climate), 660 FMD outbreaks from 13 years (2009-2022) in the region including the outbreaks from India, Bangladesh, and Sri Lanka we identified that Sri Lanka and Bangladesh appeared to have low to medium outbreak risk in the range of 0.04 to 0.55. India was used to fit the model. The machine learning models demonstrated high predictive performance (accuracy >0.87) through cross-validation. Production systems, isothermality, cattle density (per Km2), and mean diurnal range was identified as the most important predictors of FMD outbreaks. These models help to determine FMD low-risk areas to minimize FMD surveillance activities and high-risk areas to focus on performing additional confirmatory testing, and improve surveillance in a regional context.
Collapse
Affiliation(s)
- Umanga Gunasekera
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St Paul, Minnesota, United States of America
| | - Moh A. Alkhamis
- Department of Epidemiology and Biostatistics, Faculty of Public Health, Kuwait University, Kuwait City, Kuwait
| | - Sumathy Puvanendiran
- Department of Animal Production and Health, Veterinary Research Institute, Peradeniya, Sri Lanka
| | - Moumita Das
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St Paul, Minnesota, United States of America
| | - Pradeep L. Kumarawadu
- Department of Animal Production and Health, Animal Health Division, Peradeniya, Sri Lanka
| | - Munawar Sultana
- Department of Microbiology, University of Dhaka, Dhaka, Bangladesh
| | - M. Anwar Hossain
- Department of Microbiology, University of Dhaka, Dhaka, Bangladesh
| | - Jonathan Arzt
- Foreign Animal Disease Research Unit, USDA-ARS, Plum Island Animal Disease Center, Southhold, New York, United States of America
| | - Andres Perez
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St Paul, Minnesota, United States of America
| |
Collapse
|
9
|
Loizzi V, Comes MC, Arezzo F, Apostol AI, Bove S, Fanizzi A, Fruscio R, Gregorc V, Legge F, Mancari R, Marchetti C, Negri S, Russo G, Vertechy L, Scambia G, Massafra R, Cormio G. Validation of machine learning-based models to predict and explain the risk of ovarian cancer: a multicentric study on BRCA-mutated patients undergoing risk-reducing salpingo-oophorectomy. Front Oncol 2025; 15:1574037. [PMID: 40303993 PMCID: PMC12037974 DOI: 10.3389/fonc.2025.1574037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Accepted: 03/24/2025] [Indexed: 05/02/2025] Open
Abstract
Objective BRCA-mutated women are recommended to undergo bilateral risk-reducing salpingo-oophorectomy (RRSO) after childbearing, due to the lack of effective methods that could be able to early detect the occurrence of ovarian cancer. Thus, predictive machine learning (ML) techniques could be crucial to aid clinicians in identifying high-risk BRCA-mutated patients and determining the appropriate timing for performing RRSO. Methods In this work, we addressed this task by developing explainable ML models using clinical data referred to a multicentric cohort of 694 BRCA-mutated patients from six Italian centers (Policlinico Gemelli, IRCCS San Gerardo, Policlinico Bari, Istituto Tumori Regina Elena, Istituto Tumori Giovanni Paolo II, Ospedale F. Miulli), who performed salpingo-oophorectomy, out of which 39 patients showed tumor (5.6%). Data from Istituto Tumori Regina Elena and Policlinico Bari were used as External Validation Cohort (EVC). The other data were employed as Investigational Cohort (IC). Resampling and ensemble techniques were implemented to handle dataset imbalance. Explainable techniques enabled us to identify some protective and risk factors predicted by the models with respect to the task under study. Results The best ML model achieved an AUC value of 79.3% (95% CI: 75.3% - 83.0%), an accuracy value of 73.8% (95% CI: 69.6% - 78.2%), a sensitivity value of 66.7% (95% CI: 58.1% - 75.3%), a specificity value of 74.3% (95% CI: 68.7% - 80.0%), and a G-mean value of 70.4% (95% CI: 63.0% - 76.0%) on EVC. Although the model demonstrated good overall performance, its limited sensitivity reduces its effectiveness in this high-risk population. The variables CA125, age and MatoRRSO were found to be the most significant risk factors, in agreement with the clinical perspective. Conversely, variables such as Estroprogestinuse and PregnancyNfdt played a protective factor role. Conclusion Our ML proposal explores the intricate relationships between multiple clinical variables, with a particular emphasis on understanding their non-linear associations. However, while our approach provides valuable insights into risk assessment for BRCA-mutated patients, its current predictive capacity does not significantly improve upon existing clinical models.
Collapse
Affiliation(s)
- Vera Loizzi
- S.S.D. Ginecologia Oncologica Clinicizzata, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
- Dipartimento di Biomedicina Traslazionale e Neuroscienze (DiBraiN), University of Bari Aldo Moro, Bari, Italy
| | - Maria Colomba Comes
- Laboratorio di Biostatistica e Bioinformatica, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Francesca Arezzo
- S.S.D. Ginecologia Oncologica Clinicizzata, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Adriana Ionelia Apostol
- Dipartimento Scienze della Salute della Donna, del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario Agostino Gemelli, IRCCS, Rome, Italy
- Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Samantha Bove
- Laboratorio di Biostatistica e Bioinformatica, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Annarita Fanizzi
- Laboratorio di Biostatistica e Bioinformatica, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Robert Fruscio
- Department of Medicine and Surgery, University of Milan-Bicocca, Milan, Italy
- Division of Gynecologic Surgery, IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
| | | | - Francesco Legge
- Unità di Ginecologia Oncologica, “F. Miulli” Ospedale Generale Regionale, Bari, Italy
| | - Rosanna Mancari
- Gynecologic Oncology Unit, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Claudia Marchetti
- Dipartimento Scienze della Salute della Donna, del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario Agostino Gemelli, IRCCS, Rome, Italy
- Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Serena Negri
- Department of Medicine and Surgery, University of Milan-Bicocca, Milan, Italy
- Division of Gynecologic Surgery, IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
| | - Giorgia Russo
- Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Laura Vertechy
- Dipartimento Scienze della Salute della Donna, del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario Agostino Gemelli, IRCCS, Rome, Italy
| | - Giovanni Scambia
- Dipartimento Scienze della Salute della Donna, del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario Agostino Gemelli, IRCCS, Rome, Italy
- Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Raffaella Massafra
- Laboratorio di Biostatistica e Bioinformatica, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Gennaro Cormio
- S.S.D. Ginecologia Oncologica Clinicizzata, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
- Dipartimento Interdisciplinare di Medicina (DIM), University of Bari Aldo Moro, Bari, Italy
| |
Collapse
|
10
|
Aziz G, Hardy A. A predictive model for damp risk in english housing with explainable AI. Sci Rep 2025; 15:12658. [PMID: 40221515 PMCID: PMC11993753 DOI: 10.1038/s41598-025-96396-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 03/27/2025] [Indexed: 04/14/2025] Open
Abstract
Damp in residential buildings poses risks to indoor air quality, occupant health, and structural integrity, and affects up to 27% of homes in the England. This study develops a predictive model for damp risk, using 2,073 inspection records from a housing association across 125 local authorities. Homes were labelled as damp (1,630) or non-damp (443), with data supplemented by national Energy Performance Certificate (EPC) records, incorporating building characteristics and energy efficiency indicators. To evaluate model performance, both a balanced dataset (869 homes, 426 damp, 443 non-damp) and a larger imbalanced dataset (2,073 homes) were used. Seven machine learning algorithms were deployed, with the best-performing model achieving 0.636 accuracy on balanced data and 0.793 on imbalanced data. SHAP (SHapley Additive exPlanations) analysis identified heating cost, energy consumption, and wall energy efficiency as the strongest predictors of damp. Statistical tests and causal analysis were applied to interpret SHAP results, offering insights into potential damp risk and mitigations. The findings suggest that machine learning can support early identification of homes likely to develop damp, helping housing managers prioritise interventions before damp issues escalate.
Collapse
Affiliation(s)
- Gulala Aziz
- Leeds Sustainability Institute, Leeds Beckett University, Headingley Campus, Churchwood House, G02, Leeds, UK
| | - Adam Hardy
- Leeds Sustainability Institute, Leeds Beckett University, Headingley Campus, Churchwood House, G02, Leeds, UK.
| |
Collapse
|
11
|
Bengs BD, Nde J, Dutta S, Li Y, Sardiu ME. Integrative approaches for predicting protein network perturbations through machine learning and structural characterization. J Proteomics 2025; 316:105439. [PMID: 40228603 DOI: 10.1016/j.jprot.2025.105439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/14/2025] [Accepted: 04/08/2025] [Indexed: 04/16/2025]
Abstract
Chromatin remodeling complexes, such as the Saccharomyces cerevisiae INO80 complex, exemplify how dynamic protein interaction networks govern cellular function through a balance of conserved structural modules and context-dependent functional partnerships, as revealed by integrative machine learning and structural mapping approaches. In this study, we explored the INO80 complex using machine learning to predict network changes caused by genetic deletions. Tree-based models outperformed linear approaches, highlighting non-linear relationships within the interaction network. Feature selection identified key INO80 components (e.g., Arp5, Arp8) and cross-compartment features from other remodeling complexes like SWR1 and NuA4, emphasizing shared functional pathways. Perturbation patterns aligned with biological modules, particularly those linked to telomere maintenance and aging, underscoring the functional coherence of these networks. Structural mapping revealed that not all interactions are predictable through proximity alone, particularly with Arp5 and Yta7. By combining structural insights with machine learning, we enhanced predictions of genetic perturbation effects, providing a template for analyzing cross-species homologs (e.g., human INO80) and their disease-associated variants. This integrative approach bridges the gap between static structural data and dynamic functional networks, offering a pathway to disentangle conserved mechanisms from context-dependent adaptations in chromatin biology. SIGNIFICANCE: By leveraging an innovative, integrative machine learning approach, we have successfully predicted and analyzed perturbations in the INO80 network with good accuracy and depth. Our novel combination of machine learning, perturbation analysis, and structural investigation approach has provided crucial insights into the complex's structure-function relationships, shedding new light on its pivotal roles in affected pathways such as telomere maintenance. Our findings not only enhance our understanding of the INO80 complex but also establish a powerful framework for future studies in chromatin biology and beyond. This work represents a step forward in our understanding of chromatin remodeling complexes and their diverse cellular functions, laying the groundwork for future studies that can further refine our computational approaches and experimental techniques in this field.
Collapse
Affiliation(s)
- Bethany D Bengs
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Jules Nde
- Department of Cancer Biology, University of Kansas Medical Center, Kansas, USA
| | - Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA; University of Kansas Cancer Center, Kansas City, USA; Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas, USA.
| |
Collapse
|
12
|
Więckowska B, Kubiak KB, Guzik P. Evaluating the three-level approach of the U-smile method for imbalanced binary classification. PLoS One 2025; 20:e0321661. [PMID: 40208902 PMCID: PMC11984743 DOI: 10.1371/journal.pone.0321661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 03/09/2025] [Indexed: 04/12/2025] Open
Abstract
Real-life binary classification problems often involve imbalanced datasets, where the majority class outnumbers the minority class. We previously developed the U-smile method, which comprises the U-smile plot and the BA, RB and I coefficients, to assess the usefulness of a new variable added to a reference prediction model and validated it under class balance. In this study, we evaluated the U-smile method under class imbalance, proposed a three-level approach of the U-smile method, and used the I coefficients as a weighting factor for point size in the U-smile plots of the BA and RB coefficients. Using real data from the Heart Disease dataset and generated random variables, we built logistic regression models to assess four new variables added to the reference model (nested setting). These models were evaluated at seven pre-defined imbalance levels of 1%, 10%, 30%, 50%, 70%, 90% and 99% of the event class. The results of the U-smile method were compared to those of certain traditional measures: Brier skill score, net reclassification index, difference in F1-score, difference in Matthews correlation coefficient, difference in the area under the receiver operating characteristic curve of the new and reference models, and the likelihood-ratio test. The reference model overfitted to the majority class at higher imbalance levels. The BA-RB-I coefficients of the U-smile method identified informative variables across the entire imbalance range. At higher imbalance levels, the U-smile method indicated both prediction improvement in the minority class (positive BA and I coefficients) and reduction in overfitting to the majority class (negative RB coefficients). The U-smile method outperformed traditional evaluation measures across most of the imbalance range. It proved highly effective in variable selection for imbalanced binary classification, making it a useful tool for real-life problems, where imbalanced datasets are prevalent.
Collapse
Affiliation(s)
- Barbara Więckowska
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznan, Poland
| | - Katarzyna B. Kubiak
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznan, Poland
| | - Przemysław Guzik
- Department of Cardiology - Intensive Therapy and Internal Medicine, Poznan University of Medical Sciences, Poznan, Poland
- University Centre for Sports and Medical Studies, Poznan University of Medical Sciences, Poznan, Poland
| |
Collapse
|
13
|
Wang Z, Wang W, Sun C, Li J, Xie S, Xu J, Zou K, Jin Y, Yan S, Liao X, Kang Y, Coopersmith CM, Sun X. A methodological systematic review of validation and performance of sepsis real-time prediction models. NPJ Digit Med 2025; 8:190. [PMID: 40189694 PMCID: PMC11973177 DOI: 10.1038/s41746-025-01587-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 03/26/2025] [Indexed: 04/09/2025] Open
Abstract
Sepsis real-time prediction models (SRPMs) provide timely alerts and may improve patient outcomes but face limited clinical adoption due to inconsistent validation methods and potential biases. Comprehensive evaluation, including external full-window validation with model- and outcome-level metrics, is crucial for real-world effectiveness, yet performance evidence remains scarce. This study systematically reviewed SRPM performance across validation methods, analyzing 91 studies from multiple databases. Only 54.9% applied full-window validation with both metric types. Performance decreased under external and full-window validation, with median AUROCs of 0.886 and 0.861 at 6- and 12-hours pre-onset, dropping to 0.783 in full-window external validation. Median Utility Scores declined from 0.381 in internal to -0.164 in external validation. Combining AUROC and Utility Score identified top-performing SRPMs in 18.7% of studies. Hand-crafted features significantly improved performance. Future research should focus on multi-center datasets, hand-crafted features, multi-metric full-window validation, and prospective trials to support clinical implementation.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Wen Wang
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China.
| | - Che Sun
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jili Li
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- West China School of Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Shuangyi Xie
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jiayue Xu
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Kang Zou
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Yinghui Jin
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Siyu Yan
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Xuelian Liao
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Yan Kang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Craig M Coopersmith
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Xin Sun
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
14
|
Kim S, Shin HE, Kim M, Won CW. Which pathway of the possible sarcopenia algorithm of the AWGS 2019 guideline is the best in Korean community-dwelling older men and women? Arch Gerontol Geriatr 2025; 131:105778. [PMID: 39955963 DOI: 10.1016/j.archger.2025.105778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/27/2025] [Accepted: 02/03/2025] [Indexed: 02/18/2025]
Abstract
OBJECTIVE To compare the diagnostic accuracy of possible sarcopenia identification pathways, as suggested by Asian Working Group for Sarcopenia (AWGS) in 2019, by gender among Korean community-dwelling older adults. DESIGN Cross-sectional analysis of data from 2,129 community-dwelling adults (70-84 years, 50.4% men) enrolled in Korean Frailty and Aging Cohort Study. METHODS Based on AWGS 2019 guideline, possible sarcopenia was defined by low handgrip strength (HGS) or slow five-times chair stand test (5CST) time, referred to as "assessments." "Case-findings" (low calf circumference [CC], SARC-F ≥4, or SARC-CalF ≥11) were recommended for screening 'possible sarcopenia' before assessment. For the six 'possible sarcopenia' pathways (combining three case-finding and two assessment tools), area under the curve (AUC) and F1 score are compared. RESULTS For case-finding in men, CC demonstrated the highest AUC (0.657) and F1 score (0.504) for predicting sarcopenia compared with SARC-F and SARC-CalF (p <0.001, =0.001). Among men with low CC, ΔAUC between HGS and 5CST was not significant as assessment (p=0.079) (AUCs: 0.763 vs. 0.707; F1 scores: 0.713 vs. 0.650). For case-finding in women, SARC-CalF demonstrated the highest AUC (0.631) and F1 score (0.389) compared with CC and SARC-F (p=0.012, <0.001). Subsequently, ΔAUC between HGS and 5CST was not significant in women (p=0.069) (AUCs: 0.566 vs. 0.636; F1 scores: 0.387 vs. 0.514). CONCLUSIONS Based on AWGS 2019 guideline, CC in men and SARC-CalF in women was the best case-finding tool for community-dwelling older adults. After the best case-finding in each gender, two assessment pathways demonstrated insignificant difference in both genders. BRIEF SUMMARY For case-finding of possible sarcopenia, using calf circumference in older men and using SARC-CalF in older women demonstrated the highest diagnostic accuracy for predicting sarcopenia. After the best case-finding in each gender, two assessment pathways (handgrip strength and five-times chair stand test) of possible sarcopenia demonstrated insignificant difference in both genders.
Collapse
Affiliation(s)
- Sohee Kim
- College of Medicine, Kyung Hee University, Seoul, 02447, South Korea
| | - Hyung Eun Shin
- Department of Biomedical Science and Technology, Kyung Hee University, Seoul, 02447, South Korea
| | - Miji Kim
- Department of Health Sciences and Technology, College of Medicine, Kyung Hee University, Seoul, 02447, South Korea.
| | - Chang Won Won
- Department of Family Medicine, Kyung Hee University College of Medicine, Kyung Hee University Hospital, Seoul, 02447, South Korea.
| |
Collapse
|
15
|
Dai Q, Wang L, Zhang J, Ding W, Chen L. GQEO: Nearest neighbor graph-based generalized quadrilateral element oversampling for class-imbalance problem. Neural Netw 2025; 184:107107. [PMID: 39778294 DOI: 10.1016/j.neunet.2024.107107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 12/18/2024] [Accepted: 12/26/2024] [Indexed: 01/11/2025]
Abstract
The class imbalance problem is one of the difficult factors affecting the performance of traditional classifiers. The oversampling technique is the most common way to solve the class imbalance problem. They alleviate the performance impact of the class imbalance problem on traditional machine learning by augmenting minority instance feature representation. However, many SMOTE-based oversampling techniques perform linear interpolation on the line segment between the anchor instance and its nearest neighbor. This type of method only uses local information and ignores the impact of the global neighborhood relationship on the anchor instance. Therefore, inspired by finite element interpolation, a novel generalized quadrilateral element oversampling technique (GQEO) based on k-nearest neighbor graphs is proposed. First, GQEO uses the k-nearest neighbor to search the global neighbor relationship and build the global neighbor relationship graph. Then, the global neighbor graph is searched for nodes forming generalized quadrilateral elements, using planar quadrilaterals as constraints. Finally, in generalized quadrilateral elements, we use one-dimensional shape functions to synthesize minority instances in quadrilateral elements. Experimental results on 30 imbalanced datasets show that GQEO can alleviate the impact of the class imbalance problem and prevent noise from participating in the synthesis process. GQEO obtains competitive results compared to state-of-the-art oversampling techniques that consider minority noises.
Collapse
Affiliation(s)
- Qi Dai
- College of Science, North China University of Science and Technology, Tangshan, 063210, China
| | - Longhui Wang
- College of Science, North China University of Science and Technology, Tangshan, 063210, China
| | - Jing Zhang
- College of Science, North China University of Science and Technology, Tangshan, 063210, China
| | - Weiping Ding
- School of Artificial Intelligence and Computer Science, Nantong University, Nantong, 226019, China
| | - Lifang Chen
- College of Science, North China University of Science and Technology, Tangshan, 063210, China.
| |
Collapse
|
16
|
Liu Y, Liu Y, Zhang S, Zeng C, Zhang Q, Jiang Y, Yang X, Zheng L, Ge Q, Zhang Y, Chen Y, Lu M, Liu H. Using explainable machine learning to predict the irritation and corrosivity of chemicals on eyes and skin. Toxicol Lett 2025; 408:1-12. [PMID: 40180199 DOI: 10.1016/j.toxlet.2025.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 02/20/2025] [Accepted: 03/22/2025] [Indexed: 04/05/2025]
Abstract
Contact with specific chemicals often results in corrosive and irritative responses in the eyes and skin, playing a pivotal role in assessing the potential hazards of personal care products, cosmetics, and industrial chemicals to human health. While traditional animal testing can provide valuable information, its high costs, ethical controversies, and significant demand for animals limit its extensive use, particularly during preliminary screening stages. To address these issues, we adopted a computational modeling approach, integrating 3316 experimental data points on eye irritation and 3080 data points on skin irritation, to develop various machine learning and deep learning models. Under the evaluation of the external validation set, the best-performing models for the two tasks achieved balanced accuracies (BAC) of 73.0 % and 75.1 %, respectively. Furthermore, interpretability analyses were conducted at the dataset level, molecular level, and atomic level to provide insights into the prediction outcomes. Analysis of substructure frequencies identified structural alert fragments within the datasets. This information serves as a reference for identifying potentially irritating chemicals. Additionally, a user-friendly visualization interface was developed, enabling non-specialists to easily predict eye and skin irritation potential. In summary, our study provides a new avenue for the assessment of irritancy potential in chemicals used in pesticides, cosmetics, and ophthalmic drugs.
Collapse
Affiliation(s)
- Yingxu Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Yang Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Simeng Zhang
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Chen Zeng
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Qing Zhang
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Yunya Jiang
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Xi Yang
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Lidan Zheng
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Qian Ge
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China
| | - Yanmin Zhang
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China.
| | - Yadong Chen
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China.
| | - Mengyi Lu
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, PR China.
| | - Haichun Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, PR China.
| |
Collapse
|
17
|
Chen YY, Yen HK, Hsu JY, Lin TC, Lin HC, Chen CW, Hu MH, Groot OQ, Schwab JH. International external validation of the SORG machine learning algorithm for predicting sustained postoperative opioid prescription after anterior cervical discectomy and fusion using a Taiwanese cohort of 1,037 patients. Spine J 2025:S1529-9430(25)00171-8. [PMID: 40158632 DOI: 10.1016/j.spinee.2025.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 01/22/2025] [Accepted: 03/23/2025] [Indexed: 04/02/2025]
Abstract
BACKGROUND CONTEXT Anterior cervical discectomy and fusion (ACDF) is widely performed for cervical spine disorders, with opioids commonly prescribed postoperatively for pain management. However, prolonged opioid use carries significant risks such as dependency and adverse health effects. Predictive models like the SORG machine learning algorithm (SORG-MLA) have been developed to forecast prolonged opioid use post-ACDF. External validation is essential to ensure their effectiveness across different healthcare settings and populations. PURPOSE The study aimed to assess the generalizability of the SORG-MLA to a Taiwanese patient cohort for predicting prolonged opioid use after ACDF. STUDY DESIGN Retrospective cohort study utilizing data from a tertiary care center in Taiwan. PATIENT SAMPLE 1,037 patients who underwent ACDF between 2010 and 2018 were included. OUTCOME MEASURES The primary outcome was sustained postoperative opioid prescription defined as continuous opioid use for at least 90 days following ACDF. METHODS The performance of the SORG-MLA in the validation cohort was assessed using discrimination measures (area under the receiver operating characteristic curve [AUROC] and the area under the precision-recall curve [AUPRC]), calibration, overall performance (Brier Score), and decision curve analysis. Comparing the validation cohort to the developmental revealed significant differences in demographic profiles, medicolegal frameworks, ethnic cultural contexts and key predictors of postoperative opioid use identified by the SORG-MLA. The Taiwanese cohort was characterized by an older age demographic, a lower proportion of female participants, higher smoking prevalence, higher incidence of preoperative myelopathy and radiculopathy, and more frequent use of antidepressants prior to surgery. Conversely, these patients were less likely to have extended preoperative opioid prescriptions beyond 180 days, undergo multilevel ACDF procedures, or be treated with concurrent medications such as Beta-2 agonists, Gabapentin, and ACE inhibitors. This study had no funding source or conflict of interests. RESULTS The model demonstrated good discriminative ability, with an AUROC of 0.78 and an AUPRC of 0.35. Calibration curves indicated that the model overestimated the risk of prolonged opioid use. This discrepancy may be attributed to the significantly higher incidence of sustained opioid consumption in the American development cohort, spanning from 2000 to 2018, which was threefold higher than that in the Taiwanese validation cohort between 2010 and 2018 (9.9% [270/2737] vs. 3.3% [34/1037]; p < .01). The Brier score was 0.033, which improved upon the null model's score of 0.040, indicating robust overall performance. Decision curve analysis confirmed the model's clinical utility, demonstrating net benefits across various decision thresholds. CONCLUSIONS The SORG-MLA has demonstrated robust discriminative abilities and overall performance when applied to a unique Taiwanese cohort. However, the model exhibited an overestimation of the risk of prolonged opioid use, suggesting the need for recalibration with more contemporary data to reflect current opioid prescription practices, ethnic and cultural differences, and opioid regulations. Following recalibration, integration and prospective validation within the electronic healthcare system should be pursued. This will enable clinicians to proactively identify patients at heightened risk of prolonged opioid use following ACDF.
Collapse
Affiliation(s)
- Yu-Yung Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei City, Taiwan; Department of Medical Education, National Taiwan University Hospital, Taipei City, Taiwan
| | - Hung-Kuan Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei City, Taiwan; Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu City, Taiwan
| | - Jui-Yo Hsu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei City, Taiwan; Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu City, Taiwan
| | - Ta-Chun Lin
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Hao-Chen Lin
- Department of Medical Education, National Taiwan University Hospital, Taipei City, Taiwan
| | - Chih-Wei Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei City, Taiwan
| | - Ming-Hsiao Hu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei City, Taiwan.
| | - Olivier Q Groot
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, The Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, United States
| | - Joseph H Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, United States; Department of Orthopedic Surgery, Cedars-Sinai Medical Center, Los Angeles, United States
| |
Collapse
|
18
|
Li T, Zhang Y, Su D, Liu M, Ge M, Chen L, Li C, Tang J. Knowledge Graph-Based Few-Shot Learning for Label of Medical Imaging Reports. Acad Radiol 2025:S1076-6332(25)00189-8. [PMID: 40140273 DOI: 10.1016/j.acra.2025.02.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/23/2025] [Accepted: 02/25/2025] [Indexed: 03/28/2025]
Abstract
BACKGROUND The application of artificial intelligence (AI) in the field of automatic imaging report labeling faces the challenge of manually labeling large datasets. PURPOSE To propose a data augmentation method by using knowledge graph (KG) and few-shot learning. METHODS A KG of lumbar spine X-ray images was constructed, and 2000 data were annotated based on the KG, which were divided into training, validation, and test sets in a ratio of 7:2:1. The training dataset was augmented based on the synonym/replacement attributes of the KG and was the augmented data was input into the BERT (Bidirectional Encoder Representations from Transformers) model for automatic annotation training. The performance of the model under different augmentation ratios (1:10, 1:100, 1:1000) and augmentation methods (synonyms only, replacements only, combination of synonyms and replacements) was evaluated using the precision and F1 scores. In addition, with the augmentation ratio was fixed, iterative experiments were performed by supplementing the data of nodes that perform poorly in the validation set to further improve model's performance. RESULTS Prior to data augmentation, the precision was 0.728 and the F1 score was 0.666. By adjusting the augmentation ratio, the precision increased from 0.912 at a 1:10 augmentation ratio to 0.932 at a 1:100 augmentation ratio (P<.05), while F1 score improved from 0.853 at a 1:10 augmentation ratio to 0.881 at a 1:100 augmentation ratio (P<.05). Additionally, the effectiveness of various augmentation methods was compared at a 1:100 augmentation ratio. The augmentation method that combined synonyms and replacements (F1=0.881) was superior to the methods that only used synonyms (F1=0.815) and only used replacements (F1=0.753) (P<.05). For nodes that exhibited suboptimal performance on the validation set, supplementing the training set with target data improved model performance, increasing the average F1 score to 0.979 (P<.05). CONCLUSION Based on the KG, this study trained an automatic labeling model of radiology reports using a few-shot data set. This method effectively reduces the workload of manual labeling, improves the efficiency and accuracy of image data labeling, and provides an important research strategy for the application of AI in the domain of automatic labeling of image reports.
Collapse
Affiliation(s)
- Tiancheng Li
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Yuxuan Zhang
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Deyu Su
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Ming Liu
- College of Artificial Intelligence, Anhui University, Hefei, China (M.L.)
| | - Mingxin Ge
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Linyu Chen
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Chuanfu Li
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.); First Clinical Medical College, Anhui University of Traditional Chinese Medicine, Hefei, China (C.L.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Jin Tang
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.).
| |
Collapse
|
19
|
Ma Y, Lv H, Ma Y, Wang X, Lv L, Liang X, Wang L. Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data. BioData Min 2025; 18:25. [PMID: 40128863 PMCID: PMC11934807 DOI: 10.1186/s13040-025-00440-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Accepted: 03/12/2025] [Indexed: 03/26/2025] Open
Abstract
BACKGROUND Constructing a predictive model is challenging in imbalanced medical dataset (such as preeclampsia), particularly when employing ensemble machine learning algorithms. OBJECTIVE This study aims to develop a robust pipeline that enhances the predictive performance of ensemble machine learning models for the early prediction of preeclampsia in an imbalanced dataset. METHODS Our research establishes a comprehensive pipeline optimized for early preeclampsia prediction in imbalanced medical datasets. We gathered electronic health records from pregnant women at the People's Hospital of Guangxi from 2015 to 2020, with additional external validation using three public datasets. This extensive data collection facilitated the systematic assessment of various resampling techniques, varied minority-to-majority ratios, and ensemble machine learning algorithms through a structured evaluation process. We analyzed 4,608 combinations of model settings against performance metrics such as G-mean, MCC, AP, and AUC to determine the most effective configurations. Advanced statistical analyses including OLS regression, ANOVA, and Kruskal-Wallis tests were utilized to fine-tune these settings, enhancing model performance and robustness for clinical application. RESULTS Our analysis confirmed the significant impact of systematic sequential optimization of variables on the predictive performance of our models. The most effective configuration utilized the Inverse Weighted Gaussian Mixture Model for resampling, combined with Gradient Boosting Decision Trees algorithm, and an optimized minority-to-majority ratio of 0.09, achieving a Geometric Mean of 0.6694 (95% confidence interval: 0.5855-0.7557). This configuration significantly outperformed the baseline across all evaluated metrics, demonstrating substantial improvements in model performance. CONCLUSIONS This study establishes a robust pipeline that significantly enhances the predictive performance of models for preeclampsia within imbalanced datasets. Our findings underscore the importance of a strategic approach to variable optimization in medical diagnostics, offering potential for broad application in various medical contexts where class imbalance is a concern.
Collapse
Affiliation(s)
- Yinyao Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China
| | | | - Yanhua Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China
| | | | | | - Xuxia Liang
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China.
| | - Lei Wang
- BGI Research, Wuhan, 430074, China.
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, 518083, China.
| |
Collapse
|
20
|
Yu X, Zhu D, Guo H, Zhou C, Elhassan MAM, Wang M. DASNet: A Convolutional Neural Network with SE Attention Mechanism for ccRCC Tumor Grading. Interdiscip Sci 2025:10.1007/s12539-025-00693-8. [PMID: 40126867 DOI: 10.1007/s12539-025-00693-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 01/17/2025] [Accepted: 01/21/2025] [Indexed: 03/26/2025]
Abstract
Clear cell renal cell carcinoma (ccRCC) is the most common form of renal cell carcinoma in adults, comprising approximately 80% of cases. The lethality of ccRCC rises significantly at stage III or beyond, emphasizing the need for early detection to enable timely therapeutic interventions. This study introduces a non-invasive and efficient classification method, Domain Adaptive Squeeze-and-Excitation Network (DASNet), for grading ccRCC through Computed Tomography (CT) images using advanced deep learning and machine learning techniques. The dataset is enhanced using MedAugment technology and balanced to improve generalization and classification performance. To mitigate overfitting, renal angiomyolipoma (AML) samples are incorporated, increasing data diversity and model robustness. EfficientNet and RegNet serve as foundational models, leveraging local feature extraction and Squeeze-and-Excitation (SE) attention mechanisms to enhance recognition accuracy across grades. Furthermore, Domain-Adversarial Neural Networks (DANNs) are employed to maintain consistency between source and target domains, bolstering the model's generalization ability. The proposed model achieves a classification accuracy of 97.50%, demonstrating efficacy in early ccRCC grade identification. These findings not only offer valuable clinical insights but also establish a foundation for broader application of deep learning in tumor detection.
Collapse
Affiliation(s)
- Xiaoyi Yu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Donglin Zhu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Hongjie Guo
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China.
| | - Changjun Zhou
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Mohammed A M Elhassan
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Mengzhen Wang
- Department of Urology, First Affiliated Hospital of Nanchang University, Nanchang, 330006, China.
| |
Collapse
|
21
|
Racioppo P, Alhasany A, Pham NV, Wang Z, Corradetti G, Mikaelian G, Paulus YM, Sadda SR, Hu Z. Automated Foveal Avascular Zone Segmentation in Optical Coherence Tomography Angiography Across Multiple Eye Diseases Using Knowledge Distillation. Bioengineering (Basel) 2025; 12:334. [PMID: 40281694 PMCID: PMC12025180 DOI: 10.3390/bioengineering12040334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 03/15/2025] [Accepted: 03/21/2025] [Indexed: 04/29/2025] Open
Abstract
Optical coherence tomography angiography (OCTA) is a noninvasive imaging technique used to visualize retinal blood flow and identify changes in vascular density and enlargement or distortion of the foveal avascular zone (FAZ), which are indicators of various eye diseases. Although several automated FAZ detection and segmentation algorithms have been developed for use with OCTA, their performance can vary significantly due to differences in data accessibility of OCTA in different retinal pathologies, and differences in image quality in different subjects and/or different OCTA devices. For example, data from subjects with direct macular damage, such as in age-related macular degeneration (AMD), are more readily available in eye clinics, while data on macular damage due to systemic diseases like Alzheimer's disease are often less accessible; data from healthy subjects may have better OCTA quality than subjects with ophthalmic pathologies. Typically, segmentation algorithms make use of convolutional neural networks and, more recently, vision transformers, which make use of both long-range context and fine-grained detail. However, transformers are known to be data-hungry, and may overfit small datasets, such as those common for FAZ segmentation in OCTA, to which there is limited access in clinical practice. To improve model generalization in low-data or imbalanced settings, we propose a multi-condition transformer-based architecture that uses four teacher encoders to distill knowledge into a shared base model, enabling the transfer of learned features across multiple datasets. These include intra-modality distillation using OCTA datasets from four ocular conditions: healthy aging eyes, Alzheimer's disease, AMD, and diabetic retinopathy; and inter-modality distillation incorporating color fundus photographs of subjects undergoing laser photocoagulation therapy. Our multi-condition model achieved a mean Dice Index of 83.8% with pretraining, outperforming single-condition models (mean of 83.1%) across all conditions. Pretraining on color fundus photocoagulation images improved the average Dice Index by a small margin on all conditions except AMD (1.1% on single-condition models, and 0.1% on multi-condition models). Our architecture demonstrates potential for broader applications in detecting and analyzing ophthalmic and systemic diseases across diverse imaging datasets and settings.
Collapse
Affiliation(s)
- Peter Racioppo
- Doheny Image Analysis Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Aya Alhasany
- Doheny Image Analysis Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Nhuan Vu Pham
- Doheny Image Analysis Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Ziyuan Wang
- Doheny Image Analysis Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Giulia Corradetti
- Doheny Image Reading and Research Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Gary Mikaelian
- Hedgefog Research Inc., 1891 N Gaffey St. Ste 224, San Pedro, CA 90731, USA
| | - Yannis M. Paulus
- Wilmer Eye Institute, Department of Ophthalmology, Johns Hopkins University, 1800 Orleans St, Baltimore, MD 21287, USA
| | - SriniVas R. Sadda
- Doheny Image Reading and Research Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| | - Zhihong Hu
- Doheny Image Analysis Laboratory, Doheny Eye Institute, 150 North Orange Grove Blvd, Pasadena, CA 91103, USA
| |
Collapse
|
22
|
Al-Jubouri B, Desiati I, Wijanarko W, Espallargas N. An Imbalance Regression Approach to Toxicity Prediction of Chemicals for Potential Use in Environmentally Acceptable Lubricants. ACS APPLIED MATERIALS & INTERFACES 2025; 17:16725-16737. [PMID: 40062875 PMCID: PMC11931492 DOI: 10.1021/acsami.4c10622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2025]
Abstract
Lubricants are complex mixtures of chemicals that help machines function at the right level of friction and wear. Lubricant formulation methods are based on empirical experience of chemical substances that have been used as lubricants for decades. In the last years, the discussion about their environmental problem has triggered new legislations resulting in the search for Environmentally Acceptable Lubricants, which should be biodegradable, minimally toxic, and nonbioaccumulative. Finding new chemicals that comply with these three criteria is a long and expensive process that can be boosted by machine learning (ML). In this paper, we are addressing toxicity prediction with machine learning models by exploring the application of ensemble learners to chemicals having imbalanced data distribution. We investigated the effectiveness of sampling techniques to balance the data and improve the performance of the ensemble learning model. The model can predict toxicity for nonundersampled groups, which in our case corresponds to the moderately to highly toxic groups. The results of this work are useful for lubricant formulators since regulations accept moderate-to-highly toxic chemicals in lubricants if their concentration is below 20 wt %.
Collapse
Affiliation(s)
- B Al-Jubouri
- The Norwegian Tribology Center, Department of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), R. Birkelandsvei 2B, Trondheim 7491, Norway
- Department of Computer Science, York St John University, Lord Mayor's Walk, York, York YO31 7EX, United Kingdom
| | - I Desiati
- The Norwegian Tribology Center, Department of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), R. Birkelandsvei 2B, Trondheim 7491, Norway
| | - W Wijanarko
- The Norwegian Tribology Center, Department of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), R. Birkelandsvei 2B, Trondheim 7491, Norway
| | - N Espallargas
- The Norwegian Tribology Center, Department of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), R. Birkelandsvei 2B, Trondheim 7491, Norway
| |
Collapse
|
23
|
Albattah W, Khan RU. Impact of imbalanced features on large datasets. Front Big Data 2025; 8:1455442. [PMID: 40151465 PMCID: PMC11948280 DOI: 10.3389/fdata.2025.1455442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 02/21/2025] [Indexed: 03/29/2025] Open
Abstract
The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.
Collapse
Affiliation(s)
| | - Rehan Ullah Khan
- Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia
| |
Collapse
|
24
|
Abousaber I. A Novel Explainable Attention-Based Meta-Learning Framework for Imbalanced Brain Stroke Prediction. SENSORS (BASEL, SWITZERLAND) 2025; 25:1739. [PMID: 40292890 PMCID: PMC11945820 DOI: 10.3390/s25061739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 03/01/2025] [Accepted: 03/07/2025] [Indexed: 04/30/2025]
Abstract
The accurate prediction of brain stroke is critical for effective diagnosis and management, yet the imbalanced nature of medical datasets often hampers the performance of conventional machine learning models. To address this challenge, we propose a novel meta-learning framework that integrates advanced hybrid resampling techniques, ensemble-based classifiers, and explainable artificial intelligence (XAI) to enhance predictive performance and interpretability. The framework employs SMOTE and SMOTEENN for handling class imbalance, dynamic feature selection to reduce noise, and a meta-learning approach combining predictions from Random Forest and LightGBM, and further refined by a deep learning-based meta-classifier. The model uses SHAP (Shapley Additive Explanations) to provide transparent insights into feature contributions, increasing trust in its predictions. Evaluated on three datasets, DF-1, DF-2, and DF-3, the proposed framework consistently outperformed state-of-the-art methods, achieving accuracy and F1-Score of 0.992189 and 0.992579 on DF-1, 0.980297 and 0.981916 on DF-2, and 0.981901 and 0.983365 on DF-3. These results validate the robustness and effectiveness of the approach, significantly improving the detection of minority-class instances while maintaining overall performance. This work establishes a reliable solution for stroke prediction and provides a foundation for applying meta-learning and explainable AI to other imbalanced medical prediction tasks.
Collapse
Affiliation(s)
- Inam Abousaber
- Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 47912, Saudi Arabia
| |
Collapse
|
25
|
Hua H, Long W, Pan Y, Li S, Zhou J, Wang H, Chen S. scCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural Networks. Interdiscip Sci 2025; 17:12-26. [PMID: 39348073 DOI: 10.1007/s12539-024-00655-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 08/20/2024] [Accepted: 08/21/2024] [Indexed: 10/01/2024]
Abstract
Cancer is a significant global public health concern, where early detection can greatly enhance curative outcomes. Therefore, the identification of cancer cells holds significant importance as the primary method for cancer diagnosis. The advancement of single-cell RNA sequencing (scRNA-seq) technology has made it possible to address the problem of cancer cell identification at the single-cell level more efficiently with computational methods, as opposed to the time-consuming and less reproducible manual identification methods. However, existing computational methods have shown suboptimal identification performance and a lack of capability to incorporate external reference data as prior information. Here, we propose scCrab, a reference-guided automatic cancer cell identification method, which performs ensemble learning based on a Bayesian neural network (BNN) with multi-head self-attention mechanisms and a linear regression model. Through a series of experiments on various datasets, we systematically validated the superior performance of scCrab in both intra- and inter-dataset predictions. Besides, we demonstrated the robustness of scCrab to dropout rate and sample size, and conducted ablation experiments to investigate the contributions of each component in scCrab. Furthermore, as a dedicated model for cancer cell identification, scCrab effectively captures cancer-related biological significance during the identification process.
Collapse
Affiliation(s)
- Heyang Hua
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Wenxin Long
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Yan Pan
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Siyu Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Jianyu Zhou
- College of Software, Nankai University, Tianjin, 300071, China.
| | - Haixin Wang
- Cadre Medical Department, The 1St Clinical Center, Chinese PLA General Hospital, Beijing, 100853, China.
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
26
|
Yang KC, Xu Y, Lin Q, Tang LL, Zhong JW, An HN, Zeng YQ, Jia K, Jin Y, Yu G, Gao F, Zhao L, Tong LS. Explainable deep learning algorithm for identifying cerebral venous sinus thrombosis-related hemorrhage (CVST-ICH) from spontaneous intracerebral hemorrhage using computed tomography. EClinicalMedicine 2025; 81:103128. [PMID: 40093990 PMCID: PMC11909457 DOI: 10.1016/j.eclinm.2025.103128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 03/19/2025] Open
Abstract
Background Misdiagnosis of hemorrhage secondary to cerebral venous sinus thrombosis (CVST-ICH) as arterial-origin spontaneous intracerebral hemorrhage (sICH) can lead to inappropriate treatment and the potential for severe adverse outcomes. The current practice for identifying CVST-ICH involves venography, which, despite being increasingly utilized in many centers, is not typically used as the initial imaging modality for ICH patients. The study aimed to develop an explainable deep learning model to quickly identify ICH caused by CVST based on non-contrast computed tomography (NCCT). Methods The study population included patients diagnosed with CVST-ICH and other spontaneous ICH from January 2016 to March 2023 at the Second Affiliated Hospital of Zhejiang University, Taizhou First People's Hospital, Taizhou Hospital, Quzhou Second People's Hospital, and Longyan First People's Hospital. A transfer learning-based 3D U-Net with segmentation and classification was proposed and developed only on admission plain CT. Model performance was assessed using the area under the curve (AUC), sensitivity, and specificity metrics. For further evaluation, the average diagnostic performance of nine doctors on plain CT was compared with model assistance. Interpretability methods, including Grad-CAM++, SHAP, IG, and occlusion, were employed to understand the model's attention. Findings An internal dataset was constructed using propensity score matching based on age, initially including 102 CVST-ICH patients (median age: 44 [29, 61] years) and 683 sICH patients (median age: 65 [52, 73] years). After matching, 102 CVST-ICH patients and 306 sICH patients (median age: 50 [40, 62] years) were selected. An external dataset consisted of 38 CVST-ICH and 119 sICH patients from four other hospitals. Validation showed AUC 0·94, sensitivity 0·96, and specificity 0·8 for the internal testing subset; AUC 0·85, sensitivity 0·87, and specificity 0·82 for the external dataset, respectively. The discrimination performance of nine doctors interpreting CT images significantly improved with the assistance of the proposed model (accuracy 0·79 vs 0·71, sensitivity 0·88 vs 0·81, specificity 0·75 vs 0·68, p < 0·05). Interpretability methods highlighted the attention of model to the features of hemorrhage edge appearance. Interpretation The present model demonstrated high-performing and robust results on discrimination between CVST-ICH and spontaneous ICH, and aided doctors' diagnosis in clinical practice as well. Prospective validation with larger-sample size is required. Funding The work was funded by the National Key R&D Program of China (2023YFE0118900), National Natural Science Foundation of China (No.81971155 and No.81471168), the Science and Technology Department of Zhejiang Province (LGJ22H180004), Medical and Health Science and Technology Project of Zhejiang Province (No.2022KY174), the 'Pioneer' R&D Program of Zhejiang (No. 2024C03006 and No. 2023C03026) and the MOE Frontier Science Center for Brain Science & Brain-Machine Integration, Zhejiang University.
Collapse
Affiliation(s)
- Kai-Cheng Yang
- Neurology Department, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, Zhejiang Province, China
| | - Yunzhi Xu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering & Instrument Science of Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Qing Lin
- Neurology Department, The First People's Hospital of Taizhou City, Taizhou, Zhejiang Province, China
| | - Li-Li Tang
- Neurology Department, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, Zhejiang Province, China
| | - Jia-wei Zhong
- Neurology Department, Taizhou Hospital of Zhejiang Province Affiliated to Wenzhou Medical University, Linhai, Zhejiang Province, China
| | - Hong-Na An
- Neurology Department, The Second People's Hospital of Quzhou, Quzhou, Zhejiang Province, China
| | - Yan-Qin Zeng
- Neurology Department, Longyan First Hospital Affiliated to Fujian Medical University, Longyan, Fujian Province, China
| | - Ke Jia
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, Hangzhou, Zhejiang Province, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Yujia Jin
- Neurology Department, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, Zhejiang Province, China
| | - Guoshen Yu
- Neurology Department, Haiyan People's Hospital, Jiaxing, Zhejiang Province, China
| | - Feng Gao
- Neurology Department, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, Zhejiang Province, China
| | - Li Zhao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering & Instrument Science of Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Lu-Sha Tong
- Neurology Department, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, Zhejiang Province, China
| |
Collapse
|
27
|
Barfejani AH, Rahimi M, Safdari H, Gholizadeh S, Borzooei S, Roshanaei G, Golparian M, Tarokhian A. Thy-DAMP: deep artificial neural network model for prediction of thyroid cancer mortality. Eur Arch Otorhinolaryngol 2025; 282:1577-1583. [PMID: 39174681 DOI: 10.1007/s00405-024-08918-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
PURPOSE Despite the rising incidence of differentiated thyroid cancer (DTC), mortality rates have remained relatively low yet crucial for effective patient management. This study aims to develop a deep neural network capable of predicting mortality in patients with differentiated thyroid cancer. METHODS Leveraging data from the Surveillance, Epidemiology, and End Results (SEER) database, we developed Thy-DAMP (Deep Artificial Neural Network Model for Prediction of Thyroid Cancer) to forecast mortality in DTC patients. The dataset comprised demographic, histologic, and staging information. Following data normalization and feature encoding, the dataset was partitioned into training, testing, and validation subsets, with model hyperparameters fine-tuned via cross-validation. RESULTS Among the 63,513 patients, the mean age was 48.22 years (SD = 14.8), with 77.32% being female. Papillary carcinoma emerged as the predominant subtype, representing 62.94% of cases. The majority presented with stage I disease (73.96%). Thy-DAMP demonstrated acceptable performance metrics on both the test and validation sets. Sensitivity was 83.24% (95% CI 76.95-88.40%), specificity was 93.53% (95% CI 93.01-94.02%), and accuracy stood at 93.33% (95% CI 92.82-93.83%). The model exhibited a positive predictive value of 19.76% (95% CI 18.20-21.42%) and a negative predictive value of 99.66% (95% CI 99.53-99.75%). Additionally, Thy-DAMP demonstrated a positive likelihood ratio of 12.86 (95% CI 11.62-14.23), a negative likelihood ratio of 0.18 (95% CI 0.13-0.25), and an area under the receiver operating characteristic curve (AUROC) of 0.95. The model was externally validated on a separate dataset with nearly identical performance. CONCLUSION Thy-DAMP showcases considerable promise in accurately predicting mortality in DTC patients, leveraging limited set of patient data.
Collapse
Affiliation(s)
| | - Mohammad Rahimi
- Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Hassan Safdari
- Department of Anesthesiology and Preioperative Medicine, Tufts Medical Center, Boston, USA
| | | | - Shiva Borzooei
- Department of Endocrinology, Faculty of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Ghodratollah Roshanaei
- Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mitra Golparian
- Medical School, Hamadan University of Medical Sciences, Pajoohesh Blvd, Hamadan, Iran
| | - Aidin Tarokhian
- Medical School, Hamadan University of Medical Sciences, Pajoohesh Blvd, Hamadan, Iran.
| |
Collapse
|
28
|
Tano S, Kotani T, Ushida T, Matsuo S, Yoshihara M, Imai K, Kinoshita F, Moriyama Y, Nomoto M, Yoshida S, Yamashita M, Kishigami Y, Oguchi H, Kajiyama H. Visualizing risk modification of hypertensive disorders of pregnancy: development and validation of prediction model for personalized interpregnancy weight management. Hypertens Res 2025; 48:884-893. [PMID: 39663391 PMCID: PMC11879865 DOI: 10.1038/s41440-024-02024-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/08/2024] [Accepted: 11/07/2024] [Indexed: 12/13/2024]
Abstract
The growing recognition of the importance of interpregnancy weight management in reducing hypertensive disorders of pregnancy (HDP) underscores the importance of effective preventive strategies. However, developing effective systems remains a challenge. We aimed to bridge this gap by constructing a prediction model. This study retrospectively analyzed the data of 1746 women who underwent two childbirths across 14 medical facilities, including both tertiary and primary facilities. Data from 2009 to 2019 were used to create a derivation cohort (n = 1746). A separate temporal-validation cohort was constructed by adding data between 2020 and 2024 (n = 365). Furthermore, the external-validation cohort was constructed using the data from another tertiary center between 2017 and 2023 (n = 340). We constructed a prediction model for HDP development in the second pregnancy by applying logistic regression analysis using 5 primary clinical information: maternal age, pre-pregnancy body mass index, and HDP history; and pregnancy interval and weight change velocity between pregnancies. Model performance was assessed across all three cohorts. HDP in the second pregnancy occurred 7.3% in the derivation, 10.1% in the temporal-validation, and 7.9% in the external-validation cohorts. This model demonstrated strong discrimination, with c-statistics of 0.86, 0.88, and 0.86 for the respective cohorts. Precision-recall area under the curve values were 0.90, 0.85, and 0.91, respectively. Calibration showed favorable intercepts (-0.02 to -0.00) and slopes (0.96-1.02) for all cohorts. In conclusion, this externally validated model offers a robust basis for personalized interpregnancy weight management goals for women planning future pregnancies.
Collapse
Affiliation(s)
- Sho Tano
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan.
- Department of Obstetrics, Perinatal Medical Center, TOYOTA Memorial Hospital, Toyota, Aichi, Japan.
| | - Tomomi Kotani
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan.
- Division of Perinatology, Center for Maternal-Neonatal Care, Nagoya University Hospital, Nagoya, Aichi, Japan.
| | - Takafumi Ushida
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Seiko Matsuo
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Masato Yoshihara
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Kenji Imai
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Fumie Kinoshita
- Data Science Division, Data Coordinating Center, Department of Advanced Medicine, Nagoya University Hospital, Nagoya, Aichi, Japan
| | - Yoshinori Moriyama
- Department of Obstetrics and Gynecology, Fujita Health University School of Medicine, Toyoake, Aichi, Japan
| | - Masataka Nomoto
- Department of Obstetrics and Gynecology, Ogaki Municipal Hospital, Ogaki, Gifu, Japan
| | | | | | - Yasuyuki Kishigami
- Department of Obstetrics, Perinatal Medical Center, TOYOTA Memorial Hospital, Toyota, Aichi, Japan
| | - Hidenori Oguchi
- Department of Obstetrics, Perinatal Medical Center, TOYOTA Memorial Hospital, Toyota, Aichi, Japan
| | - Hiroaki Kajiyama
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| |
Collapse
|
29
|
Li D, Xing W, Zhao J, Shi C, Wang F. Multimodal deep learning for predicting in-hospital mortality in heart failure patients using longitudinal chest X-rays and electronic health records. THE INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING 2025; 41:427-440. [PMID: 39786626 DOI: 10.1007/s10554-025-03322-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 01/01/2025] [Indexed: 01/12/2025]
Abstract
Amid an aging global population, heart failure has become a leading cause of hospitalization among older people. Its high prevalence and mortality rates underscore the importance of accurate mortality prediction for swift disease progression assessment and better patient outcomes. The evolution of artificial intelligence (AI) presents new avenues for predicting heart failure mortality. Yet current research has predominantly leveraged structured data and unstructured clinical notes from electronic health records (EHR), underutilizing the prognostic value of chest X-rays (CXRs). This study aims to harness deep learning methodologies to explore the feasibility of enhancing the precision of predicting in-hospital all-cause mortality in heart failure patients using CXRs data. We propose a novel multimodal deep learning network based on the spatially and temporally decoupled Transformer (MN-STDT) for in-hospital mortality prediction in heart failure by integrating longitudinal CXRs and structured EHR data. The MN-STDT captures spatial and temporal information from CXRs through a Hybrid Spatial Encoder and a Distance-Aware Temporal Encoder, ultimately fusing features from both modalities for predictive modeling. Initial pre-training of the spatial encoder was conducted on CheXpert, followed by full model training and evaluation on the MIMIC-IV and MIMIC-CXR datasets for mortality prediction tasks. In a comprehensive view, the MN-STDT demonstrated the best performance, with an AUC-ROC of 0.8620, surpassing all baseline models. Comparative analysis revealed that the AUC-ROC of the multimodal model (0.8620) was significantly higher than that of models using only structured data (0.8166) or chest X-ray data alone (0.7479). This study demonstrates the value of CXRs in the prognosis of heart failure, showing that the combination of longitudinal CXRs with structured EHR data can significantly improve the accuracy of mortality prediction in heart failure. Feature importance analysis based on SHAP provides interpretable decision support, paving the way for potential clinical applications.
Collapse
Affiliation(s)
- Dengao Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China.
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China.
- Intelligent Perception Engineering Technology Center of Shanxi, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China.
| | - Wen Xing
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
| | - Jumin Zhao
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
- Intelligent Perception Engineering Technology Center of Shanxi, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
| | - Changcheng Shi
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, 30 Yingze West Street, Taiyuan, 030024, Shanxi, China
| | - Fei Wang
- Shanxi Cardiovascular Hospital, 18 Yifen Street, Taiyuan, 030024, Shanxi, China
| |
Collapse
|
30
|
Zhang L, Zhang Z, Wang Y, Zhu Y, Wang Z, Wan H. Evaluation of machine learning models for predicting xerostomia in adults with head and neck cancer during proton and heavy ion radiotherapy. Radiother Oncol 2025; 204:110712. [PMID: 39798700 DOI: 10.1016/j.radonc.2025.110712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 01/01/2025] [Accepted: 01/04/2025] [Indexed: 01/15/2025]
Abstract
BACKGROUND AND PURPOSE Few studies have examined the factors associated with xerostomia during proton and carbon ion radiotherapy for head and neck cancer (HNC), which are reported to have fewer toxic effects compared to traditional photon-based radiotherapy. This study aims to evaluate the performance of machine learning approaches in predicting grade 2 + xerostomia in adults with HNC receiving proton and carbon ion radiotherapy. MATERIALS AND METHODS A retrospective study involving 1,769 adults with HNC who completed proton or carbon ion radiotherapy was conducted. Xerostomia was graded using the Radiation Therapy Oncology Group criteria. Eight machine learning models with different combinations sampling methods and class weights were compared to identify the model with the highest balanced accuracy. RESULTS The mean age of patients was 47.8 years (range 18-80), with 33.5 % female. The average total radiation dose was 71.0 GyE (SD = 5.7). Grade 1 xerostomia was recorded in 572 patients (32.3 %) and grade 2 in 103 patients (5.8 %). No cases of grade 3 or higher xerostomia were reported. A support vector machine with a linear kernel, a 1:2 positive-to-negative class weight, and SMOTE oversampling achieved the highest balanced accuracy (0.66) and AUC-ROC (0.69) for predicting grade 2 xerostomia, outperforming the logistic regression model (balanced accuracy:0.50, AUC-ROC. 0.67). CONCLUSION The prevalence of grade 2 radiation-induced xerostomia during proton and carbon ion radiotherapy was low in adults with HNC, posing challenges for accurate prediction. Further research is needed to develop improved methods for predicting xerostomia during proton and carbon ion radiotherapy.
Collapse
Affiliation(s)
- Lijuan Zhang
- Department of Nursing, Shanghai Proton and Heavy Ion Center, Fudan University Cancer Hospital; Shanghai Key Laboratory of Radiation Oncology; and Shanghai Engineering Research Center of Proton and Heavy Ion Radiation Therapy, Shanghai 201315 China
| | - Zhihong Zhang
- Columbia University, New York City, NY 10027, United States
| | - Yiqiao Wang
- Department of Nursing, Shanghai Proton and Heavy Ion Center, Fudan University Cancer Hospital; Shanghai Key Laboratory of Radiation Oncology; and Shanghai Engineering Research Center of Proton and Heavy Ion Radiation Therapy, Shanghai 201315 China
| | - Yu Zhu
- Department of Nursing, Shanghai Proton and Heavy Ion Center, Fudan University Cancer Hospital; Shanghai Key Laboratory of Radiation Oncology; and Shanghai Engineering Research Center of Proton and Heavy Ion Radiation Therapy, Shanghai 201315 China
| | - Ziying Wang
- Department of Nursing, Shanghai Proton and Heavy Ion Center, Fudan University Cancer Hospital; Shanghai Key Laboratory of Radiation Oncology; and Shanghai Engineering Research Center of Proton and Heavy Ion Radiation Therapy, Shanghai 201315 China
| | - Hongwei Wan
- Department of Nursing, Shanghai Proton and Heavy Ion Center, Fudan University Cancer Hospital; Shanghai Key Laboratory of Radiation Oncology; and Shanghai Engineering Research Center of Proton and Heavy Ion Radiation Therapy, Shanghai 201315 China.
| |
Collapse
|
31
|
Lwin TC, Zin TT, Tin P, Kino E, Ikenoue T. Advanced Predictive Analytics for Fetal Heart Rate Variability Using Digital Twin Integration. SENSORS (BASEL, SWITZERLAND) 2025; 25:1469. [PMID: 40096274 PMCID: PMC11902867 DOI: 10.3390/s25051469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 03/19/2025]
Abstract
Fetal heart rate variability (FHRV) is a critical indicator of fetal well-being and autonomic nervous system development during labor. Traditional monitoring methods often provide limited insights, potentially leading to delayed interventions and suboptimal outcomes. This study proposes an advanced predictive analytics approach by integrating approximate entropy analysis with a hidden Markov model (HMM) within a digital twin framework to enhance real-time fetal monitoring. We utilized a dataset of 469 fetal electrocardiogram (ECG) recordings, each exceeding one hour in duration, to ensure sufficient temporal information for reliable modeling. The FHRV data were preprocessed and partitioned into parasympathetic and sympathetic components based on downward and non-downward beat detection. Approximate entropy was calculated to quantify the complexity of FHRV patterns, revealing significant correlations with umbilical cord blood gas parameters, particularly pH levels. The HMM was developed with four hidden states representing discrete pH levels and eight observed states derived from FHRV data. By employing the Baum-Welch and Viterbi algorithms for training and decoding, respectively, the model effectively captured temporal dependencies and provided early predictions of the fetal acid-base status. Experimental results demonstrated that the model achieved 85% training and 79% testing accuracy on the balanced dataset distribution, improving from 78% and 71% on the imbalanced dataset. The integration of this predictive model into a digital twin framework offers significant benefits for timely clinical interventions, potentially improving prenatal outcomes.
Collapse
Affiliation(s)
- Tunn Cho Lwin
- Interdisciplinary Graduate School of Agriculture and Engineering, University of Miyazaki, Miyazaki 889-2192, Japan;
| | - Thi Thi Zin
- Graduate School of Engineering, University of Miyazaki, Miyazaki 889-2192, Japan;
| | - Pyke Tin
- Graduate School of Engineering, University of Miyazaki, Miyazaki 889-2192, Japan;
| | - Emi Kino
- Faculty of Medicine, University of Miyazaki, Miyazaki 889-1692, Japan; (E.K.); (T.I.)
| | - Tsuyomu Ikenoue
- Faculty of Medicine, University of Miyazaki, Miyazaki 889-1692, Japan; (E.K.); (T.I.)
| |
Collapse
|
32
|
Cova TF, Ferreira C, Nunes SCC, Pais AACC. Structural Similarity, Activity, and Toxicity of Mycotoxins: Combining Insights from Unsupervised and Supervised Machine Learning Algorithms. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025. [PMID: 40013497 DOI: 10.1021/acs.jafc.4c08527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
A large number of mycotoxins and related fungal metabolites have not been assessed in terms of their toxicological impacts. Current methodologies often prioritize specific target families, neglecting the complexity and presence of co-occurring compounds. This work addresses a fundamental question: Can we assess molecular similarity and predict the toxicity of mycotoxins in silico using a defined set of molecular descriptors? We propose a rapid nontarget screening approach for multiple classes of mycotoxins, integrating both unsupervised and supervised machine learning models, alongside molecular and physicochemical descriptors to enhance the understanding of structural similarity, activity, and toxicity. Clustering analyses identify natural clusters corresponding to the known mycotoxin families, indicating that mycotoxins belonging to the same cluster share similar molecular properties. However, topological descriptors play a significant role in distinguishing between acutely toxic and nonacutely toxic compounds. Random forest (RF) and neural networks (NN), combined with molecular descriptors, contribute to improved knowledge and predictive capability regarding mycotoxin toxicity profiles. RF allows the prediction of toxicity using data reflecting mainly structural features and performs well in the presence of descriptors reflecting biological activity. NN models prove to be more sensitive to biological activity descriptors than RF. The use of descriptors encompassing structural complexity and diversity, chirality and symmetry, connectivity, atomic charge, and polarizability, together with descriptors representing lipophilicity, absorption, and permeation of molecules, is crucial for predicting toxicity, facilitating broader toxicological evaluations.
Collapse
Affiliation(s)
- Tânia F Cova
- Coimbra Chemistry Centre, Department of Chemistry, Institute of Molecular Sciences (IMS), Faculty of Sciences and Technology, University of Coimbra, R. Larga 2, 3004-535 Coimbra, Portugal
| | - Cláudia Ferreira
- Coimbra Chemistry Centre, Department of Chemistry, Institute of Molecular Sciences (IMS), Faculty of Sciences and Technology, University of Coimbra, R. Larga 2, 3004-535 Coimbra, Portugal
| | - Sandra C C Nunes
- Coimbra Chemistry Centre, Department of Chemistry, Institute of Molecular Sciences (IMS), Faculty of Sciences and Technology, University of Coimbra, R. Larga 2, 3004-535 Coimbra, Portugal
| | - Alberto A C C Pais
- Coimbra Chemistry Centre, Department of Chemistry, Institute of Molecular Sciences (IMS), Faculty of Sciences and Technology, University of Coimbra, R. Larga 2, 3004-535 Coimbra, Portugal
| |
Collapse
|
33
|
Sadegh-Zadeh SA, Sadeghzadeh N, Sedighi B, Rahpeyma E, Nilgounbakht M, Barati MA. Curvature estimation techniques for advancing neurodegenerative disease analysis: a systematic review of machine learning and deep learning approaches. AMERICAN JOURNAL OF NEURODEGENERATIVE DISEASE 2025; 14:1-33. [PMID: 40124352 PMCID: PMC11929037 DOI: 10.62347/dznq2482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Accepted: 02/07/2025] [Indexed: 03/25/2025]
Abstract
Neurodegenerative diseases present complex challenges that demand advanced analytical techniques to decode intricate brain structures and their changes over time. Curvature estimation within datasets has emerged as a critical tool in areas like neuroimaging and pattern recognition, with significant applications in diagnosing and understanding neurodegenerative diseases. This systematic review assesses state-of-the-art curvature estimation methodologies, covering classical mathematical techniques, machine learning, deep learning, and hybrid methods. Analysing 105 research papers from 2010 to 2023, we explore how each approach enhances our understanding of structural variations in neurodegenerative pathology. Our findings highlight a shift from classical methods to machine learning and deep learning, with neural network regression and convolutional neural networks gaining traction due to their precision in handling complex geometries and data-driven modelling. Hybrid methods further demonstrate the potential to merge classical and modern techniques for robust curvature estimation. This comprehensive review aims to equip researchers and clinicians with insights into effective curvature estimation methods, supporting the development of enhanced diagnostic tools and interventions for neurodegenerative diseases.
Collapse
Affiliation(s)
- Seyed-Ali Sadegh-Zadeh
- Department of Computing, School of Digital, Technologies and Arts, Staffordshire UniversityStoke-on-Trent, United Kingdom
| | | | - Bahareh Sedighi
- Department of Mathematics and Computer Science, Amirkabir University of TechnologyTehran, Iran
| | - Elaheh Rahpeyma
- Department of Electrical Engineering, K.N. Toosi University of TechnologyTehran, Iran
| | | | - Mohammad Amin Barati
- School of Mechanical Engineering, College of Engineering, University of TehranTehran, Iran
| |
Collapse
|
34
|
Wang Z, Guo Z, Wang W, Zhang Q, Song S, Xue Y, Zhang Z, Wang J. Prediction of tuberculosis treatment outcomes using biochemical makers with machine learning. BMC Infect Dis 2025; 25:229. [PMID: 39962412 PMCID: PMC11834319 DOI: 10.1186/s12879-025-10609-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 02/06/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Tuberculosis (TB) continues to pose a significant threat to global public health. Enhancing patient prognosis is essential for alleviating the disease burden. OBJECTIVE This study aims to evaluate TB prognosis by incorporating treatment discontinuation into the assessment framework, expanding beyond mortality and drug resistance. METHODS Seven feature selection methods and twelve machine learning algorithms were utilized to analyze admission test data from TB patients, identifying predictive features and building prognostic models. SHapley Additive exPlanations (SHAP) were applied to evaluate feature importance in top-performing models. RESULTS Analysis of 1,086 TB cases showed that a K-Nearest Neighbor classifier with Mutual Information feature selection achieved an area under the receiver operation curve (AUC) of 0.87 (95% CI: 0.83-0.92). Key predictors of treatment failure included elevated levels of 5'-nucleotidase, uric acid, globulin, creatinine, cystatin C, and aspartate transaminase. SHAP analysis highlighted 5'-nucleotidase, uric acid, and globulin as having the most significant influence on predicting treatment discontinuation. CONCLUSION Our model provides valuable insights into TB outcomes based on initial patient tests, potentially guiding prevention and control strategies. Elevated biomarker levels before therapy are associated with increased risk of treatment discontinuation, indicating their potential as early warning indicators.
Collapse
Affiliation(s)
- Zheyue Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, National Vaccine Innovation Platform, Nanjing Medical University, Nanjing, 211166, China
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China
- Department of Epidemiology, Gusu School, Nanjing Medical University, Nanjing, 211166, China
| | - Zhenpeng Guo
- Department of Epidemiology, Center for Global Health, School of Public Health, National Vaccine Innovation Platform, Nanjing Medical University, Nanjing, 211166, China
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China
| | - Weijia Wang
- School of Information and Software, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Qiang Zhang
- Department of Epidemiology, Center for Global Health, School of Public Health, National Vaccine Innovation Platform, Nanjing Medical University, Nanjing, 211166, China
| | - Suya Song
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China
- Department of Pulmonary Diseases, The Third People's Hospital of Changzhou, Changzhou, 213001, China
| | - Yuan Xue
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China
| | - Zhixin Zhang
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China.
- Department of Pulmonary Diseases, The Third People's Hospital of Changzhou, Changzhou, 213001, China.
| | - Jianming Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, National Vaccine Innovation Platform, Nanjing Medical University, Nanjing, 211166, China.
- Changzhou Medical Center, Nanjing Medical University, Changzhou, 213004, China.
- Department of Epidemiology, Gusu School, Nanjing Medical University, Nanjing, 211166, China.
| |
Collapse
|
35
|
Das D, Teixeira ES, Morales JA. Recurrent Neural Network/Machine Learning Predictions of Reactive Channels in H + + C 2H 4 at E Lab = 30 eV: A Prototype of Ion Cancer Therapy Reactions. J Comput Chem 2025; 46:e70033. [PMID: 39936181 DOI: 10.1002/jcc.70033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 12/11/2024] [Accepted: 12/12/2024] [Indexed: 02/13/2025]
Abstract
We present a simplest-level electron nuclear dynamics/machine learning (SLEND/ML) approach to predict chemical properties in ion cancer therapy (ICT) reactions. SLEND is a time-dependent, variational, on-the-fly, and nonadiabatic method. In SLEND, nuclear and electronic parameters determine reactants-to-products trajectories in a quantum phase space; this establishes a mapping between reactants' initial conditions and products' properties. To accelerate simulations, SLEND/ML utilizes a modicum of SLEND trajectories to train ML methods on the aforesaid mapping and employs them to predict chemical properties. We employ SLEND/ML to predict reaction types and products' charges in H+ + C2H4 at ELab = 30 eV, a prototype of ICT reactions involving double-bonded compounds. For reaction predictions, a recurrent neural network (RNN) and k-nearest neighbor method are the best models with 98.23% and 95.13% accuracy. RNN correctly predicts frequent and infrequent reaction types and generalizes over data sets. For charge predictions, the RNN exhibits low mean absolute errors of 0.02-0.07.
Collapse
Affiliation(s)
- Debojyoti Das
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, Texas, USA
| | | | - Jorge A Morales
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, Texas, USA
| |
Collapse
|
36
|
Dai W, Ma Y, Chen J, Chen X, Li S. Tradeoffs Between Richness and Bias of Augmented Data in Long-Tail Recognition. ENTROPY (BASEL, SWITZERLAND) 2025; 27:201. [PMID: 40003198 PMCID: PMC11854239 DOI: 10.3390/e27020201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2024] [Revised: 01/27/2025] [Accepted: 02/06/2025] [Indexed: 02/27/2025]
Abstract
In long-tail scenarios, models have a very high demand for high-quality data. Information augmentation, as an important class of data-centric methods, has been proposed to improve model performance by expanding the richness and quantity of samples in tail classes. However, the underlying mechanisms behind the effectiveness of information augmentation methods remain underexplored. This has led to reliance on empirical and intricate fine-tuning in the use of information augmentation for long-tail recognition tasks. In this work, we simultaneously consider the richness gain and distribution shift introduced by information augmentation methods and propose effective information gain (EIG) to explore the mechanisms behind the effectiveness of these methods. We find that when the value of the effective information gain appropriately balances the richness gain and distribution shift, the performance of information augmentation methods is fully realized. Comprehensive experiments on long-tail benchmark datasets CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT demonstrate that using effective information gain to filter augmented data can further enhance model performance without any modifications to the model's architecture. Therefore, in addition to proposing new model architectures, data-centric approaches also hold significant potential in the field of long-tail recognition.
Collapse
Affiliation(s)
- Wei Dai
- School of Telecommunications Engineering, Xidian University, Xi’an 710071, China; (W.D.); (J.C.)
| | - Yanbiao Ma
- School of Artificial Intelligence, Xidian University, Xi’an 710071, China;
| | - Jiayi Chen
- School of Telecommunications Engineering, Xidian University, Xi’an 710071, China; (W.D.); (J.C.)
| | - Xiaohua Chen
- Department of Automation, Tsinghua University, Beijing 100190, China;
| | - Shuo Li
- School of Artificial Intelligence, Xidian University, Xi’an 710071, China;
| |
Collapse
|
37
|
Park S, Yi Y, Han SS, Kim TH, Kim SJ, Yoon YS, Kim S, Lee HJ, Heo Y. Development of an AI Model for Predicting Methacholine Bronchial Provocation Test Results Using Spirometry. Diagnostics (Basel) 2025; 15:449. [PMID: 40002600 PMCID: PMC11854253 DOI: 10.3390/diagnostics15040449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 01/31/2025] [Accepted: 02/08/2025] [Indexed: 02/27/2025] Open
Abstract
Background/Objectives: The methacholine bronchial provocation test (MBPT) is a diagnostic test frequently used to evaluate airway hyper-reactivity. MBPT is essential for diagnosing asthma; however, it can be time-consuming and resource-intensive. This study aimed to develop an artificial intelligence (AI) model to predict the MBPT results using forced expiratory volume in one second (FEV1) and bronchodilator test measurements from spirometry. Methods: a dataset of spirometry measurements, including Pre- and Post-bronchodilator FEV1, was used to train and validate the model. Results: Among the evaluated models, the multilayer perceptron (MLP) achieved the highest area under the curve (AUC) of 0.701 (95% CI: 0.676-0.725), accuracy of 0.758, and an F1-score of 0.853. Logistic regression (LR) and a support vector machine (SVM) demonstrated comparable performance with AUC values of 0.688, while random forest (RF) and extreme gradient boost (XGBoost) achieved slightly lower AUC values of 0.669 and 0.672, respectively. Feature importance analysis of the MLP model identified key contributing features, including Pre-FEF25-75 (%), Pre-FVC (L), Post FEV1/FVC, Change-FEV1 (L), and Change-FEF25-75 (%), providing insight into the interpretability and clinical applicability of the model. Conclusions: These results highlight the potential of the model to utilize readily available spirometry data, particularly FEV1 and bronchodilator responses, to accurately predict MBPT results. Our findings suggest that AI-based prediction can improve asthma diagnostic workflows by minimizing the reliance on MBPT and enabling faster and more accessible assessments.
Collapse
Affiliation(s)
- SangJee Park
- Biomedical Research Institute, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea;
| | - Yehyeon Yi
- Department of Internal Medicine, Seoul Medical Center, Seoul 02053, Republic of Korea; (Y.Y.); (S.K.)
| | - Seon-Sook Han
- Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.-S.H.); (T.-H.K.)
| | - Tae-Hoon Kim
- Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.-S.H.); (T.-H.K.)
| | - So Jeong Kim
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hwaseong 18450, Republic of Korea;
| | - Young Soon Yoon
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Dongguk University Ilsan Hospital, Goyang 10326, Republic of Korea;
| | - Suhyun Kim
- Department of Internal Medicine, Seoul Medical Center, Seoul 02053, Republic of Korea; (Y.Y.); (S.K.)
| | - Hyo Jin Lee
- Internal Medicine, Seoul National University Seoul Metropolitan Government Boramae Medical Center, Seoul 07061, Republic of Korea;
| | - Yeonjeong Heo
- Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.-S.H.); (T.-H.K.)
| |
Collapse
|
38
|
Hemmatian J, Hajizadeh R, Nazari F. Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE. PLoS One 2025; 20:e0317396. [PMID: 39928607 PMCID: PMC11809912 DOI: 10.1371/journal.pone.0317396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/29/2024] [Indexed: 02/12/2025] Open
Abstract
In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen's kappa, Matthew's correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE's neighbors' number to 5.
Collapse
Affiliation(s)
| | - Rassoul Hajizadeh
- Machine Learning and Deep Learning Laboratory, Faculty of Engineering Modern Technologies, Amol University of Special Modern Technologies, Amol, Iran
| | - Fakhroddin Nazari
- Faculty of Engineering Modern Technologies, Amol University of Special Modern Technologies, Amol, Iran
| |
Collapse
|
39
|
Wu YF, Shu X, Wang S, Xu X, Sun PL. Accurate identification of oxygen desaturation status in COPD by using classifier ensemble. PLoS One 2025; 20:e0318837. [PMID: 39908245 DOI: 10.1371/journal.pone.0318837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 01/23/2025] [Indexed: 02/07/2025] Open
Abstract
The accurate identification of oxygen desaturation (OD) status plays critical role in the clinic diagnosis of chronic obstructive pulmonary disease (COPD), which is a common disease related to the lungs and respiratory tract of the human body. This paper focuses on a specific type of OD status, i.e., exercise-induced oxygen desaturation (EIOD) status in COPD, and try to further improve the performance of EIOD status identification. We propose a new and effective EIOD status identification method by using classifier ensemble strategy. In the proposed method, five different features of each data point from the time series of SpO2 and pulse are extracted and then combined to form the discriminative feature of the corresponding data point; then, multiple base classifiers with different balanced training subsets are trained and then integrated by using AdaBoost Algorithm. The comparative computational results on the 6-min walk test (6MWT) of the recruited participants show that the proposed method achieved the best global performance with AUC (Area Under Curve) value of 0.8532, indicating that the proposed method can be effectively used for the identification of EIOD and could assist the clinic diagnosis of COPD.
Collapse
Affiliation(s)
- Yue-Fang Wu
- Department of Internal Medicine, Nanjing University of Science and Technology Hospital, Nanjing, Jiangsu, China
| | - Xin Shu
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Shiqi Wang
- Department of Respiratory Medicine, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xiaojun Xu
- Department of Respiratory Medicine, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, China
| | - Pei-Li Sun
- Department of Respiratory Medicine, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, China
| |
Collapse
|
40
|
Ghavidel A, Pazos P. Machine learning (ML) techniques to predict breast cancer in imbalanced datasets: a systematic review. J Cancer Surviv 2025; 19:270-294. [PMID: 37749361 DOI: 10.1007/s11764-023-01465-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/09/2023] [Indexed: 09/27/2023]
Abstract
Knowledge discovery in databases (KDD) is crucial in analyzing data to extract valuable insights. In medical outcome prediction, KDD is increasingly applied, particularly in diseases with high incidence, mortality, and costs, like cancer. ML techniques can develop more accurate predictive models for cancer patients' clinical outcomes, aiding informed healthcare decision-making. However, cancer prediction modeling faces challenges because of the unbalanced nature of the datasets, where there is a small minority category of patients with a cancer diagnosis compared to a majority category of cancer-free patients. Imbalanced datasets pose statistical hurdles like bias and overfitting when developing accurate prediction models. This systematic review focuses on breast cancer prediction articles published from 2008 to 2023. The objective is to examine ML methods used in three critical steps of KDD: preprocessing, data mining, and interpretation which address the imbalanced data problem in breast cancer prediction. This work synthesizes prior research in ML methods for breast cancer prediction. The findings help identify effective preprocessing strategies, including balancing and feature selection methods, robust predictive models, and evaluation metrics of those models. The study aims to inform healthcare providers and researchers about effective techniques for accurate breast cancer prediction.
Collapse
Affiliation(s)
- Arman Ghavidel
- Engineering Management and Systems Engineering, Old Dominion University, Norfolk, VA, USA
| | - Pilar Pazos
- Engineering Management and Systems Engineering, Old Dominion University, Norfolk, VA, USA.
| |
Collapse
|
41
|
Powla PP, Fakhri F, Jankowski S, Mansour A, Polley EC. Clinical Prediction Models in Neurocritical Care: An Overview of the Literature and Example Application to Prediction of Hospital Mortality in Traumatic Brain Injury. Neurocrit Care 2025; 42:32-38. [PMID: 39107660 DOI: 10.1007/s12028-024-02083-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 07/26/2024] [Indexed: 02/12/2025]
Abstract
Clinical prediction models serve as valuable instruments for assessing the risk of crucial outcomes and facilitating decision-making in clinical settings. Constructing these models requires nuanced analytical decisions and expertise informed by the current statistical literature. Access and thorough understanding of such literature may be limited for neurocritical care physicians, which may hinder the interpretation of existing predictive models. The present emphasis is on narrowing this knowledge gap by providing neurocritical care specialists with methodological guidance for interpreting predictive models in neurocritical care. Presented are the statistical learning principles integral to constructing a model predicting hospital mortality (nonsurvival during hospitalization) in patients with moderate and severe blunt traumatic brain injury using components of the IMPACT-Core model. Discussion encompasses critical elements such as model flexibility, hyperparameter selection, data imbalance, cross-validation, model assessment (discrimination and calibration), prediction instability, and probability thresholds. The intricate interplay among these components, the data set, and the clincal context of neurocritical care is elaborated. Leveraging this comprehensive exploration of statistical learning can enhance comprehension of articles encompassing model generation, tailored clinical care, and, ultimately, better interpretation and clinical applicability of predictive models.
Collapse
Affiliation(s)
- Plamena P Powla
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA.
| | - Farima Fakhri
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Samantha Jankowski
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Ali Mansour
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Eric C Polley
- Department of Public Health Sciences, University of Chicago Medical Center, Chicago, IL, USA
| |
Collapse
|
42
|
Scroggins JK, Hulchafo II, Harkins S, Scharp D, Moen H, Davoudi A, Cato K, Tadiello M, Topaz M, Barcelona V. Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing. J Am Med Inform Assoc 2025; 32:308-317. [PMID: 39569431 PMCID: PMC11756426 DOI: 10.1093/jamia/ocae290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 10/23/2024] [Accepted: 11/06/2024] [Indexed: 11/22/2024] Open
Abstract
OBJECTIVE To identify stigmatizing language in obstetric clinical notes using natural language processing (NLP). MATERIALS AND METHODS We analyzed electronic health records from birth admissions in the Northeast United States in 2017. We annotated 1771 clinical notes to generate the initial gold standard dataset. Annotators labeled for exemplars of 5 stigmatizing and 1 positive/preferred language categories. We used a semantic similarity-based search approach to expand the initial dataset by adding additional exemplars, composing an enhanced dataset. We employed traditional classifiers (Support Vector Machine, Decision Trees, and Random Forest) and a transformer-based model, ClinicalBERT (Bidirectional Encoder Representations from Transformers) and BERT base. Models were trained and validated on initial and enhanced datasets and were tested on enhanced testing dataset. RESULTS In the initial dataset, we annotated 963 exemplars as stigmatizing or positive/preferred. The most frequently identified category was marginalized language/identities (n = 397, 41%), and the least frequent was questioning patient credibility (n = 51, 5%). After employing a semantic similarity-based search approach, 502 additional exemplars were added, increasing the number of low-frequency categories. All NLP models also showed improved performance, with Decision Trees demonstrating the greatest improvement (21%). ClinicalBERT outperformed other models, with the highest average F1-score of 0.78. DISCUSSION Clinical BERT seems to most effectively capture the nuanced and context-dependent stigmatizing language found in obstetric clinical notes, demonstrating its potential clinical applications for real-time monitoring and alerts to prevent usages of stigmatizing language use and reduce healthcare bias. Future research should explore stigmatizing language in diverse geographic locations and clinical settings to further contribute to high-quality and equitable perinatal care. CONCLUSION ClinicalBERT effectively captures the nuanced stigmatizing language in obstetric clinical notes. Our semantic similarity-based search approach to rapidly extract additional exemplars enhanced the performances while reducing the need for labor-intensive annotation.
Collapse
Affiliation(s)
| | - Ismael I Hulchafo
- School of Nursing, Columbia University, New York, NY 10032, United States
| | - Sarah Harkins
- School of Nursing, Columbia University, New York, NY 10032, United States
| | - Danielle Scharp
- Icahn School of Medicine, Mount Sinai, NY 10029, United States
| | - Hans Moen
- Department of Computer Science, Aalto University, Espoo 02150, Finland
| | | | - Kenrick Cato
- School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Michele Tadiello
- Center for Community-Engaged Health Informatics and Data Science, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, NY 10032, United States
| | - Veronica Barcelona
- School of Nursing, Columbia University, New York, NY 10032, United States
| |
Collapse
|
43
|
Imre A, Balogh B, Mándity I. GraphCPP: The new state-of-the-art method for cell-penetrating peptide prediction via graph neural networks. Br J Pharmacol 2025; 182:495-509. [PMID: 39568115 DOI: 10.1111/bph.17388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 08/07/2024] [Accepted: 10/07/2024] [Indexed: 11/22/2024] Open
Abstract
BACKGROUND AND PURPOSE Cell-penetrating peptides (CPPs) are short amino acid sequences that can penetrate cell membranes and deliver molecules into cells. Several models have been developed for their discovery, yet these models often face challenges in accurately predicting membrane penetration due to the complex nature of peptide-cell interactions. Hence, there is a need for innovative approaches that can enhance predictive performance. EXPERIMENTAL APPROACH In this study, we present the application GraphCPP, a novel graph neural network (GNN) for the prediction of membrane penetration capability of peptides. KEY RESULTS A new comprehensive dataset-dubbed CPP1708-was constructed resulting in the largest reliable database of CPPs to date. Comparative analyses with previous methods, such as MLCPP2, C2Pred, CellPPD and CellPPD-Mod, demonstrated the superior predictive performance of our model. Upon testing against other published methods, GraphCPP performs exceptionally, achieving 0.5787 Matthews correlation coefficient and 0.8459 area under the curve (AUC) values on one dataset. This means a 92.8% and 23.3% improvement in Matthews correlation coefficient and AUC measures respectively compared with the next best model. The capability of the model to effectively learn peptide representations was demonstrated through t-distributed stochastic neighbour embedding plots. Additionally, the uncertainty analysis revealed that GraphCPP maintains high confidence in predictions for peptides shorter than 40 amino acids. The source code is available at https://github.com/attilaimre99/GraphCPP. CONCLUSION AND IMPLICATIONS These findings indicate the potential of GNN-based models to improve CPP penetration prediction and it may contribute towards the development of more efficient drug delivery systems.
Collapse
Affiliation(s)
- Attila Imre
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Health Technology Assessment, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Balázs Balogh
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - István Mándity
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- Artificial Transporters Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
44
|
Kumar P, Moomtaheen F, Malec SA, Yang JJ, Bologa CG, Schneider KA, Zhu Y, Tohen M, Villarreal G, Perkins DJ, Fielstein EM, Davis SE, Matheny ME, Lambert CG. Detecting Opioid Use Disorder in Health Claims Data With Positive Unlabeled Learning. IEEE J Biomed Health Inform 2025; 29:750-757. [PMID: 40030473 PMCID: PMC11971012 DOI: 10.1109/jbhi.2024.3515805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Accurate detection and prevalence estimation of behavioral health conditions, such as opioid use disorder (OUD), are crucial for identifying at-risk individuals, determining treatment needs, monitoring prevention and intervention efforts, and recruiting treatment-naive participants for clinical trials. The availability of extensive health data, combined with advancements in machine learning (ML) frameworks, has enabled researchers to employ various ML techniques to predict or identify OUD within patient health data. Ideally, we could directly estimate the prevalence, or the proportion of a population with a condition over time. However, underdiagnosis and undercoding of conditions in patient health records make it challenging to determine the true prevalence of these conditions and to identify at-risk patients with less severe conditions who are more likely to be missed. Consequently, patients without diagnoses may comprise positive and negative examples for a given condition. Treating all undiagnosed (uncoded) patients as negative when applying ML methods can introduce bias into models, affecting their predictive power. To address this issue, we employed Positive Unlabeled Learning Selected Not At Random (PULSNAR), a Positive and Unlabeled (PU) learning technique, to estimate the probability of a given patient having OUD during a time window and the overall population prevalence of OUD. In a sample of 3,342,044 commercially insured US patients with at least one opioid prescription filled, PULSNAR estimated that 5.08% of patients have a cumulative prevalence of OUD over a 2-5 a observation period, compared to the 1.35% with a recorded OUD diagnosis, with 73.5% of cases not diagnosed/coded. The prevalence estimates provided by PULSNAR are consistent with those reported in other studies.
Collapse
|
45
|
Hu Y, Zhong L, Liu H, Ding W, Wang L, Xing Z, Wan L. Lung CT-based multi-lesion radiomic model to differentiate between nontuberculous mycobacteria and Mycobacterium tuberculosis. Med Phys 2025; 52:1086-1095. [PMID: 39607908 DOI: 10.1002/mp.17537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/28/2024] [Accepted: 11/10/2024] [Indexed: 11/30/2024] Open
Abstract
BACKGROUND Nontuberculous mycobacterial lung disease (NTM-LD) and Mycobacterium tuberculosis lung disease (MTB-LD) are difficult to distinguish based on conventional imaging examinations. In recent years, radiomics has been used to discriminate them. However, existing radiomic methods mainly focus on specific lesion types, and have limitations in handling the presence of multiple lesion types that vary among different patients. PURPOSE We aimed to establish a radiomic model based on multiple lesion types in the patient's CT scans, and analyzed the importance of different lesion types in distinguishing the two diseases. METHODS 120 NTM-LD and 120 MTB-LD patients were retrospectively enrolled in this study and randomly split into the training (168) and testing (72) sets. A total of 1037 radiomic features were extracted separately for each lesion type. The univariate analysis, least absolute shrinkage, and selection operator were used to select the significant radiomic features. The radiomic signature score (Radscore) from each lesion type was estimated and aggregated to construct the multi-lesion feature vector for each patient. A multi-lesion radiomic (MLR) model was then established using the random forest classifier, which can estimate importance coefficients for different lesion types. The performances of the MLR model and single radomic models were investigated by the receiver operating characteristic curve (ROC). The impact of the predicted lesion importance was also evaluated in subjective imaging diagnosis. RESULTS The MLR model achieved an area under the curve (AUC) of 90.2% (95% CI: 86.2% 94.1%) in differentiating NTM-LD and MTB-LD, outperforming the models using specific lesion types following existing radiomic models by 1% to 13%. Among different lesion types, tree-in-bud pattern demonstrated the highest distinguishing value, followed by consolidation, nodules, and lymph node enlargement. Given the estimated lesion importance, two senior radiologists exhibited improved accuracy in diagnosis, with an increased accuracy of 8.33% and 8.34%, respectively. CONCLUSIONS This is the first radiomic study to use multiple lesion types to distinguish NTM-LD and MTB-LD. The developed MLR model performed well in differentiating the two diseases, and the lesion types with high importance exhibited the potential to assist experienced radiologists in clinical decision-making.
Collapse
Affiliation(s)
- Yanlin Hu
- Academy of medical engineering and translational medicine, Tianjin University, Tianjin, People's Republic of China
| | - Lingshan Zhong
- Department of radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, People's Republic of China
| | - Hongying Liu
- Academy of medical engineering and translational medicine, Tianjin University, Tianjin, People's Republic of China
| | - Wenlong Ding
- Department of radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, People's Republic of China
| | - Li Wang
- Department of radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, People's Republic of China
| | - Zhiheng Xing
- Department of radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, People's Republic of China
| | - Liang Wan
- Academy of medical engineering and translational medicine, Tianjin University, Tianjin, People's Republic of China
| |
Collapse
|
46
|
Teimouri H, Ghoreyshi ZS, Kolomeisky AB, George JT. Feature selection enhances peptide binding predictions for TCR-specific interactions. Front Immunol 2025; 15:1510435. [PMID: 39916960 PMCID: PMC11799297 DOI: 10.3389/fimmu.2024.1510435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 12/24/2024] [Indexed: 02/09/2025] Open
Abstract
Introduction T-cell receptors (TCRs) play a critical role in the immune response by recognizing specific ligand peptides presented by major histocompatibility complex (MHC) molecules. Accurate prediction of peptide binding to TCRs is essential for advancing immunotherapy, vaccine design, and understanding mechanisms of autoimmune disorders. Methods This study presents a theoretical approach that explores the impact of feature selection techniques on enhancing the predictive accuracy of peptide binding models tailored for specific TCRs. To evaluate our approach across different TCR systems, we utilized a dataset that includes peptide libraries tested against three distinct murine TCRs. A broad range of physicochemical properties, including amino acid composition, dipeptide composition, and tripeptide features, were integrated into the machine learning-based feature selection framework to identify key properties contributing to binding affinity. Results Our analysis reveals that leveraging optimized feature subsets not only simplifies the model complexity but also enhances predictive performance, enabling more precise identification of TCR peptide interactions. The results of our feature selection method are consistent with findings from hybrid approaches that utilize both sequence and structural data as input as well as experimental data. Discussion Our theoretical approach highlights the role of feature selection in peptide-TCR interactions, providing a quantitative tool for uncovering the molecular mechanisms of the T-cell response and assisting in the design of more advanced targeted therapeutics.
Collapse
Affiliation(s)
- Hamid Teimouri
- Department of Chemistry, Rice University, Houston, TX, United States
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
| | - Zahra S. Ghoreyshi
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
| | - Anatoly B. Kolomeisky
- Department of Chemistry, Rice University, Houston, TX, United States
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, United States
| | - Jason T. George
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
- Department of Hematopoietic Biology and Malignancy, MD Anderson Cancer Center, Houston, TX, United States
- Department of Translational Medical Sciences, Texas A&M Health Science Center, Houston, TX, United States
| |
Collapse
|
47
|
Alshamrani K, Alshamrani HA. An Efficient Dual-Sampling Approach for Chest CT Diagnosis. J Multidiscip Healthc 2025; 18:239-253. [PMID: 39839996 PMCID: PMC11748922 DOI: 10.2147/jmdh.s472170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/12/2024] [Indexed: 01/23/2025] Open
Abstract
Background This paper aimed to enhance the diagnostic process of lung abnormalities in computed tomography (CT) images, particularly in distinguishing cancer cells from normal chest tissue. The rapid and uneven growth of cancer cells, presenting with variable symptoms, necessitates an advanced approach for accurate identification. Objective To develop a dual-sampling network targeting lung infection regions to address the diagnostic challenge. The network was designed to adapt to the uneven distribution of infection areas, which could be predominantly minor or major in different regions. Methods A total of 150 CT images were analyzed using the dual-sampling network. Two sampling approaches were compared: the proposed dual-sampling technique and a uniform sampling method. Results The dual-sampling network demonstrated superior performance in detecting lung abnormalities compared to uniform sampling. The uniform sampling method, the network results: an F1-Score of 94.2%, accuracy of 94.5%, sensitivity of 93.5%, specificity of 95.4%, and an area under the curve (AUC) of 98.4%. However, with the proposed dual-sampling method, the network reached an F1-score of 94.9%, accuracy of 95.2%, specificity of 96.1%, sensitivity of 94.2%, and an AUC of 95.5%. Conclusion This study suggests that the proposed dual-sampling network significantly improves the precision of lung abnormality diagnosis in CT images. This advancement has the potential to aid radiologists in making more accurate diagnoses, ultimately benefiting patient treatment and contributing to better overall population health. The efficiency and effectiveness of the dual-sampling approach in managing the uneven distribution of lung infection areas are key to its success.
Collapse
Affiliation(s)
- Khalaf Alshamrani
- Radiology Sciences Department, College of Medical Sciences, Najran University, Najran, Saudi Arabia
- School of Medicine and Population Health, University of Sheffield, Sheffield, UK
| | - Hassan A Alshamrani
- Radiology Sciences Department, College of Medical Sciences, Najran University, Najran, Saudi Arabia
| |
Collapse
|
48
|
Jia C, Li X, Hu S, Liu G, Fang J, Zhou X, Yan X, Yan B. Advanced Mass-Spectra-Based Machine Learning for Predicting the Toxicity of Traditional Chinese Medicines. Anal Chem 2025; 97:783-792. [PMID: 39704481 DOI: 10.1021/acs.analchem.4c05311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Traditional Chinese medicine (TCM) has been a cornerstone of health care for centuries, valued for its preventive and therapeutic properties. However, recent decades have revealed significant toxicological concerns associated with TCMs due to their complex chemical compositions. Traditional QSAR (quantitative structure-activity relationships) models, which predict toxicity based on chemical structures, face challenges with the intricate nature of TCM compounds. In this study, we effectively resolved this issue by correlating the toxicity of TCMs with advanced analytical descriptors from electron ionization mass spectra (EI-MS) data. The optimal classification model achieved a balanced accuracy of over 0.74. Through interpretable machine learning models, we identified specific toxic components, such as 13-hexyloxacyclotridec-10-en-2-one and loliolide. We applied molecular dynamics (MD) simulations to explore the interactions of identified toxic components with crucial protein targets, using hepatic cytochrome P450 3A4 as an example. This novel approach not only enhances our understanding of the toxicological profiles of TCMs but also maximizes their therapeutic benefits while minimizing adverse effects. More importantly, our findings support the application of analytical descriptor-based machine learning in predicting the toxicity of unknown mixtures in the real environment.
Collapse
Affiliation(s)
- Chen Jia
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Xiaofang Li
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Song Hu
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Guohong Liu
- School of Health, Guangzhou Vocational and Technical University of Science and Technology, Guangzhou 510555, China
| | - Jiansong Fang
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, 12 Jichang Road, Guangzhou 510405, China
| | - Xiaoxia Zhou
- National-Regional Joint Engineering Research Center for Soil Pollution Control and Remediation in South China, Guangdong Key Laboratory of Integrated Agro-Environmental Pollution Control and Management, Institute of Eco-Environmental and Soil Sciences, Guangdong Academy of Sciences, Guangzhou 510650, China
| | - Xiliang Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Bing Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
49
|
Wang H, Liu W, Chen J, Ji S. Transfer Learning with a Graph Attention Network and Weighted Loss Function for Screening of Persistent, Bioaccumulative, Mobile, and Toxic Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:578-590. [PMID: 39680085 DOI: 10.1021/acs.est.4c11085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
In silico methods for screening hazardous chemicals are necessary for sound management. Persistent, bioaccumulative, mobile, and toxic (PBMT) chemicals persist in the environment and have high mobility in aquatic environments, posing risks to human and ecological health. However, lack of experimental data for the vast number of chemicals hinders identification of PBMT chemicals. Through an extensive search of measured chemical mobility data, as well as persistent, bioaccumulative, and toxic (PBT) chemical inventories, this study constructed comprehensive data sets on PBMT chemicals. To address the limited volume of the PBMT chemical data set, a transfer learning (TL) framework based on graph attention network (GAT) architecture was developed to construct models for screening PBMT chemicals, designating the PBT chemical inventories as source domains and the PBMT chemical data set as target domains. A weighted loss (LW) function was proposed and proved to mitigate the negative impact of imbalanced data on the model performance. Results indicate the TL-GAT models outperformed GAT models, along with large coverage of applicability domains and interpretability. The constructed models were employed to identify PBMT chemicals from inventories consisting of about 1 × 106 chemicals. The developed TL-GAT framework with the LW function holds broad applicability across diverse tasks, especially those involving small and imbalanced data sets.
Collapse
Affiliation(s)
- Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Shengshe Ji
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
50
|
Orouji S, Taschereau-Dumouchel V, Cortese A, Odegaard B, Cushing C, Cherkaoui M, Kawato M, Lau H, Peters MAK. Task relevant autoencoding enhances machine learning for human neuroscience. Sci Rep 2025; 15:1365. [PMID: 39779744 PMCID: PMC11711280 DOI: 10.1038/s41598-024-83867-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, and so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior rather than noise or other irrelevant factors. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE) designed to identify behaviorally-relevant target neural patterns. We benchmarked TRACE against a standard autoencoder and other models for two severely truncated machine learning datasets (to match the data typically available in functional magnetic resonance imaging [fMRI] data for an individual subject), then evaluated all models on fMRI data from 59 subjects who observed animals and objects. TRACE outperformed alternative models nearly unilaterally, showing up to 12% increased classification accuracy and up to 56% improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior.
Collapse
Affiliation(s)
- Seyedmehdi Orouji
- Department of Cognitive Sciences, University of California, 2201 Social & Behavioral Sciences Gateway, Irvine, CA, 92697, USA.
| | - Vincent Taschereau-Dumouchel
- Department of Psychiatry and Addictology, Université de Montréal, Montreal, H3C 3J7, Canada
- Centre de Recherche de L'institut Universitaire en Santé Mentale de Montréal, Montréal, Canada
| | - Aurelio Cortese
- ATR Computational Neuroscience Laboratories, Kyoto, 619-0288, Japan
| | - Brian Odegaard
- Department of Psychology, University of Florida, Gainesville, FL, 32603, USA
| | - Cody Cushing
- Department of Psychology, University of California Los Angeles, Los Angeles, 90095, USA
| | - Mouslim Cherkaoui
- Department of Psychology, University of California Los Angeles, Los Angeles, 90095, USA
| | - Mitsuo Kawato
- ATR Computational Neuroscience Laboratories, Kyoto, 619-0288, Japan
| | - Hakwan Lau
- RIKEN Center for Brain Science, Tokyo, Japan
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California, 2201 Social & Behavioral Sciences Gateway, Irvine, CA, 92697, USA.
- Center for the Neurobiology of Learning and Memory, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|