1
|
Ma XH, Chen ZG, Liu S, Liu JM, Tian XS. Wavelength selection method for near-infrared spectroscopy based on the combination of mutual information and genetic algorithm. Talanta 2025; 286:127573. [PMID: 39809072 DOI: 10.1016/j.talanta.2025.127573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 01/04/2025] [Accepted: 01/10/2025] [Indexed: 01/16/2025]
Abstract
Near-infrared (NIR) spectroscopy analysis technology has become a widely utilized analytical tool in various fields due to its convenience and efficiency. However, with the promotion of instrument precision, the spectral dimension can now be expanded to include hundreds of dimensions. This expansion results in time-consuming modeling processes and a decrease in model performance. Hence, it is crucial to carefully choose representative features before constructing models. This paper focuses on the limitations of filter algorithms, which can only sort features and cannot directly determine the best subset of features. A hybrid method of combination of the Max-Relevance Min-Redundancy (mRMR) algorithm and the Genetic Algorithm (GA), as well as filter and wrapper feature selection methods, are combined to select appropriate features automatically. This hybrid algorithm retains the features in each individual that are considered to have a strong correlation and low redundancy by the mRMR algorithms during each iteration of the GA. On the other hand, it deletes the features that are regarded as having little correlation or high redundancy. Through the process of iteration, the feature subset is continuously optimized. We use the proposed hybrid method to select features on two datasets and establish various models to verify our proposed method in this paper. The experimental results indicate the feature selection approach, which combines mRMR with the GA, covers the advantages of both feature selection methods. This approach can select features that show good predictive performance. When compared with other common feature selection methods, such as the Uninformative Variable Elimination algorithm (UVE), Competitive Adaptive Reweighted Sampling algorithm (CARS), Successive Projections Algorithm (SPA), Iteratively Retains Informative Variables (IRIV), and GA, the hybrid algorithm can select a larger number of feature variables that are both representative and informative, additionally, it significantly enhances the predictive performance of the model.
Collapse
Affiliation(s)
- Xiao-Hui Ma
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing, 163319, China
| | - Zheng-Guang Chen
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing, 163319, China.
| | - Shuo Liu
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing, 163319, China
| | - Jin-Ming Liu
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing, 163319, China
| | - Xue-Song Tian
- Daqing Oilfield Shale Oil Exploration and Development Headquarters, Daqing, 163455, China
| |
Collapse
|
2
|
Han J, Guzman JA, Chu ML. Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 383:125478. [PMID: 40286423 DOI: 10.1016/j.jenvman.2025.125478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 03/22/2025] [Accepted: 04/19/2025] [Indexed: 04/29/2025]
Abstract
This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the "very high" class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.
Collapse
Affiliation(s)
- Jeongho Han
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA; Agriculture and Life Sciences Research Institute, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Jorge A Guzman
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Maria L Chu
- Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
3
|
Zeng QB, Peng EL, Zhou Y, Lin QW, Zhong LC, He LP, Zhang NQ, Song JC. Explainable machine learning model for predicting septic shock in critically sepsis patients based on coagulation indexes: A multicenter cohort study. Chin J Traumatol 2025:S1008-1275(25)00032-X. [PMID: 40246624 DOI: 10.1016/j.cjtee.2024.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/19/2024] [Accepted: 08/23/2024] [Indexed: 04/19/2025] Open
Abstract
PURPOSE Septic shock is associated with high mortality and poor outcomes among sepsis patients with coagulopathy. Although traditional statistical methods or machine learning (ML) algorithms have been proposed to predict septic shock, these potential approaches have never been systematically compared. The present work aimed to develop and compare models to predict septic shock among patients with sepsis. METHODS It is a retrospective cohort study based on 484 patients with sepsis who were admitted to our intensive care units between May 2018 and November 2022. Patients from the 908th Hospital of Chinese PLA Logistical Support Force and Nanchang Hongdu Hospital of Traditional Chinese Medicine were respectively allocated to training (n=311) and validation (n=173) sets. All clinical and laboratory data of sepsis patients characterized by comprehensive coagulation indexes were collected. We developed 5 models based on ML algorithms and 1 model based on a traditional statistical method to predict septic shock in the training cohort. The performance of all models was assessed using the area under the receiver operating characteristic curve and calibration plots. Decision curve analysis was used to evaluate the net benefit of the models. The validation set was applied to verify the predictive accuracy of the models. This study also used SHapley Additive exPlanations method to assess variable importance and explain the prediction made by a ML algorithm. RESULTS Among all patients, 37.2% experienced septic shock. The characteristic curves of the 6 models ranged from 0.833 to 0.962 and 0.630 to 0.744 in the training and validation sets, respectively. The model with the best prediction performance was based on the support vector machine (SVM) algorithm, which was constructed by age, tissue plasminogen activator-inhibitor complex, prothrombin time, international normalized ratio, white blood cells, and platelet counts. The SVM model showed good calibration and discrimination and a greater net benefit in decision curve analysis. CONCLUSION The SVM algorithm may be superior to other ML and traditional statistical algorithms for predicting septic shock. Physicians can better understand the reliability of the predictive model by SHapley Additive exPlanations value analysis.
Collapse
Affiliation(s)
- Qing-Bo Zeng
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Intensive Care Unit, Nanchang Hongdu Hospital of Traditional Chinese Medicine, Nanchang, 330038, China
| | - En-Lan Peng
- Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China
| | - Ye Zhou
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China
| | - Qing-Wei Lin
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China
| | - Lin-Cui Zhong
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China
| | - Long-Ping He
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China
| | - Nian-Qing Zhang
- Intensive Care Unit, Nanchang Hongdu Hospital of Traditional Chinese Medicine, Nanchang, 330038, China
| | - Jing-Chun Song
- Intensive Care Unit, The 908th Hospital of Chinese PLA Logistic Support Force, Nanchang, 330002, China; Department of Critical Care Medicine, Changcheng Hospital Affiliated to Nanchang University, Nanchang, 330002, China.
| |
Collapse
|
4
|
Qirtas MM, Zafeiridi E, White EB, Pesch D. Unmasking Nuances Affecting Loneliness: Using Digital Behavioural Markers to Understand Social and Emotional Loneliness in College Students. SENSORS (BASEL, SWITZERLAND) 2025; 25:1903. [PMID: 40293076 PMCID: PMC11945615 DOI: 10.3390/s25061903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 02/23/2025] [Accepted: 03/13/2025] [Indexed: 04/30/2025]
Abstract
Loneliness is a global issue which is particularly prevalent among college students, where it poses risks to mental health and academic success. Chronic loneliness can manifest in two primary forms: social loneliness, which is defined by a lack of belonging or a social network, and emotional loneliness, which comes from the absence of deep, meaningful connections. Differentiating between these forms is crucial for designing personalized and targeted interventions. Passive sensing technology offers a promising, unobtrusive approach to detecting loneliness by using behavioural data collected from smartphones and wearables. This study investigates behavioural patterns associated with social and emotional loneliness using passively sensed data from a student population. Our objectives were to (1) identify behavioural patterns linked to social and emotional loneliness, (2) evaluate the predictive power of these patterns for classifying loneliness types, and (3) determine the most significant digital markers used by machine learning models in loneliness prediction. Using statistical analysis, machine learning, and SHAP-based feature importance methods, we identified significant differences in behaviours between socially and Emotionally Lonely students. Specifically, there were distinct differences in phone use and location-based features. Our machine learning analysis shows a strong ability to classify types of loneliness accurately. The XGBoost model achieved the highest accuracy (78.48%) in predicting loneliness. Feature importance analysis found the critical role of phone usage and location-based features in distinguishing between social and emotional loneliness.
Collapse
Affiliation(s)
- Malik Muhammad Qirtas
- School of Computer Science and Information Technology, University College Cork, T12 K8AF Cork, Ireland; (E.Z.); (D.P.)
| | - Evi Zafeiridi
- School of Computer Science and Information Technology, University College Cork, T12 K8AF Cork, Ireland; (E.Z.); (D.P.)
| | - Eleanor Bantry White
- School of Applied Social Studies, University College Cork, T12 K8AF Cork, Ireland;
| | - Dirk Pesch
- School of Computer Science and Information Technology, University College Cork, T12 K8AF Cork, Ireland; (E.Z.); (D.P.)
| |
Collapse
|
5
|
Mohanasundaram V, Rangaswamy B. Elastic net with Bayesian Density Estimation model for feature selection for photovoltaic energy prediction. Sci Rep 2025; 15:8736. [PMID: 40082495 PMCID: PMC11906738 DOI: 10.1038/s41598-025-92633-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 03/03/2025] [Indexed: 03/16/2025] Open
Abstract
Accurate forecasting of photovoltaic (PV) generated electricity is essential for efficiently managing and integrating Renewable Energy (RE) into electricity distribution systems. This research investigation optimizes Feature Selection (FS) and prediction results for PV energy prediction by applying Bayesian Density Estimation (BDE) with Elastic Net (ELNET) regression analysis. This phenomenon and unacceptable outcomes are prevalent when applying conventional regression algorithms on datasets with significant results and addressing predictor multicollinearity. Improved FS and multicollinearity control has been rendered feasible by ELNET, which integrates the best features of Ridge and Lasso regression. ELNET eliminates these challenges through the implementation of L1 and L2 penalties. Non-parametric prediction Bayesian Density Estimation (BDE) is comprehensive data regarding residual distributions and predictor impacts. By incorporating ELNET's regularisation and FS abilities with BDE's statistical prediction and adaptability, the recommended ELNET-BDE is proposed to attain more accurate and reliable predictions. This technique has been used to assess massive data sets developing from Visakhapatnam, India, incorporating historical PV energy generation combined with definite Meteorological Factors (MF). Considering comprehensive data preliminary processing, FS, and validation, ELNET-BDE outperforms existing methods. Research investigations demonstrate that the ELNET-BDE model attains significantly lower Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) than contesting Machine Learning (ML) algorithms like Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machines (GBM). Compared with distinct FS techniques, RMSE can be minimized by up to 15% and MAE by up to 20%. The findings specify a substantial improvement in accuracy in prediction, emphasizing how the model can be used for improving solar power grid integration and energy for improved RE management.
Collapse
Affiliation(s)
- Venkatachalam Mohanasundaram
- Department of Electrical and Electronics Engineering, Kongu Engineering College, Tamil Nadu, Perundurai, 638060, India.
| | - Balamurugan Rangaswamy
- Department of Electrical and Electronics Engineering, K.S.Rangasamy College of Technology, Tamil Nadu, Tiruchengode, 637215, India
| |
Collapse
|
6
|
Nejadshamsi S, Karami V, Ghourchian N, Armanfard N, Bergman H, Grad R, Wilchesky M, Khanassov V, Vedel I, Abbasgholizadeh Rahimi S. Development and Feasibility Study of HOPE Model for Prediction of Depression Among Older Adults Using Wi-Fi-based Motion Sensor Data: Machine Learning Study. JMIR Aging 2025; 8:e67715. [PMID: 40053734 PMCID: PMC11914842 DOI: 10.2196/67715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 03/09/2025] Open
Abstract
BACKGROUND Depression, characterized by persistent sadness and loss of interest in daily activities, greatly reduces quality of life. Early detection is vital for effective treatment and intervention. While many studies use wearable devices to classify depression based on physical activity, these often rely on intrusive methods. Additionally, most depression classification studies involve large participant groups and use single-stage classifiers without explainability. OBJECTIVE This study aims to assess the feasibility of classifying depression using nonintrusive Wi-Fi-based motion sensor data using a novel machine learning model on a limited number of participants. We also conduct an explainability analysis to interpret the model's predictions and identify key features associated with depression classification. METHODS In this study, we recruited adults aged 65 years and older through web-based and in-person methods, supported by a McGill University health care facility directory. Participants provided consent, and we collected 6 months of activity and sleep data via nonintrusive Wi-Fi-based sensors, along with Edmonton Frailty Scale and Geriatric Depression Scale data. For depression classification, we proposed a HOPE (Home-Based Older Adults' Depression Prediction) machine learning model with feature selection, dimensionality reduction, and classification stages, evaluating various model combinations using accuracy, sensitivity, precision, and F1-score. Shapely addictive explanations and local interpretable model-agnostic explanations were used to explain the model's predictions. RESULTS A total of 6 participants were enrolled in this study; however, 2 participants withdrew later due to internet connectivity issues. Among the 4 remaining participants, 3 participants were classified as not having depression, while 1 participant was identified as having depression. The most accurate classification model, which combined sequential forward selection for feature selection, principal component analysis for dimensionality reduction, and a decision tree for classification, achieved an accuracy of 87.5%, sensitivity of 90%, and precision of 88.3%, effectively distinguishing individuals with and those without depression. The explainability analysis revealed that the most influential features in depression classification, in order of importance, were "average sleep duration," "total number of sleep interruptions," "percentage of nights with sleep interruptions," "average duration of sleep interruptions," and "Edmonton Frailty Scale." CONCLUSIONS The findings from this preliminary study demonstrate the feasibility of using Wi-Fi-based motion sensors for depression classification and highlight the effectiveness of our proposed HOPE machine learning model, even with a small sample size. These results suggest the potential for further research with a larger cohort for more comprehensive validation. Additionally, the nonintrusive data collection method and model architecture proposed in this study offer promising applications in remote health monitoring, particularly for older adults who may face challenges in using wearable devices. Furthermore, the importance of sleep patterns identified in our explainability analysis aligns with findings from previous research, emphasizing the need for more in-depth studies on the role of sleep in mental health, as suggested in the explainable machine learning study.
Collapse
Affiliation(s)
- Shayan Nejadshamsi
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Vania Karami
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | | | - Narges Armanfard
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
- Department of Electrical and Computer Engineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
| | - Howard Bergman
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Roland Grad
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Machelle Wilchesky
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
- Donald Berman Maimonides Centre for Research in Aging, Montreal, QC, Canada
| | - Vladimir Khanassov
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Isabelle Vedel
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Samira Abbasgholizadeh Rahimi
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
- Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
- Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, Canada
| |
Collapse
|
7
|
Sarridis I, Koutlis C, Papadopoulos S, Diou C. FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1148-1160. [PMID: 39466859 DOI: 10.1109/tpami.2024.3487254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-unaware approaches, i.e., approaches operating without such labels, exhibit considerably lower performance. To overcome these limitations, this work introduces FLAC, a methodology that minimizes mutual information between the features extracted by the model and a protected attribute, without the use of attribute labels. To do that, FLAC proposes a sampling strategy that highlights underrepresented samples in the dataset, and casts the problem of learning fair representations as a probability matching problem that leverages representations extracted by a bias-capturing classifier. It is theoretically shown that FLAC can indeed lead to fair representations, that are independent of the protected attributes. FLAC surpasses the current state-of-the-art on Biased-MNIST, CelebA, and UTKFace, by 29.1%, 18.1%, and 21.9%, respectively. Additionally, FLAC exhibits 2.2% increased accuracy on ImageNet-A and up to 4.2% increased accuracy on Corrupted-Cifar10. Finally, in most experiments, FLAC even outperforms the bias label-aware state-of-the-art methods.
Collapse
|
8
|
Huang L, Liu P, Huang X. InterDIA: Interpretable prediction of drug-induced autoimmunity through ensemble machine learning approaches. Toxicology 2025; 511:154064. [PMID: 39870155 DOI: 10.1016/j.tox.2025.154064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/15/2025] [Accepted: 01/23/2025] [Indexed: 01/29/2025]
Abstract
Drug-induced autoimmunity (DIA) is a non-IgE immune-related adverse drug reaction that poses substantial challenges in predictive toxicology due to its idiosyncratic nature, complex pathogenesis, and diverse clinical manifestations. To address these challenges, we developed InterDIA, an interpretable machine learning framework for predicting DIA toxicity based on molecular physicochemical properties. Multi-strategy feature selection and advanced ensemble resampling approaches were integrated to enhance prediction accuracy and overcome data imbalance. The optimized Easy Ensemble Classifier achieved robust performance in both 10-fold cross-validation (AUC value of 0.8836 and accuracy of 82.81 %) and external validation (AUC value of 0.8930 and accuracy of 85.00 %). Paired case studies of hydralazine/phthalazine and procainamide/N-acetylprocainamide demonstrated the model's capacity to discriminate between structurally similar compounds with distinct immunogenic potentials. Mechanistic interpretation through SHAP (SHapley Additive exPlanations) analysis revealed critical physicochemical determinants of DIA, including molecular lipophilicity, partial charge distribution, electronic states, polarizability, and topological features. These molecular signatures were mechanistically linked to key processes in DIA pathogenesis, such as membrane permeability and tissue distribution, metabolic bioactivation susceptibility, immune protein recognition and binding specificity. SHAP dependence plots analysis identified specific threshold values for key molecular features, providing novel insights into structure-toxicity relationships in DIA. To facilitate practical application, we developed an open-access web platform enabling batch prediction with real-time visualization of molecular feature contributions through SHAP waterfall plots. This integrated framework not only advances our mechanistic understanding of DIA pathogenesis from a molecular perspective but also provides a valuable tool for early assessment of autoimmune toxicity risk during drug development.
Collapse
Affiliation(s)
- Lina Huang
- Department of Clinical Pharmacy, Jieyang People's Hospital 522000, China
| | - Peineng Liu
- Department of Clinical Pharmacy, Jieyang People's Hospital 522000, China
| | - Xiaojie Huang
- Department of Clinical Pharmacy, Jieyang People's Hospital 522000, China.
| |
Collapse
|
9
|
Ji S, Wu J, An F, Lou M, Zhang T, Guo J, Wu P, Zhu Y, Wu R. Umami-gcForest: Construction of a predictive model for umami peptides based on deep forest. Food Chem 2025; 464:141826. [PMID: 39522377 DOI: 10.1016/j.foodchem.2024.141826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/07/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024]
Abstract
Umami peptides have recently gained attention for their ability to enhance umami flavor, reduce salt content, and provide nutritional benefits. However, traditional wet laboratory methods to identify them are time-consuming, laborious, and costly. Therefore, we developed the Umami-gcForest model using the deep forest algorithm. It constructs amino acid feature matrices using ProtBERT, amino acid composition, composition-transition-distribution, and pseudo amino acid composition, applying mutual information for feature selection to optimize dimensions. Compared to other machine learning baseline, umami peptide prediction, and composite models, the validation results of Umami-gcForest on different test sets demonstrated outstanding predictive accuracy. Using SHapley Additive exPlanations to calculate feature contributions, we found that the key features of Umami-gcForest were hydrophobicity, charge, and polarity. Based on this, an online platform was developed to facilitate its user application. In conclusion, Umami-gcForest serves as a powerful tool, providing a solid foundation for the efficient and accurate screening of umami peptides.
Collapse
Affiliation(s)
- Shuaiqi Ji
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Junrui Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Feiyu An
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Mengxue Lou
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Taowei Zhang
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Jiawei Guo
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Penggong Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Yi Zhu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Rina Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China.
| |
Collapse
|
10
|
Zhao M, Cai M, Lei F, Yuan X, Liu Q, Fang Y, Zhu B. AI-driven feature selection and epigenetic pattern analysis: A screening strategy of CpGs validated by pyrosequencing for body fluid identification. Forensic Sci Int 2025; 367:112339. [PMID: 39729807 DOI: 10.1016/j.forsciint.2024.112339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 12/01/2024] [Accepted: 12/06/2024] [Indexed: 12/29/2024]
Abstract
Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.
Collapse
Affiliation(s)
- Ming Zhao
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Meiming Cai
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Fanzhang Lei
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Xi Yuan
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Qinglin Liu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Yating Fang
- School of Basic Medical Science, Anhui Medical University, Hefei 230031, China.
| | - Bofeng Zhu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
| |
Collapse
|
11
|
Gao Y, Cheng L. Enhanced Polar Lights Optimization with Cryptobiosis and Differential Evolution for Global Optimization and Feature Selection. Biomimetics (Basel) 2025; 10:53. [PMID: 39851769 PMCID: PMC11761853 DOI: 10.3390/biomimetics10010053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 01/13/2025] [Accepted: 01/13/2025] [Indexed: 01/26/2025] Open
Abstract
Optimization algorithms play a crucial role in solving complex problems across various fields, including global optimization and feature selection (FS). This paper presents the enhanced polar lights optimization with cryptobiosis and differential evolution (CPLODE), a novel improvement upon the original polar lights optimization (PLO) algorithm. CPLODE integrates a cryptobiosis mechanism and differential evolution (DE) operators to enhance PLO's search capabilities. The original PLO's particle collision strategy is replaced with DE's mutation and crossover operators, enabling a more effective global exploration and using a dynamic crossover rate to improve convergence. Furthermore, a cryptobiosis mechanism records and reuses historically successful solutions, thereby improving the greedy selection process. The experimental results on 29 CEC 2017 benchmark functions demonstrate CPLODE's superior performance compared to eight classical optimization algorithms, with higher average ranks and faster convergence. Moreover, CPLODE achieved competitive results in feature selection on ten real-world datasets, outperforming several well-known binary metaheuristic algorithms in classification accuracy and feature reduction. These results highlight CPLODE's effectiveness for both global optimization and feature selection.
Collapse
Affiliation(s)
| | - Liang Cheng
- School of Petroleum Engineering, Yangtze University, Wuhan 430100, China;
| |
Collapse
|
12
|
Mu G, Li J, Liu Z, Dai J, Qu J, Li X. MSBKA: A Multi-Strategy Improved Black-Winged Kite Algorithm for Feature Selection of Natural Disaster Tweets Classification. Biomimetics (Basel) 2025; 10:41. [PMID: 39851757 PMCID: PMC11763058 DOI: 10.3390/biomimetics10010041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 01/03/2025] [Accepted: 01/08/2025] [Indexed: 01/26/2025] Open
Abstract
With the advancement of the Internet, social media platforms have gradually become powerful in spreading crisis-related content. Identifying informative tweets associated with natural disasters is beneficial for the rescue operation. When faced with massive text data, choosing the pivotal features, reducing the calculation expense, and increasing the model classification performance is a significant challenge. Therefore, this study proposes a multi-strategy improved black-winged kite algorithm (MSBKA) for feature selection of natural disaster tweets classification based on the wrapper method's principle. Firstly, BKA is improved by utilizing the enhanced Circle mapping, integrating the hierarchical reverse learning, and introducing the Nelder-Mead method. Then, MSBKA is combined with the excellent classifier SVM (RBF kernel function) to construct a hybrid model. Finally, the MSBKA-SVM model performs feature selection and tweet classification tasks. The empirical analysis of the data from four natural disasters shows that the proposed model has achieved an accuracy of 0.8822. Compared with GA, PSO, SSA, and BKA, the accuracy is increased by 4.34%, 2.13%, 2.94%, and 6.35%, respectively. This research proves that the MSBKA-SVM model can play a supporting role in reducing disaster risk.
Collapse
Affiliation(s)
- Guangyu Mu
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, China; (G.M.); (J.L.)
- Key Laboratory of Financial Technology of Jilin Province, Changchun 130117, China
| | - Jiaxue Li
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, China; (G.M.); (J.L.)
| | - Zhanhui Liu
- Changchun Community Official Staff College of Jilin Province, Changchun 130052, China
| | - Jiaxiu Dai
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, China; (G.M.); (J.L.)
| | - Jiayi Qu
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, China; (G.M.); (J.L.)
| | - Xiurong Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
13
|
Venkatesan A, Basak J, Bahadur RP. pmiRScan: a LightGBM based method for prediction of animal pre-miRNAs. Funct Integr Genomics 2025; 25:9. [PMID: 39786653 DOI: 10.1007/s10142-025-01527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 12/03/2024] [Accepted: 01/01/2025] [Indexed: 01/12/2025]
Abstract
MicroRNAs (miRNA) are categorized as short endogenous non-coding RNAs, which have a significant role in post-transcriptional gene regulation. Identifying new animal precursor miRNA (pre-miRNA) and miRNA is crucial to understand the role of miRNAs in various biological processes including the development of diseases. The present study focuses on the development of a Light Gradient Boost (LGB) based method for the classification of animal pre-miRNAs using various sequence and secondary structural features. In various pre-miRNA families, distinct k-mer repeat signatures with a length of three nucleotides have been identified. Out of nine different classifiers that have been trained and tested in the present study, LGB has an overall better performance with an AUROC of 0.959. In comparison with the existing methods, our method 'pmiRScan' has an overall better performance with accuracy of 0.93, sensitivity of 0.86, specificity of 0.95 and F-score of 0.82. Moreover, pmiRScan effectively classifies pre-miRNAs from four distinct taxonomic groups: mammals, nematodes, molluscs and arthropods. We have used our classifier to predict genome-wide pre-miRNAs in human. We find a total of 313 pre-miRNA candidates using pmiRScan. A total of 180 potential mature miRNAs belonging to 60 distinct miRNA families are extracted from predicted pre-miRNAs; of which 128 were novel and are note reported in miRBase. These discoveries may enhance our current understanding of miRNAs and their targets in human. pmiRScan is freely available at http://www.csb.iitkgp.ac.in/applications/pmiRScan/index.php .
Collapse
Affiliation(s)
- Amrit Venkatesan
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
| | - Jolly Basak
- Genomics of Plant Stress Biology Lab, Department of Biotechnology, Visva-Bharati, Santiniketan, West Bengal, 731235, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
- Bioinformatics Centre, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
| |
Collapse
|
14
|
Huang X, Chen J, Liu P. Assessing chemical exposure risk in breastfeeding infants: An explainable machine learning model for human milk transfer prediction. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2025; 289:117707. [PMID: 39799920 DOI: 10.1016/j.ecoenv.2025.117707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 12/28/2024] [Accepted: 01/07/2025] [Indexed: 01/15/2025]
Abstract
Breast milk is essential for infant health, but the transfer of xenobiotic chemicals poses significant risks. Ethical challenges in clinical trials necessitate the use of in vitro predictive models to assess chemical exposure risks in breastfeeding infants. This study introduces an explainable machine learning model to predict the risk of chemical transfer through human milk. Our novel framework integrates ensemble resampling methods with advanced feature selection techniques, addressing data imbalance and enhancing predictive accuracy. The balanced random forest classifier, optimized using the genetic algorithm for feature selection, achieved an area under the receiver operating characteristic curve (AUC) of 0.8708 and an accuracy of 82.67 % on the internal test set, with an accuracy of 86.36 % on the external validation set. The integration of the SHapley Additive exPlanations approach provided deeper insights by revealing how specific chemical properties influence the transfer of high-risk compounds into breast milk. This enhanced interpretability offers a clearer understanding of the associated risks and informs strategies for their mitigation. Structural alert analysis further identified molecular fragments linked to high-risk chemicals, enabling targeted risk assessments. Additionally, the model was applied to evaluate the transfer risks of FDA-approved drugs from 2019 to 2024, identifying several with high transfer probabilities. To broaden its application, we developed an online prediction tool that offers real-time risk assessments, providing an accessible resource for healthcare professionals and researchers. These contributions present a robust, ethically sound tool for assessing chemical exposure risks in breastfeeding infants, supporting informed decisions on drug use and environmental contaminant exposure.
Collapse
Affiliation(s)
- Xiaojie Huang
- Department of Pharmacy, Jieyang People's Hospital, Jieyang, China.
| | - Jiajia Chen
- Department of Pharmacy, Jieyang People's Hospital, Jieyang, China
| | - Peineng Liu
- Department of Pharmacy, Jieyang People's Hospital, Jieyang, China
| |
Collapse
|
15
|
Liu P, Yuan H, Ning Y, Chakraborty B, Liu N, Peres MA. A modified and weighted Gower distance-based clustering analysis for mixed type data: a simulation and empirical analyses. BMC Med Res Methodol 2024; 24:305. [PMID: 39696017 DOI: 10.1186/s12874-024-02427-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 11/26/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Traditional clustering techniques are typically restricted to either continuous or categorical variables. However, most real-world clinical data are mixed type. This study aims to introduce a clustering technique specifically designed for datasets containing both continuous and categorical variables to offer better clustering compatibility, adaptability, and interpretability than other mixed type techniques. METHODS This paper proposed a modified Gower distance incorporating feature importance as weights to maintain equal contributions between continuous and categorical features. The algorithm (DAFI) was evaluated using five simulated datasets with varying proportions of important features and real-world datasets from the 2011-2014 National Health and Nutrition Examination Survey (NHANES). Effectiveness was demonstrated through comparisons with 13 clustering techniques. Clustering performance was assessed using the adjusted Rand index (ARI) for accuracy in simulation studies and the silhouette score for cohesion and separation in NHANES. Additionally, multivariable logistic regression estimated the association between periodontitis (PD) and cardiovascular diseases (CVDs), adjusting for clusters in NHANES. RESULTS In simulation studies, the DAFI-Gower algorithm consistently performs better than baseline methods according to the adjusted Rand index in settings investigated, especially on datasets with more redundant features. In NHANES, 3,760 people were analyzed. DAFI-Gower achieves the highest silhouette score (0.79). Four distinct clusters with diverse health profiles were identified. By incorporating feature importance, we found that cluster formations were more strongly influenced by CVD-related factors. The association between periodontitis and cardiovascular diseases, after adjusting for clusters, reveals significant insights (adjusted OR 1.95, 95% CI 1.50 to 2.55, p = 0.012), highlighting severe periodontitis as a potential risk factor for cardiovascular diseases. CONCLUSIONS DAFI performed better than classic clustering baselines on both simulated and real-world datasets. It effectively captures cluster characteristics by considering feature importance, which is crucial in clinical settings where many variables may be similar or irrelevant. We envisage that DAFI offers an effective solution for mixed type clustering.
Collapse
Affiliation(s)
- Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Marco Aurélio Peres
- Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore, Singapore
| |
Collapse
|
16
|
Kasińska J, Malinowski P, Matusiewicz P, Makieła W, Barwicki L, Bolibruchova D. A Novel Objective Method for Steel Degradation Rate Evaluation. MATERIALS (BASEL, SWITZERLAND) 2024; 17:6074. [PMID: 39769673 PMCID: PMC11678804 DOI: 10.3390/ma17246074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/01/2024] [Accepted: 12/07/2024] [Indexed: 01/11/2025]
Abstract
This article introduces a novel approach for assessing microstructure, particularly its degradation after extended operation. The authors focus on creep processes in power plant components, highlighting the importance of diagnostics in this field. This article emphasizes the value of combining traditional microstructure observation techniques with image analysis. A non-destructive method of evaluating microstructure parameters (matrix replicas) is presented, and its accuracy is evaluated against the conventional destructive method. The assessment utilizes quantitative data derived from classical stereological principles and image analysis. Parameters like mean chord length, relative surface area, mean cross-sectional area, and mean equivalent diameter are compared for replica and metallographic specimens. The results show that the replica method accurately reproduces the microstructure. In their conclusions, the authors highlight the importance of developing visual methods alongside the application of artificial intelligence while indicating the challenges in achieving this goal.
Collapse
Affiliation(s)
- Justyna Kasińska
- Faculty of Mechatronics and Mechanical Engineering, Department of Metal Science and Materials Technology, Kielce University of Technology, Al. Tysiąclecia Państwa Polskiego 7, 25-314 Kielce, Poland
| | - Paweł Malinowski
- Faculty of Foundry Engineering, Department of Foundry Processes Engineering, AGH University of Krakow, Al. Mickiewicza 30, 30-059 Krakow, Poland;
| | - Piotr Matusiewicz
- Faculty of Metals Engineering and Industrial Computer Science, Department of Physical and Powder Metallurgy, AGH University of Krakow, Al. Mickiewicza 30, 30-059 Krakow, Poland;
| | - Włodzimierz Makieła
- Faculty of Mechatronics and Mechanical Engineering, Department of Manufacturing Engineering and Metrology, Kielce University of Technology, Al. Tysiąclecia Państwa Polskiego 7, 25-314 Kielce, Poland;
| | - Leopold Barwicki
- ENREM-POŁANIEC Sp. z o.o., Tursko Małe 107, 28-230 Połaniec, Poland;
| | - Dana Bolibruchova
- Department of Technological Engineering, Faculty of Mechanical Engineering, University of Zilina, Univerzitná 8215/1, 010 26 Zilina, Slovakia;
| |
Collapse
|
17
|
Morís DI, de Moura J, Marcos PJ, Míguez Rey E, Novo J, Ortega M. Efficient clinical decision-making process via AI-based multimodal data fusion: A COVID-19 case study. Heliyon 2024; 10:e38642. [PMID: 39640748 PMCID: PMC11619951 DOI: 10.1016/j.heliyon.2024.e38642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/26/2024] [Indexed: 12/07/2024] Open
Abstract
COVID-19 is an infectious disease that caused a global pandemic in 2020. In the critical moments of this healthcare emergencies, the medical staff needs to take important decisions in a context of limited resources that must be carefully managed. To this end, the computer-aided diagnosis methods are extremely powerful and help them to better recognize the evidences of high-risk patients. This can be done with the support of relevant information extracted from electronic health records, lab tests and imaging studies. In this work, we present a novel fully-automatic efficient method to help the clinical decision-making process in the context of COVID-19 risk estimation, using multimodal data fusion of clinical features and deep features extracted from chest X-ray images. The risk estimation is studied in two of the most relevant and critical encountered scenarios: the risk of hospitalization and mortality. This study shows which are the most important features for each scenario, the ratio of clinical and imaging features present in the top ranking and the performance of the used machine learning models. The results demonstrate a great performance by the classifiers, estimating the risk of hospitalization with an AUC-ROC of 0.8452 ± 0.0133 and the risk of death with an AUC-ROC of 0.8285 ± 0.0210, only using a subset of the original features, and highlight the significant contribution of imaging features to hospitalization risk assessment, while clinical features become more crucial for mortality risk evaluation. Furthermore, multimodal data fusion can outperform the approaches that use one data source. Despite the model's complexity, it requires fewer features, an advantage in scenarios with limited computational resources. This streamlined, fully-automated method shows promising potential to improve the clinical decision-making process and better manage medical resources, not only in the context of COVID-19, but also in other clinical scenarios.
Collapse
Affiliation(s)
- Daniel I. Morís
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Joaquim de Moura
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Pedro J. Marcos
- Dirección Asistencial y Servicio de Neumología, Complejo Hospitalario Universitario de A Coruña (CHUAC), Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Sergas, 15006 A Coruña, Spain
| | - Enrique Míguez Rey
- Grupo de Investigación en Virología Clínica, Sección de Enfermedades Infecciosas, Servicio de Medicina Interna, Instituto de Investigación Biomédica de A Coruña (INIBIC), Área Sanitaria A Coruña y CEE (ASCC), SERGAS, 15006 A Coruña, Spain
| | - Jorge Novo
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Marcos Ortega
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| |
Collapse
|
18
|
Henderson J, Nagano Y, Milighetti M, Tiffeau-Mayer A. Limits on inferring T cell specificity from partial information. Proc Natl Acad Sci U S A 2024; 121:e2408696121. [PMID: 39374400 PMCID: PMC11494314 DOI: 10.1073/pnas.2408696121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 09/03/2024] [Indexed: 10/09/2024] Open
Abstract
A key challenge in molecular biology is to decipher the mapping of protein sequence to function. To perform this mapping requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservation among antigen-specific receptors relative to null expectations. We find that TCR specificity synergistically depends on the hypervariable regions of both receptor chains, with a degree of synergy that strongly depends on the ligand. Using a coincidence-based approach to measuring information enables us to directly bound the accuracy with which TCR specificity can be predicted from partial matches to reference sequences. We anticipate that our statistical framework will be of use for developing machine learning models for TCR specificity prediction and for optimizing TCRs for cell therapies. The proposed coincidence-based information measures might find further applications in bounding the performance of pairwise classifiers in other fields.
Collapse
Affiliation(s)
- James Henderson
- Division of Infection and Immunity, University College London, LondonWC1E 6BT, United Kingdom
- Institute for the Physics of Living Systems, University College London, LondonWC1E 6BT, United Kingdom
| | - Yuta Nagano
- Division of Infection and Immunity, University College London, LondonWC1E 6BT, United Kingdom
- Division of Medicine, University College London, LondonWC1E 6BT, United Kingdom
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, LondonWC1E 6BT, United Kingdom
- Cancer Institute, University College London, LondonWC1E 6DD, United Kingdom
| | - Andreas Tiffeau-Mayer
- Division of Infection and Immunity, University College London, LondonWC1E 6BT, United Kingdom
- Institute for the Physics of Living Systems, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
19
|
Robert Vincent ACS, Sengan S. Effective clinical decision support implementation using a multi filter and wrapper optimisation model for Internet of Things based healthcare data. Sci Rep 2024; 14:21820. [PMID: 39294200 PMCID: PMC11410983 DOI: 10.1038/s41598-024-71726-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/30/2024] [Indexed: 09/20/2024] Open
Abstract
Feature Selection (FS) is essential in the Internet of Things (IoT)-based Clinical Decision Support Systems (CDSS) to improve the accuracy and efficiency of the system. With the increasing number of sensors and devices used in healthcare, the volume of data generated is vast and complex. Relevant FS from this data is crucial in reducing computational overhead, improving the system's interpretability, and enhancing the Decision-Making System (DMS) quality. FS also aids in addressing the problems of data redundancy and noise, which can negatively impact the system's performance. FS is critical to developing practical and dependable CDSS in IoT-based healthcare sectors. This research proposes a two-phase FS model. Phase-I employs an ensemble of five Filter Methods (FM), followed by a Pearson Correlation Method (PCM). Phase-II uses the Binary Optimized Genetic Grey Wolf Optimization Algorithm (BOGGWOA) as a Wrapper Method (WM). This recommended model integrates the most valuable features of each filter. Then, it uses the Pearson Correlation Coefficient (PCC) to get rid of features that aren't needed, a Support Vector Machine (SVM) to guess how accurate their classification will be, and BOGGWOA as the Wrapper Method (WM) to pick the most essential features with the best CA.
Collapse
Affiliation(s)
| | - Sudhakar Sengan
- Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, 627451, India.
| |
Collapse
|
20
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
21
|
Zhang M, Cui Q, Lü Y, Li W. A feature-aware multimodal framework with auto-fusion for Alzheimer's disease diagnosis. Comput Biol Med 2024; 178:108740. [PMID: 38901184 DOI: 10.1016/j.compbiomed.2024.108740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 05/02/2024] [Accepted: 06/08/2024] [Indexed: 06/22/2024]
Abstract
Alzheimer's disease (AD), one of the most common dementias, has about 4.6 million new cases yearly worldwide. Due to the significant amount of suspected AD patients, early screening for the disease has become particularly important. There are diversified types of AD diagnosis data, such as cognitive tests, images, and risk factors, many prior investigations have primarily concentrated on integrating only high-dimensional features and simple fusion concatenation, resulting in less-than-optimal outcomes for AD diagnosis. Therefore, We propose an enhanced multimodal AD diagnostic framework comprising a feature-aware module and an automatic model fusion strategy (AMFS). To preserve the correlation and significance features within a low-dimensional space, the feature-aware module employs a low-dimensional SHapley Additive exPlanation (SHAP) boosting feature selection as the initial step, following this analysis, diverse tiers of low-dimensional features are extracted from patients' biological data. Besides, in the high-dimensional stage, the feature-aware module integrates cross-modal attention mechanisms to capture subtle relationships among different cognitive domains, neuroimaging modalities, and risk factors. Subsequently, we integrate the aforementioned feature-aware module with graph convolutional networks (GCN) to address heterogeneous data in multimodal AD, while also possessing the capability to perceive relationships between different modalities. Lastly, our proposed AMFS autonomously learns optimal parameters for aligning two sub-models. The validation tests using two ADNI datasets show the high accuracies of 95.9% and 91.9% respectively, in AD diagnosis. The methods efficiently select features from multimodal AD data, optimizing model fusion for potential clinical assistance in diagnostics.
Collapse
Affiliation(s)
- Meiwei Zhang
- College of Electrical Engineering, Chongqing University, Chongqing, 400030, China
| | - Qiushi Cui
- College of Electrical Engineering, Chongqing University, Chongqing, 400030, China.
| | - Yang Lü
- Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Wenyuan Li
- College of Electrical Engineering, Chongqing University, Chongqing, 400030, China
| |
Collapse
|
22
|
Janyasupab P, Singhanat K, Warnnissorn M, Thuwajit P, Suratanee A, Plaimas K, Thuwajit C. Identification of Tumor Budding-Associated Genes in Breast Cancer through Transcriptomic Profiling and Network Diffusion Analysis. Biomolecules 2024; 14:896. [PMID: 39199284 PMCID: PMC11352152 DOI: 10.3390/biom14080896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 07/23/2024] [Accepted: 07/23/2024] [Indexed: 09/01/2024] Open
Abstract
Breast cancer has the highest diagnosis rate among all cancers. Tumor budding (TB) is recognized as a recent prognostic marker. Identifying genes specific to high-TB samples is crucial for hindering tumor progression and metastasis. In this study, we utilized an RNA sequencing technique, called TempO-Seq, to profile transcriptomic data from breast cancer samples, aiming to identify biomarkers for high-TB cases. Through differential expression analysis and mutual information, we identified seven genes (NOL4, STAR, C8G, NEIL1, SLC46A3, FRMD6, and SCARF2) that are potential biomarkers in breast cancer. To gain more relevant proteins, further investigation based on a protein-protein interaction network and the network diffusion technique revealed enrichment in the Hippo signaling and Wnt signaling pathways, promoting tumor initiation, invasion, and metastasis in several cancer types. In conclusion, these novel genes, recognized as overexpressed in high-TB samples, along with their associated pathways, offer promising therapeutic targets, thus advancing treatment and diagnosis for breast cancer.
Collapse
Affiliation(s)
- Panisa Janyasupab
- Advance Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
| | - Kodchanan Singhanat
- Department of Immunology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand; (K.S.); (P.T.)
| | - Malee Warnnissorn
- Department of Pathology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand;
| | - Peti Thuwajit
- Department of Immunology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand; (K.S.); (P.T.)
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand;
- Intelligent and Nonlinear Dynamics Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
| | - Kitiporn Plaimas
- Advance Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
| | - Chanitra Thuwajit
- Department of Immunology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand; (K.S.); (P.T.)
| |
Collapse
|
23
|
Kamińska D, Kamińska O, Sochacka M, Sokół-Szawłowska M. The Role of Selected Speech Signal Characteristics in Discriminating Unipolar and Bipolar Disorders. SENSORS (BASEL, SWITZERLAND) 2024; 24:4721. [PMID: 39066117 PMCID: PMC11281009 DOI: 10.3390/s24144721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 06/23/2024] [Accepted: 07/12/2024] [Indexed: 07/28/2024]
Abstract
OBJECTIVE The objective of this study is to explore and enhance the diagnostic process of unipolar and bipolar disorders. The primary focus is on leveraging automated processes to improve the accuracy and accessibility of diagnosis. The study aims to introduce an audio corpus collected from patients diagnosed with these disorders, annotated using the Clinical Global Impressions Scale (CGI) by psychiatrists. METHODS AND PROCEDURES Traditional diagnostic methods rely on the clinician's expertise and consideration of co-existing mental disorders. However, this study proposes the implementation of automated processes in the diagnosis, providing quantitative measures and enabling prolonged observation of patients. The paper introduces a speech signal pipeline for CGI state classification, with a specific focus on selecting the most discriminative features. Acoustic features such as prosodies, MFCC, and LPC coefficients are examined in the study. The classification process utilizes common machine learning methods. RESULTS The results of the study indicate promising outcomes for the automated diagnosis of bipolar and unipolar disorders using the proposed speech signal pipeline. The audio corpus annotated with CGI by psychiatrists achieved a classification accuracy of 95% for the two-class classification. For the four- and seven-class classifications, the results were 77.3% and 73%, respectively, demonstrating the potential of the developed method in distinguishing different states of the disorders.
Collapse
Affiliation(s)
- Dorota Kamińska
- Institute of Mechatronics and Information Systems, Lodz University of Technology, 116 Żeromskiego Street, 90-924 Lodz, Poland
| | - Olga Kamińska
- Systems Research Institute, Polish Academy of Sciences, 01-447 Warsaw, Poland;
| | | | - Marlena Sokół-Szawłowska
- Outpatient Psychiatric Clinic, Institute of Psychiatry and Neurology, 9 Jana III Sobieskiego Street, 02-957 Warsaw, Poland;
| |
Collapse
|
24
|
Jain R, Ganesan RA. Effective diagnosis of sleep disorders using EEG and EOG signals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039043 DOI: 10.1109/embc53108.2024.10782470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
This work focuses on the diagnosis of various sleep disorders such as insomnia, narcolepsy, periodic leg movement, nocturnal frontal lobe epilepsy, bruxism, REM behavior disorder, and sleep-disordered breathing. We utilize SVM for classifying each of the sleep disorders from healthy controls. The proposed approach is evaluated on the publicly available CAP dataset comprising 108 overnight recordings from healthy controls and patients with sleep disorders. A single feature called gridded distribution entropy derived from Poincaré plots of EEG signal provides 100% accuracy in distinguishing healthy controls from each pathology, except insomnia and PLM. With the EOG channel, we are able to classify these two groups as well with 100% accuracy, demonstrating the effectiveness of EOG in disambiguating insomnia and PLM from controls.Clinical relevance- Diagnosis of sleep disorders is important to facilitate appropriate treatment. It is challenging due to the diverse nature and inter-subject variation of the physiological symptoms. Automated sleep disorder detection can improve cost efficiency and reduce variability.
Collapse
|
25
|
Bohm BC, Borges FEDM, Silva SCM, Soares AT, Ferreira DD, Belo VS, Lignon JS, Bruhn FRP. Utilization of machine learning for dengue case screening. BMC Public Health 2024; 24:1573. [PMID: 38862945 PMCID: PMC11167742 DOI: 10.1186/s12889-024-19083-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 06/07/2024] [Indexed: 06/13/2024] Open
Abstract
Dengue causes approximately 10.000 deaths and 100 million symptomatic infections annually worldwide, making it a significant public health concern. To address this, artificial intelligence tools like machine learning can play a crucial role in developing more effective strategies for control, diagnosis, and treatment. This study identifies relevant variables for the screening of dengue cases through machine learning models and evaluates the accuracy of the models. Data from reported dengue cases in the states of Rio de Janeiro and Minas Gerais for the years 2016 and 2019 were obtained through the National Notifiable Diseases Surveillance System (SINAN). The mutual information technique was used to assess which variables were most related to laboratory-confirmed dengue cases. Next, a random selection of 10,000 confirmed cases and 10,000 discarded cases was performed, and the dataset was divided into training (70%) and testing (30%). Machine learning models were then tested to classify the cases. It was found that the logistic regression model with 10 variables (gender, age, fever, myalgia, headache, vomiting, nausea, back pain, rash, retro-orbital pain) and the Decision Tree and Multilayer Perceptron (MLP) models achieved the best results in decision metrics, with an accuracy of 98%. Therefore, a tree-based model would be suitable for building an application and implementing it on smartphones. This resource would be available to healthcare professionals such as doctors and nurses.
Collapse
Affiliation(s)
- Bianca Conrad Bohm
- Laboratory of Veterinary Epidemiology, Postgraduate Program in Veterinary, Federal University of Pelotas (UFPel), Capão do Leão, RS, Brazil.
| | | | - Suellen Caroline Matos Silva
- Laboratory of Veterinary Epidemiology, Postgraduate Program in Veterinary, Federal University of Pelotas (UFPel), Capão do Leão, RS, Brazil
| | - Alessandra Talaska Soares
- Laboratory of Veterinary Epidemiology, Graduate Program in Microbiology and Parasitology, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil
| | | | - Vinícius Silva Belo
- Federal University of São, João del-Rei, Midwest Dona Lindu campus, Divinópolis, Minas Gerais, Brazil
| | - Julia Somavilla Lignon
- Laboratory of Veterinary Epidemiology, Postgraduate Program in Veterinary, Federal University of Pelotas (UFPel), Capão do Leão, RS, Brazil
| | - Fábio Raphael Pascoti Bruhn
- Laboratory of Veterinary Epidemiology, Preventive Veterinary Department, Federal University of Pelotas,, Capão do Leão, Rio Grande do Sul, Brazil
| |
Collapse
|
26
|
Zamanian H, Shalbaf A. Estimation of non-alcoholic steatohepatitis (NASH) disease using clinical information based on the optimal combination of intelligent algorithms for feature selection and classification. Comput Methods Biomech Biomed Engin 2024; 27:964-979. [PMID: 37254745 DOI: 10.1080/10255842.2023.2217978] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 05/12/2023] [Indexed: 06/01/2023]
Abstract
The early diagnosis of NASH disease can decrease the risk of proceeding elements and treatment costs for patients. This study aims to present an optimal combination of intelligent algorithms using advanced machine learning methods, including different feature selections and classifications based on clinical data and blood factors. In this work, collected data were from 176 patients to investigate NASH disease, and 19 features were extracted. We then sought to find the best combination of features based on different feature selection algorithms such as Feature Forward Selection (FFS), Minimum Redundancy Maximum Relevance (MRMR), and Mutual Information (MI). Finally, we used nine classifier frameworks with different mathematical mechanisms, including random forest (RF), logistic regression (LR), Linear Discriminant Analysis (LDA), AdaBoost, K nearest neighbors (KNN), multilayer perceptron model (MLP), support vector machine (SVM), and decision tree (DT) to estimate NASH disease. Our investigation revealed that the combination of dominant features, namely body mass index (BMI), glutamic pyruvic transaminase (GPT), total cholesterol (TC), high-density lipoprotein (HDL), Ezetimibe, lipoprotein level Lp(a), Loge(Lp(a)), total triglyceride (TG), Creatinine (Cre), HbA1c, Fibrate, and Sex, selected by the MRMR algorithm and classified by the RF method can provide the most appropriate performance based on less computation effort and maximum performance with accuracy, AUC, precision, and recall indices, which are 81.51 ± 9.35 , 82.53 ± 11.24 , 85.28 ± 9.68 , and 89.49 ± 7.92 , respectively. This study investigated the configuration of feature selection and classifier that is most suitable for classifying NASH disease based on clinical data and blood factors. The proposed intelligent algorithm based on MRMR and RF classifier can automatically diagnose NASH disease with appropriate performance and present an initial report without any further invasive methods. It also clarifies the diagnostic process and, as a result, the continuation of their prevention and treatment cycle.
Collapse
Affiliation(s)
- Hamed Zamanian
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
27
|
Shukla S, Deo BS, Vishwakarma C, Mishra S, Ahirwar S, Sah AN, Pandey K, Singh S, Prasad SN, Padhi AK, Pal M, Panigrahi PK, Pradhan A. A smartphone-based standalone fluorescence spectroscopy tool for cervical precancer diagnosis in clinical conditions. JOURNAL OF BIOPHOTONICS 2024; 17:e202300468. [PMID: 38494870 DOI: 10.1002/jbio.202300468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/07/2024] [Indexed: 03/19/2024]
Abstract
Real-time prediction about the severity of noncommunicable diseases like cancers is a boon for early diagnosis and timely cure. Optical techniques due to their minimally invasive nature provide better alternatives in this context than the conventional techniques. The present study talks about a standalone, field portable smartphone-based device which can classify different grades of cervical cancer on the basis of the spectral differences captured in their intrinsic fluorescence spectra with the help of AI/ML technique. In this study, a total number of 75 patients and volunteers, from hospitals at different geographical locations of India, have been tested and classified with this device. A classification approach employing a hybrid mutual information long short-term memory model has been applied to categorize various subject groups, resulting in an average accuracy, specificity, and sensitivity of 96.56%, 96.76%, and 94.37%, respectively using 10-fold cross-validation. This exploratory study demonstrates the potential of combining smartphone-based technology with fluorescence spectroscopy and artificial intelligence as a diagnostic screening approach which could enhance the detection and screening of cervical cancer.
Collapse
Affiliation(s)
- Shivam Shukla
- Center for Lasers and Photonics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Bhaswati Singha Deo
- Center for Lasers and Photonics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Chaitanya Vishwakarma
- Center for Lasers and Photonics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Subrata Mishra
- Center for Lasers and Photonics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Shikha Ahirwar
- PhotoSpIMeDx Pvt. Ltd., Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Amar Nath Sah
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| | - Kiran Pandey
- Obstetrics and Gynecology Department, GSVM Medical College Kanpur, Kanpur, Uttar Pradesh, India
| | - Sweta Singh
- Department of Obstetrics and Gynecology, AIIMS Bhubaneswar, Bhubaneswar, Odisha, India
| | - S N Prasad
- Radiation Oncology Department, J.K. Cancer Institute Kanpur, Kanpur, Uttar Pradesh, India
| | - Ashok Kumar Padhi
- Gynecologic Oncology Department, Acharya Harihar Regional Cancer Research Centre, Cuttack, Odisha, India
| | - Mayukha Pal
- ABB Ability Innovation Center, Asea Brown Boveri Company, Hyderabad, India
| | - Prasanta K Panigrahi
- Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal, India
- Centre for Quantum Science and Technology, Siksha 'O' Anusandhan University, Bhubaneswar, Odisha, India
| | - Asima Pradhan
- Center for Lasers and Photonics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
- PhotoSpIMeDx Pvt. Ltd., Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
- Department of Physics, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
| |
Collapse
|
28
|
Jia Y, Hu X, Kang W, Dong X. Unveiling Microbial Nitrogen Metabolism in Rivers using a Machine Learning Approach. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:6605-6615. [PMID: 38566483 DOI: 10.1021/acs.est.3c09653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Microbial nitrogen metabolism is a complicated and key process in mediating environmental pollution and greenhouse gas emissions in rivers. However, the interactive drivers of microbial nitrogen metabolism in rivers have not been identified. Here, we analyze the microbial nitrogen metabolism patterns in 105 rivers in China driven by 26 environmental and socioeconomic factors using an interpretable causal machine learning (ICML) framework. ICML better recognizes the complex relationships between factors and microbial nitrogen metabolism than traditional linear regression models. Furthermore, tipping points and concentration windows were proposed to precisely regulate microbial nitrogen metabolism. For example, concentrations of dissolved organic carbon (DOC) below tipping points of 6.2 and 4.2 mg/L easily reduce bacterial denitrification and nitrification, respectively. The concentration windows for NO3--N (15.9-18.0 mg/L) and DOC (9.1-10.8 mg/L) enabled the highest abundance of denitrifying bacteria on a national scale. The integration of ICML models and field data clarifies the important drivers of microbial nitrogen metabolism, supporting the precise regulation of nitrogen pollution and river ecological management.
Collapse
Affiliation(s)
- Yuying Jia
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Xiangang Hu
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Weilu Kang
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Xu Dong
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| |
Collapse
|
29
|
Zhang J, Lin Y, Jiang M, Li S, Tang Y, Long J, Weng J, Tan KC. Fast Multilabel Feature Selection via Global Relevance and Redundancy Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5721-5734. [PMID: 36215379 DOI: 10.1109/tnnls.2022.3208956] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Information theoretical-based methods have attracted a great attention in recent years and gained promising results for multilabel feature selection (MLFS). Nevertheless, most of the existing methods consider a heuristic way to the grid search of important features, and they may also suffer from the issue of fully utilizing labeling information. Thus, they are probable to deliver a suboptimal result with heavy computational burden. In this article, we propose a general optimization framework global relevance and redundancy optimization (GRRO) to solve the learning problem. The main technical contribution in GRRO is a formulation for MLFS while feature relevance, label relevance (i.e., label correlation), and feature redundancy are taken into account, which can avoid repetitive entropy calculations to obtain a global optimal solution efficiently. To further improve the efficiency, we extend GRRO to filter out inessential labels and features, thus facilitating fast MLFS. We call the extension as GRROfast, in which the key insights are twofold: 1) promising labels and related relevant features are investigated to reduce ineffective calculations in terms of features, even labels and 2) the framework of GRRO is reconstructed to generate the optimal result with an ensemble. Moreover, our proposed algorithms have an excellent mechanism for exploiting the inherent properties of multilabel data; specifically, we provide a formulation to enhance the proposal with label-specific features. Extensive experiments clearly reveal the effectiveness and efficiency of our proposed algorithms.
Collapse
|
30
|
Bhamidipati K, Muppidi S, Reddy PVB, Merugula S. Soil Moisture and Heat Level Prediction for Plant Health Monitoring Using Deep Learning with Gannet Namib Beetle Optimization in IoT. Appl Biochem Biotechnol 2024; 196:2289-2317. [PMID: 37535216 DOI: 10.1007/s12010-023-04636-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2023] [Indexed: 08/04/2023]
Abstract
Plant health monitoring is crucial in ensuring a constant food supply to satisfy the growing demand for food. Hence, it is essential to monitor plant health to maximize the yield and minimize the risk of various diseases. Soil moisture and temperature are of critical importance in plant growth, and predicting them enables farmers to take preventive actions, thereby mitigating the issues affecting plant health. This work presents a plant health monitoring approach by forecasting soil moisture and heat levels by collecting data in an Internet of Things (IoT) environment. Here, for transmitting the soil data acquired by the IoT nodes, a cluster head (CH) selection and routing technique using Gannet Namib beetle optimization (GNBO) is used. The data is routed to a prediction module, wherein soil moisture and heat levels are predicted by Convolutional long short term memory (Conv-LSTM). Furthermore, the hyperparameters of the Conv-LSTM are optimized by the GNBO algorithm. The efficiency of the GNBO-Conv-LSTM is examined based on link life time (LLT), energy, delay, distance, negative predictive value (NPV), positive predictive value (PPV), and true negative rate (TNR) and is observed to have achieved values of 0.675, 0.478 J, 0.092 ms, 50.200 m, 0.885, 0.882, and 0.875, correspondingly.
Collapse
Affiliation(s)
- Kishore Bhamidipati
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Satish Muppidi
- Department of Computer Science and Engineering, GMR Institute of Technology, Rajam, Andhra Pradesh, India
| | - P V Bhaskar Reddy
- School of Computer Science and Engineering, REVA University, Bangalore, India
| | - Suneetha Merugula
- Department of CSE, GITAM School of Technology, GITAM (Deemed to be University), Visakhapatnam, India
| |
Collapse
|
31
|
Burton T, Fathieh F, Nemati N, Gillins HR, Shadforth IP, Ramchandani S, Bridges CR. Development of a Non-Invasive Machine-Learned Point-of-Care Rule-Out Test for Coronary Artery Disease. Diagnostics (Basel) 2024; 14:719. [PMID: 38611631 PMCID: PMC11012183 DOI: 10.3390/diagnostics14070719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 03/22/2024] [Accepted: 03/25/2024] [Indexed: 04/14/2024] Open
Abstract
The current standard of care for coronary artery disease (CAD) requires an intake of radioactive or contrast enhancement dyes, radiation exposure, and stress and may take days to weeks for referral to gold-standard cardiac catheterization. The CAD diagnostic pathway would greatly benefit from a test to assess for CAD that enables the physician to rule it out at the point of care, thereby enabling the exploration of other diagnoses more rapidly. We sought to develop a test using machine learning to assess for CAD with a rule-out profile, using an easy-to-acquire signal (without stress/radiation) at the point of care. Given the historic disparate outcomes between sexes and urban/rural geographies in cardiology, we targeted equal performance across sexes in a geographically accessible test. Noninvasive photoplethysmogram and orthogonal voltage gradient signals were simultaneously acquired in a representative clinical population of subjects before invasive catheterization for those with CAD (gold-standard for the confirmation of CAD) and coronary computed tomographic angiography for those without CAD (excellent negative predictive value). Features were measured from the signal and used in machine learning to predict CAD status. The machine-learned algorithm achieved a sensitivity of 90% and specificity of 59%. The rule-out profile was maintained across both sexes, as well as all other relevant subgroups. A test to assess for CAD using machine learning on a noninvasive signal has been successfully developed, showing high performance and rule-out ability. Confirmation of the performance on a large clinical, blinded, enrollment-gated dataset is required before implementation of the test in clinical practice.
Collapse
Affiliation(s)
- Timothy Burton
- Analytics for Life, Toronto, ON M5X 1C9, Canada; (T.B.); (F.F.); (N.N.)
| | - Farhad Fathieh
- Analytics for Life, Toronto, ON M5X 1C9, Canada; (T.B.); (F.F.); (N.N.)
| | - Navid Nemati
- Analytics for Life, Toronto, ON M5X 1C9, Canada; (T.B.); (F.F.); (N.N.)
| | | | | | - Shyam Ramchandani
- Analytics for Life, Toronto, ON M5X 1C9, Canada; (T.B.); (F.F.); (N.N.)
| | | |
Collapse
|
32
|
Luo M, Zhu J, Jia J, Zhang H, Zhao J. Progress on network modeling and analysis of gut microecology: a review. Appl Environ Microbiol 2024; 90:e0009224. [PMID: 38415584 PMCID: PMC11207142 DOI: 10.1128/aem.00092-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
The gut microecological network is a complex microbial community within the human body that plays a key role in linking dietary nutrition and host physiology. To understand the complex relationships among microbes and their functions within this community, network analysis has emerged as a powerful tool. By representing the interactions between microbes and their associated omics data as a network, we can gain a comprehensive understanding of the ecological mechanisms that drive the human gut microbiota. In addition, the network-based approach provides a more intuitive analysis of the gut microbiota, simplifying the study of its complex dynamics and interdependencies. This review provides a comprehensive overview of the methods used to construct and analyze networks in the context of gut microecological background. We discuss various types of network modeling approaches, including co-occurrence networks, causal networks, dynamic networks, and multi-omics networks, and describe the analytical techniques used to identify important network properties. We also highlight the challenges and limitations of network modeling in this area, such as data scarcity and heterogeneity, and provide future research directions to overcome these limitations. By exploring these network-based methods, researchers can gain valuable insights into the intricate relationships and functional roles of microbial communities within the gut, ultimately advancing our understanding of the gut microbiota's impact on human health.
Collapse
Affiliation(s)
- Meng Luo
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Jinlin Zhu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Jiajia Jia
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, China
| | - Hao Zhang
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi, Jiangsu, China
- Wuxi Translational Medicine Research Center, Jiangsu Translational Medicine Research Institute Wuxi Branch, Wuxi, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, China
| | - Jianxin Zhao
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- Wuxi Translational Medicine Research Center, Jiangsu Translational Medicine Research Institute Wuxi Branch, Wuxi, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, China
| |
Collapse
|
33
|
Rodrigues J, Amin A, Chandra S, Mulla NJ, Nayak GS, Rai S, Ray S, Mahato KK. Machine Learning Enabled Photoacoustic Spectroscopy for Noninvasive Assessment of Breast Tumor Progression In Vivo: A Preclinical Study. ACS Sens 2024; 9:589-601. [PMID: 38288735 PMCID: PMC10897932 DOI: 10.1021/acssensors.3c01085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 11/25/2023] [Accepted: 01/17/2024] [Indexed: 02/24/2024]
Abstract
Breast cancer is a dreaded disease affecting women the most in cancer-related deaths over other cancers. However, early diagnosis of the disease can help increase survival rates. The existing breast cancer diagnosis tools do not support the early diagnosis of the disease. Therefore, there is a great need to develop early diagnostic tools for this cancer. Photoacoustic spectroscopy (PAS), being very sensitive to biochemical changes, can be relied upon for its application in detecting breast tumors in vivo. With this motivation, in the current study, an aseptic chamber integrated photoacoustic (PA) probe was designed and developed to monitor breast tumor progression in vivo, established in nude mice. The device served the dual purpose of transporting tumor-bearing animals to the laboratory from the animal house and performing PA experiments in the same chamber, maintaining sterility. In the current study, breast tumor was induced in the nude mice by MCF-7 cells injection and the corresponding PA spectra at different time points (day 0, 5, 10, 15, and 20) of tumor progression in vivo in the same animals. The recorded photoacoustic spectra were subsequently preprocessed, wavelet-transformed, and subjected to filter-based feature selection algorithm. The selected top 20 features, by minimum redundancy maximum relevance (mRMR) algorithm, were then used to build an input feature matrix for machine learning (ML)-based classification of the data. The performance of classification models demonstrated 100% specificity, whereas the sensitivity of 95, 100, 92.5, and 85% for the time points, day 5, 10, 15, and 20, respectively. These results suggest the potential of PA signal-based classification of breast tumor progression in a preclinical model. The PA signal contains information on the biochemical changes associated with disease progression, emphasizing its translational strength toward early disease diagnosis.
Collapse
Affiliation(s)
- Jackson Rodrigues
- Department
of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Karnataka, Manipal 576104, India
| | - Ashwini Amin
- Department
of Computer Science and Engineering, Manipal
Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - Subhash Chandra
- Department
of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Karnataka, Manipal 576104, India
| | - Nitufa J. Mulla
- Department
of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Karnataka, Manipal 576104, India
| | - G. Subramanya Nayak
- Department
of Electronics and Communication, Manipal
Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - Sharada Rai
- Department
of Pathology, Kasturba Medical College Mangalore,
Manipal Academy of Higher Education, Karnataka, Manipal 576104, India
| | - Satadru Ray
- Department
of Surgery, Kasturba Medical College, Manipal
Academy of Higher Education, Karnataka,Manipal 576104, India
| | - Krishna Kishore Mahato
- Department
of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Karnataka, Manipal 576104, India
| |
Collapse
|
34
|
Kreuze JF, Ramírez DA, Fuentes S, Loayza H, Ninanya J, Rinza J, David M, Gamboa S, De Boeck B, Diaz F, Pérez A, Silva L, Campos H. High-throughput characterization and phenotyping of resistance and tolerance to virus infection in sweetpotato. Virus Res 2024; 339:199276. [PMID: 38006786 PMCID: PMC10751700 DOI: 10.1016/j.virusres.2023.199276] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/14/2023] [Accepted: 11/16/2023] [Indexed: 11/27/2023]
Abstract
Breeders have made important efforts to develop genotypes able to resist virus attacks in sweetpotato, a major crop providing food security and poverty alleviation to smallholder farmers in many regions of Sub-Saharan Africa, Asia and Latin America. However, a lack of accurate objective quantitative methods for this selection target in sweetpotato prevents a consistent and extensive assessment of large breeding populations. In this study, an approach to characterize and classify resistance in sweetpotato was established by assessing total yield loss and virus load after the infection of the three most common viruses (SPFMV, SPCSV, SPLCV). Twelve sweetpotato genotypes with contrasting reactions to virus infection were grown in the field under three different treatments: pre-infected by the three viruses, un-infected and protected from re-infection, and un-infected but exposed to natural infection. Virus loads were assessed using ELISA, (RT-)qPCR, and loop-mediated isothermal amplification (LAMP) methods, and also through multispectral reflectance and canopy temperature collected using an unmanned aerial vehicle. Total yield reduction compared to control and the arithmetic sum of (RT-)qPCR relative expression ratios were used to classify genotypes into four categories: resistant, tolerant, susceptible, and sensitives. Using 14 remote sensing predictors, machine learning algorithms were trained to classify all plots under the said categories. The study found that remotely sensed predictors were effective in discriminating the different virus response categories. The results suggest that using machine learning and remotely sensed data, further complemented by fast and sensitive LAMP assays to confirm results of predicted classifications could be used as a high throughput approach to support virus resistance phenotyping in sweetpotato breeding.
Collapse
Affiliation(s)
- Jan F Kreuze
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - David A Ramírez
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Segundo Fuentes
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Hildo Loayza
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru; Programa academico de ingenieria ambiental, Universidad de Huanuco, Jr. Hermilio Valdizan N° 871, Huanuco, Peru.
| | - Johan Ninanya
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Javier Rinza
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Maria David
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Soledad Gamboa
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Bert De Boeck
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Federico Diaz
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Ana Pérez
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Luis Silva
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| | - Hugo Campos
- International Potato Center (CIP), Headquarters, P.O. Box 1558, Lima 15024, Peru.
| |
Collapse
|
35
|
Urmi WF, Uddin MN, Uddin MA, Talukder MA, Hasan MR, Paul S, Chanda M, Ayoade J, Khraisat A, Hossen R, Imran F. A stacked ensemble approach to detect cyber attacks based on feature selection techniques. INTERNATIONAL JOURNAL OF COGNITIVE COMPUTING IN ENGINEERING 2024; 5:316-331. [DOI: 10.1016/j.ijcce.2024.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2024]
|
36
|
Abdelwahab MM, Al-Karawi KA, Semary HE. Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data. Biomedicines 2023; 11:3304. [PMID: 38137524 PMCID: PMC10741889 DOI: 10.3390/biomedicines11123304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023] Open
Abstract
Alzheimer's disease is a genetically complex disorder, and microarray technology provides valuable insights into it. However, the high dimensionality of microarray datasets and small sample sizes pose challenges. Gene selection techniques have emerged as a promising solution to this challenge, potentially revolutionizing AD diagnosis. The study aims to investigate deep learning techniques, specifically neural networks, in predicting Alzheimer's disease using microarray gene expression data. The goal is to develop a reliable predictive model for early detection and diagnosis, potentially improving patient care and intervention strategies. This study employed gene selection techniques, including Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), to pinpoint pertinent genes within microarray datasets. Leveraging deep learning principles, we harnessed a Convolutional Neural Network (CNN) as our classifier for Alzheimer's disease (AD) prediction. Our approach involved the utilization of a seven-layer CNN with diverse configurations to process the dataset. Empirical outcomes on the AD dataset underscored the effectiveness of the PCA-CNN model, yielding an accuracy of 96.60% and a loss of 0.3503. Likewise, the SVD-CNN model showcased remarkable accuracy, attaining 97.08% and a loss of 0.2466. These results accentuate the potential of our method for gene dimension reduction and classification accuracy enhancement by selecting a subset of pertinent genes. Integrating gene selection methodologies with deep learning architectures presents a promising framework for elevating AD prediction and promoting precision medicine in neurodegenerative disorders. Ongoing research endeavors aim to generalize this approach for diverse applications, explore alternative gene selection techniques, and investigate a variety of deep learning architectures.
Collapse
Affiliation(s)
- Mahmoud M. Abdelwahab
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi Arabia;
- Department of Basic Sciences, Higher Institute of Administrative Sciences, Belbeis 44621, Egypt
| | - Khamis A. Al-Karawi
- School of Science, Engineering and Environment, Salford University, Salford M5 4WT, UK;
- College of Veterinary Medicine, Diyala University, Baquba 32001, Iraq
| | - Hatem E. Semary
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi Arabia;
- Department of Statistics and Insurance, Faculty of Commerce, Zagazig University, Zagazig 44519, Egypt
| |
Collapse
|
37
|
Hossain MA, Islam MS. A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection. Sci Rep 2023; 13:21207. [PMID: 38040793 PMCID: PMC10692109 DOI: 10.1038/s41598-023-48230-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open
Abstract
In the age of sophisticated cyber threats, botnet detection remains a crucial yet complex security challenge. Existing detection systems are continually outmaneuvered by the relentless advancement of botnet strategies, necessitating a more dynamic and proactive approach. Our research introduces a ground-breaking solution to the persistent botnet problem through a strategic amalgamation of Hybrid Feature Selection methods-Categorical Analysis, Mutual Information, and Principal Component Analysis-and a robust ensemble of machine learning techniques. We uniquely combine these feature selection tools to refine the input space, enhancing the detection capabilities of the ensemble learners. Extra Trees, as the ensemble technique of choice, exhibits exemplary performance, culminating in a near-perfect 99.99% accuracy rate in botnet classification across varied datasets. Our model not only surpasses previous benchmarks but also demonstrates exceptional adaptability to new botnet phenomena, ensuring persistent accuracy in a landscape of evolving threats. Detailed comparative analyses manifest our model's superiority, consistently achieving over 99% True Positive Rates and an unprecedented False Positive Rate close to 0.00%, thereby setting a new precedent for reliability in botnet detection. This research signifies a transformative step in cybersecurity, offering unprecedented precision and resilience against botnet infiltrations, and providing an indispensable blueprint for the development of next-generation security frameworks.
Collapse
Affiliation(s)
- Md Alamgir Hossain
- Institute of Information and Communication Technology (IICT), Bangladesh University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh.
| | - Md Saiful Islam
- Institute of Information and Communication Technology (IICT), Bangladesh University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh
| |
Collapse
|
38
|
Felefly T, Roukoz C, Fares G, Achkar S, Yazbeck S, Meyer P, Kordahi M, Azoury F, Nasr DN, Nasr E, Noël G, Francis Z. An Explainable MRI-Radiomic Quantum Neural Network to Differentiate Between Large Brain Metastases and High-Grade Glioma Using Quantum Annealing for Feature Selection. J Digit Imaging 2023; 36:2335-2346. [PMID: 37507581 PMCID: PMC10584786 DOI: 10.1007/s10278-023-00886-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/11/2023] [Accepted: 07/17/2023] [Indexed: 07/30/2023] Open
Abstract
Solitary large brain metastases (LBM) and high-grade gliomas (HGG) are sometimes hard to differentiate on MRI. The management differs significantly between these two entities, and non-invasive methods that help differentiate between them are eagerly needed to avoid potentially morbid biopsies and surgical procedures. We explore herein the performance and interpretability of an MRI-radiomics variational quantum neural network (QNN) using a quantum-annealing mutual-information (MI) feature selection approach. We retrospectively included 423 patients with HGG and LBM (> 2 cm) who had a contrast-enhanced T1-weighted (CE-T1) MRI between 2012 and 2019. After exclusion, 72 HGG and 129 LBM were kept. Tumors were manually segmented, and a 5-mm peri-tumoral ring was created. MRI images were pre-processed, and 1813 radiomic features were extracted. A set of best features based on MI was selected. MI and conditional-MI were embedded into a quadratic unconstrained binary optimization (QUBO) formulation that was mapped to an Ising-model and submitted to D'Wave's quantum annealer to solve for the best combination of 10 features. The 10 selected features were embedded into a 2-qubits QNN using PennyLane library. The model was evaluated for balanced-accuracy (bACC) and area under the receiver operating characteristic curve (ROC-AUC) on the test set. The model performance was benchmarked against two classical models: dense neural networks (DNN) and extreme gradient boosting (XGB). Shapley values were calculated to interpret sample-wise predictions on the test set. The best 10-feature combination included 6 tumor and 4 ring features. For QNN, DNN, and XGB, respectively, training ROC-AUC was 0.86, 0.95, and 0.94; test ROC-AUC was 0.76, 0.75, and 0.79; and test bACC was 0.74, 0.73, and 0.72. The two most influential features were tumor Laplacian-of-Gaussian-GLRLM-Entropy and sphericity. We developed an accurate interpretable QNN model with quantum-informed feature selection to differentiate between LBM and HGG on CE-T1 brain MRI. The model performance is comparable to state-of-the-art classical models.
Collapse
Affiliation(s)
- Tony Felefly
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon.
- ICube Laboratory, University of Strasbourg, Strasbourg, France.
- Radiation Oncology Department, Hôtel-Dieu de Lévis, Lévis, QC, Canada.
| | - Camille Roukoz
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon
| | - Georges Fares
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon
- Physics Department, Saint Joseph University, Beirut, Lebanon
| | - Samir Achkar
- Radiation Oncology Department, Gustave Roussy Cancer Campus, 94805, Villejuif, France
| | - Sandrine Yazbeck
- Department of Radiology, University of Maryland School of Medicine, 655 W Baltimore St S, Baltimore, MD, 21201, USA
| | - Philippe Meyer
- Medical Physics Department, Institut de Cancérologie de Strasbourg (ICANS), 67200, Strasbourg, France
- IMAGeS Unit, IRIS Platform, ICube, University of Strasbourg, 67085, Strasbourg Cedex, France
| | | | - Fares Azoury
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon
| | - Dolly Nehme Nasr
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon
| | - Elie Nasr
- Radiation Oncology Department, Hôtel-Dieu de France Hospital, Saint Joseph University, Beirut, Lebanon
| | - Georges Noël
- Radiotherapy Department, Institut de Cancérologie de Strasbourg (ICANS), 67200, Strasbourg, France
- Radiobiology Department, IMIS Unit, IRIS Platform, ICube, University of Strasbourg, 67085, Strasbourg Cedex, France
- Faculty of Medicine, University of Strasbourg, 67000, Strasbourg, France
| | - Ziad Francis
- Physics Department, Saint Joseph University, Beirut, Lebanon
| |
Collapse
|
39
|
Lee H, Lee Y, Jo M, Nam S, Jo J, Lee C. Enhancing Diagnosis of Rotating Elements in Roll-to-Roll Manufacturing Systems through Feature Selection Approach Considering Overlapping Data Density and Distance Analysis. SENSORS (BASEL, SWITZERLAND) 2023; 23:7857. [PMID: 37765913 PMCID: PMC10534779 DOI: 10.3390/s23187857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Roll-to-roll manufacturing systems have been widely adopted for their cost-effectiveness, eco-friendliness, and mass-production capabilities, utilizing thin and flexible substrates. However, in these systems, defects in the rotating components such as the rollers and bearings can result in severe defects in the functional layers. Therefore, the development of an intelligent diagnostic model is crucial for effectively identifying these rotating component defects. In this study, a quantitative feature-selection method, feature partial density, to develop high-efficiency diagnostic models was proposed. The feature combinations extracted from the measured signals were evaluated based on the partial density, which is the density of the remaining data excluding the highest class in overlapping regions and the Mahalanobis distance by class to assess the classification performance of the models. The validity of the proposed algorithm was verified through the construction of ranked model groups and comparison with existing feature-selection methods. The high-ranking group selected by the algorithm outperformed the other groups in terms of training time, accuracy, and positive predictive value. Moreover, the top feature combination demonstrated superior performance across all indicators compared to existing methods.
Collapse
Affiliation(s)
- Haemi Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Yoonjae Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Minho Jo
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Sanghoon Nam
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeongdai Jo
- Department of Printed Electronics, Korea Institute of Machinery and Materials, 156, Gajeongbuk-ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Changwoo Lee
- Department of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| |
Collapse
|
40
|
Statsenko Y, Babushkin V, Talako T, Kurbatova T, Smetanina D, Simiyu GL, Habuza T, Ismail F, Almansoori TM, Gorkom KNV, Szólics M, Hassan A, Ljubisavljevic M. Automatic Detection and Classification of Epileptic Seizures from EEG Data: Finding Optimal Acquisition Settings and Testing Interpretable Machine Learning Approach. Biomedicines 2023; 11:2370. [PMID: 37760815 PMCID: PMC10525492 DOI: 10.3390/biomedicines11092370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 09/29/2023] Open
Abstract
Deep learning (DL) is emerging as a successful technique for automatic detection and differentiation of spontaneous seizures that may otherwise be missed or misclassified. Herein, we propose a system architecture based on top-performing DL models for binary and multigroup classifications with the non-overlapping window technique, which we tested on the TUSZ dataset. The system accurately detects seizure episodes (87.7% Sn, 91.16% Sp) and carefully distinguishes eight seizure types (95-100% Acc). An increase in EEG sampling rate from 50 to 250 Hz boosted model performance: the precision of seizure detection rose by 5%, and seizure differentiation by 7%. A low sampling rate is a reasonable solution for training reliable models with EEG data. Decreasing the number of EEG electrodes from 21 to 8 did not affect seizure detection but worsened seizure differentiation significantly: 98.24 ± 0.17 vs. 85.14 ± 3.14% recall. In detecting epileptic episodes, all electrodes provided equally informative input, but in seizure differentiation, their informative value varied. We improved model explainability with interpretable ML. Activation maximization highlighted the presence of EEG patterns specific to eight seizure types. Cortical projection of epileptic sources depicted differences between generalized and focal seizures. Interpretable ML techniques confirmed that our system recognizes biologically meaningful features as indicators of epileptic activity in EEG.
Collapse
Affiliation(s)
- Yauhen Statsenko
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Medical Imaging Platform, ASPIRE Precision Medicine Research Institute Abu Dhabi, Al Ain P.O. Box 15551, United Arab Emirates
- Big Data Analytics Center, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Vladimir Babushkin
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Tatsiana Talako
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Department of Oncohematology, Minsk Scientific and Practical Center for Surgery, Transplantology and Hematology, 220089 Minsk, Belarus
| | - Tetiana Kurbatova
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Darya Smetanina
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Gillian Lylian Simiyu
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Tetiana Habuza
- Big Data Analytics Center, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Fatima Ismail
- Pediatric Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Taleb M. Almansoori
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Klaus N.-V. Gorkom
- Radiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Miklós Szólics
- Neurology Division, Medicine Department, Tawam Hospital, Al Ain P.O. Box 15258, United Arab Emirates
- Internal Medicine Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Ali Hassan
- Neurology Division, Medicine Department, Tawam Hospital, Al Ain P.O. Box 15258, United Arab Emirates
| | - Milos Ljubisavljevic
- Physiology Department, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates;
- Neuroscience Platform, ASPIRE Precision Medicine Research Institute Abu Dhabi, Al Ain P.O. Box 15551, United Arab Emirates
| |
Collapse
|
41
|
Habib MA, O’Sullivan JJ, Abolfathi S, Salauddin M. Enhanced wave overtopping simulation at vertical breakwaters using machine learning algorithms. PLoS One 2023; 18:e0289318. [PMID: 37585387 PMCID: PMC10431617 DOI: 10.1371/journal.pone.0289318] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 07/16/2023] [Indexed: 08/18/2023] Open
Abstract
Accurate prediction of wave overtopping at sea defences remains central to the protection of lives, livelihoods, and infrastructural assets in coastal zones. In addressing the increased risks of rising sea levels and more frequent storm surges, robust assessment and prediction methods for overtopping prediction are increasingly important. Methods for predicting overtopping have typically relied on empirical relations based on physical modelling and numerical simulation data. In recent years, with advances in computational efficiency, data-driven techniques including advanced Machine Learning (ML) methods have become more readily applicable. However, the methodological appropriateness and performance evaluation of ML techniques for predicting wave overtopping at vertical seawalls has not been extensively studied. This study examines the predictive performance of four ML techniques, namely Random Forest (RF), Gradient Boosted Decision Trees (GBDT), Support Vector Machines-Regression (SVR), and Artificial Neural Network (ANN) for overtopping discharge at vertical seawalls. The ML models are developed using data from the EurOtop (2018) database. Hyperparameter tuning is performed to curtail algorithms to the intrinsic features of the dataset. Feature Transformation and advanced Feature Selection methods are adopted to reduce data redundancy and overfitting. Comprehensive statistical analysis shows superior performance of the RF method, followed in turn by the GBDT, SVR, and ANN models, respectively. In addition to this, Decision Tree (DT) based methods such as GBDT and RF are shown to be more computationally efficient than SVR and ANN, with GBDT performing simulations more rapidly that other methods. This study shows that ML approaches can be adopted as a reliable and computationally effective method for evaluating wave overtopping at vertical seawalls across a wide range of hydrodynamic and structural conditions.
Collapse
Affiliation(s)
- M. A. Habib
- UCD Dooge Centre for Water Resources Research, School of Civil Engineering, University College Dublin, Dublin, Ireland
| | - J. J. O’Sullivan
- UCD Dooge Centre for Water Resources Research, School of Civil Engineering, University College Dublin, Dublin, Ireland
- UCD Earth Institute, University College Dublin, Dublin, Ireland
| | - S. Abolfathi
- School of Engineering, University of Warwick, Coventry, United Kingdom
| | - M. Salauddin
- UCD Dooge Centre for Water Resources Research, School of Civil Engineering, University College Dublin, Dublin, Ireland
- UCD Earth Institute, University College Dublin, Dublin, Ireland
| |
Collapse
|
42
|
Fabietti M, Mahmud M, Lotfi A, Leparulo A, Fontana R, Vassanelli S, Fasolato C. Early Detection of Alzheimer's Disease From Cortical and Hippocampal Local Field Potentials Using an Ensembled Machine Learning Model. IEEE Trans Neural Syst Rehabil Eng 2023; 31:2839-2848. [PMID: 37347628 DOI: 10.1109/tnsre.2023.3288835] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
Early diagnosis of Alzheimer's disease (AD) is a very challenging problem and has been attempted through data-driven methods in recent years. However, considering the inherent complexity in decoding higher cognitive functions from spontaneous neuronal signals, these data-driven methods benefit from the incorporation of multimodal data. This work proposes an ensembled machine learning model with explainability (EXML) to detect subtle patterns in cortical and hippocampal local field potential signals (LFPs) that can be considered as a potential marker for AD in the early stage of the disease. The LFPs acquired from healthy and two types of AD animal models (n = 10 each) using linear multielectrode probes were endorsed by electrocardiogram and respiration signals for their veracity. Feature sets were generated from LFPs in temporal, spatial and spectral domains and were fed into selected machine-learning models for each domain. Using late fusion, the EXML model achieved an overall accuracy of 99.4%. This provided insights into the amyloid plaque deposition process as early as 3 months of the disease onset by identifying the subtle patterns in the network activities. Lastly, the individual and ensemble models were found to be robust when evaluated by randomly masking channels to mimic the presence of artefacts.
Collapse
|
43
|
Abd Elaziz M, Ouadfel S, Ibrahim RA. Boosting capuchin search with stochastic learning strategy for feature selection. Neural Comput Appl 2023; 35:14061-14080. [DOI: 10.1007/s00521-023-08400-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/13/2023] [Indexed: 09/02/2023]
Abstract
AbstractThe technological revolution has made available a large amount of data with many irrelevant and noisy features that alter the analysis process and increase time processing. Therefore, feature selection (FS) approaches are used to select the smallest subset of relevant features. Feature selection is viewed as an optimization process for which meta-heuristics have been successfully applied. Thus, in this paper, a new feature selection approach is proposed based on an enhanced version of the Capuchin search algorithm (CapSA). In the developed FS approach, named ECapSA, three modifications have been introduced to avoid a lack of diversity, and premature convergence of the basic CapSA: (1) The inertia weight is adjusted using the logistic map, (2) sine cosine acceleration coefficients are added to improve convergence, and (3) a stochastic learning strategy is used to add more diversity to the movement of Capuchin and a levy random walk. To demonstrate the performance of ECapSA, different datasets are used, and it is compared with other well-known FS methods. The results provide evidence of the superiority of ECapSA among the tested datasets and competitive methods in terms of performance metrics.
Collapse
|
44
|
Wang Y, Gao X, Ru X, Sun P, Wang J. The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1003. [PMID: 37509950 PMCID: PMC10378569 DOI: 10.3390/e25071003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/27/2023] [Accepted: 06/28/2023] [Indexed: 07/30/2023]
Abstract
Feature selection plays an important role in improving the performance of classification or reducing the dimensionality of high-dimensional datasets, such as high-throughput genomics/proteomics data in bioinformatics. As a popular approach with computational efficiency and scalability, information theory has been widely incorporated into feature selection. In this study, we propose a unique weight-based feature selection (WBFS) algorithm that assesses selected features and candidate features to identify the key protein biomarkers for classifying lung cancer subtypes from The Cancer Proteome Atlas (TCPA) database and we further explored the survival analysis between selected biomarkers and subtypes of lung cancer. Results show good performance of the combination of our WBFS method and Bayesian network for mining potential biomarkers. These candidate signatures have valuable biological significance in tumor classification and patient survival analysis. Taken together, this study proposes the WBFS method that helps to explore candidate biomarkers from biomedical datasets and provides useful information for tumor diagnosis or therapy strategies.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Xinxin Ru
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Pengzhan Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Jihan Wang
- Xi'an Key Laboratory of Stem Cell and Regenerative Medicine, Institute of Medical Research, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
45
|
Apte S, Falbriard M, Meyer F, Millet GP, Gremeaux V, Aminian K. Estimation of horizontal running power using foot-worn inertial measurement units. Front Bioeng Biotechnol 2023; 11:1167816. [PMID: 37425358 PMCID: PMC10324974 DOI: 10.3389/fbioe.2023.1167816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/02/2023] [Indexed: 07/11/2023] Open
Abstract
Feedback of power during running is a promising tool for training and determining pacing strategies. However, current power estimation methods show low validity and are not customized for running on different slopes. To address this issue, we developed three machine-learning models to estimate peak horizontal power for level, uphill, and downhill running using gait spatiotemporal parameters, accelerometer, and gyroscope signals extracted from foot-worn IMUs. The prediction was compared to reference horizontal power obtained during running on a treadmill with an embedded force plate. For each model, we trained an elastic net and a neural network and validated it with a dataset of 34 active adults across a range of speeds and slopes. For the uphill and level running, the concentric phase of the gait cycle was considered, and the neural network model led to the lowest error (median ± interquartile range) of 1.7% ± 12.5% and 3.2% ± 13.4%, respectively. The eccentric phase was considered relevant for downhill running, wherein the elastic net model provided the lowest error of 1.8% ± 14.1%. Results showed a similar performance across a range of different speed/slope running conditions. The findings highlighted the potential of using interpretable biomechanical features in machine learning models for the estimating horizontal power. The simplicity of the models makes them suitable for implementation on embedded systems with limited processing and energy storage capacity. The proposed method meets the requirements for applications needing accurate near real-time feedback and complements existing gait analysis algorithms based on foot-worn IMUs.
Collapse
Affiliation(s)
- Salil Apte
- Laboratory of Movement Analysis and Measurement, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Mathieu Falbriard
- Laboratory of Movement Analysis and Measurement, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Frédéric Meyer
- Digital Signal Processing Group, Department of Informatics, University of Oslo, Oslo, Norway
- Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland
| | - Grégoire P. Millet
- Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland
| | - Vincent Gremeaux
- Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland
- Sport Medicine Unit, Division of Physical Medicine and Rehabilitation, Swiss Olympic Medical Center, Lausanne University Hospital, Lausanne, Switzerland
| | - Kamiar Aminian
- Laboratory of Movement Analysis and Measurement, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
46
|
Rezapour M, Niazi MKK, Gurcan MN. Machine learning-based analytics of the impact of the Covid-19 pandemic on alcohol consumption habit changes among United States healthcare workers. Sci Rep 2023; 13:6003. [PMID: 37046069 PMCID: PMC10092930 DOI: 10.1038/s41598-023-33222-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 04/10/2023] [Indexed: 04/14/2023] Open
Abstract
The COVID-19 pandemic is a global health concern that has spread around the globe. Machine Learning is promising in the fight against the COVID-19 pandemic. Machine learning and artificial intelligence have been employed by various healthcare providers, scientists, and clinicians in medical industries in the fight against COVID-19 disease. In this paper, we discuss the impact of the Covid-19 pandemic on alcohol consumption habit changes among healthcare workers in the United States during the first wave of the Covid-19 pandemic. We utilize multiple supervised and unsupervised machine learning methods and models such as decision trees, logistic regression, support vector machines, multilayer perceptron, XGBoost, CatBoost, LightGBM, AdaBoost, Chi-Squared Test, mutual information, KModes clustering and the synthetic minority oversampling technique on a mental health survey data obtained from the University of Michigan Inter-University Consortium for Political and Social Research to investigate the links between COVID-19-related deleterious effects and changes in alcohol consumption habits among healthcare workers. Through the interpretation of the supervised and unsupervised methods, we have concluded that healthcare workers whose children stayed home during the first wave in the US consumed more alcohol. We also found that the work schedule changes due to the Covid-19 pandemic led to a change in alcohol use habits. Changes in food consumption, age, gender, geographical characteristics, changes in sleep habits, the amount of news consumption, and screen time are also important predictors of an increase in alcohol use among healthcare workers in the United States.
Collapse
Affiliation(s)
- Mostafa Rezapour
- Center for Biomedical Informatics, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
| | | | - Metin Nafi Gurcan
- Center for Biomedical Informatics, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| |
Collapse
|
47
|
Uddin MG, Nash S, Rahman A, Olbert AI. A sophisticated model for rating water quality. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 868:161614. [PMID: 36669667 DOI: 10.1016/j.scitotenv.2023.161614] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/04/2023] [Accepted: 01/10/2023] [Indexed: 06/17/2023]
Abstract
Here, we present the Irish Water Quality Index (IEWQI) model for assessing transitional and coastal water quality in an effort to improve the method and develop a tool that can be used by environmental regulators to abate water pollution in Ireland. The developed model has been associated with the adoption of water quality standards formulated for coastal and transitional waterbodies according to the water framework directive legislation by the environmental regulator of Irish water. The model consists of five identical components, including (i) indicator selection technique is to select the crucial water quality indicator; (ii) sub-index (SI) function for rescaling various water quality indicators' information into a uniform scale; (iii) indicators' weight method for estimating the weight values based on the relative significance of real-time information on water quality; (iii) aggregation function for computing the water quality index (WQI) score; and (v) score interpretation scheme for assessing the state of water quality. The IEWQI model was developed based on Cork Harbour, Ireland. The developed IEWQI model was applied to four coastal waterbodies in Ireland, for assessing water quality using 2021 water quality data for the summer and winter seasons in order to evaluate model sensitivity in terms of spatio-temporal resolution of various waterbodies. The model efficiency and uncertainty were also analysed in this research. In terms of different spatio-temporal magnitudes of various domains, the model shows higher sensitivity in four application domains during the summer and winter. In addition, the results of uncertainty reveal that the IEWQI model architecture may be effective for reducing model uncertainty in order to avoid model eclipsing and ambiguity problems. The findings of this study reveal that the IEWQI model could be an efficient and reliable technique for the assessment of transitional and coastal water quality more accurately in any geospatial domain.
Collapse
Affiliation(s)
- Md Galal Uddin
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland.
| | - Stephen Nash
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland
| | - Azizur Rahman
- School of Computing, Mathematics and Engineering, Charles Sturt University, Wagga Wagga, Australia; The Gulbali Institute of Agriculture, Water and Environment, Charles Sturt University, Wagga Wagga, Australia
| | - Agnieszka I Olbert
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland
| |
Collapse
|
48
|
Balakarthikeyan V, Jais R, Vijayarangan S, Sreelatha Premkumar P, Sivaprakasam M. Heart Rate Variability Based Estimation of Maximal Oxygen Uptake in Athletes Using Supervised Regression Models. SENSORS (BASEL, SWITZERLAND) 2023; 23:3251. [PMID: 36991963 PMCID: PMC10054075 DOI: 10.3390/s23063251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/04/2023] [Accepted: 03/09/2023] [Indexed: 06/19/2023]
Abstract
Wearable Heart Rate monitors are used in sports to provide physiological insights into athletes' well-being and performance. Their unobtrusive nature and ability to provide reliable heart rate measurements facilitate the estimation of cardiorespiratory fitness of athletes, as quantified by maximum consumption of oxygen uptake. Previous studies have employed data-driven models which use heart rate information to estimate the cardiorespiratory fitness of athletes. This signifies the physiological relevance of heart rate and heart rate variability for the estimation of maximal oxygen uptake. In this work, the heart rate variability features that were extracted from both exercise and recovery segments were fed to three different Machine Learning models to estimate maximal oxygen uptake of 856 athletes performing Graded Exercise Testing. A total of 101 features from exercise and 30 features from recovery segments were given as input to three feature selection methods to avoid overfitting of the models and to obtain relevant features. This resulted in the increase of model's accuracy by 5.7% for exercise and 4.3% for recovery. Further, post-modelling analysis was performed to remove the deviant points in two cases, initially in both training and testing and then only in training set, using k-Nearest Neighbour. In the former case, the removal of deviant points led to a reduction of 19.3% and 18.0% in overall estimation error for exercise and recovery, respectively. In the latter case, which mimicked the real-world scenario, the average R value of the models was observed to be 0.72 and 0.70 for exercise and recovery, respectively. From the above experimental approach, the utility of heart rate variability to estimate maximal oxygen uptake of large population of athletes was validated. Additionally, the proposed work contributes to the utility of cardiorespiratory fitness assessment of athletes through wearable heart rate monitors.
Collapse
Affiliation(s)
- Vaishali Balakarthikeyan
- Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600036, India; (R.J.); (S.V.); (M.S.)
- Healthcare Technology Innovation Centre (HTIC), Chennai 600113, India;
| | - Rohan Jais
- Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600036, India; (R.J.); (S.V.); (M.S.)
| | - Sricharan Vijayarangan
- Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600036, India; (R.J.); (S.V.); (M.S.)
- Healthcare Technology Innovation Centre (HTIC), Chennai 600113, India;
| | | | - Mohanasankar Sivaprakasam
- Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600036, India; (R.J.); (S.V.); (M.S.)
- Healthcare Technology Innovation Centre (HTIC), Chennai 600113, India;
| |
Collapse
|
49
|
Mutual information based weighted variance approach for uncertainty quantification of climate projections. MethodsX 2023; 10:102063. [PMID: 36851983 PMCID: PMC9958507 DOI: 10.1016/j.mex.2023.102063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 02/04/2023] [Indexed: 02/07/2023] Open
Abstract
Future climate projections are a vital source of information that aid in deriving effective mitigation and adaptation measures. Due to the inherent uncertainty in these climate projections, quantification of uncertainty is essential for increasing its credibility in policymaking. While quantifying the uncertainty, often the possible dependency between the General Circulation Models (GCMs) due to their shared common model code, literature, ideas of representation processes, parameterization schemes, evaluation datasets etc., are ignored. As this will lead to wrong conclusions, the inter-model dependency and the respective independence weights need to be considered, for a realistic quantification of uncertainty. Here, we present the detailed step-wise methodology of a "mutual information based independence weight" framework, that accounts for the linear and nonlinear dependence between GCMs and the equitability property.•A brief illustration of the utility of this method is provided by applying it to the multi-model ensemble of 20 GCMs.•The weighted variance approach seemingly reduces the uncertainty about one GCM given the knowledge of another.
Collapse
|
50
|
Abu-Aqil G, Suleiman M, Sharaha U, Lapidot I, Huleihel M, Salman A. Instant detection of extended-spectrum β-lactamase-producing bacteria from the urine of patients using infrared spectroscopy combined with machine learning. Analyst 2023; 148:1130-1140. [PMID: 36727471 DOI: 10.1039/d2an01897g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Antibiotics are considered the most effective treatment against bacterial infections. However, most bacteria have already developed resistance to a broad spectrum of commonly used antibiotics, mainly due to their uncontrolled use. Extended-spectrum beta-lactamase (ESBL)-producing bacteria are an essential class of multidrug-resistant (MDR) bacteria. It is of extreme urgency to develop a method that can detect ESBL-producing bacteria rapidly for the effective treatment of patients with bacterial infectious diseases. Fourier transform infrared (FTIR) microscopy is a sensitive method that can rapidly detect cellular molecular changes. In this study, we examined the potential of FTIR spectroscopy-based machine learning algorithms for the rapid detection of ESBL-producing bacteria obtained directly from a patient's urine. Using 591 ESBL-producing and 1658 non-ESBL-producing samples of Escherichia coli (E. coli) and Klebsiella pneumoniae, our results show that the FTIR spectroscopy-based machine learning approach can identify ESBL-producing bacteria within 40 minutes from receiving a patient's urine sample, with a success rate of 80%.
Collapse
Affiliation(s)
- George Abu-Aqil
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel.
| | - Manal Suleiman
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel.
| | - Uraib Sharaha
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel. .,Department of Biology, Science and Technology College, Hebron University, Hebron P760, Palestine
| | - Itshak Lapidot
- Department of Electrical and Electronics Engineering, ACLP-Afeka Center for Language Processing, Afeka Tel-Aviv Academic College of Engineering, Tel-Aviv 69107, Israel
| | - Mahmoud Huleihel
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel.
| | - Ahmad Salman
- Department of Physics, SCE - Shamoon College of Engineering, Beer-Sheva 84100, Israel.
| |
Collapse
|