51
|
Ding H, Li N, Li L, Xu Z, Xia W. Machine learning-enabled mental health risk prediction for youths with stressful life events: A modelling study. J Affect Disord 2025; 368:537-546. [PMID: 39306010 DOI: 10.1016/j.jad.2024.09.111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/10/2024] [Accepted: 09/15/2024] [Indexed: 09/25/2024]
Abstract
BACKGROUND Youths face significant mental health challenges exacerbated by stressful life events, particularly in the context of the COVID-19 pandemic. Immature coping strategies can worsen mental health outcomes. METHODS This study utilised a two-wave cross-sectional survey design with data collected from Chinese youth aged 14-25 years. Wave 1 (N = 3038) and Wave 2 (N = 539) datasets were used for model development and external validation, respectively. Twenty-five features, encompassing dimensions related to demographic information, stressful life events, social support, coping strategies, and emotional intelligence, were input into the model to predict the mental health status of youth, which was considered their coping outcome. Shapley additive explanation (SHAP) was used to determine the importance of each risk factor in the feature selection. The intersection of top 10 features identified by random forest and XGBoost were considered the most influential predictors of mental health during the feature selection process, and was then taken as the final set of features for model development. Machine learning models, including logistic regression, AdaBoost, and a backpropagation neural network (BPNN), were trained to predict the outcomes. The optimum model was selected according to the performance in both internal and external validation. RESULTS This study identified six key features that were significantly associated with mental health outcomes: punishment, adaptation issues, self-regulation of emotions, learning pressure, use of social support, and recognition of others' emotions. The BPNN model, optimized through feature selection methods like SHAP, demonstrated superior performance in internal validation (C-index [95 % CI] = 0.9120 [0.9111, 0.9129], F-score [95 % CI] = 0.8861 [0.8853, 0.8869]). Additionally, external validation showed the model had strong discrimination (C-index = 0.9749, F-score = 0.8442) and calibration (Brier score = 0.029) capabilities. LIMITATIONS Although the clinical prediction model performed well, the study it still limited by self-reported data and representativeness of samples. Causal relationships need to be established to interpret the coping mechanism from multiple perspectives. Also, the limited data on minority groups may lead to algorithmic unfairness. CONCLUSIONS Machine learning models effectively identified and predicted mental health outcomes among youths, with the SHAP+BPNN model showing promising clinical applicability. These findings emphasise the importance and effectiveness of targeted interventions with the help of clinical prediction model.
Collapse
Affiliation(s)
- Hexiao Ding
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China; Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hunghom, Hong Kong SAR, China.
| | - Na Li
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Lishan Li
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Ziruo Xu
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| | - Wei Xia
- School of Nursing, Sun Yat-Sen University, No. 74, 2nd Yat-Sen Rd, Yuexiu District, Guangzhou City, Guangdong Province, China.
| |
Collapse
|
52
|
Shahin-Shamsabadi A, Cappuccitti J. Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis. Heliyon 2024; 10:e40772. [PMID: 39720035 PMCID: PMC11667615 DOI: 10.1016/j.heliyon.2024.e40772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/22/2024] [Accepted: 11/27/2024] [Indexed: 12/26/2024] Open
Abstract
Omics techniques, such as proteomics, contain crucial data for understanding biological processes, but they remain underutilized due to their high dimensionality. Typically, proteomics research focuses narrowly on using a limited number of datasets, hindering cross-study comparisons, a problem that can potentially be addressed by machine learning. Despite this potential, machine learning has seen limited adoption in the field of proteomics. Here, skeletal muscle proteomics datasets from five separate studies were combined. These studies included conditions such as in vitro models (both 2D and 3D), in vivo skeletal muscle tissue, and adjacent tissues such as tendons. The collected data was preprocessed using MaxQuant, and then enriched using a Python script fetching structural and compositional details from UniProt and Ensembl databases. This was used to handle high-dimensional and sparsely labeled dataset by breaking it down into five smaller categories using cellular composition information and then training a Random Forest model for each category separately. Using biological context for interpreting the data resulted in improved model performance and made tailored analysis possible by reducing the dimensionality and increasing signal-to-noise ratio as well as only preserving biologically relevant features in each category. This integration of domain knowledge into data analysis and model training facilitated the discovery of new patterns while ensuring the retention of critical details, often overlooked when blind feature selection methods are used to exclude proteins with minimal expressions or variances. This approach was shown to be suitable for performing diverse analyses on individual as well as combined datasets within a broader biological context, ultimately leading to the identification of biologically relevant patterns. Besides from generating new biological insights, this approach can be used to perform tasks such as biomarker discovery, cluster analysis, classification, and anomaly detection more accurately, but incorporation of more datasets is needed to further expand the computational capabilities of such models in clinical settings.
Collapse
|
53
|
Özkahraman A, Ölmez T, Dokur Z. Performance Improvement with Reduced Number of Channels in Motor Imagery BCI System. SENSORS (BASEL, SWITZERLAND) 2024; 25:120. [PMID: 39796911 PMCID: PMC11723053 DOI: 10.3390/s25010120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 12/19/2024] [Accepted: 12/23/2024] [Indexed: 01/13/2025]
Abstract
Classifying Motor Imaging (MI) Electroencephalogram (EEG) signals is of vital importance for Brain-Computer Interface (BCI) systems, but challenges remain. A key challenge is to reduce the number of channels to improve flexibility, portability, and computational efficiency, especially in multi-class scenarios where more channels are needed for accurate classification. This study demonstrates that combining Electrooculogram (EOG) channels with a reduced set of EEG channels is more effective than relying on a large number of EEG channels alone. EOG channels provide useful information for MI signal classification, countering the notion that they only introduce eye-related noise. The study uses advanced deep learning techniques, including multiple 1D convolution blocks and depthwise-separable convolutions, to optimize classification accuracy. The findings in this study are tested on two datasets: dataset 1, the BCI Competition IV Dataset IIa (4-class MI), and dataset 2, the Weibo dataset (7-class MI). The performance for dataset 1, utilizing 3 EEG and 3 EOG channels (6 channels total), is of 83% accuracy, while dataset 2, with 3 EEG and 2 EOG channels (5 channels total), achieves an accuracy of 61%, demonstrating the effectiveness of the proposed channel reduction method and deep learning model.
Collapse
Affiliation(s)
- Ali Özkahraman
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
- Department of Electrical and Electronics Engineering, Iskenderun Technical University, 31200 Iskenderun, Hatay, Turkey
| | - Tamer Ölmez
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
| | - Zümray Dokur
- Department of Electronics and Communication Engineering, Istanbul Technical University, 34467 Istanbul, Istanbul, Turkey
| |
Collapse
|
54
|
Lisik D, Basna R, Dinh T, Hennig C, Shah SA, Wennergren G, Goksör E, Nwaru BI. Artificial intelligence in pediatric allergy research. Eur J Pediatr 2024; 184:98. [PMID: 39706990 PMCID: PMC11662037 DOI: 10.1007/s00431-024-05925-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/06/2024] [Accepted: 12/11/2024] [Indexed: 12/23/2024]
Abstract
Atopic dermatitis, food allergy, allergic rhinitis, and asthma are among the most common diseases in childhood. They are heterogeneous diseases, can co-exist in their development, and manifest complex associations with other disorders and environmental and hereditary factors. Elucidating these intricacies by identifying clinically distinguishable groups and actionable risk factors will allow for better understanding of the diseases, which will enhance clinical management and benefit society and affected individuals and families. Artificial intelligence (AI) is a promising tool in this context, enabling discovery of meaningful patterns in complex data. Numerous studies within pediatric allergy have and continue to use AI, primarily to characterize disease endotypes/phenotypes and to develop models to predict future disease outcomes. However, most implementations have used relatively simplistic data from one source, such as questionnaires. In addition, methodological approaches and reporting are lacking. This review provides a practical hands-on guide for conducting AI-based studies in pediatric allergy, including (1) an introduction to essential AI concepts and techniques, (2) a blueprint for structuring analysis pipelines (from selection of variables to interpretation of results), and (3) an overview of common pitfalls and remedies. Furthermore, the state-of-the art in the implementation of AI in pediatric allergy research, as well as implications and future perspectives are discussed. CONCLUSION AI-based solutions will undoubtedly transform pediatric allergy research, as showcased by promising findings and innovative technical solutions, but to fully harness the potential, methodologically robust implementation of more advanced techniques on richer data will be needed. WHAT IS KNOWN • Pediatric allergies are heterogeneous and common, inflicting substantial morbidity and societal costs. • The field of artificial intelligence is undergoing rapid development, with increasing implementation in various fields of medicine and research. WHAT IS NEW • Promising applications of AI in pediatric allergy have been reported, but implementation largely lags behind other fields, particularly in regard to use of advanced algorithms and non-tabular data. Furthermore, lacking reporting on computational approaches hampers evidence synthesis and critical appraisal. • Multi-center collaborations with multi-omics and rich unstructured data as well as utilization of deep learning algorithms are lacking and will likely provide the most impactful discoveries.
Collapse
Affiliation(s)
- Daniil Lisik
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden.
| | - Rani Basna
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden
- Division of Geriatric Medicine, Department of Clinical Sciences in Malmö, Lund University, 214 28, Malmö, Sweden
| | - Tai Dinh
- CMC University, No. 11, Duy Tan Street, Dich Vong Hau Ward, Cau Giay District, Hanoi, Vietnam
- The Kyoto College of Graduate Studies for Informatics, 7 Tanaka Monzencho, Sakyo Ward, Kyoto, Japan
| | - Christian Hennig
- Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Bologna, Italy
| | | | - Göran Wennergren
- Department of Paediatrics, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Emma Goksör
- Department of Paediatrics, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Bright I Nwaru
- Krefting Research Centre, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Box 424, 405 30, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
55
|
Xie Z, Zhang Q, Zhang R, Zhao Y, Zhang W, Song Y, Yu D, Lin J, Li X, Suo S, Zhou Y. Identification of D842V mutation in gastrointestinal stromal tumors based on CT radiomics: a multi-center study. Cancer Imaging 2024; 24:169. [PMID: 39707515 DOI: 10.1186/s40644-024-00815-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 12/10/2024] [Indexed: 12/23/2024] Open
Abstract
BACKGROUND Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumors of the gastrointestinal tract. Recent advent of tyrosine kinase inhibitors (TKIs) has significantly improved the prognosis of GIST patients. However, responses to TKI therapy can vary depending on the specific gene mutation. D842V, which is the most common mutation in platelet-derived growth factor receptor alpha exon 18, shows no response to imatinib and sunitinib. Radiomics features based on venous-phase contrast-enhanced computed tomography (CECT) have shown potential in non-invasive prediction of GIST genotypes. This study sought to determine whether radiomics features could help distinguish GISTs with D842V mutations. METHODS A total of 872 pathologically confirmed GIST patients with CECT data available from three independent centers were included and divided into the training cohort ( n = 487 ) and the external validation cohort ( n = 385 ). Clinical features including age, sex, tumor size and location were collected. Radiomics features on the largest axial image of venous-phase CECT were analyzed and a total of two radiomics features were selected after feature selection. Random forest models trained on non-radiomics features only (the non-radiomics model) and on both non-radiomics and radiomics features (the combined model) were compared. RESULTS The combined model showed better average precision (0.250 vs. 0.102, p = 0.039) and F1 score (0.253 vs. 0.155, p = 0.012) than the non-radiomics model. There was no significant difference in ROC-AUC (0.728 vs. 0.737, p = 0.836) and geometric mean (0.737 vs. 0.681, p = 0.352). CONCLUSIONS This study demonstrated the potential of radiomics features based on venous-phase CECT images to identify D842V mutation in GISTs. Our model may provide an alternative approach to guide TKI therapy for patients inaccessible to sequence variant testing, potentially improving treatment outcomes for GIST patients especially in resource-limited settings.
Collapse
Affiliation(s)
- Zhenhui Xie
- Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Pujian Road 160, Pudong District, 200127, Shanghai, China
| | - Qingwei Zhang
- Division of Gastroenterology and Hepatology, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Institute of Digestive Disease, Pujian Road 160, Pudong District, 200127, Shanghai, China
| | - Ranying Zhang
- Department of Radiology, Zhongshan Hospital, Fudan University, and Shanghai Institute of Medical Imaging, 108 Fenglin Road, 200032, Shanghai, China
| | - Yuxuan Zhao
- Department of Radiology, Qilu Hospital of Shandong University, 107 Wenhuaxi Road, 250012, Jinan, Shandong, China
| | - Wang Zhang
- Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Pujian Road 160, Pudong District, 200127, Shanghai, China
| | - Yang Song
- MR Research Collaboration Team, Siemens Healthineers Ltd., 399 Haiyang West Road, 200126, Shanghai, China
| | - Dexin Yu
- Department of Radiology, Qilu Hospital of Shandong University, 107 Wenhuaxi Road, 250012, Jinan, Shandong, China.
| | - Jiang Lin
- Department of Radiology, Zhongshan Hospital, Fudan University, and Shanghai Institute of Medical Imaging, 108 Fenglin Road, 200032, Shanghai, China.
| | - Xiaobo Li
- Division of Gastroenterology and Hepatology, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Institute of Digestive Disease, Pujian Road 160, Pudong District, 200127, Shanghai, China.
| | - Shiteng Suo
- Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Pujian Road 160, Pudong District, 200127, Shanghai, China.
| | - Yan Zhou
- Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Pujian Road 160, Pudong District, 200127, Shanghai, China.
| |
Collapse
|
56
|
Yan Y, Wang J, Wang Y, Wu W, Chen W. Research on Lipidomic Profiling and Biomarker Identification for Osteonecrosis of the Femoral Head. Biomedicines 2024; 12:2827. [PMID: 39767733 PMCID: PMC11673004 DOI: 10.3390/biomedicines12122827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/03/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
Objectives: Abnormal lipid metabolism is increasingly recognized as a contributing factor to the development of osteonecrosis of the femoral head (ONFH). This study aimed to explore the lipidomic profiles of ONFH patients, focusing on distinguishing between traumatic ONFH (TONFH) and non-traumatic ONFH (NONFH) subtypes and identifying potential biomarkers for diagnosis and understanding pathogenesis. Methods: Plasma samples were collected from 92 ONFH patients (divided into TONFH and NONFH subtypes) and 33 healthy normal control (NC) participants. Lipidomic profiling was performed using ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS). Data analysis incorporated a machine learning-based feature selection method, least absolute shrinkage and selection operator (LASSO) regression, to identify significant lipid biomarkers. Results: Distinct lipidomic signatures were observed in both TONFH and NONFH groups compared to the NC group. LASSO regression identified 11 common lipid biomarkers that signify shared metabolic disruptions in both ONFH subtypes, several of which exhibited strong diagnostic performance with areas under the curve (AUCs) > 0.7. Additionally, subtype-specific lipid markers unique to TONFH and NONFH were identified, providing insights into the differential pathophysiological mechanisms underlying these subtypes. Conclusions: This study highlights the importance of lipidomic profiling in understanding ONFH-associated metabolic disorders and demonstrates the utility of machine learning approaches, such as LASSO regression, in high-dimensional data analysis. These findings not only improve disease characterization but also facilitate the discovery of diagnostic and mechanistic biomarkers, paving the way for more personalized therapeutic strategies in ONFH.
Collapse
Affiliation(s)
- Yuzhu Yan
- Department of Laboratory Medicine, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
- Clinical Laboratory of Honghui Hospital, Xi’an Jiaotong University, Xi’an 710054, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi’an 710072, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
| | - Wenjing Wu
- Department of Laboratory Medicine, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Wei Chen
- Department of Laboratory Medicine, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| |
Collapse
|
57
|
Han S, Li R, Wang H, Wang L, Gao Y, Wen Y, Gong T, Ruan S, Li H, Gao P. Early Diagnosis of Bloodstream Infections Using Serum Metabolomic Analysis. Metabolites 2024; 14:685. [PMID: 39728466 PMCID: PMC11676852 DOI: 10.3390/metabo14120685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 11/30/2024] [Accepted: 12/03/2024] [Indexed: 12/28/2024] Open
Abstract
BACKGROUND Bloodstream infections (BSIs) pose a great challenge to treating patients, especially those with underlying diseases, such as immunodeficiency diseases. Early diagnosis helps to direct precise empirical antibiotic administration and proper clinical management. This study carried out a serum metabolomic analysis using blood specimens sampled from patients with a suspected infection whose routine culture results were later demonstrated to be positive. METHODS A liquid chromatograph-mass spectrometry-based metabolomic analysis was carried out to profile the BSI serum samples. The serum metabolomics data could be used to successfully differentiate BSIs from non-BSIs. RESULTS The major classes of the isolated pathogens (e.g., Gram-positive and Gram-negative bacteria) could be differentiated using our optimized statistical algorithms. In addition, by using different machine-learning algorithms, the isolated pathogens could also be classified at the species levels (e.g., Escherichia coli and Klebsiella pneumoniae) or according to their specific antibiotic-resistant phenotypes (e.g., extended-spectrum β-lactamase-producing and non-producing phenotypes) if needed. CONCLUSIONS This study provides an early diagnosis method that could be an alternative to the traditional time-consuming culture process to identify BSIs. Moreover, this metabolomics strategy was less affected by several risk factors (e.g., antibiotics administration) that could produce false culture results.
Collapse
Affiliation(s)
- Shuang Han
- Department of Clinical Laboratory, The Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China; (S.H.); (R.L.)
| | - Ruihua Li
- Department of Clinical Laboratory, The Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China; (S.H.); (R.L.)
| | - Hao Wang
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Lin Wang
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Yiming Gao
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Yaolin Wen
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Tianyang Gong
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Shiyu Ruan
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Hui Li
- School of statistics, Dongbei University of Finance and Economics, Dalian 116025, China; (H.W.); (L.W.); (Y.G.); (Y.W.); (T.G.); (S.R.)
| | - Peng Gao
- Department of Clinical Laboratory, The Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China; (S.H.); (R.L.)
| |
Collapse
|
58
|
Cong R, Deng O, Nishimura S, Ogihara A, Jin Q. Multiple feature selection based on an optimization strategy for causal analysis of health data. Health Inf Sci Syst 2024; 12:52. [PMID: 39534650 PMCID: PMC11554952 DOI: 10.1007/s13755-024-00312-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Accepted: 10/14/2024] [Indexed: 11/16/2024] Open
Abstract
Purpose Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data. Methods We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes. Results The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets. Conclusion By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.
Collapse
Affiliation(s)
- Ruichen Cong
- Graduate School of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192 Saitama Japan
| | - Ou Deng
- Graduate School of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192 Saitama Japan
| | - Shoji Nishimura
- Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192 Saitama Japan
| | - Atsushi Ogihara
- Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192 Saitama Japan
| | - Qun Jin
- Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, 359-1192 Saitama Japan
| |
Collapse
|
59
|
De Velasco MA, Sakai K, Mitani S, Kura Y, Minamoto S, Haeno T, Hayashi H, Nishio K. A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin. Int J Clin Oncol 2024; 29:1795-1810. [PMID: 39292320 PMCID: PMC11588780 DOI: 10.1007/s10147-024-02617-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/28/2024] [Indexed: 09/19/2024]
Abstract
BACKGROUND Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites. METHODS Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites. RESULTS This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung. CONCLUSIONS Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.
Collapse
Affiliation(s)
- Marco A De Velasco
- Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan
| | - Kazuko Sakai
- Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan
| | - Seiichiro Mitani
- Department of Medical Oncology, Faculty of Medicine, Kindai University, Osaka-Sayama, Japan
| | - Yurie Kura
- Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan
| | - Shuji Minamoto
- Department of Molecular Tumor Pathobiology, Kindai University Graduate School of Medical Sciences, Osaka-Sayama, Japan
| | - Takahiro Haeno
- Department of Molecular Tumor Pathobiology, Kindai University Graduate School of Medical Sciences, Osaka-Sayama, Japan
| | - Hidetoshi Hayashi
- Department of Medical Oncology, Faculty of Medicine, Kindai University, Osaka-Sayama, Japan
| | - Kazuto Nishio
- Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan.
- Department of Molecular Tumor Pathobiology, Kindai University Graduate School of Medical Sciences, Osaka-Sayama, Japan.
| |
Collapse
|
60
|
Halder RK, Uddin MN, Uddin MA, Aryal S, Saha S, Hossen R, Ahmed S, Rony MAT, Akter MF. ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application. J Pathol Inform 2024; 15:100371. [PMID: 38510072 PMCID: PMC10950726 DOI: 10.1016/j.jpi.2024.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/07/2024] [Accepted: 02/17/2024] [Indexed: 03/22/2024] Open
Abstract
Chronic kidney diseases (CKDs) are a significant public health issue with potential for severe complications such as hypertension, anemia, and renal failure. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. In this paper, we developed a machine learning-based kidney diseases prediction (ML-CKDP) model with dual objectives: to enhance dataset preprocessing for CKD classification and to develop a web-based application for CKD prediction. The proposed model involves a comprehensive data preprocessing protocol, converting categorical variables to numerical values, imputing missing data, and normalizing via Min-Max scaling. Feature selection is executed using a variety of techniques including Correlation, Chi-Square, Variance Threshold, Recursive Feature Elimination, Sequential Forward Selection, Lasso Regression, and Ridge Regression to refine the datasets. The model employs seven classifiers: Random Forest (RF), AdaBoost (AdaB), Gradient Boosting (GB), XgBoost (XgB), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT), to predict CKDs. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and calculating the Area Under the Curve (AUC) specifically for the classification of positive cases. Random Forest (RF) and AdaBoost (AdaB) achieve a 100% accuracy rate, evident across various validation methods including data splits of 70:30, 80:20, and K-Fold set to 10 and 15. RF and AdaB consistently reach perfect AUC scores of 100% across multiple datasets, under different splitting ratios. Moreover, Naive Bayes (NB) stands out for its efficiency, recording the lowest training and testing times across all datasets and split ratios. Additionally, we present a real-time web-based application to operationalize the model, enhancing accessibility for healthcare practitioners and stakeholders. Web app link: https://rajib-research-kedney-diseases-prediction.onrender.com/.
Collapse
Affiliation(s)
- Rajib Kumar Halder
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Mohammed Nasir Uddin
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Geelong 3125, Australia
| | - Sunil Aryal
- School of Information Technology, Deakin University, Geelong 3125, Australia
| | - Sajeeb Saha
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Rakib Hossen
- Dept. of Cyber Security, Bangabandhu Sheikh Mujibur Rahman Digital University, Kaliakoir, Gazipur 1750, Bangladesh
| | - Sabbir Ahmed
- Dept. of Educational Technology, Bangabandhu Sheikh Mujibur Rahman Digital University, Kaliakoir, Gazipur 1750, Bangladesh
| | | | - Mosammat Farida Akter
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| |
Collapse
|
61
|
Ming A, Schubert T, Marr V, Hötzsch J, Stober S, Mertens PR. Video game-based application for fall risk assessment: a proof-of-concept cohort study. EClinicalMedicine 2024; 78:102947. [PMID: 39677357 PMCID: PMC11638629 DOI: 10.1016/j.eclinm.2024.102947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 12/17/2024] Open
Abstract
Background Fall(s) are a significant cause of morbidity and mortality especially amongst elderly with polyneuropathy and cognitive decline. Conventional fall risk assessment tools are prone to low predictive values and do not address specific vulnerabilities. This study seeks to advance the development of an innovative, engaging fall prediction tool for a high-risk cohort diagnosed with diabetes. Methods In this proof-of-concept cohort study, between July 01, 2020, and May 31, 2022, 152 participants with diabetes performed clinical examinations to estimate individual risks of fall (timed "up and go" (TUG) test, dynamic gait index (DGI), Berg-Balance-Scale (BBS)) and participated in a video game-based fall risk assessment with sensor-equipped insoles as steering units. The participants engaged in four distinct video games, each designed to address capabilities pertinent to prevent fall(s): skillfulness, reaction time, sensation, endurance, balance, and muscle strength. Data were collected during both, seated and standing gaming sessions. By data analyses using binary machine learning models a classification of participants was achieved and compared with actual fall events reported for the past 24 months. Findings Overall 22 out of 152 participants (14.5%) underwent at least one episode of fall during the past 24 months. Adjusted risk classification accuracies of TUG, DGI, and BBS reached 58.7%, 58.3%, and 47.5%, respectively. Data analyses from gaming sessions in seated and standing positions yielded two models with six predictors from the four games with accuracies of 82.8% and 88.6% (area under the receiver-operating-characteristic curve 0.84 (95% confidence interval (CI): 0.77-0.91) and 0.91 (95% CI: 0.85-0.97), respectively). Key capabilities that were distinctly different between the groups related to endurance (0.6 ± 0.1 vs. 0.5 ± 0.2; p = 0.03) and balance (0.7 ± 0.2 vs. 0.6 ± 0.2; p = 0.05). The AI-driven analysis allowed to extract a list of game features that showed highly significant predictive values, e.g., reaction times in specific task, deviation from ideal steering routes in parcours and pressure-related parameters. Interpretation Thus, video game-based assessment of fall risk surpasses traditional clinical assessment tools and scores (e.g., TUG, DGI, and BBS) and may open a novel resource for patient evaluation in the future. Further research with larger, heterogeneous cohorts is needed to validate these findings and especially predict future fall risk probabilities in clinical as well as outpatient settings. Funding This project was funded by the Ministry of Science, Economics, and Digitalization of the State of Saxony-Anhalt and the European Fund for Regional Development under the Autonomy in Old Age Program (Funding No: ZS/2016/05/78615, ZS/2018/12/95325) and Healthy Cognition and Nerve function (HeyCoNer, ZS/2023/12/183088).
Collapse
Affiliation(s)
- Antao Ming
- University Clinic for Nephrology and Hypertension, Diabetes and Endocrinology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Tanja Schubert
- University Clinic for Nephrology and Hypertension, Diabetes and Endocrinology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Vanessa Marr
- University Clinic for Nephrology and Hypertension, Diabetes and Endocrinology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Jaqueline Hötzsch
- University Clinic for Nephrology and Hypertension, Diabetes and Endocrinology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Sebastian Stober
- Artificial Intelligence Lab, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Peter R. Mertens
- University Clinic for Nephrology and Hypertension, Diabetes and Endocrinology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| |
Collapse
|
62
|
Ngusie HS, Tesfa GA, Taddese AA, Enyew EB, Alene TD, Abebe GK, Walle AD, Zemariam AB. Predicting place of delivery choice among childbearing women in East Africa: a comparative analysis of advanced machine learning techniques. Front Public Health 2024; 12:1439320. [PMID: 39664535 PMCID: PMC11631870 DOI: 10.3389/fpubh.2024.1439320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 11/11/2024] [Indexed: 12/13/2024] Open
Abstract
Background Sub-Saharan Africa faces high neonatal and maternal mortality rates due to limited access to skilled healthcare during delivery. This study aims to improve the classification of health facilities and home deliveries using advanced machine learning techniques and to explore factors influencing women's choices of delivery locations in East Africa. Method The study focused on 86,009 childbearing women in East Africa. A comparative analysis of 12 advanced machine learning algorithms was conducted, utilizing various data balancing techniques and hyperparameter optimization methods to enhance model performance. Result The prevalence of health facility delivery in East Africa was found to be 83.71%. The findings showed that the support vector machine (SVM) algorithm and CatBoost performed best in predicting the place of delivery, in which both of those algorithms scored an accuracy of 95% and an AUC of 0.98 after optimized with Bayesian optimization tuning and insignificant difference between them in all comprehensive analysis of metrics performance. Factors associated with facility-based deliveries were identified using association rule mining, including parental education levels, timing of initial antenatal care (ANC) check-ups, wealth status, marital status, mobile phone ownership, religious affiliation, media accessibility, and birth order. Conclusion This study underscores the vital role of machine learning algorithms in predicting health facility deliveries. A slight decline in facility deliveries from previous reports highlights the urgent need for targeted interventions to meet Sustainable Development Goals (SDGs), particularly in maternal health. The study recommends promoting facility-based deliveries. These include raising awareness about skilled birth attendance, encouraging early ANC check-up, addressing financial barriers through targeted support programs, implementing culturally sensitive interventions, utilizing media campaigns, and mobile health initiatives. Design specific interventions tailored to the birth order of the child, recognizing that mothers may have different informational needs depending on whether it is their first or subsequent delivery. Furthermore, we recommended researchers to explore a variety of techniques and validate findings using more recent data.
Collapse
Affiliation(s)
- Habtamu Setegn Ngusie
- Department of Health Informatics, School of Public Health, College of Medicine and Health Sciences, Woldia University, Woldia, Ethiopia
| | - Getanew Aschalew Tesfa
- School of Public Health, College of Medicine and Health Science, Dilla University, Dilla, Ethiopia
| | - Asefa Adimasu Taddese
- Department of Sport, Physical Education and Health (SPEH), Academy of Wellness and Human Development, Faculty of Arts and Social Sciences, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China
| | - Ermias Bekele Enyew
- Department of Health Informatics, College of Medicine and Health Science, Wollo University, Dessie, Ethiopia
| | - Tilahun Dessie Alene
- Department of Pediatric and Child Health, School of Medicine, College of Medicine and Health Science, Wollo University, Dessie, Ethiopia
| | - Gebremeskel Kibret Abebe
- Department of Emergency and Critical Care Nursing, School of Nursing, College of Medicine and Health Sciences, Woldia University, Woldia, Ethiopia
| | - Agmasie Damtew Walle
- Department of Health Informatics, College of Medicine and Health Science, Debre Berhan University, Debre Berhan, Ethiopia
| | - Alemu Birara Zemariam
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| |
Collapse
|
63
|
Howard A, Hughes DM, Green PL, Velluva A, Gerada A, Maskell S, Buchan IE, Hope W. Personalised antimicrobial susceptibility testing with clinical prediction modelling informs appropriate antibiotic use. Nat Commun 2024; 15:9924. [PMID: 39572574 PMCID: PMC11582675 DOI: 10.1038/s41467-024-54192-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 11/04/2024] [Indexed: 11/24/2024] Open
Abstract
Antimicrobial susceptibility testing is a key weapon against antimicrobial resistance. Diagnostic microbiology laboratories use one-size-fits-all testing approaches that are often imprecise, inefficient, and inequitable. Here, we report a personalised approach that adapts laboratory testing for urinary tract infection to maximise the number of appropriate treatment options for each patient. We develop and assess susceptibility prediction models for 12 antibiotics on real-world healthcare data using an individual-level simulation study. When combined with decision thresholds that prioritise selection of World Health Organisation Access category antibiotics (those least likely to induce antimicrobial resistance), the personalised approach delivers more susceptible results (results that encourage prescription of that antibiotic) per specimen for Access category antibiotics than a standard testing approach, without compromising provision of susceptible results overall. Here, we show that personalised antimicrobial susceptibility testing could help tackle antimicrobial resistance by safely providing more Access category antibiotic treatment options to clinicians managing urinary tract infection.
Collapse
Affiliation(s)
- Alex Howard
- Department of Clinical Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, William Henry Duncan Building, 6 West Derby Street, University of Liverpool, Liverpool, L7 8TX, UK.
- Liverpool University Hospitals NHS Foundation Trust, Mount Vernon Street, Liverpool, L7 8YE, UK.
| | - David M Hughes
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building Block B, Brownlow Street, Liverpool, L69 3GF, UK
| | - Peter L Green
- Civic Health Innovation Labs, University of Liverpool, Liverpool Science Park, 131 Mount Pleasant, Liverpool, L3 5TF, UK
- Department of Mechanical and Aerospace Engineering, School of Engineering, University of Liverpool, The Quadrangle, Brownlow Hill, L69 3GH, Liverpool, UK
| | - Anoop Velluva
- Department of Clinical Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, William Henry Duncan Building, 6 West Derby Street, University of Liverpool, Liverpool, L7 8TX, UK
| | - Alessandro Gerada
- Department of Clinical Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, William Henry Duncan Building, 6 West Derby Street, University of Liverpool, Liverpool, L7 8TX, UK
- Liverpool University Hospitals NHS Foundation Trust, Mount Vernon Street, Liverpool, L7 8YE, UK
| | - Simon Maskell
- Department of Electrical Engineering and Electronics, School of Electrical Engineering, Electronics, and Computer Science, University of Liverpool, The Quadrangle, Brownlow Hill, L69 3GH, Liverpool, UK
| | - Iain E Buchan
- Civic Health Innovation Labs, University of Liverpool, Liverpool Science Park, 131 Mount Pleasant, Liverpool, L3 5TF, UK
- Department of Public Health, Policy & Systems, Institute of Population Health, University of Liverpool, Waterhouse Building Block B, Brownlow Street, Liverpool, L69 3GF, UK
| | - William Hope
- Department of Clinical Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, William Henry Duncan Building, 6 West Derby Street, University of Liverpool, Liverpool, L7 8TX, UK
- Liverpool University Hospitals NHS Foundation Trust, Mount Vernon Street, Liverpool, L7 8YE, UK
| |
Collapse
|
64
|
Rayan RA, Suruliandi A, Raja SP. Modified mutual information feature selection algorithm to predict COVID-19 using clinical data. Comput Methods Biomech Biomed Engin 2024:1-21. [PMID: 39568329 DOI: 10.1080/10255842.2024.2429012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/11/2024] [Accepted: 11/08/2024] [Indexed: 11/22/2024]
Abstract
The COVID-19 pandemic has profoundly impacted health, emphasizing the need for timely disease detection. Blood tests have become key diagnostic tools due to the virus's effects on blood composition. Accurate COVID-19 prediction through machine learning requires selecting relevant features, as irrelevant features can lower classification accuracy. This study proposes Modified Mutual Information (MMI) for feature selection, ranking features by relevance and using backtracking to find the optimal subset. Support Vector Machines (SVM) are then used for classification. Results show that MMI with SVM achieves 95% accuracy, outperforming other methods, and demonstrates strong generalizability on various benchmark datasets.
Collapse
Affiliation(s)
- R Ame Rayan
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India
| | - A Suruliandi
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India
| | - S P Raja
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
65
|
Díaz de la Guardia-Bolívar E, Martínez Manjón JE, Pérez-Filgueiras D, Zwir I, del Val C. Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression. Int J Mol Sci 2024; 25:12419. [PMID: 39596491 PMCID: PMC11594711 DOI: 10.3390/ijms252212419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/15/2024] [Accepted: 11/15/2024] [Indexed: 11/28/2024] Open
Abstract
In oncology, there is a critical need for robust biomarkers that can be easily translated into the clinic. We introduce a novel approach using paired differential gene expression analysis for biological feature selection in machine learning models, enhancing robustness and interpretability while accounting for patient variability. This method compares primary tumor tissue with the same patient's healthy tissue, improving gene selection by eliminating individual-specific artifacts. A focus on carcinoma was selected due to its prevalence and the availability of the data; we aim to identify biomarkers involved in general carcinoma progression, including less-researched types. Our findings identified 27 pivotal genes that can distinguish between healthy and carcinoma tissue, even in unseen carcinoma types. Additionally, the panel could precisely identify the tissue-of-origin in the eight carcinoma types used in the discovery phase. Notably, in a proof of concept, the model accurately identified the primary tissue origin in metastatic samples despite limited sample availability. Functional annotation reveals these genes' involvement in cancer hallmarks, detecting subtle variations across carcinoma types. We propose paired differential gene expression analysis as a reference method for the discovering of robust biomarkers.
Collapse
Affiliation(s)
- Elisa Díaz de la Guardia-Bolívar
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain
| | - Juan Emilio Martínez Manjón
- Instituto de Investigación Biosanitaria ibs.GRANADA, Complejo Hospitales Universitarios de Granada, Niversidad de Granada, 18012 Granada, Spain (D.P.-F.)
| | - David Pérez-Filgueiras
- Instituto de Investigación Biosanitaria ibs.GRANADA, Complejo Hospitales Universitarios de Granada, Niversidad de Granada, 18012 Granada, Spain (D.P.-F.)
| | - Igor Zwir
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain
- Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18016 Granada, Spain
| | - Coral del Val
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain
- Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18016 Granada, Spain
| |
Collapse
|
66
|
Ali SH, Shehata M. A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method. Bioengineering (Basel) 2024; 11:1148. [PMID: 39593808 PMCID: PMC11591806 DOI: 10.3390/bioengineering11111148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/08/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open
Abstract
Annually, many people worldwide lose their lives due to breast cancer, making it one of the most prevalent cancers in the world. Since the disease is becoming more common, early detection of breast cancer is essential to avoiding serious complications and possibly death as well. This research provides a novel Breast Cancer Discovery (BCD) strategy to aid patients by providing prompt and sensitive detection of breast cancer. The two primary steps that form the BCD are the Breast Cancer Discovery Step (BCDS) and the Pre-processing Step (P2S). In the P2S, the needed data are filtered from any non-informative data using three primary operations: data normalization, feature selection, and outlier rejection. Only then does the diagnostic model in the BCDS for precise diagnosis begin to be trained. The primary contribution of this research is the novel outlier rejection technique known as the Combined Outlier Rejection Technique (CORT). CORT is divided into two primary phases: (i) the Quick Rejection Phase (QRP), which is a quick phase utilizing a statistical method, and (ii) the Accurate Rejection Phase (ARP), which is a precise phase using an optimization method. Outliers are rapidly eliminated during the QRP using the standard deviation, and the remaining outliers are thoroughly eliminated during ARP via Binary Harris Hawk Optimization (BHHO). The P2S in the BCD strategy indicates that data normalization is a pre-processing approach used to find numeric values in the datasets that fall into a predetermined range. Information Gain (IG) is then used to choose the optimal subset of features, and CORT is used to reject incorrect training data. Furthermore, based on the filtered data from the P2S, an Ensemble Classification Method (ECM) is utilized in the BCDS to identify breast cancer patients. This method consists of three classifiers: Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The Wisconsin Breast Cancer Database (WBCD) dataset, which contains digital images of fine-needle aspiration samples collected from patients' breast masses, is used herein to compare the BCD strategy against several contemporary strategies. According to the outcomes of the experiment, the suggested method is very competitive. It achieves 0.987 accuracy, 0.013 error, 0.98 recall, 0.984 precision, and a run time of 3 s, outperforming all other methods from the literature.
Collapse
Affiliation(s)
- Shereen H. Ali
- Communications & Electronics Engineering Department, Delta Higher Institute for Engineering & Technology, Mansoura 35511, Egypt;
| | - Mohamed Shehata
- Department of Bioengineering, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
67
|
Hudon A, Beaudoin M, Phraxayavong K, Potvin S, Dumais A. Exploring the Intersection of Schizophrenia, Machine Learning, and Genomics: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e62752. [PMID: 39546776 PMCID: PMC11607571 DOI: 10.2196/62752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 10/06/2024] [Accepted: 10/16/2024] [Indexed: 11/17/2024]
Abstract
BACKGROUND An increasing body of literature highlights the integration of machine learning with genomic data in psychiatry, particularly for complex mental health disorders such as schizophrenia. These advanced techniques offer promising potential for uncovering various facets of these disorders. A comprehensive review of the current applications of machine learning in conjunction with genomic data within this context can significantly enhance our understanding of the current state of research and its future directions. OBJECTIVE This study aims to conduct a systematic scoping review of the use of machine learning algorithms with genomic data in the field of schizophrenia. METHODS To conduct a systematic scoping review, a search was performed in the electronic databases MEDLINE, Web of Science, PsycNet (PsycINFO), and Google Scholar from 2013 to 2024. Studies at the intersection of schizophrenia, genomic data, and machine learning were evaluated. RESULTS The literature search identified 2437 eligible articles after removing duplicates. Following abstract screening, 143 full-text articles were assessed, and 121 were subsequently excluded. Therefore, 21 studies were thoroughly assessed. Various machine learning algorithms were used in the identified studies, with support vector machines being the most common. The studies notably used genomic data to predict schizophrenia, identify schizophrenia features, discover drugs, classify schizophrenia amongst other mental health disorders, and predict the quality of life of patients. CONCLUSIONS Several high-quality studies were identified. Yet, the application of machine learning with genomic data in the context of schizophrenia remains limited. Future research is essential to further evaluate the portability of these models and to explore their potential clinical applications.
Collapse
Affiliation(s)
- Alexandre Hudon
- Department of psychiatry and addictology, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
- Centre de recherche de l'Institut universitaire en santé mentale de Montréal, Montréal, QC, Canada
- Institut universitaire en santé mentale de Montréal, Montréal, QC, Canada
| | - Mélissa Beaudoin
- Department of psychiatry and addictology, Université de Montréal, Montréal, QC, Canada
- Faculty of Medicine, McGill University, Montréal, QC, Canada
| | | | - Stéphane Potvin
- Centre de recherche de l'Institut universitaire en santé mentale de Montréal, Montréal, QC, Canada
- Department of psychiatry and addictology, Université de Montréal, Montréal, QC, Canada
| | - Alexandre Dumais
- Centre de recherche de l'Institut universitaire en santé mentale de Montréal, Montréal, QC, Canada
- Department of psychiatry and addictology, Université de Montréal, Montréal, QC, Canada
- Services et Recherches Psychiatriques AD, Montréal, QC, Canada
- Institut nationale de psychiatrie légale Philippe-Pinel, Montréal, QC, Canada
| |
Collapse
|
68
|
Restini FCF, Torfeh T, Aouadi S, Hammoud R, Al-Hammadi N, Starling MTM, Sousa CFPM, Mancini A, Brito LH, Yoshimoto FH, Lima-Júnior NF, Queiroz MM, Passos UL, Amancio CT, Takahashi JT, De Souza Delgado D, Hanna SA, Marta GN, Neves-Junior WFP. AI tool for predicting MGMT methylation in glioblastoma for clinical decision support in resource limited settings. Sci Rep 2024; 14:27995. [PMID: 39543155 PMCID: PMC11564566 DOI: 10.1038/s41598-024-78189-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 10/29/2024] [Indexed: 11/17/2024] Open
Abstract
Glioblastoma is an aggressive brain cancer with a poor prognosis. The O6-methylguanine-DNA methyltransferase (MGMT) gene methylation status is crucial for treatment stratification, yet economic constraints often limit access. This study aims to develop an artificial intelligence (AI) framework for predicting MGMT methylation. Diagnostic magnetic resonance (MR) images in public repositories were used for training. The algorithm created was validated in data from a single institution. All images were segmented according to widely used guidelines for radiotherapy planning and combined with clinical evaluations from neuroradiology experts. Radiomic features and clinical impressions were extracted, tabulated, and used for modeling. Feature selection methods were used to identify relevant phenotypes. A total of 100 patients were used for training and 46 for validation. A total of 343 features were extracted. Eight feature selection methods produced seven independent predictive frameworks. The top-performing ML model was a model post-Least Absolute Shrinkage and Selection Operator (LASSO) feature selection reaching accuracy (ACC) of 0.82, an area under the curve (AUC) of 0.81, a recall of 0.75, and a precision of 0.75. This study demonstrates that integrating clinical and radiotherapy-derived AI-driven phenotypes can predict MGMT methylation. The framework addresses constraints that limit molecular diagnosis access.
Collapse
Affiliation(s)
- Felipe Cicci Farinha Restini
- Department of Radiation Oncology, Hospital Sírio-Libanês, Rua Batataes, 523, Jardim Paulista, Distrito Federal, São Paulo, Brasília, 01423-010, Brazil.
| | - Tarraf Torfeh
- Department of Radiation Oncology, National Center for Cancer Care and Research, Doha, Qatar
| | - Souha Aouadi
- Department of Radiation Oncology, National Center for Cancer Care and Research, Doha, Qatar
| | - Rabih Hammoud
- Department of Radiation Oncology, National Center for Cancer Care and Research, Doha, Qatar
| | - Noora Al-Hammadi
- Department of Radiation Oncology, National Center for Cancer Care and Research, Doha, Qatar
| | | | | | - Anselmo Mancini
- Department of Radiation Oncology, Hospital Sírio-Libanês, São Paulo, Brazil
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Gorelik MG, Gorelik AJ, Fishbein SRS, Fehlmann T, Deepak P, Bogdan R, Dantas G, Jain U. Improving Differentiation of Crohn's Disease and Ulcerative Colitis Proteomes through Protein-Wide Association Study Feature Selection in Machine Learning. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.11.13.24316854. [PMID: 39606394 PMCID: PMC11601736 DOI: 10.1101/2024.11.13.24316854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Background and Aims Diagnostic differentiation between Crohn's disease (CD) and ulcerative colitis (UC) is crucial for timely and suitable therapeutic measures. The current gold standard for differentiating between CD and UC involves endoscopy and histology, which are invasive and costly. We aimed to identify blood plasma proteomic signatures using a Protein-Wide Association Study (PWAS) approach to differentiate CD from UC and evaluate the efficacy of these signatures as features in machine learning (ML) classifiers. Methods Among participants (n=1,106; nCD=636; nUC=470) of the Study of a Prospective Adult Research Cohort with IBD (SPARC), plasma protein (n=2,920) levels were estimated using Olink proteomics. A PWAS with Bonferroni correction for multiple testing was used to identify proteins associated with disease states after controlling for age, sex, and disease severity. ML classifiers examined the diagnostic utility of these models. Feature importance was determined via SHapley Additive exPlanations (SHAP) analysis. Results Thirteen proteins which were significantly differentially abundant in CD vs UC (all |β|s > 0.22, all adjusted p values < 8.42E-06). Random forest models of proteins differentiated between CD and UC with models trained only on PWAS identified proteins (Average ROC-AUC 0.73) outperforming models trained of the full proteome (Average ROC-AUC 0.62). SHAP analysis revealed that Granzyme B, insulin-like peptide 5 (INSL5), and interleukin-12 subunit beta (IL-12B) were the most important features. Conclusions Our findings demonstrate that PWAS-based feature selection approaches are a powerful method to identify features in complex, noisy datasets. Importantly, we have identified novel peptide based biomarkers such as INSL5, that can be potentially used to complement existing strategies to differentiate between CD and UC.
Collapse
Affiliation(s)
- Mark G Gorelik
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Aaron J Gorelik
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
| | - Skye R S Fishbein
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Tara Fehlmann
- Crohn's and Colitis Foundation, New York, New York, USA
| | - Parakkal Deepak
- Division of Gastroenterology, John T. Milliken Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Ryan Bogdan
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
| | - Gautam Dantas
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Biomedical Engineering, Washington University in St Louis, St. Louis, MO, USA
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Umang Jain
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
70
|
Teshale AB, Htun HL, Owen AJ, Ryan J, Baker JR, Vered M, Reid CM, Woods RL, Berk M, Tonkin A, Neumann JT, Kilkenny MF, Phyo AZZ, Nelson MR, Stocks N, Britt C, Freak-Poli R. Gender-specific aspects of socialisation and risk of cardiovascular disease among community-dwelling older adults: a prospective cohort study using machine learning algorithms and a conventional method. J Epidemiol Community Health 2024; 78:737-744. [PMID: 38839108 DOI: 10.1136/jech-2023-221860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
BACKGROUND Gender influences cardiovascular disease (CVD) through norms, social relations, roles and behaviours. This study identified gender-specific aspects of socialisation associated with CVD. METHODS A longitudinal study was conducted, involving 9936 (5,231 women and 4705 men) initially healthy, community-dwelling Australians aged 70 years or more from the ASPirin in Reducing Events in the Elderly (ASPREE) study and ASPREE Longitudinal Study of Older Persons, with a median follow-up time of 6.4 years. Variable categorisation, variable selection (using machine learning (ML) models; Elastic Net and extreme gradient boosting) and Cox-regression were employed separately by binary gender to identity socialisation factors (n=25 considered) associated with CVD. RESULTS Different socialisation factors were identified using the ML models. In the Cox model, for both genders, being married/partnered was associated with a reduced risk of CVD (men: HR 0.76, 95% CI 0.60 to 0.96; women: HR 0.67, 95% CI 0.58 to 0.95). For men, having 3-8 relatives they felt close to and could call on for help (HR 0.76, 95% CI 0.58 to 0.99; reference <3 relatives), having 3-8 relatives they felt at ease talking with about private matters (HR 0.70, 95% CI 0.55 to 0.90; reference <3 relatives) or playing games such as chess or cards (HR 0.82, 95% CI 0.67 to 1.00) was associated with reduced risk of CVD. For women, living with others (HR 0.71, 95% CI 0.55 to 0.91) or having ≥3 friends they felt at ease talking with about private matters (HR 0.74, 95% CI 0.58 to 0.95; reference <3 friends) was associated with a lower risk of CVD. CONCLUSIONS This study demonstrates the need to prioritise gender-specific social factors to improve cardiovascular health in older adults.
Collapse
Affiliation(s)
| | - Htet Lin Htun
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Alice J Owen
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Joanne Ryan
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - J R Baker
- Primary & Community Care Services Ltd, Thornleigh, New South Wales, Australia
| | - Mor Vered
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton, Victoria, Australia
| | - Christopher M Reid
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
- School of Public Health, Curtin University, Bentley, Western Australia, Australia
| | - Robyn L Woods
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Michael Berk
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
- Institute for Mental and Physical Health and Clinical Translation (IMPACT), Food & Mood Centre, School of Medicine, Deakin University and Barwon Health, Geelong, Victoria, Australia
| | - Andrew Tonkin
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Johannes T Neumann
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
- Department of Cardiology, University Heart & Vascular Centre, Hamburg, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Monique F Kilkenny
- Stroke and Ageing Research, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
- Stroke Division, The Florey Institute of Neuroscience and Mental Health, University of Melbourne, The University of Melbourne, Heidelberg, Victoria, Australia
| | - Aung Zaw Zaw Phyo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Mark R Nelson
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Nigel Stocks
- Discipline of General Practice, Adelaide Medical School, University of Adelaide, Adelaide, South Australia, Australia
| | - Carlene Britt
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Rosanne Freak-Poli
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
- School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
| |
Collapse
|
71
|
Mylona E, Zaridis DI, Kalantzopoulos CΝ, Tachos NS, Regge D, Papanikolaou N, Tsiknakis M, Marias K, Fotiadis DI. Optimizing radiomics for prostate cancer diagnosis: feature selection strategies, machine learning classifiers, and MRI sequences. Insights Imaging 2024; 15:265. [PMID: 39495422 PMCID: PMC11535140 DOI: 10.1186/s13244-024-01783-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/27/2024] [Indexed: 11/05/2024] Open
Abstract
OBJECTIVES Radiomics-based analyses encompass multiple steps, leading to ambiguity regarding the optimal approaches for enhancing model performance. This study compares the effect of several feature selection methods, machine learning (ML) classifiers, and sources of radiomic features, on models' performance for the diagnosis of clinically significant prostate cancer (csPCa) from bi-parametric MRI. METHODS Two multi-centric datasets, with 465 and 204 patients each, were used to extract 1246 radiomic features per patient and MRI sequence. Ten feature selection methods, such as Boruta, mRMRe, ReliefF, recursive feature elimination (RFE), random forest (RF) variable importance, L1-lasso, etc., four ML classifiers, namely SVM, RF, LASSO, and boosted generalized linear model (GLM), and three sets of radiomics features, derived from T2w images, ADC maps, and their combination, were used to develop predictive models of csPCa. Their performance was evaluated in a nested cross-validation and externally, using seven performance metrics. RESULTS In total, 480 models were developed. In nested cross-validation, the best model combined Boruta with Boosted GLM (AUC = 0.71, F1 = 0.76). In external validation, the best model combined L1-lasso with boosted GLM (AUC = 0.71, F1 = 0.47). Overall, Boruta, RFE, L1-lasso, and RF variable importance were the top-performing feature selection methods, while the choice of ML classifier didn't significantly affect the results. The ADC-derived features showed the highest discriminatory power with T2w-derived features being less informative, while their combination did not lead to improved performance. CONCLUSION The choice of feature selection method and the source of radiomic features have a profound effect on the models' performance for csPCa diagnosis. CRITICAL RELEVANCE STATEMENT This work may guide future radiomic research, paving the way for the development of more effective and reliable radiomic models; not only for advancing prostate cancer diagnostic strategies, but also for informing broader applications of radiomics in different medical contexts. KEY POINTS Radiomics is a growing field that can still be optimized. Feature selection method impacts radiomics models' performance more than ML algorithms. Best feature selection methods: RFE, LASSO, RF, and Boruta. ADC-derived radiomic features yield more robust models compared to T2w-derived radiomic features.
Collapse
Affiliation(s)
- Eugenia Mylona
- Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece
- Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Dimitrios I Zaridis
- Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece
- Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece
- Biomedical Engineering Laboratory, School of Electrical & Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Charalampos Ν Kalantzopoulos
- Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece
- Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Nikolaos S Tachos
- Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece
- Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Daniele Regge
- Department of Radiology, Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy
| | | | - Manolis Tsiknakis
- Computational Biomedicine Laboratory, Institute of Computer Science, FORTH, GR 70013, Heraklion, Greece
- Department of Electrical and Computer Engineering, Hellenic Mediterranean University, GR 71004, Heraklion, Greece
| | - Kostas Marias
- Computational Biomedicine Laboratory, Institute of Computer Science, FORTH, GR 70013, Heraklion, Greece
- Department of Electrical and Computer Engineering, Hellenic Mediterranean University, GR 71004, Heraklion, Greece
| | - Dimitrios I Fotiadis
- Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece.
- Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece.
| |
Collapse
|
72
|
Yue Z, McCormick NP, Ezeala OM, Durham SH, Westrick SC. EMSIG: Uncovering Factors Influencing COVID-19 Vaccination Across Different Subgroups Characterized by Embedding-Based Spatial Information Gain. Vaccines (Basel) 2024; 12:1253. [PMID: 39591156 PMCID: PMC11599077 DOI: 10.3390/vaccines12111253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 10/24/2024] [Accepted: 10/30/2024] [Indexed: 11/28/2024] Open
Abstract
Background/Objectives: COVID-19 and its variants continue to pose significant threats to public health, with considerable uncertainty surrounding their impact. As of September 2024, the total number of deaths reached 8.8 million worldwide. Vaccination remains the most effective strategy for preventing COVID-19. However, vaccination rates in the Deep South, U.S., are notably lower than the national average due to various factors. Methods: To address this challenge, we developed the Embedding-based Spatial Information Gain (EMSIG) method, an innovative tool using machine learning techniques for subgroup modeling. EMSIG helps identify subgroups where participants share similar perceptions but exhibit high variance in COVID-19 vaccine doses. It introduces spatial information gain (SIG) to screen regions of interest (ROI) subgroups and reveals their specific concerns. Results: We analyzed survey data from 1020 participants in Alabama. EMSIG identified 16 factors encompassing COVID-19 hesitancy and trust in medical doctors, pharmacists, and public health authorities and revealed four distinct ROI subgroups. The five factors, including COVID-19 perceived detriment, fear, skepticism, side effects related to COVID-19, and communication with pharmacists, were commonly shared across at least three subgroups. A subgroup primarily composed of Democrats with a high flu-shot rate expressed concerns about pharmacist communication, government fairness, and responsibility. Another subgroup, characterized by older, white Republicans with a relatively low flu-shot rate, expressed concerns about doctor trust and the intelligence of public health authorities. Conclusions: EMSIG enhances our understanding of specific concerns across different demographics, characterizes these demographics, and informs targeted interventions to increase vaccination uptake and ensure equitable prevention strategies.
Collapse
Affiliation(s)
- Zongliang Yue
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL 36849, USA; (Z.Y.); (N.P.M.); (O.M.E.)
| | - Nicholas P. McCormick
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL 36849, USA; (Z.Y.); (N.P.M.); (O.M.E.)
| | - Oluchukwu M. Ezeala
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL 36849, USA; (Z.Y.); (N.P.M.); (O.M.E.)
| | - Spencer H. Durham
- Department of Pharmacy Practice, Harrison College of Pharmacy, Auburn University, Auburn, AL 36849, USA;
| | - Salisa C. Westrick
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL 36849, USA; (Z.Y.); (N.P.M.); (O.M.E.)
| |
Collapse
|
73
|
Montevechi AA, Miranda RDC, Medeiros AL, Montevechi JAB. Advancing credit risk modelling with Machine Learning: A comprehensive review of the state-of-the-art. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2024; 137:109082. [DOI: 10.1016/j.engappai.2024.109082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
74
|
Shelley B, Shaw M. Machine learning and preoperative risk prediction: the machines are coming. Br J Anaesth 2024; 133:925-930. [PMID: 39209700 DOI: 10.1016/j.bja.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 07/18/2024] [Indexed: 09/04/2024] Open
Abstract
Preoperative risk prediction is an important component of perioperative medicine. Machine learning is a powerful tool that could lead to increasingly complex risk prediction models with improved predictive performance. Careful consideration is required to guide the machine learning approach to ensure appropriate decisions are made with regard to what we are trying to predict, when we are trying to predict it, and what we seek to do with the results.
Collapse
Affiliation(s)
- Ben Shelley
- Department of Cardiothoracic Anaesthesia and Intensive Care, Golden Jubilee National Hospital, Clydebank, UK; Anaesthesia, Perioperative Medicine and Critical Care Research Group, University of Glasgow, Glasgow, UK.
| | - Martin Shaw
- Anaesthesia, Perioperative Medicine and Critical Care Research Group, University of Glasgow, Glasgow, UK; Department of Clinical Physics and Bioengineering, NHS Greater Glasgow and Clyde, Glasgow, UK
| |
Collapse
|
75
|
Chen T, Yi Y. Multi-Strategy Enhanced Parrot Optimizer: Global Optimization and Feature Selection. Biomimetics (Basel) 2024; 9:662. [PMID: 39590234 PMCID: PMC11591862 DOI: 10.3390/biomimetics9110662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/24/2024] [Accepted: 10/30/2024] [Indexed: 11/28/2024] Open
Abstract
Optimization algorithms are pivotal in addressing complex problems across diverse domains, including global optimization and feature selection (FS). In this paper, we introduce the Enhanced Crisscross Parrot Optimizer (ECPO), an improved version of the Parrot Optimizer (PO), designed to address these challenges effectively. The ECPO incorporates a sophisticated strategy selection mechanism that allows individuals to retain successful behaviors from prior iterations and shift to alternative strategies in case of update failures. Additionally, the integration of a crisscross (CC) mechanism promotes more effective information exchange among individuals, enhancing the algorithm's exploration capabilities. The proposed algorithm's performance is evaluated through extensive experiments on the CEC2017 benchmark functions, where it is compared with ten other conventional optimization algorithms. Results demonstrate that the ECPO consistently outperforms these algorithms across various fitness landscapes. Furthermore, a binary version of the ECPO is developed and applied to FS problems on ten real-world datasets, demonstrating its ability to achieve competitive error rates with reduced feature subsets. These findings suggest that the ECPO holds promise as an effective approach for both global optimization and feature selection.
Collapse
Affiliation(s)
| | - Yuanyuan Yi
- College of Geophysics and Petroleum Resources, Yangtze University, Wuhan 430100, China;
| |
Collapse
|
76
|
Breimann S, Frishman D. AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. BIOINFORMATICS ADVANCES 2024; 4:vbae165. [PMID: 39544628 PMCID: PMC11562964 DOI: 10.1093/bioadv/vbae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/10/2024] [Accepted: 10/23/2024] [Indexed: 11/17/2024]
Abstract
Summary Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications. Availability and implementation The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
- Division of Metabolic Biochemistry, Biomedical Center (BMC), LMU Munich, Munich, 81377, Germany
- Biochemistry of γ-Secretase, German Center for Neurodegenerative Diseases (DZNE), Munich, 81377, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
| |
Collapse
|
77
|
Abdi B, Kolo K, Shahabi H. Assessment of land degradation susceptibility within the Shaqlawa subregion of Northern Iraq-Kurdistan Region via synergistic application of remotely acquired datasets and advanced predictive models. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 196:1103. [PMID: 39453413 DOI: 10.1007/s10661-024-13284-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 10/16/2024] [Indexed: 10/26/2024]
Abstract
Land degradation (LD) is the decline in a land's functional capacity and productive potential, which includes various anthropogenic and natural drivers. This study focuses on three primary manifestations of LD including soil erosion, landslides, and rockfalls, which are the most prevalent in the Shaqlawa district. A set of 22 LD conditioning factors, encompassing curvature, lithology, aspect, river density, soil type, lineament density, river distance, elevation, road distance, length slope (LS), land use land cover (LULC), stream power index (SPI), valley depth, profile curvature, slope, solar radiation, road density, lineament distance, rainfall, topographic wetness index (TWI), plan curvature, and normalized difference vegetation index (NDVI), were integrated into the analysis. Variance inflation factors (VIF) and tolerance (TOL) values from linear regression indicate that most LD factors have acceptable levels of multicollinearity. The Information Gain Ratio (IGR) identified key variables TWI, NDVI, and lithology-as pivotal factors for predicting LD. Additionally, the study evaluated degradation factors using various machine learning (ML) algorithms, including random forest (RF), Naive Bayes, logistic regression, rotation forest, forest penalized attributes (FPA), and Fisher's Linear discriminant analysis (FLDA). This facilitated categorizing the study area into five susceptibility categories. The FLDA model categorized the highest area under very high degradation risk at 26.72%, emphasizing the varied insights each algorithm brought to characterizing the degradation risk. Additionally, the receiver operating characteristic curves (ROC) were employed for model validation, identifying RF as the most successful model in the training dataset with an area under the curve (AUC) of 0.882, while FLDA outperformed in the testing dataset with an AUC of 0.883. The identified LD-prone areas will help land-use planners and emergency management officials apply effective mitigation strategies for similar terrains.
Collapse
Affiliation(s)
- Badeea Abdi
- Department of Petroleum Geoscience, Faculty of Science, Soran University, Soran, Erbil, Iraq.
| | - Kamal Kolo
- Department of Biogeosciences, Scientific Research Center, Soran University, Soran, Iraq
| | - Himan Shahabi
- Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran
- Division of Geochronology and Environmental Isotopes, Institute of Physics, Silesian University of Technology, 44-100, Gliwice, Poland
| |
Collapse
|
78
|
Sun B, Xu Y, Kat S, Sun A, Yin T, Zhao L, Su X, Chen J, Wang H, Gong X, Liu Q, Han G, Peng S, Li X, Liu J. Exploring the most discriminative brain structural abnormalities in ASD with multi-stage progressive feature refinement approach. Front Psychiatry 2024; 15:1463654. [PMID: 39483728 PMCID: PMC11524921 DOI: 10.3389/fpsyt.2024.1463654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 09/23/2024] [Indexed: 11/03/2024] Open
Abstract
Objective Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by increasing prevalence, diverse impairments, and unclear origins and mechanisms. To gain a better grasp of the origins of ASD, it is essential to identify the most distinctive structural brain abnormalities in individuals with ASD. Methods A Multi-Stage Progressive Feature Refinement Approach was employed to identify the most pivotal structural magnetic resonance imaging (MRI) features that distinguish individuals with ASD from typically developing (TD) individuals. The study included 175 individuals with ASD and 69 TD individuals, all aged between 7 and 18 years, matched in terms of age and gender. Both cortical and subcortical features were integrated, with a particular focus on hippocampal subfields. Results Out of 317 features, 9 had the most significant impact on distinguishing ASD from TD individuals. These structural features, which include a specific hippocampal subfield, are closely related to the brain areas associated with the reward system. Conclusion Structural irregularities in the reward system may play a crucial role in the pathophysiology of ASD, and specific hippocampal subfields may also contribute uniquely, warranting further investigation.
Collapse
Affiliation(s)
- Bingxi Sun
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Yingying Xu
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Siuching Kat
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Anlan Sun
- Yizhun Medical AI Co., Ltd, Algorithm and Development Department, Beijing, China
| | - Tingni Yin
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Liyang Zhao
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Xing Su
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Jialu Chen
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Hui Wang
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Xiaoyun Gong
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Qinyi Liu
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Gangqiang Han
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Shuchen Peng
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Xue Li
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Jing Liu
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| |
Collapse
|
79
|
Ngusie HS, Enyew EB, Walle AD, Tilahun Assaye B, Kasaye MD, Tesfa GA, Zemariam AB. Employing machine learning techniques for prediction of micronutrient supplementation status during pregnancy in East African Countries. Sci Rep 2024; 14:23827. [PMID: 39394461 PMCID: PMC11470067 DOI: 10.1038/s41598-024-75455-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 10/04/2024] [Indexed: 10/13/2024] Open
Abstract
Micronutrient deficiencies, known as "hidden hunger" or "hidden malnutrition," pose a significant health risk to pregnant women, particularly in low-income countries like the East Africa region. This study employed eight advanced machine learning algorithms to predict the status of micronutrient supplementation among pregnant women in 12 East African countries, using recent demographic health survey (DHS) data. The analysis involved 138,426 study samples, and algorithm performance was evaluated using accuracy, area under the ROC curve (AUC), specificity, precision, recall, and F1-score. Among the algorithms tested, the random forest classifier emerged as the top performer in predicting micronutrient supplementation status, exhibiting excellent evaluation scores (AUC = 0.892 and accuracy = 94.0%). By analyzing mean SHAP values and performing association rule mining, we gained valuable insights into the importance of different variables and their combined impact, revealing hidden patterns within the data. Key predictors of micronutrient supplementation were the mother's education level, employment status, number of antenatal care (ANC) visits, access to media, number of children, and religion. By harnessing the power of machine learning algorithms, policymakers and healthcare providers can develop targeted strategies to improve the uptake of micronutrient supplementation. Key intervention components involve enhancing education, strengthening ANC services, and implementing comprehensive media campaigns that emphasize the importance of micronutrient supplementation. It is also crucial to consider cultural and religious sensitivities when designing interventions to ensure their effectiveness and acceptance within the specific population. Furthermore, researchers are encouraged to explore and experiment with various techniques to optimize algorithm performance, leading to the identification of the most effective predictors and enhanced accuracy in predicting micronutrient supplementation status.
Collapse
Affiliation(s)
- Habtamu Setegn Ngusie
- Department of Health Informatics, School of Public Health, College of Medicine and Health Sciences, Woldia University, PO Box 400, Woldia, Amhara, Ethiopia.
| | - Ermias Bekele Enyew
- Department of Health Informatics, College of Medicine and Health Science, Wollo University, Desie, Ethiopia
| | - Agmasie Damtew Walle
- Department of Health Informatics, College of Medicine and Health Science, Debre Berhan University, Debre Berhan, Ethiopia
| | - Bayou Tilahun Assaye
- Department of Health Informatics, College of Health Science, Debre Markos University, Debre Markos, Ethiopia
| | - Mulugeta Desalegn Kasaye
- Department of Health Informatics, College of Medicine and Health Science, Wollo University, Desie, Ethiopia
| | | | - Alemu Birara Zemariam
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| |
Collapse
|
80
|
Kim Y, Kang H, Seo H, Choi H, Kim M, Han J, Kee G, Park S, Ko S, Jung H, Kim B, Jun TJ, Roh JH, Kim YH. Development and transfer learning of self-attention model for major adverse cardiovascular events prediction across hospitals. Sci Rep 2024; 14:23443. [PMID: 39379478 PMCID: PMC11461710 DOI: 10.1038/s41598-024-74366-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/25/2024] [Indexed: 10/10/2024] Open
Abstract
Predicting major adverse cardiovascular events (MACE) is crucial due to its high readmission rate and severe sequelae. Current risk scoring model of MACE are based on a few features of a patient status at a single time point. We developed a self-attention-based model to predict MACE within 3 years from time series data utilizing numerous features in electronic medical records (EMRs). In addition, we demonstrated transfer learning for hospitals with insufficient data through code mapping and feature selection by the calculated importance using Xgboost. We established operational definitions and categories for diagnoses, medications, and laboratory tests to streamline scattered codes, enhancing clinical interpretability across hospitals. This resulted in reduced feature size and improved data quality for transfer learning. The pre-trained model demonstrated an increase in AUROC after transfer learning, from 0.564 to 0.821. Furthermore, to validate the effectiveness of the predicted scores, we analyzed the data using traditional survival analysis, which confirmed an elevated hazard ratio for a group with high scores.
Collapse
Affiliation(s)
- Yunha Kim
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Heejun Kang
- Division of Cardiology, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Hyeram Seo
- Department of Information Medicine, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Heejung Choi
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Minkyoung Kim
- Department of Information Medicine, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - JiYe Han
- Department of Information Medicine, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Gaeun Kee
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Seohyun Park
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Soyoung Ko
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - HyoJe Jung
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Byeolhee Kim
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Tae Joon Jun
- Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea.
| | - Jae-Hyung Roh
- Department of Internal Medicine, Chungnam National University College of Medicine, Chungnam National University Sejong Hospital, 20, Bodeum 7-Ro, Sejong-Si, Sejong, 30099, Republic of Korea
| | - Young-Hak Kim
- Division of Cardiology, Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-ro 43-gil, Songpa-gu, Songpagu, Seoul, 05505, Republic of Korea.
| |
Collapse
|
81
|
Kaur D, Arora A, Vigneshwar P, Raghava GPS. Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods. Proteomics 2024; 24:e2400004. [PMID: 38803012 DOI: 10.1002/pmic.202400004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/29/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]
Abstract
Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.
Collapse
Affiliation(s)
- Dashleen Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Palani Vigneshwar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
82
|
Yousef H, Malagurski Tortei B, Castiglione F. Predicting multiple sclerosis disease progression and outcomes with machine learning and MRI-based biomarkers: a review. J Neurol 2024; 271:6543-6572. [PMID: 39266777 PMCID: PMC11447111 DOI: 10.1007/s00415-024-12651-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 08/16/2024] [Accepted: 08/17/2024] [Indexed: 09/14/2024]
Abstract
Multiple sclerosis (MS) is a demyelinating neurological disorder with a highly heterogeneous clinical presentation and course of progression. Disease-modifying therapies are the only available treatment, as there is no known cure for the disease. Careful selection of suitable therapies is necessary, as they can be accompanied by serious risks and adverse effects such as infection. Magnetic resonance imaging (MRI) plays a central role in the diagnosis and management of MS, though MRI lesions have displayed only moderate associations with MS clinical outcomes, known as the clinico-radiological paradox. With the advent of machine learning (ML) in healthcare, the predictive power of MRI can be improved by leveraging both traditional and advanced ML algorithms capable of analyzing increasingly complex patterns within neuroimaging data. The purpose of this review was to examine the application of MRI-based ML for prediction of MS disease progression. Studies were divided into five main categories: predicting the conversion of clinically isolated syndrome to MS, cognitive outcome, EDSS-related disability, motor disability and disease activity. The performance of ML models is discussed along with highlighting the influential MRI-derived biomarkers. Overall, MRI-based ML presents a promising avenue for MS prognosis. However, integration of imaging biomarkers with other multimodal patient data shows great potential for advancing personalized healthcare approaches in MS.
Collapse
Affiliation(s)
- Hibba Yousef
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates.
| | - Brigitta Malagurski Tortei
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates
| | - Filippo Castiglione
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates
- Institute for Applied Computing (IAC), National Research Council of Italy, Rome, Italy
| |
Collapse
|
83
|
Shao W, Lin X, Huang Y, Qu L, Zhuo W, Liu H. Rapid patient-specific organ dose estimation in computed tomography scans via integration of radiomics features and neural networks. Quant Imaging Med Surg 2024; 14:7379-7391. [PMID: 39429608 PMCID: PMC11485356 DOI: 10.21037/qims-24-645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/22/2024] [Indexed: 10/22/2024]
Abstract
Background Computed tomography (CT) offers detailed cross-sectional images of internal anatomy for disease detection but carries a risk of solid cancer or blood malignancies due to exposure to X-ray radiation. This study aimed to develop a new method to quickly predict patient-specific organ doses from CT examinations by training neural networks (NNs) based on radiomics features. Methods CT Digital Imaging and Communications in Medicine (DICOM) image data were exported to DeepViewer, a clinical autosegmentation software, to segment the regions of interest (ROIs) for patient organs. Radiomics feature extraction was performed based on the selected CT data and ROIs. Reference organ doses were computed using Monte Carlo (MC) simulations. Patient-specific organ doses were predicted by training a NN model based on radiomics features and reference doses. For the dose prediction performance, the relative root mean squared error (RRMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2) were evaluated on the test sets. The robustness of the NN model was evaluated via the random rearrangement of patient samples in the training and test sets. Results The maximal difference between the reference and predicted doses was less than 1 mGy for all investigated organs. The range of MAPE was 1.68% to 5.2% for head organs, 11.42% to 15.2% for chest organs, and 5.0% to 8.0% for abdominal organs; the maximal R2 values were 0.93, 0.86, and 0.89 for the head, chest, and abdominal organs, respectively. Conclusions The radiomics feature-based NN model can achieve accurate prediction of patient-specific organ doses at a high speed of less than 1 second using a single central processing unit, which supports its use as a user-friendly online clinical application.
Collapse
Affiliation(s)
- Wencheng Shao
- Institute of Radiation Medicine, Fudan University, Shanghai, China
| | - Xin Lin
- Institute of Radiation Medicine, Fudan University, Shanghai, China
| | - Ying Huang
- Institute of Modern Physics, Fudan University, Shanghai, China
| | - Liangyong Qu
- Department of Radiology, Shanghai Zhongye Hospital, Shanghai, China
| | - Weihai Zhuo
- Institute of Radiation Medicine, Fudan University, Shanghai, China
| | - Haikuan Liu
- Institute of Radiation Medicine, Fudan University, Shanghai, China
| |
Collapse
|
84
|
Liu S, Shi T, Yu J, Li R, Lin H, Deng K. Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review. Int J Mol Sci 2024; 25:9844. [PMID: 39337334 PMCID: PMC11432553 DOI: 10.3390/ijms25189844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure-activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.
Collapse
Affiliation(s)
| | | | | | | | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| |
Collapse
|
85
|
Hernandez-Laredo E, Estévez-Pedraza ÁG, Santiago-Fuentes LM, Parra-Rodríguez L. Optimizing Fall Risk Diagnosis in Older Adults Using a Bayesian Classifier and Simulated Annealing. Bioengineering (Basel) 2024; 11:908. [PMID: 39329650 PMCID: PMC11429116 DOI: 10.3390/bioengineering11090908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/06/2024] [Accepted: 09/06/2024] [Indexed: 09/28/2024] Open
Abstract
The aim of this study was to improve the diagnostic ability of fall risk classifiers using a Bayesian approach and the Simulated Annealing (SA) algorithm. A total of 47 features from 181 records (40 Center of Pressure (CoP) indices and 7 patient descriptive variables) were analyzed. The wrapper method of feature selection using the SA algorithm was applied to optimize the cost function based on the difference of the mean minus the standard deviation of the Area Under the Curve (AUC) of the fall risk classifiers across multiple dimensions. A stratified 60-20-20% hold-out method was used for train, test, and validation sets, respectively. The results showed that although the highest performance was observed with 31 features (0.815 ± 0.110), lower variability and higher explainability were achieved with only 15 features (0.780 ± 0.055). These findings suggest that the SA algorithm is a valuable tool for feature selection for acceptable fall risk diagnosis. This method offers an alternative or complementary resource in situations where clinical tools are difficult to apply.
Collapse
Affiliation(s)
- Enrique Hernandez-Laredo
- Tianguistenco Professional Academic Unit, Autonomous University of the State of Mexico, Tianguistenco 52640, Mexico;
| | - Ángel Gabriel Estévez-Pedraza
- Tianguistenco Professional Academic Unit, Autonomous University of the State of Mexico, Tianguistenco 52640, Mexico;
| | | | | |
Collapse
|
86
|
Miller C, Portlock T, Nyaga DM, O'Sullivan JM. A review of model evaluation metrics for machine learning in genetics and genomics. FRONTIERS IN BIOINFORMATICS 2024; 4:1457619. [PMID: 39318760 PMCID: PMC11420621 DOI: 10.3389/fbinf.2024.1457619] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 08/27/2024] [Indexed: 09/26/2024] Open
Abstract
Machine learning (ML) has shown great promise in genetics and genomics where large and complex datasets have the potential to provide insight into many aspects of disease risk, pathogenesis of genetic disorders, and prediction of health and wellbeing. However, with this possibility there is a responsibility to exercise caution against biases and inflation of results that can have harmful unintended impacts. Therefore, researchers must understand the metrics used to evaluate ML models which can influence the critical interpretation of results. In this review we provide an overview of ML metrics for clustering, classification, and regression and highlight the advantages and disadvantages of each. We also detail common pitfalls that occur during model evaluation. Finally, we provide examples of how researchers can assess and utilise the results of ML models, specifically from a genomics perspective.
Collapse
Affiliation(s)
- Catriona Miller
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Theo Portlock
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Denis M Nyaga
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Justin M O'Sullivan
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Singapore Institute for Clinical Sciences, Agency for Science Technology and Research, Singapore, Singapore
| |
Collapse
|
87
|
Khan A, Zubair S, Shuaib M, Sheneamer A, Alam S, Assiri B. Development of a robust parallel and multi-composite machine learning model for improved diagnosis of Alzheimer's disease: correlation with dementia-associated drug usage and AT(N) protein biomarkers. Front Neurosci 2024; 18:1391465. [PMID: 39308946 PMCID: PMC11412962 DOI: 10.3389/fnins.2024.1391465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 08/12/2024] [Indexed: 09/25/2024] Open
Abstract
Introduction Machine learning (ML) algorithms and statistical modeling offer a potential solution to offset the challenge of diagnosing early Alzheimer's disease (AD) by leveraging multiple data sources and combining information on neuropsychological, genetic, and biomarker indicators. Among others, statistical models are a promising tool to enhance the clinical detection of early AD. In the present study, early AD was diagnosed by taking into account characteristics related to whether or not a patient was taking specific drugs and a significant protein as a predictor of Amyloid-Beta (Aβ), tau, and ptau [AT(N)] levels among participants. Methods In this study, the optimization of predictive models for the diagnosis of AD pathologies was carried out using a set of baseline features. The model performance was improved by incorporating additional variables associated with patient drugs and protein biomarkers into the model. The diagnostic group consisted of five categories (cognitively normal, significant subjective memory concern, early mildly cognitively impaired, late mildly cognitively impaired, and AD), resulting in a multinomial classification challenge. In particular, we examined the relationship between AD diagnosis and the use of various drugs (calcium and vitamin D supplements, blood-thinning drugs, cholesterol-lowering drugs, and cognitive drugs). We propose a hybrid-clinical model that runs multiple ML models in parallel and then takes the majority's votes, enhancing the accuracy. We also assessed the significance of three cerebrospinal fluid biomarkers, Aβ, tau, and ptau in the diagnosis of AD. We proposed that a hybrid-clinical model be used to simulate the MRI-based data, with five diagnostic groups of individuals, with further refinement that includes preclinical characteristics of the disorder. The proposed design builds a Meta-Model for four different sets of criteria. The set criteria are as follows: to diagnose from baseline features, baseline and drug features, baseline and protein features, and baseline, drug and protein features. Results We were able to attain a maximum accuracy of 97.60% for baseline and protein data. We observed that the constructed model functioned effectively when all five drugs were included and when any single drug was used to diagnose the response variable. Interestingly, the constructed Meta-Model worked well when all three protein biomarkers were included, as well as when a single protein biomarker was utilized to diagnose the response variable. Discussion It is noteworthy that we aimed to construct a pipeline design that incorporates comprehensive methodologies to detect Alzheimer's over wide-ranging input values and variables in the current study. Thus, the model that we developed could be used by clinicians and medical experts to advance Alzheimer's diagnosis and as a starting point for future research into AD and other neurodegenerative syndromes.
Collapse
Affiliation(s)
- Afreen Khan
- Department of Computer Application, Faculty of Engineering & IT, Integral University, Lucknow, India
| | - Swaleha Zubair
- Department of Computer Science, Faculty of Science, Aligarh Muslim University, Aligarh, India
| | - Mohammed Shuaib
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Abdullah Sheneamer
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Shadab Alam
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Basem Assiri
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| |
Collapse
|
88
|
Selwyn JD, Despard BA, Vollmer MV, Trytten EC, Vollmer SV. Identification of putative coral pathogens in endangered Caribbean staghorn coral using machine learning. Environ Microbiol 2024; 26:e16700. [PMID: 39289821 DOI: 10.1111/1462-2920.16700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/27/2024] [Indexed: 09/19/2024]
Abstract
Coral diseases contribute to the rapid decline in coral reefs worldwide, and yet coral bacterial pathogens have proved difficult to identify because 16S rRNA gene surveys typically identify tens to hundreds of disease-associate bacteria as putative pathogens. An example is white band disease (WBD), which has killed up to 95% of the now-endangered Caribbean Acropora corals since 1979, yet the pathogen is still unknown. The 16S rRNA gene surveys have identified hundreds of WBD-associated bacterial amplicon sequencing variants (ASVs) from at least nine bacterial families with little consensus across studies. We conducted a multi-year, multi-site 16S rRNA gene sequencing comparison of 269 healthy and 143 WBD-infected Acropora cervicornis and used machine learning modelling to accurately predict disease outcomes and identify the top ASVs contributing to disease. Our ensemble ML models accurately predicted disease with greater than 97% accuracy and identified 19 disease-associated ASVs and five healthy-associated ASVs that were consistently differentially abundant across sampling periods. Using a tank-based transmission experiment, we tested whether the 19 disease-associated ASVs met the assumption of a pathogen and identified two pathogenic candidate ASVs-ASV25 Cysteiniphilum litorale and ASV8 Vibrio sp. to target for future isolation, cultivation, and confirmation of Henle-Koch's postulate via transmission assays.
Collapse
Affiliation(s)
- Jason D Selwyn
- Marine Science Center, Northeastern University, Nahant, Massachusetts, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Brecia A Despard
- Marine Science Center, Northeastern University, Nahant, Massachusetts, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Miles V Vollmer
- Marine Science Center, Northeastern University, Nahant, Massachusetts, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Emily C Trytten
- Marine Science Center, Northeastern University, Nahant, Massachusetts, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Steven V Vollmer
- Marine Science Center, Northeastern University, Nahant, Massachusetts, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| |
Collapse
|
89
|
Elkahwagy DMAS, Kiriacos CJ, Mansour M. Logistic regression and other statistical tools in diagnostic biomarker studies. Clin Transl Oncol 2024; 26:2172-2180. [PMID: 38530558 PMCID: PMC11333519 DOI: 10.1007/s12094-024-03413-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
A biomarker is a measured indicator of a variety of processes, and is often used as a clinical tool for the diagnosis of diseases. While the developmental process of biomarkers from lab to clinic is complex, initial exploratory stages often focus on characterizing the potential of biomarkers through utilizing various statistical methods that can be used to assess their discriminatory performance, establish an appropriate cut-off that transforms continuous data to apt binary responses of confirming or excluding a diagnosis, or establish a robust association when tested against confounders. This review aims to provide a gentle introduction to the most common tools found in diagnostic biomarker studies used to assess the performance of biomarkers with an emphasis on logistic regression.
Collapse
Affiliation(s)
| | - Caroline Joseph Kiriacos
- Pharmaceutical Biology Department, Faculty of Pharmacy and Biotechnology, German University in Cairo, Cairo, 11835, Egypt
| | - Manar Mansour
- Pharmaceutical Biology Department, Faculty of Pharmacy and Biotechnology, German University in Cairo, Cairo, 11835, Egypt
| |
Collapse
|
90
|
Yoon SB, Lee JM, Jung CW, Suh KS, Lee KW, Yi NJ, Hong SK, Choi Y, Hong SY, Lee HC. Machine-learning model to predict the tacrolimus concentration and suggest optimal dose in liver transplantation recipients: a multicenter retrospective cohort study. Sci Rep 2024; 14:19996. [PMID: 39198694 PMCID: PMC11358263 DOI: 10.1038/s41598-024-71032-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/23/2024] [Indexed: 09/01/2024] Open
Abstract
Titrating tacrolimus concentration in liver transplantation recipients remains a challenge in the early post-transplant period. This multicenter retrospective cohort study aimed to develop and validate a machine-learning algorithm to predict tacrolimus concentration. Data from 443 patients undergoing liver transplantation between 2017 and 2020 at an academic hospital in South Korea were collected to train machine-learning models. Long short-term memory (LSTM) and gradient-boosted regression tree (GBRT) models were developed using time-series doses and concentrations of tacrolimus with covariates of age, sex, weight, height, liver enzymes, total bilirubin, international normalized ratio, albumin, serum creatinine, and hematocrit. We conducted performance comparisons with linear regression and populational pharmacokinetic models, followed by external validation using the eICU Collaborative Research Database collected in the United States between 2014 and 2015. In the external validation, the LSTM outperformed the GBRT, linear regression, and populational pharmacokinetic models with median performance error (8.8%, 25.3%, 13.9%, and - 11.4%, respectively; P < 0.001) and median absolute performance error (22.3%, 33.1%, 26.8%, and 23.4%, respectively; P < 0.001). Dosing based on the LSTM model's suggestions achieved therapeutic concentrations more frequently on the chi-square test (P < 0.001). Patients who received doses outside the suggested range were associated with longer ICU stays by an average of 2.5 days (P = 0.042). In conclusion, machine learning models showed excellent performance in predicting tacrolimus concentration in liver transplantation recipients and can be useful for concentration titration in these patients.
Collapse
Affiliation(s)
- Soo Bin Yoon
- Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Jeong-Moo Lee
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Chul-Woo Jung
- Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Kyung-Suk Suh
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Kwang-Woong Lee
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Nam-Joon Yi
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Suk Kyun Hong
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - YoungRok Choi
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Su Young Hong
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Hyung-Chul Lee
- Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
| |
Collapse
|
91
|
Borah K, Das HS, Seth S, Mallick K, Rahaman Z, Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct Integr Genomics 2024; 24:139. [PMID: 39158621 DOI: 10.1007/s10142-024-01415-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
Collapse
Affiliation(s)
- Kasmika Borah
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India
| | - Himanish Shekhar Das
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
| | - Soumita Seth
- Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India
| | - Koushik Mallick
- Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India
| | | | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
92
|
Mohtasham F, Pourhoseingholi M, Hashemi Nazari SS, Kavousi K, Zali MR. Comparative analysis of feature selection techniques for COVID-19 dataset. Sci Rep 2024; 14:18627. [PMID: 39128991 PMCID: PMC11317481 DOI: 10.1038/s41598-024-69209-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 08/01/2024] [Indexed: 08/13/2024] Open
Abstract
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
Collapse
Affiliation(s)
- Farideh Mohtasham
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - MohamadAmin Pourhoseingholi
- Hearing Sciences, Mental Health and Clinical Neurosciences, School of Medicine, National Institute for Health and Care Research (NIHR) Nottingham Biomedical Research Center, University of Nottingham, Nottingham, UK
| | - Seyed Saeed Hashemi Nazari
- Department of Epidemiology, School of Public Health & Safety, Shahid Beheshti University of Medical Sciences (SBMU), Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
| | - Mohammad Reza Zali
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
93
|
Pudjihartono N, Ho D, O’Sullivan JM. Integrative analysis reveals novel insights into juvenile idiopathic arthritis pathogenesis and shared molecular pathways with associated traits. Front Genet 2024; 15:1448363. [PMID: 39175752 PMCID: PMC11338781 DOI: 10.3389/fgene.2024.1448363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024] Open
Abstract
Background Juvenile idiopathic arthritis (JIA) is an autoimmune joint disease that frequently co-occurs with other complex phenotypes, including cancers and other autoimmune diseases. Despite the identification of numerous risk variants through genome-wide association studies (GWAS), the affected genes, their connection to JIA pathogenesis, and their role in the development of associated traits remain unclear. This study aims to address these gaps by elucidating the gene-regulatory mechanisms underlying JIA pathogenesis and exploring its potential role in the emergence of associated traits. Methods A two-sample Mendelian Randomization (MR) analysis was conducted to identify blood-expressed genes causally linked to JIA. A curated protein interaction network was subsequently used to identify sets of single-nucleotide polymorphisms (i.e., spatial eQTL SNPs) that regulate the expression of JIA causal genes and their protein interaction partners. These SNPs were cross-referenced against the GWAS catalog to identify statistically enriched traits associated with JIA. Results The two-sample MR analysis identified 52 genes whose expression changes in the blood are putatively causal for JIA. These genes (e.g., HLA, LTA, LTB, IL6ST) participate in a range of immune-related pathways (e.g., antigen presentation, cytokine signalling) and demonstrate cell type-specific regulatory patterns across different immune cell types (e.g., PPP1R11 in CD4+ T cells). The spatial eQTLs that regulate JIA causal genes and their interaction partners were statistically enriched for GWAS SNPs linked with 95 other traits, including both known and novel JIA-associated traits. This integrative analysis identified genes whose dysregulation may explain the links between JIA and associated traits, such as autoimmune/inflammatory diseases (genes at 6p22.1 locus), Hodgkin lymphoma (genes at 6p21.3 [FKBPL, PBX2, AGER]), and chronic lymphocytic leukemia (BAK1). Conclusion Our approach provides a significant advance in understanding the genetic architecture of JIA and associated traits. The results suggest that the burden of associated traits may differ among JIA patients, influenced by their combined genetic risk across different clusters of traits. Future experimental validation of the identified connections could pave the way for refined patient stratification, the discovery of new biomarkers, and shared therapeutic targets.
Collapse
Affiliation(s)
- N. Pudjihartono
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - D. Ho
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - J. M. O’Sullivan
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- A*STAR Singapore Institute for Clinical Sciences, Singapore, Singapore
| |
Collapse
|
94
|
Yang J, Liang N, Pitts BJ, Prakah-Asante K, Curry R, Yu D. An Eye-Fixation Related Electroencephalography Technique for Predicting Situation Awareness: Implications for Driver State Monitoring Systems. HUMAN FACTORS 2024; 66:2138-2153. [PMID: 37851849 DOI: 10.1177/00187208231204570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
OBJECTIVE This study developed a fixation-related electroencephalography band power (FRBP) approach for situation awareness (SA) assessment in automated driving. BACKGROUND Maintaining good SA in Level 3 automated vehicles is crucial to drivers' takeover performance when the automated system fails. A multimodal fusion approach that enables the analysis of the visual behavioral and cognitive processes of SA can facilitate real-time assessment of SA in future driver state monitoring systems. METHOD Thirty participants performed three simulated automated driving tasks. After each task, the Situation Awareness Global Assessment Technique (SAGAT) was deployed to capture their SA about key elements that could affect their takeover task performance. Participants eye movements and brain activities were recorded. Data on their brain activity after each eye fixation on the key elements were extracted and labeled according to the correctness of the SAGAT. Mixed-effects models were used to identify brain regions that were indicative of SA, and machine learning models for SA assessment were developed based on the identified brain regions. RESULTS Participants' alpha and theta oscillation at frontal and temporal areas are indicative of SA. In addition, the FRBP technique can be used to predict drivers' SA with an accuracy of 88% using a neural network model. CONCLUSION The FRBP technique, which incorporates eye movements and brain activities, can provide more comprehensive evaluation of SA. Findings highlight the potential of utilizing FRBP to monitor drivers' SA in real-time. APPLICATION The proposed framework can be expanded and applied to driver state monitoring systems to measure human SA in real-world driving.
Collapse
Affiliation(s)
- Jing Yang
- Purdue University, West Lafayette, IN, USA
| | - Nade Liang
- Purdue University, West Lafayette, IN, USA
| | | | | | | | - Denny Yu
- Purdue University, West Lafayette, IN, USA
| |
Collapse
|
95
|
Ahmed F, Mishra NK, Alghamdi OA, Khan MI, Ahmad A, Khan N, Rehan M. Deciphering KDM8 dysregulation and CpG methylation in hepatocellular carcinoma using multi-omics and machine learning. Epigenomics 2024; 16:961-983. [PMID: 39072393 PMCID: PMC11370911 DOI: 10.1080/17501911.2024.2374702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 06/25/2024] [Indexed: 07/30/2024] Open
Abstract
Aim: This study investigates the altered expression and CpG methylation patterns of histone demethylase KDM8 in hepatocellular carcinoma (HCC), aiming to uncover insights and promising diagnostics biomarkers.Materials & methods: Leveraging TCGA-LIHC multi-omics data, we employed R/Bioconductor libraries and Cytoscape to analyze and construct a gene correlation network, and LASSO regression to develop an HCC-predictive model.Results: In HCC, KDM8 downregulation is correlated with CpGs hypermethylation. Differential gene correlation analysis unveiled a liver carcinoma-associated network marked by increased cell division and compromised liver-specific functions. The LASSO regression identified a highly accurate HCC prediction signature, prominently featuring CpG methylation at cg02871891.Conclusion: Our study uncovers CpG hypermethylation at cg02871891, possibly influencing KDM8 downregulation in HCC, suggesting these as promising biomarkers and targets.
Collapse
Affiliation(s)
- Firoz Ahmed
- Department of Biological Sciences, College of Science, University of Jeddah, Jeddah, Saudi Arabia
| | - Nitish Kumar Mishra
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38015, USA
| | - Othman A Alghamdi
- Department of Biological Sciences, College of Science, University of Jeddah, Jeddah, Saudi Arabia
| | - Mohammad Imran Khan
- Research Center, King Faisal Specialist Hospital & Research Centre, Jeddah, Saudi Arabia
- Department of Biochemistry & Molecular Medicine, College of Medicine, Al-Faisal University, Riyadh, Saudi Arabia
| | - Aamir Ahmad
- Translational Research Institute, Academic Health System, Hamad Medical Corporation, Doha, 3050, Qatar
| | - Nargis Khan
- Snyder Institute of Chronic Diseases, Health Research & Innovation Center, Cumming School of Medicine, University of Calgary, Alberta, Canada
- Department of Microbiology, Immunology & Infectious Diseases, Cumming School of Medicine, University of Calgary, Alberta, Canada
| | - Mohammad Rehan
- Snyder Institute of Chronic Diseases, Health Research & Innovation Center, Cumming School of Medicine, University of Calgary, Alberta, Canada
- Department of Microbiology, Immunology & Infectious Diseases, Cumming School of Medicine, University of Calgary, Alberta, Canada
| |
Collapse
|
96
|
Pradhan UK, Meher PK, Naha S, Sharma NK, Agarwal A, Gupta A, Parsad R. DBPMod: a supervised learning model for computational recognition of DNA-binding proteins in model organisms. Brief Funct Genomics 2024; 23:363-372. [PMID: 37651627 DOI: 10.1093/bfgp/elad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 08/09/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
DNA-binding proteins (DBPs) play critical roles in many biological processes, including gene expression, DNA replication, recombination and repair. Understanding the molecular mechanisms underlying these processes depends on the precise identification of DBPs. In recent times, several computational methods have been developed to identify DBPs. However, because of the generic nature of the models, these models are unable to identify species-specific DBPs with higher accuracy. Therefore, a species-specific computational model is needed to predict species-specific DBPs. In this paper, we introduce the computational DBPMod method, which makes use of a machine learning approach to identify species-specific DBPs. For prediction, both shallow learning algorithms and deep learning models were used, with shallow learning models achieving higher accuracy. Additionally, the evolutionary features outperformed sequence-derived features in terms of accuracy. Five model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens and Mus musculus, were used to assess the performance of DBPMod. Five-fold cross-validation and independent test set analyses were used to evaluate the prediction accuracy in terms of area under receiver operating characteristic curve (auROC) and area under precision-recall curve (auPRC), which was found to be ~89-92% and ~89-95%, respectively. The comparative results demonstrate that the DBPMod outperforms 12 current state-of-the-art computational approaches in identifying the DBPs for all five model organisms. We further developed the web server of DBPMod to make it easier for researchers to detect DBPs and is publicly available at https://iasri-sg.icar.gov.in/dbpmod/. DBPMod is expected to be an invaluable tool for discovering DBPs, supplementing the current experimental and computational methods.
Collapse
Affiliation(s)
- Upendra K Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Nitesh K Sharma
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
| | - Aarushi Agarwal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh 201313, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
97
|
Nussinov R, Yavuz BR, Demirel HC, Arici MK, Jang H, Tuncbag N. Review: Cancer and neurodevelopmental disorders: multi-scale reasoning and computational guide. Front Cell Dev Biol 2024; 12:1376639. [PMID: 39015651 PMCID: PMC11249571 DOI: 10.3389/fcell.2024.1376639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 06/10/2024] [Indexed: 07/18/2024] Open
Abstract
The connection and causality between cancer and neurodevelopmental disorders have been puzzling. How can the same cellular pathways, proteins, and mutations lead to pathologies with vastly different clinical presentations? And why do individuals with neurodevelopmental disorders, such as autism and schizophrenia, face higher chances of cancer emerging throughout their lifetime? Our broad review emphasizes the multi-scale aspect of this type of reasoning. As these examples demonstrate, rather than focusing on a specific organ system or disease, we aim at the new understanding that can be gained. Within this framework, our review calls attention to computational strategies which can be powerful in discovering connections, causalities, predicting clinical outcomes, and are vital for drug discovery. Thus, rather than centering on the clinical features, we draw on the rapidly increasing data on the molecular level, including mutations, isoforms, three-dimensional structures, and expression levels of the respective disease-associated genes. Their integrated analysis, together with chromatin states, can delineate how, despite being connected, neurodevelopmental disorders and cancer differ, and how the same mutations can lead to different clinical symptoms. Here, we seek to uncover the emerging connection between cancer, including pediatric tumors, and neurodevelopmental disorders, and the tantalizing questions that this connection raises.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Bengi Ruken Yavuz
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
| | | | - M. Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, Türkiye
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
| | - Nurcan Tuncbag
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Türkiye
- School of Medicine, Koc University, Istanbul, Türkiye
- Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Türkiye
| |
Collapse
|
98
|
Azriel D, Rinott Y, Tal O, Abbou B, Rappoport N. Surgery Duration Prediction Using Multi-Task Feature Selection. IEEE J Biomed Health Inform 2024; 28:4216-4223. [PMID: 38457316 DOI: 10.1109/jbhi.2024.3374783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Efficient optimization of operating room (OR) activity poses a significant challenge for hospital managers due to the complex and risky nature of the environment. The traditional "one size fits all" approach to OR scheduling is no longer practical, and personalized medicine is required to meet the diverse needs of patients, care providers, medical procedures, and system constraints within limited resources. This paper aims to introduce a scientific and practical tool for predicting surgery durations and improving OR performance for maximum benefit to patients and the hospital. Previous works used machine-learning models for surgery duration prediction based on preoperative data. The models consider covariates known to the medical staff at the time of scheduling the surgery. Given a large number of covariates, model selection becomes crucial, and the number of covariates used for prediction depends on the available sample size. Our proposed approach utilizes multi-task regression to select a common subset of predicting covariates for all tasks with the same sample size while allowing the model's coefficients to vary between them. A regression task can refer to a single surgeon or operation type or the interaction between them. By considering these diverse factors, our method provides an overall more accurate estimation of the surgery durations, and the selected covariates that enter the model may help to identify the resources required for a specific surgery. We found that when the regression tasks were surgeon-based or based on the pair of operation type and surgeon, our suggested approach outperformed the compared baseline suggested in a previous study. However, our approach failed to reach the baseline for an operation-type-based task. By accurately estimating surgery durations, hospital managers can provide care to a greater number of patients, optimize resource allocation and utilization, and reduce waste. This research contributes to the advancement of personalized medicine and provides a valuable tool for improving operational efficiency in the dynamic world of medicine.
Collapse
|
99
|
Vens C, van Luijk P, Vogelius RI, El Naqa I, Humbert-Vidan L, von Neubeck C, Gomez-Roman N, Bahn E, Brualla L, Böhlen TT, Ecker S, Koch R, Handeland A, Pereira S, Possenti L, Rancati T, Todor D, Vanderstraeten B, Van Heerden M, Ullrich W, Jackson M, Alber M, Marignol L. A joint physics and radiobiology DREAM team vision - Towards better response prediction models to advance radiotherapy. Radiother Oncol 2024; 196:110277. [PMID: 38670264 DOI: 10.1016/j.radonc.2024.110277] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/21/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024]
Abstract
Radiotherapy developed empirically through experience balancing tumour control and normal tissue toxicities. Early simple mathematical models formalized this practical knowledge and enabled effective cancer treatment to date. Remarkable advances in technology, computing, and experimental biology now create opportunities to incorporate this knowledge into enhanced computational models. The ESTRO DREAM (Dose Response, Experiment, Analysis, Modelling) workshop brought together experts across disciplines to pursue the vision of personalized radiotherapy for optimal outcomes through advanced modelling. The ultimate vision is leveraging quantitative models dynamically during therapy to ultimately achieve truly adaptive and biologically guided radiotherapy at the population as well as individual patient-based levels. This requires the generation of models that inform response-based adaptations, individually optimized delivery and enable biological monitoring to provide decision support to clinicians. The goal is expanding to models that can drive the realization of personalized therapy for optimal outcomes. This position paper provides their propositions that describe how innovations in biology, physics, mathematics, and data science including AI could inform models and improve predictions. It consolidates the DREAM team's consensus on scientific priorities and organizational requirements. Scientifically, it stresses the need for rigorous, multifaceted model development, comprehensive validation and clinical applicability and significance. Organizationally, it reinforces the prerequisites of interdisciplinary research and collaboration between physicians, medical physicists, radiobiologists, and computational scientists throughout model development. Solely by a shared understanding of clinical needs, biological mechanisms, and computational methods, more informed models can be created. Future research environment and support must facilitate this integrative method of operation across multiple disciplines.
Collapse
Affiliation(s)
- C Vens
- School of Cancer Science, University of Glasgow, Glasgow, UK; Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands.
| | - P van Luijk
- Department of Biomedical Sciences of Cells and Systems, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands; Department of Radiation Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
| | - R I Vogelius
- Department of Oncology, Rigshospitalet, Copenhagen, Denmark; Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.
| | - I El Naqa
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI 48103, United States.
| | - L Humbert-Vidan
- University of Texas MD Anderson Cancer Centre, Houston, TX, United States; Department of MedicalPhysics, Guy's and St Thomas' NHS Foundation Trust, London, UK; School of Cancer and Pharmaceutical Sciences, Comprehensive Cancer Centre, King's College London, London, UK
| | - C von Neubeck
- Department of Particle Therapy, University Hospital Essen, University of Duisburg-Essen, Essen 45147, Germany
| | - N Gomez-Roman
- Strathclyde Institute of Phrmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - E Bahn
- Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; Heidelberg Institute of Radiation Oncology (HIRO), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany; Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - L Brualla
- West German Proton Therapy Centre Essen (WPE), Essen, Germany; Faculty of Medicine, University of Duisburg-Essen, Germany
| | - T T Böhlen
- Institute of Radiation Physics, Lausanne University Hospital and Lausanne University, Lausanne, Switzerland
| | - S Ecker
- Department of Radiation Oncology, Medical University of Wien, Austria
| | - R Koch
- Department of Particle Therapy, University Hospital Essen, University of Duisburg-Essen, Essen 45147, Germany
| | - A Handeland
- Department of Oncology and Medical Physics, Haukeland University Hospital, Bergen, Norway; Department of Physics and Technology, University of Bergen, Bergen, Norway
| | - S Pereira
- Neolys Diagnostics, 7 Allée de l'Europe, 67960 Entzheim, France
| | - L Possenti
- Data Science Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - T Rancati
- Data Science Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - D Todor
- Department of Radiation Oncology, Virginia Commonwealth University, United States
| | - B Vanderstraeten
- Department of Radiotherapy-Oncology, Ghent University Hospital, Gent, Belgium; Department of Human Structure and Repair, Ghent University, Gent, Belgium
| | - M Van Heerden
- Center for Proton Therapy, Paul Scherrer Institute, Villigen, Switzerland
| | | | - M Jackson
- School of Cancer Science, University of Glasgow, Glasgow, UK
| | - M Alber
- Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; Heidelberg Institute of Radiation Oncology (HIRO), Heidelberg, Germany
| | - L Marignol
- Applied Radiation Therapy Trinity (ARTT), Discipline of Radiation Therapy, School of Medicine, Trinity St. James's Cancer Institute, Trinity College Dublin, University of Dublin, Dublin, Ireland
| |
Collapse
|
100
|
Zemariam AB, Abey W, Kassaw AK, Yimer A. Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS. Health Informatics J 2024; 30:14604582241285769. [PMID: 39270135 DOI: 10.1177/14604582241285769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
Background: Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods: The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results: 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother's occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion: The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.
Collapse
Affiliation(s)
- Alemu Birara Zemariam
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Wondosen Abey
- Departments of Public Health, College of Health Sciences, Woldia University, Woldia, Ethiopia
| | - Abdulaziz Kebede Kassaw
- Department of Health Informatics, School of Public Health, College of Medicine and Health Sciences, Wollo University, Dessie, Ethiopia
| | - Ali Yimer
- Departments of Public Health, College of Health Sciences, Woldia University, Woldia, Ethiopia
| |
Collapse
|