Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

58
(from Reference Citation Analysis)

Article PDFs (5)

Cited by > 0 (23)

Searched Name

Feature importance

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Wu C, Liang Y, Jiang S, Shi Z. Mechanistic and data-driven perspectives on plant uptake of organic pollutants. Sci Total Environ 2024;929:172415. [PMID: 38631647 DOI: 10.1016/j.scitotenv.2024.172415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024]

Abstract

Establishing reliable predictive models for plant uptake of organic pollutants is crucial for environmental risk assessment and guiding phytoremediation efforts. This study compiled an expanded dataset of plant cuticle-water partition coefficients (Kcw), a useful indicator for plant uptake, for 371 data points of 148 unique compounds and various plant species. Quantum/computational chemistry software and tools were utilized to compute various molecular descriptors, aiming to comprehensively characterize the properties and structures of each compound. Three types of models were developed to predict Kcw: a mechanism-driven pp-LFER model, a data-driven machine learning model, and an integrated mechanism-data-driven model. The mechanism-data-driven GBRT-ppLFER model exhibited superior performance, achieving RMSEtrain = 0.133 and RMSEtest = 0.301 while maintaining interpretability. The Shapley Additive Explanation analysis indicated that pp-LFER parameters, ESPI, FwRadicalmax, ExtFP607, and RDF70s are the key factors influencing plant uptake in the GBRT-ppLFER model. Overall, pp-LFER parameter, ESPI, and ExtFP607 show positive effects, while the remaining factors exhibit negative effects. Partial dependency analysis further indicated that plant uptake is not solely determined by individual factors but rather by the combined interactions of multiple factors. Specifically, compounds with ppLFER parameter >4, ESPI > -25.5, 0.098 < FwRadicalmax <0.132, and 2 < RFD70s < 3, are generally more readily taken up by plants. Besides, the predicted Kcw values from the GBRT-ppLFER model were effectively employed to estimate the plant-water partition coefficients and bioconcentration factors across different plant species and growth media (water, sand, and soil), achieving an outstanding performance with an RMSE of 0.497. This study provides effective tools for assessing plant uptake of organic pollutants and deepens our understanding of plant-environment-compound interactions.

Collapse

Schwarz L, Sobania D, Rothlauf F. On relevant features for the recurrence prediction of urothelial carcinoma of the bladder. Int J Med Inform 2024;186:105414. [PMID: 38531255 DOI: 10.1016/j.ijmedinf.2024.105414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/16/2024] [Accepted: 03/11/2024] [Indexed: 03/28/2024]

Abstract

BACKGROUND

Urothelial bladder cancer (UBC) is characterized by a high recurrence rate, which is predicted by scoring systems. However, recent studies show the superiority of Machine Learning (ML) models. Nevertheless, these ML approaches are rarely used in medical practice because most of them are black-box models, that cannot adequately explain how a prediction is made.

OBJECTIVE

We investigate the global feature importance of different ML models. By providing information on the most relevant features, we can facilitate the use of ML in everyday medical practice.

DESIGN, SETTING, AND PARTICIPANTS

The data is provided by the cancer registry Rhineland-Palatinate gGmbH, Germany. It consists of numerical and categorical features of 1,944 patients with UBC. We retrospectively predict 2-year recurrence through ML models using Support Vector Machine, Gradient Boosting, and Artificial Neural Network. We then determine the global feature importance using performance-based Permutation Feature Importance (PFI) and variance-based Feature Importance Ranking Measure (FIRM).

RESULTS

We show reliable recurrence prediction of UBC with 82.02% to 83.89% F1-Score, 83.95% to 84.49% Precision, and an overall performance of 69.20% to 70.82% AUC on testing data, depending on the model. Gradient Boosting performs best among all black-box models with an average F1-Score (83.89%), AUC (70.82%), and Precision (83.95%). Furthermore, we show consistency across PFI and FIRM by identifying the same features as relevant across the different models. These features are exclusively therapeutic measures and are consistent with findings from both medical research and clinical trials.

CONCLUSIONS

We confirm the superiority of ML black-box models in predicting UBC recurrence compared to more traditional logistic regression. In addition, we present an approach that increases the explanatory power of black-box models by identifying the underlying influence of input features, thus facilitating the use of ML in clinical practice and therefore providing improved recurrence prediction through the application of black-box models.

Collapse

Kurotani A, Miyamoto H, Kikuchi J. Validation of causal inference data using DirectLiNGAM in an environmental small-scale model and calculation settings. MethodsX 2024;12:102528. [PMID: 38274701 PMCID: PMC10809110 DOI: 10.1016/j.mex.2023.102528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open

Wang Y, Liu J, Chen S, Zheng C, Zou X, Zhou Y. Exploring risk factors and their differences on suicidal ideation and suicide attempts among depressed adolescents based on decision tree model. J Affect Disord 2024;352:87-100. [PMID: 38360368 DOI: 10.1016/j.jad.2024.02.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 02/04/2024] [Accepted: 02/11/2024] [Indexed: 02/17/2024]

Abstract

BACKGROUND

Suicide has been recognized as a major global public health issue. Depressed adolescents are more prone to experiencing it. We explore risk factors and their differences on suicidal ideation and suicide attempts to further enhance our understanding of suicidal behavior.

METHODS

2343 depressed adolescents aged 12-18 from 9 provinces/cities in China participated in this cross-sectional study. We utilized decision tree model, incorporating 32 factors encompassing participants' suicidal behavior. The feature importance of each factor was measured using Gini coefficients.

RESULTS

The decision tree model demonstrated a good fit with high accuracy (SI = 0.86, SA = 0.85 and F-Score (SI = 0.85, SA = 0.83). The predictive importance of each factor varied between groups with suicidal ideation and with suicide attempts. The most significant risk factor in both groups was depression (SI = 16.7 %, SA = 19.8 %). However, factors such as academic stress (SI = 7.2 %, SA = 1.6 %), hopelessness (SI = 9.1 %, SA = 5.0 %), and age (SI = 7.1 %, SA = 3.2 %) were more closely associated with suicidal ideation than suicide attempts. Factors related to the schooling status (SI = 3.5 %, SA = 10.1 %), total years of education (SI = 2.6 %, SA = 8.6 %), and loneliness (SI = 2.3 %, SA = 7.4 %) were relatively more important in the suicide attempt stage compared to suicidal ideation.

LIMITATIONS

The cross-sectional design limited the ability to capture changes in suicidal behavior among depressed adolescents over time. Possible bias may exist in the measurement of suicidal ideation.

CONCLUSION

The relative importance of each risk factor for suicidal ideation and attempted suicide varies. These findings provide further empirical evidence for understanding suicide behavior. Targeted treatment measures should be taken for different stages of suicide in clinical interventions.

Collapse

Bhandari P, Lee TG. Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values. J Appl Genet 2024;65:283-286. [PMID: 38170439 DOI: 10.1007/s13353-023-00815-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 11/17/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]

Bhattarai P, Thakuri DS, Nie Y, Chand GB. Explainable AI-based Deep-SHAP for mapping the multivariate relationships between regional neuroimaging biomarkers and cognition. Eur J Radiol 2024;174:111403. [PMID: 38452732 DOI: 10.1016/j.ejrad.2024.111403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/16/2024] [Accepted: 03/01/2024] [Indexed: 03/09/2024]

Su G, Jiang P. Machine learning models for predicting biochar properties from lignocellulosic biomass torrefaction. Bioresour Technol 2024;399:130519. [PMID: 38437964 DOI: 10.1016/j.biortech.2024.130519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/14/2024] [Accepted: 02/29/2024] [Indexed: 03/06/2024]

Meinke C, Lueken U, Walter H, Hilbert K. Predicting treatment outcome based on resting-state functional connectivity in internalizing mental disorders: A systematic review and meta-analysis. Neurosci Biobehav Rev 2024;160:105640. [PMID: 38548002 DOI: 10.1016/j.neubiorev.2024.105640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 02/29/2024] [Accepted: 03/21/2024] [Indexed: 04/07/2024]

Meng H, Wagner C, Triguero I. SEGAL time series classification - Stable explanations using a generative model and an adaptive weighting method for LIME. Neural Netw 2024;176:106345. [PMID: 38733798 DOI: 10.1016/j.neunet.2024.106345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 04/23/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024]

Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024;14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open

Abstract

To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.

Collapse

Sylvester S, Sagehorn M, Gruber T, Atzmueller M, Schöne B. SHAP value-based ERP analysis (SHERPA): Increasing the sensitivity of EEG signals with explainable AI methods. Behav Res Methods 2024:10.3758/s13428-023-02335-7. [PMID: 38453828 DOI: 10.3758/s13428-023-02335-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2023] [Indexed: 03/09/2024]

Zhang Y, Shangguan C, Zhang X, Ma J, He J, Jia M, Chen N. Computer-Aided Diagnosis of Complications After Liver Transplantation Based on Transfer Learning. Interdiscip Sci 2024;16:123-140. [PMID: 37875773 DOI: 10.1007/s12539-023-00588-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 09/20/2023] [Accepted: 09/22/2023] [Indexed: 10/26/2023]

Rios Fuck JV, Cechinel MAP, Neves J, Campos de Andrade R, Tristão R, Spogis N, Riella HG, Soares C, Padoin N. Predicting effluent quality parameters for wastewater treatment plant: A machine learning-based methodology. Chemosphere 2024;352:141472. [PMID: 38382719 DOI: 10.1016/j.chemosphere.2024.141472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 02/23/2024]

Sun K, Lan T, Goh YM, Safiena S, Huang YH, Lytle B, He Y. An interpretable clustering approach to safety climate analysis: Examining driver group distinctions. Accid Anal Prev 2024;196:107420. [PMID: 38159513 DOI: 10.1016/j.aap.2023.107420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 11/23/2023] [Accepted: 12/01/2023] [Indexed: 01/03/2024]

Abstract

The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these problems, this study introduces an interpretable clustering approach for safety climate analysis. This study compares five algorithms for clustering truck drivers based on their safety climate perceptions. It also proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). Then, to better interpret the clustering results, this study introduces different interpretable machine learning measures (Shapley additive explanations, permutation feature importance, and QPDP). The Python code used in this study is available at https://github.com/NUS-DBE/truck-driver-safety-climate. This study explains the clusters based on the importance of different safety climate factors. Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. Moreover, it showcases the advantages of employing machine learning techniques, such as cluster analysis, to enrich the scientific knowledge in this field. Future studies could involve experimental methods to assess strategies for enhancing supervisory care promotion, as well as integrating deep learning clustering techniques with safety climate evaluation.

Collapse

Wang J, Sourlos N, Heuvelmans M, Prokop M, Vliegenthart R, van Ooijen P. Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules. Comput Biol Med 2024;169:107871. [PMID: 38154157 DOI: 10.1016/j.compbiomed.2023.107871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 11/01/2023] [Accepted: 12/17/2023] [Indexed: 12/30/2023]

Lee S, Yoo S. InterDILI: interpretable prediction of drug-induced liver injury through permutation feature importance and attention mechanism. J Cheminform 2024;16:1. [PMID: 38173043 PMCID: PMC10765872 DOI: 10.1186/s13321-023-00796-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/17/2023] [Indexed: 01/05/2024] Open

Abstract

Safety is one of the important factors constraining the distribution of clinical drugs on the market. Drug-induced liver injury (DILI) is the leading cause of safety problems produced by drug side effects. Therefore, the DILI risk of approved drugs and potential drug candidates should be assessed. Currently, in vivo and in vitro methods are used to test DILI risk, but both methods are labor-intensive, time-consuming, and expensive. To overcome these problems, many in silico methods for DILI prediction have been suggested. Previous studies have shown that DILI prediction models can be utilized as prescreening tools, and they achieved a good performance. However, there are still limitations in interpreting the prediction results. Therefore, this study focused on interpreting the model prediction to analyze which features could potentially cause DILI. For this, five publicly available datasets were collected to train and test the model. Then, various machine learning methods were applied using substructure and physicochemical descriptors as inputs and the DILI label as the output. The interpretation of feature importance was analyzed by recognizing the following general-to-specific patterns: (i) identifying general important features of the overall DILI predictions, and (ii) highlighting specific molecular substructures which were highly related to the DILI prediction for each compound. The results indicated that the model not only captured the previously known properties to be related to DILI but also proposed a new DILI potential substructural of physicochemical properties. The models for the DILI prediction achieved an area under the receiver operating characteristic (AUROC) of 0.88-0.97 and an area under the Precision-Recall curve (AUPRC) of 0.81-0.95. From this, we hope the proposed models can help identify the potential DILI risk of drug candidates at an early stage and offer valuable insights for drug development.

Collapse

Chen J, Kuhn LA, Raschka S. Techniques for Developing Reliable Machine Learning Classifiers Applied to Understanding and Predicting Protein:Protein Interaction Hot Spots. Methods Mol Biol 2024;2714:235-268. [PMID: 37676603 DOI: 10.1007/978-1-0716-3441-7_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]

Abstract

With machine learning now transforming the sciences, successful prediction of biological structure or activity is mainly limited by the extent and quality of data available for training, the astute choice of features for prediction, and thorough assessment of the robustness of prediction on a variety of new cases. In this chapter, we address these issues while developing and sharing protocols to build a robust dataset and rigorously compare several predictive classifiers using the open-source Python machine learning library, scikit-learn. We show how to evaluate whether enough data has been used for training and whether the classifier has been overfit to training data. The most telling experiment is 500-fold repartitioning of the training and test sets, followed by prediction, which gives a good indication of whether a classifier performs consistently well on different datasets. An intuitive method is used to quantify which features are most important for correct prediction.The resulting well-trained classifier, hotspotter, can robustly predict the small subset of amino acid residues on the surface of a protein that are energetically most important for binding a protein partner: the interaction hot spots. Hotspotter has been trained and tested here on a curated dataset assembled from 1046 non-redundant alanine scanning mutation sites with experimentally measured change in binding free energy values from 97 different protein complexes; this dataset is available to download. The accessible surface area of the wild-type residue at a given site and its degree of evolutionary conservation proved the most important features to identify hot spots. A variant classifier was trained and validated for proteins where only the amino acid sequence is available, augmented by secondary structure assignment. This version of hotspotter requiring fewer features is almost as robust as the structure-based classifier. Application to the ACE2 (angiotensin converting enzyme 2) receptor, which mediates COVID-19 virus entry into human cells, identified the critical hot spot triad of ACE2 residues at the center of the small interface with the CoV-2 spike protein. Hotspotter results can be used to guide the strategic design of protein interfaces and ligands and also to identify likely interfacial residues for protein:protein docking.

Collapse

Cottin A, Zulian M, Pécuchet N, Guilloux A, Katsahian S. MS-CPFI: A model-agnostic Counterfactual Perturbation Feature Importance algorithm for interpreting black-box Multi-State models. Artif Intell Med 2024;147:102741. [PMID: 38184354 DOI: 10.1016/j.artmed.2023.102741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 11/17/2023] [Accepted: 11/28/2023] [Indexed: 01/08/2024]

Abstract

Multi-state processes (Webster, 2019) are commonly used to model the complex clinical evolution of diseases where patients progress through different states. In recent years, machine learning and deep learning algorithms have been proposed to improve the accuracy of these models' predictions (Wang et al., 2019). However, acceptability by patients and clinicians, as well as for regulatory compliance, require interpretability of these algorithms's predictions. Existing methods, such as the Permutation Feature Importance algorithm, have been adapted for interpreting predictions in black-box models for 2-state processes (corresponding to survival analysis). For generalizing these methods to multi-state models, we introduce a novel model-agnostic interpretability algorithm called Multi-State Counterfactual Perturbation Feature Importance (MS-CPFI) that computes feature importance scores for each transition of a general multi-state model, including survival, competing-risks, and illness-death models. MS-CPFI uses a new counterfactual perturbation method that allows interpreting feature effects while capturing the non-linear effects and potentially capturing time-dependent effects. Experimental results on simulations show that MS-CPFI increases model interpretability in the case of non-linear effects. Additionally, results on a real-world dataset for patients with breast cancer confirm that MS-CPFI can detect clinically important features and provide information on the disease progression by displaying features that are protective factors versus features that are risk factors for each stage of the disease. Overall, MS-CPFI is a promising model-agnostic interpretability algorithm for multi-state models, which can improve the interpretability of machine learning and deep learning algorithms in healthcare.

Collapse

Shi X, Cui Y, Wang S, Pan Y, Wang B, Lei M. Development and validation of a web-based artificial intelligence prediction model to assess massive intraoperative blood loss for metastatic spinal disease using machine learning techniques. Spine J 2024;24:146-160. [PMID: 37704048 DOI: 10.1016/j.spinee.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/01/2023] [Accepted: 09/02/2023] [Indexed: 09/15/2023]

Abstract

BACKGROUND CONTEXT

Intraoperative blood loss is a significant concern in patients with metastatic spinal disease. Early identification of patients at high risk of experiencing massive intraoperative blood loss is crucial as it allows for the development of appropriate surgical plans and facilitates timely interventions. However, accurate prediction of intraoperative blood loss remains limited based on prior studies.

PURPOSE

The purpose of this study was to develop and validate a web-based artificial intelligence (AI) model to predict massive intraoperative blood loss during surgery for metastatic spinal disease.

STUDY DESIGN/SETTING

An observational cohort study.

PATIENT SAMPLE

Two hundred seventy-six patients with metastatic spinal tumors undergoing decompressive surgery from two hospitals were included for analysis. Of these, 200 patients were assigned to the derivation cohort for model development and internal validation, while the remaining 76 were allocated to the external validation cohort.

OUTCOME MEASURES

The primary outcome was massive intraoperative blood loss defined as an estimated blood loss of 2,500 cc or more.

METHODS

Data on patients' demographics, tumor conditions, oncological therapies, surgical strategies, and laboratory examinations were collected in the derivation cohort. SMOTETomek resampling (which is a combination of Synthetic Minority Oversampling Technique and Tomek Links Undersampling) was performed to balance the classes of the dataset and obtain an expanded dataset. The patients were randomly divided into two groups in a proportion of 7:3, with the most used for model development and the remaining for internal validation. External validation was performed in another cohort of 76 patients with metastatic spinal tumors undergoing decompressive surgery from a teaching hospital. The logistic regression (LR) model, and five machine learning models, including K-Nearest Neighbor (KNN), Decision Tree (DT), XGBoosting Machine (XGBM), Random Forest (RF), and Support Vector Machine (SVM), were used to develop prediction models. Model prediction performance was evaluated using area under the curve (AUC), recall, specificity, F1 score, Brier score, and log loss. A scoring system incorporating 10 evaluation metrics was developed to comprehensively evaluate the prediction performance.

RESULTS

The incidence of massive intraoperative blood loss was 23.50% (47/200). The model features were comprised of five clinical variables, including tumor type, smoking status, Eastern Cooperative Oncology Group (ECOG) score, surgical process, and preoperative platelet level. The XGBM model performed the best in AUC (0.857 [95% CI: 0.827, 0.877]), accuracy (0.771), recall (0.854), F1 score (0.787), Brier score (0.150), and log loss (0.461), and the RF model ranked second in AUC (0.826 [95% CI: 0.793, 0.861]) and precise (0.705), whereas the AUC of the LR model was only 0.710 (95% CI: 0.665, 0.771), the accuracy was 0.627, the recall was 0.610, and the F1 score was 0.617. According to the scoring system, the XGBM model obtained the highest total score of 55, which signifies the best predictive performance among the evaluated models. External validation showed that the AUC of the XGBM model was also up to 0.809 (95% CI: 0.778, 0.860) and the accuracy was 0.733. The XGBM model, was further deployed online, and can be freely accessed at https://starxueshu-massivebloodloss-main-iudy71.streamlit.app/.

CONCLUSIONS

The XGBM model may be a useful AI tool to assess the risk of intraoperative blood loss in patients with metastatic spinal disease undergoing decompressive surgery.

Collapse

Bhattarai P, Taha A, Soni B, Thakuri DS, Ritter E, Chand GB. Predicting cognitive dysfunction and regional hubs using Braak staging amyloid-beta biomarkers and machine learning. Brain Inform 2023;10:33. [PMID: 38043122 PMCID: PMC10694120 DOI: 10.1186/s40708-023-00213-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/21/2023] [Indexed: 12/05/2023] Open

Kopitar L, Kokol P, Stiglic G. Hybrid visualization-based framework for depressive state detection and characterization of atypical patients. J Biomed Inform 2023;147:104535. [PMID: 37926393 DOI: 10.1016/j.jbi.2023.104535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 10/30/2023] [Accepted: 11/02/2023] [Indexed: 11/07/2023]

Okagbue HI, Ijezie OA, Ugwoke PO, Adeyemi-Kayode TM, Jonathan O. Single-label machine learning classification revealed some hidden but inter-related causes of five psychotic disorder diseases. Heliyon 2023;9:e19422. [PMID: 37674848 PMCID: PMC10477489 DOI: 10.1016/j.heliyon.2023.e19422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 08/04/2023] [Accepted: 08/22/2023] [Indexed: 09/08/2023] Open

Abstract

Psychotic disorder diseases (PDD) or mental illnesses are group of illnesses that affect the minds and impair the cognitive ability, retard emotional ability and obstruct the process of communication and relationship with others and are characterized by delusions, hallucinations and disoriented or disordered pattern of thinking. Prognosis of PDD is not sufficient because of the nature of the diseases and as such adequate form of diagnosis is required to detect, manage and treat the illness. This paper applied the single-label classification (SLC) machine learning approach in mining of electronic health records of people with PDD in Nigeria using eleven independent (demographic) variables and five PDD as target variables. The five PDDs are Insomnia, Schizophrenia, Minimal Brain dysfunction (MBD), which is also known as Attention-Deficit/Hyperactivity Disorder (ADHD), Vascular Dementia (VD) and Bipolar Disorder (BD). The aim of using SLC is that it would be easier to detect some PDDs that are related to each other without the loss of information, which is a plus over multi-label classification (MLC). ReliefF algorithm was used at each experiment to precipitate the order of importance of the independent variables and redundant variables were excluded from the analysis. The order of the variables in feature selection was matched with feature importance after the classifications and quantified using the Spearman rank correlation coefficient. The data was divided into: 70% for training and 30% for testing. Four new performance metrics adapted from the root mean square (RMSE) were proposed and used to measure the differences between the performance results of the 10 Machine learning models in terms of the training and testing and secondly, feature and without feature selection. The new metrics are close to zero which is an indication that the use of feature selection and cross validation may not greatly affects the accuracy of the SLC. When the PDDs are included as predictors for classifying others, there was a tremendous improvement as revealed by the four new metrics for classification accuracy (CA), precision and recall. Analysis of variance showed the four different metrics differs significantly for classification accuracy (CA) and precision. However, there were no significant difference between the CA and precision when the duo are compared together across the four evaluation metrics at p value less than 0.05.

Collapse

Huang G, Liu H, Gong S, Ge Y. Survival Prediction After Transarterial Chemoembolization for Hepatocellular Carcinoma: a Deep Multitask Survival Analysis Approach. J Healthc Inform Res 2023;7:332-358. [PMID: 37637721 PMCID: PMC10449707 DOI: 10.1007/s41666-023-00139-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 02/20/2023] [Accepted: 07/16/2023] [Indexed: 08/29/2023]

Abstract

The accurate prediction of postoperative survival time of patients with Barcelona Clinic Liver Cancer (BCLC) stage B hepatocellular carcinoma (HCC) is important for postoperative health care. Survival analysis is a common method used to predict the occurrence time of events of interest in the medical field. At present, the mainstream survival analysis models, such as the Cox proportional risk model, should make strict assumptions about the potential random process to solve the censored data, thus potentially limiting their application in clinical practice. In this paper, we propose a novel deep multitask survival model (DMSM) to analyze HCC survival data. Specifically, DMSM transforms the traditional survival time prediction problem of patients with HCC into a survival probability prediction problem at multiple time points and applies entropy regularization and ranking loss to optimize a multitask neural network. Compared with the traditional methods of deleting censored data and strong hypothesis, DMSM makes full use of all the information in the censored data but does not need to make any assumption. In addition, we identify the risk factors affecting the prognosis of patients with HCC and visualize the importance of ranking these factors. On the basis of the analysis of a real dataset of patients with BCLC stage B HCC, experimental results on three different validation datasets show that the DMSM achieves competitive performance with concordance index of 0.779, 0.727, and 0.780 and integrated Brier score (IBS) of 0.172, 0.138, and 0.135, respectively. Our DMSM has a comparatively small standard deviation (0.002, 0.002, and 0.003) for IBS of bootstrapping 100 times. The DMSM we proposed can be utilized as an effective survival analysis model and provide an important means for the accurate prediction of postoperative survival time of patients with BCLC stage B HCC.

Collapse

Feng Y, Park J. Using machine learning-based binary classifiers for predicting organizational members' user satisfaction with collaboration software. PeerJ Comput Sci 2023;9:e1481. [PMID: 37547399 PMCID: PMC10403168 DOI: 10.7717/peerj-cs.1481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 06/14/2023] [Indexed: 08/08/2023]

Abstract

Background

In today's digital economy, enterprises are adopting collaboration software to facilitate digital transformation. However, if employees are not satisfied with the collaboration software, it can hinder enterprises from achieving the expected benefits. Although existing literature has contributed to user satisfaction after the introduction of collaboration software, there are gaps in predicting user satisfaction before its implementation. To address this gap, this study offers a machine learning-based forecasting method.

Methods

We utilized national public data provided by the national information society agency of South Korea. To enable the data to be used in a machine learning-based binary classifier, we discretized the predictor variable. We then validated the effectiveness of our prediction model by calculating feature importance scores and prediction accuracy.

Results

We identified 10 key factors that can predict user satisfaction. Furthermore, our analysis indicated that the naive Bayes (NB) classifier achieved the highest prediction accuracy rate of 0.780, followed by logistic regression (LR) at 0.767, extreme gradient boosting (XGBoost) at 0.744, support vector machine (SVM) at 0.744, K-nearest neighbor (KNN) at 0.707, and decision tree (DT) at 0.637.

Conclusions

This research identifies essential indicators that can predict user satisfaction with collaboration software across four levels: institutional guidance, information and communication technology (ICT) environment, company culture, and demographics. Enterprises can use this information to evaluate their current collaboration status and develop strategies for introducing collaboration software. Furthermore, this study presents a novel approach to predicting user satisfaction and confirm the effectiveness of the machine learning-based prediction method proposed in this study, adding to the existing knowledge on the subject.

Collapse

Guo C, Wan D, Li Y, Zhu Q, Luo Y, Luo W, Cui Y. Quantitative prediction of the hydraulic performance of free water surface constructed wetlands by integrating numerical simulation and machine learning. J Environ Manage 2023;337:117745. [PMID: 36965370 DOI: 10.1016/j.jenvman.2023.117745] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 02/24/2023] [Accepted: 03/13/2023] [Indexed: 06/18/2023]

Abstract

Quantitative prediction of the design parameter-influenced hydraulic performance is significant for optimizing free water surface constructed wetlands (FWS CWs) to reduce point and non-point source pollution and improve land utilization. However, owing to limitations of the test conditions and data scale, a quantitative prediction model of the hydraulic performance under multiple design parameters has not yet been established. In this study, we integrated field test data, mechanism model, statistical regression, and machine learning (ML) to construct such quantitative prediction models. A FWS CW numerical model was established by integrating 13 groups of trace data from field tests. Subsequently, training, test and extension datasets comprising 125 (5^3), 25 (L₂₅(5⁶)) and 16 (L₁₆(4⁴)) data points, respectively, were generated via numerical simulation of multi-level value combination of three quantitative design parameters, namely, water depth, hydraulic loading rate (HLR), and aspect ratio. The short circuit index (φ₁₀), Morrill dispersion index (MDI), hydraulic efficiency (λ) and moment index (MI) were used as representative hydraulic performance indicators. Training set with large samples were analyzed to determine the variation rules of different hydraulic indicators. Based on the control variable method, φ₁₀, λ, and MI grew exponentially with increasing aspect ratio whereas MDI showed a decreasing trend; with increasing water depth, φ₁₀, λ, and MI showed polynomial decreases whereas MDI increased; with increasing HLR, φ₁₀, λ, and MI slowly increased linearly whereas MDI showed the opposite trend. Finally, we constructed models based on multivariate nonlinear regression (MNLR) and ML (random forest (RF), multilayer perceptron (MLP), and support vector regression. The coefficients of determination (R²) of the MNLR and ML models fitting the training and test sets were all greater than 0.9; however, the generalization abilities of different models in the extension set were different. The most robust MLP, MNLR without interaction term, and RF models were recommended as the preferred models to hydraulic performance prediction. The extreme importance of aspect ratio in hydraulic performance was revealed. Thus, gaps in the current understanding of multivariate quantitative prediction of the hydraulic performance of FWS CWs are addressed while providing an avenue for researching FWS CWs in different regions according to local conditions.

Collapse

Wallace ML, Mentch L, Wheeler BJ, Tapia AL, Richards M, Zhou S, Yi L, Redline S, Buysse DJ. Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction. BMC Med Res Methodol 2023;23:144. [PMID: 37337173 DOI: 10.1186/s12874-023-01965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/06/2023] [Indexed: 06/21/2023] Open

Abstract

BACKGROUND

Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages.

METHODS

We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs.

RESULTS

Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs.

CONCLUSIONS

The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.

Collapse

Jung CR, Chen WT, Young LH, Hsiao TC. A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan. Environ Int 2023;175:107937. [PMID: 37088007 DOI: 10.1016/j.envint.2023.107937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 04/12/2023] [Accepted: 04/13/2023] [Indexed: 05/03/2023]

Alfeo AL, Zippo AG, Catrambone V, Cimino MGCA, Toschi N, Valenza G. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput Methods Programs Biomed 2023;236:107550. [PMID: 37086584 DOI: 10.1016/j.cmpb.2023.107550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/14/2023] [Accepted: 04/14/2023] [Indexed: 05/03/2023]

Abstract

BACKGROUND

Explainable artificial intelligence (XAI) is a technology that can enhance trust in mental state classifications by providing explanations for the reasoning behind artificial intelligence (AI) models outputs, especially for high-dimensional and highly-correlated brain signals. Feature importance and counterfactual explanations are two common approaches to generate these explanations, but both have drawbacks. While feature importance methods, such as shapley additive explanations (SHAP), can be computationally expensive and sensitive to feature correlation, counterfactual explanations only explain a single outcome instead of the entire model.

METHODS

To overcome these limitations, we propose a new procedure for computing global feature importance that involves aggregating local counterfactual explanations. This approach is specifically tailored to fMRI signals and is based on the hypothesis that instances close to the decision boundary and their counterfactuals mainly differ in the features identified as most important for the downstream classification task. We refer to this proposed feature importance measure as Boundary Crossing Solo Ratio (BoCSoR), since it quantifies the frequency with which a change in each feature in isolation leads to a change in classification outcome, i.e., the crossing of the model's decision boundary.

RESULTS AND CONCLUSIONS

Experimental results on synthetic data and real publicly available fMRI data from the Human Connect project show that the proposed BoCSoR measure is more robust to feature correlation and less computationally expensive than state-of-the-art methods. Additionally, it is equally effective in providing an explanation for the behavior of any AI model for brain signals. These properties are crucial for medical decision support systems, where many different features are often extracted from the same physiological measures and a gold standard is absent. Consequently, computing feature importance may become computationally expensive, and there may be a high probability of mutual correlation among features, leading to unreliable results from state-of-the-art XAI methods.

Collapse

Zou B, Mi X, Stone E, Zou F. A deep neural network framework to derive interpretable decision rules for accurate traumatic brain injury identification of infants. BMC Med Inform Decis Mak 2023;23:58. [PMID: 37024858 PMCID: PMC10080782 DOI: 10.1186/s12911-023-02155-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 03/15/2023] [Indexed: 04/08/2023] Open

Ekundayo TC, Ijabadeniyi OA, Igbinosa EO, Okoh AI. Using machine learning models to predict the effects of seasonal fluxes on Plesiomonas shigelloides population density. Environ Pollut 2023;317:120734. [PMID: 36455774 DOI: 10.1016/j.envpol.2022.120734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]

Abstract

Seasonal variations (SVs) affect the population density (PD), fate, and fitness of pathogens in environmental water resources and the public health impacts. Therefore, this study is aimed at applying machine learning intelligence (MLI) to predict the impacts of SVs on P. shigelloides population density (PDP) in the aquatic milieu. Physicochemical events (PEs) and PDP from three rivers acquired via standard microbiological and instrumental techniques across seasons were fitted to MLI algorithms (linear regression (LR), multiple linear regression (MR), random forest (RF), gradient boosted machine (GBM), neural network (NN), K-nearest neighbour (KNN), boosted regression tree (BRT), extreme gradient boosting (XGB) regression, support vector regression (SVR), decision tree regression (DTR), M5 pruned regression (M5P), artificial neural network (ANN) regression (with one 10-node hidden layer (ANN10), two 6- and 4-node hidden layers (ANN64), and two 5- and 5-node hidden layers (ANN55)), and elastic net regression (ENR)) to assess the implications of the SVs of PEs on aquatic PDP. The results showed that SVs significantly influenced PDP and PEs in the water (p < 0.0001), exhibiting a site-specific pattern. While MLI algorithms predicted PDP with differing absolute flux magnitudes for the contributing variables, DTR predicted the highest PDP value of 1.707 log unit, followed by XGB (1.637 log unit), but XGB (mean-squared-error (MSE) = 0.0025; root-mean-squared-error (RMSE) = 0.0501; R² =0.998; medium absolute deviation (MAD) = 0.0275) outperformed other models in terms of regression metrics. Temperature and total suspended solids (TSS) ranked first and second as significant factors in predicting PDP in 53.3% (8/15) and 40% (6/15), respectively, of the models, based on the RMSE loss after permutations. Additionally, season ranked third among the 7 models, and turbidity (TBS) ranked fourth at 26.7% (4/15), as the primary significant factor for predicting PDP in the aquatic milieu. The results of this investigation demonstrated that MLI predictive modelling techniques can promisingly be exploited to complement the repetitive laboratory-based monitoring of PDP and other pathogens, especially in low-resource settings, in response to seasonal fluxes and can provide insights into the potential public health risks of emerging pathogens and TSS pollution (e.g., nanoparticles and micro- and nanoplastics) in the aquatic milieu. The model outputs provide low-cost and effective early warning information to assist watershed managers and fish farmers in making appropriate decisions about water resource protection, aquaculture management, and sustainable public health protection.

Collapse

Zhou L, Zheng W, Huang S, Yang X. Integrated radiomics, dose-volume histogram criteria and clinical features for early prediction of saliva amount reduction after radiotherapy in nasopharyngeal cancer patients. Discov Oncol 2022;13:145. [PMID: 36581739 PMCID: PMC9800672 DOI: 10.1007/s12672-022-00606-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 12/15/2022] [Indexed: 12/31/2022] Open

Abstract

PURPOSE

Previously, the evaluation of xerostomia depended on subjective grading systems, rather than the accurate saliva amount reduction. Our aim was to quantify acute xerostomia with reduced saliva amount, and apply radiomics, dose-volume histogram (DVH) criteria and clinical features to predict saliva amount reduction by machine learning techniques.

MATERIAL AND METHODS

Computed tomography (CT) of parotid glands, DVH, and clinical data of 52 patients were collected to extract radiomics, DVH criteria and clinical features, respectively. Firstly, radiomics, DVH criteria and clinical features were divided into 3 groups for feature selection, in order to alleviate the masking effect of the number of features in different groups. Secondly, the top features in the 3 groups composed integrated features, and features selection was performed again for integrated features. In this study, feature selection was used as a combination of eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP) to alleviate multicollinearity. Finally, 6 machine learning techniques were used for predicting saliva amount reduction. Meanwhile, top radiomics features were modeled using the same machine learning techniques for comparison.

RESULT

17 integrated features (10 radiomics, 4 clinical, 3 DVH criteria) were selected to predict saliva amount reduction, with a mean square error (MSE) of 0.6994 and a R² score of 0.9815. Top 17 and 10 selected radiomics features predicted saliva amount reduction, with MSE of 0.7376, 0.7519, and R² score of 0.9805, 0.9801, respectively.

CONCLUSION

With the same number of features, integrated features (radiomics + DVH criteria + clinical) performed better than radiomics features alone. The important DVH criteria and clinical features mainly included, white blood cells (WBC), parotid_glands_Dmax, Age, parotid_glands_V15, hemoglobin (Hb), BMI and parotid_glands_V45.

Collapse

Ban MJ, Lee DH, Shin SW, Kim K, Kim S, Oa SW, Kim GH, Park YJ, Jin DR, Lee M, Kang JH. Identifying the acute toxicity of contaminated sediments using machine learning models. Environ Pollut 2022;312:120086. [PMID: 36064062 DOI: 10.1016/j.envpol.2022.120086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/03/2022] [Accepted: 08/29/2022] [Indexed: 06/15/2023]

Kim KM, Ahn JH. Machine learning predictions of chlorophyll-a in the Han river basin, Korea. J Environ Manage 2022;318:115636. [PMID: 35777152 DOI: 10.1016/j.jenvman.2022.115636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/20/2022] [Accepted: 06/26/2022] [Indexed: 06/15/2023]

Usta A. Prediction of soil water contents and erodibility indices based on artificial neural networks: using topography and remote sensing. Environ Monit Assess 2022;194:794. [PMID: 36109443 DOI: 10.1007/s10661-022-10465-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 09/08/2022] [Indexed: 06/15/2023]

Abstract

This study aimed to predict some soil water contents and soil erodibility indices with a multilayer perceptron (MLP) artificial neural network (ANN) using remote sensing data (Landsat 8 OLI TIRS) and topographic variables from a digital elevation model (DEM) in a semi-arid ecosystem. In models, the input variables were derived from remote sensing imaging and DEM. The output variables were field capacity, wilting point, aggregate stability index, structural stability index, dispersion ratio, and clay flocculation index. This study was realized in the watersheds of the Koruluk dam, the Kızlarkalesi, and the Telme ponds built for agricultural irrigation in Gümüşhane-Şiran. The soil samples were obtained from two depths (0-10 cm and 10-20 cm) from 59 soil profiles. Besides field capacity, wilting point, and aggregate stability analysis, undispersed/dispersed sand, silt, clay contents, and organic matter analysis were performed due to their strong effect on soil moisture, soil water content, and erodibility indices. The correlation analysis results showed significant relationships between soil characteristics and soil water contents/soil erodibility indices. The remote sensing variables were derived from three Landsat images of 2015 (June, July, and September). The performance results of MLP ANN models predicted for soil water contents and erodibility indices ranged from 0.75 to 0.90 for R², 0.046-4.115 for root mean square error (RMSE), 4.46-6.54 for normalized root mean square error (NRMSE), and 0.042-0.186 for mean absolute error (MAE). Topography was a more significant group of variables that affected soil water contents and soil erodibility indices and the feature importance of topography in the prediction was over 55%. The results showed that the use of topographic variables together with remote sensing variables in MLP ANN modeling increased the performance of the models.

Collapse

Hu Y, Donald C, Giacaman N. A revised application of cognitive presence automatic classifiers for MOOCs: a new set of indicators revealed? Int J Educ Technol High Educ 2022;19:48. [PMID: 36118283 PMCID: PMC9467662 DOI: 10.1186/s41239-022-00353-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 06/10/2022] [Indexed: 06/15/2023]

Kangjae L. Analysis of changes in geographical factors affecting sales in commercial alleys after COVID-19 using machine learning techniques. Heliyon 2022;8:e10708. [PMID: 36158091 PMCID: PMC9484863 DOI: 10.1016/j.heliyon.2022.e10708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/29/2022] Open

Bárcenas R, Fuentes-García R. Risk assessment in COVID-19 patients: A multiclass classification approach. Inform Med Unlocked 2022;32:101023. [PMID: 35873009 PMCID: PMC9295315 DOI: 10.1016/j.imu.2022.101023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/09/2022] [Accepted: 07/14/2022] [Indexed: 11/30/2022] Open

Guo C, Cui Y. Machine learning exhibited excellent advantages in the performance simulation and prediction of free water surface constructed wetlands. J Environ Manage 2022;309:114694. [PMID: 35182978 DOI: 10.1016/j.jenvman.2022.114694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 01/19/2022] [Accepted: 02/06/2022] [Indexed: 06/14/2023]

Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health 2022;22:664. [PMID: 35387629 PMCID: PMC8985311 DOI: 10.1186/s12889-022-13131-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/30/2022] [Indexed: 01/10/2023] Open

Abstract

Background

Metabolic syndrome (MetS) is a complex condition that appears as a cluster of metabolic abnormalities, and is closely associated with the prevalence of various diseases. Early prediction of the risk of MetS in the middle-aged population provides greater benefits for cardiovascular disease-related health outcomes. This study aimed to apply the latest machine learning techniques to find the optimal MetS prediction model for the middle-aged Korean population.

Methods

We retrieved 20 data types from the Korean Medicine Daejeon Citizen Cohort, a cohort study on a community-based population of adults aged 30–55 years. The data included sex, age, anthropometric data, lifestyle-related data, and blood indicators of 1991 individuals. Participants satisfying two (pre-MetS) or ≥ 3 (MetS) of the five NECP-ATP III criteria were included in the MetS group. MetS prediction used nine machine learning models based on the following algorithms: Decision tree, Gaussian Naïve Bayes, K-nearest neighbor, eXtreme gradient boosting (XGBoost), random forest, logistic regression, support vector machine, multi-layer perceptron, and 1D convolutional neural network. All analyses were performed by sequentially inputting the features in three steps according to their characteristics. The models’ performances were compared after applying the synthetic minority oversampling technique (SMOTE) to resolve data imbalance.

Results

MetS was detected in 33.85% of the subjects. Among the MetS prediction models, the tree-based random forest and XGBoost models showed the best performance, which improved with the number of features used. As a measure of the models’ performance, the area under the receiver operating characteristic curve (AUC) increased by up to 0.091 when the SMOTE was applied, with XGBoost showing the highest AUC of 0.851. Body mass index and waist-to-hip ratio were identified as the most important features in the MetS prediction models for this population.

Conclusions

Tree-based machine learning models were useful in identifying MetS with high accuracy in middle-aged Koreans. Early diagnosis of MetS is important and requires a multidimensional approach that includes self-administered questionnaire, anthropometric, and biochemical measurements.

Collapse

Navarrete M, Arthur S, Treder MS, Lewis PA. Ongoing neural oscillations predict the post-stimulus outcome of closed loop auditory stimulation during slow-wave sleep. Neuroimage 2022;253:119055. [PMID: 35276365 DOI: 10.1016/j.neuroimage.2022.119055] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 10/18/2022] Open

Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584. [PMID: 34942412 DOI: 10.1016/j.cmpb.2021.106584] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/08/2021] [Accepted: 12/07/2021] [Indexed: 06/14/2023]

Zafari H, Langlois S, Zulkernine F, Kosowan L, Singer A. AI in predicting COPD in the Canadian population. Biosystems 2021;211:104585. [PMID: 34864143 DOI: 10.1016/j.biosystems.2021.104585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 11/17/2021] [Accepted: 11/23/2021] [Indexed: 12/12/2022]

Jiang F, Ma J. A comprehensive study of macro factors related to traffic fatality rates by XGBoost-based model and GIS techniques. Accid Anal Prev 2021;163:106431. [PMID: 34758411 DOI: 10.1016/j.aap.2021.106431] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 07/09/2021] [Accepted: 09/30/2021] [Indexed: 06/13/2023]

Teufl W, Taetz B, Miezal M, Dindorf C, Fröhlich M, Trinler U, Hogan A, Bleser G. Automated detection and explainability of pathological gait patterns using a one-class support vector machine trained on inertial measurement unit based gait data. Clin Biomech (Bristol, Avon) 2021;89:105452. [PMID: 34481198 DOI: 10.1016/j.clinbiomech.2021.105452] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 08/12/2021] [Accepted: 08/13/2021] [Indexed: 02/07/2023]

Alkadri S, Ledwos N, Mirchi N, Reich A, Yilmaz R, Driscoll M, Del Maestro RF. Utilizing a multilayer perceptron artificial neural network to assess a virtual reality surgical procedure. Comput Biol Med 2021;136:104770. [PMID: 34426170 DOI: 10.1016/j.compbiomed.2021.104770] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/12/2021] [Accepted: 08/13/2021] [Indexed: 11/18/2022]

Abstract

BACKGROUND

Virtual reality surgical simulators are a safe and efficient technology for the assessment and training of surgical skills. Simulators allow trainees to improve specific surgical techniques in risk-free environments. Recently, machine learning has been coupled to simulators to classify performance. However, most studies fail to extract meaningful observations behind the classifications and the impact of specific surgical metrics on the performance. One benefit from integrating machine learning algorithms, such as Artificial Neural Networks, to simulators is the ability to extract novel insights into the composites of the surgical performance that differentiate levels of expertise.

OBJECTIVE

This study aims to demonstrate the benefits of artificial neural network algorithms in assessing and analyzing virtual surgical performances. This study applies the algorithm on a virtual reality simulated annulus incision task during an anterior cervical discectomy and fusion scenario.

DESIGN

An artificial neural network algorithm was developed and integrated. Participants performed the simulated surgical procedure on the Sim-Ortho simulator. Data extracted from the annulus incision task were extracted to generate 157 surgical performance metrics that spanned three categories (motion, safety, and efficiency).

SETTING

Musculoskeletal Biomechanics Research Lab; Neurosurgical Simulation and Artificial Intelligence Learning Center, McGill University, Montreal, Canada.

PARTICIPANTS

Twenty-three participants were recruited and divided into 3 groups: 11 post-residents, 5 senior and 7 junior residents.

RESULTS

An artificial neural network model was trained on nine selected surgical metrics, spanning all three categories and achieved 80% testing accuracy.

CONCLUSIONS

This study outlines the benefits of integrating artificial neural networks to virtual reality surgical simulators in understanding composites of expertise performance.

Collapse

Jia Y, Kaul C, Lawton T, Murray-Smith R, Habli I. Prediction of weaning from mechanical ventilation using Convolutional Neural Networks. Artif Intell Med 2021;117:102087. [PMID: 34127233 DOI: 10.1016/j.artmed.2021.102087] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 05/03/2021] [Accepted: 05/03/2021] [Indexed: 10/21/2022]

Ali Shah SM, Taju SW, Ho QT, Nguyen TTD, Ou YY. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models. Comput Biol Med 2021;131:104259. [PMID: 33581474 DOI: 10.1016/j.compbiomed.2021.104259] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/04/2021] [Accepted: 02/04/2021] [Indexed: 12/14/2022]

Ho LV, Aczon M, Ledbetter D, Wetzel R. Interpreting a recurrent neural network's predictions of ICU mortality risk. J Biomed Inform 2021;114:103672. [PMID: 33422663 DOI: 10.1016/j.jbi.2021.103672] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/30/2020] [Accepted: 01/03/2021] [Indexed: 12/25/2022]

Doppalapudi S, Qiu RG, Badr Y. Lung cancer survival period prediction and understanding: Deep learning approaches. Int J Med Inform 2020;148:104371. [PMID: 33461009 DOI: 10.1016/j.ijmedinf.2020.104371] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 12/16/2020] [Accepted: 12/27/2020] [Indexed: 11/29/2022]

Abstract

INTRODUCTION

Survival period prediction through early diagnosis of cancer has many benefits. It allows both patients and caregivers to plan resources, time and intensity of care to provide the best possible treatment path for the patients. In this paper, by focusing on lung cancer patients, we build several survival prediction models using deep learning techniques to tackle both cancer survival classification and regression problems. We also conduct feature importance analysis to understand how lung cancer patients' relevant factors impact their survival periods. We contribute to identifying an approach to estimate survivability that are commonly and practically appropriate for medical use.

METHODOLOGIES

We have compared the performance across three of the most popular deep learning architectures - Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) while comparing the performing of deep learning models against traditional machine learning models. The data was obtained from the lung cancer section of Surveillance, Epidemiology, and End Results (SEER) cancer registry.

RESULTS

The deep learning models outperformed traditional machine learning models across both classification and regression approaches. We obtained a best of 71.18 % accuracy for the classification approach when patients' survival periods are segmented into classes of '<=6 months',' 0.5 - 2 years' and '>2 years' and Root Mean Squared Error (RMSE) of 13.5 % andR² value of 0.5 for the regression approach for the deep learning models while the traditional machine learning models saturated at 61.12 % classification accuracy and 14.87 % RMSE in regression.

CONCLUSIONS

This approach can be a baseline for early prediction with predictions that can be further improved with more temporal treatment information collected from treated patients. In addition, we evaluated the feature importance to investigate the model interpretability, gaining further insight into the survival analysis models and the factors that are important in cancer survival period prediction.

Collapse

Bedi S, Samal A, Ray C, Snow D. Comparative evaluation of machine learning models for groundwater quality assessment. Environ Monit Assess 2020;192:776. [PMID: 33219864 DOI: 10.1007/s10661-020-08695-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 10/20/2020] [Indexed: 06/11/2023]