1
|
Askar M, Garcia BH, Svendsen K. Exploring Multimorbidity Patterns in older hospitalized Norwegian patients using Network Analysis modularity. Int J Med Inform 2025; 201:105954. [PMID: 40300484 DOI: 10.1016/j.ijmedinf.2025.105954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 03/19/2025] [Accepted: 04/23/2025] [Indexed: 05/01/2025]
Abstract
BACKGROUND Understanding Multimorbidity Patterns (MPs) is crucial for planning healthcare interventions, allocating resources, and improving patients' outcomes. OBJECTIVE We aim to demonstrate the use of Network Analysis (NA) to explore the MPs in hospitalized Norwegian older patients. METHODS We utilized data from the Norwegian Patient Registry (NPR) of all admissions between 2017 and 2019. The study population included patients ≥ 65 years old with two or more different conditions. Multimorbidity was defined as the co-occurrence of two or more associated chronic conditions. Chronic conditions were identified using the Chronic Condition Indicator Refined (CCIR) list. The association between chronic conditions was determined by calculating Relative Risk (RR) and Phi-correlation to detect pairs of conditions that co-occur beyond chance. A multimorbidity network was created, and MPs were detected using Louvain method for community detection. We suggested a clinical interpretation for these MPs. RESULTS A total of 539 chronic conditions were used to create a multimorbidity network revealing several MPs. These modules included patterns of vision and hearing disorders, cardiorenal syndrome, metabolic and cardiovascular disorders, respiratory disorders, endocrine and skin conditions, autoimmune and musculoskeletal disorders, as well as mental and behavioral disorders. Using NA centrality measures, we identified the most influential conditions in each module. An interactive network and sunburst graphs for each module are publicly available. CONCLUSION The study demonstrates the use of NA modularity detection in identifying MPs. The findings highlight the complex interaction of chronic conditions in the elderly and the potential of NA methodology in exploring these relationships.
Collapse
Affiliation(s)
- Mohsen Askar
- Department of Pharmacy, Faculty of Health Sciences, UiT, The Arctic University of Norway, Norway.
| | - Beate Hennie Garcia
- Department of Pharmacy, Faculty of Health Sciences, UiT, The Arctic University of Norway, Norway
| | - Kristian Svendsen
- Department of Pharmacy, Faculty of Health Sciences, UiT, The Arctic University of Norway, Norway
| |
Collapse
|
2
|
Zheng P, He P, Guo Y, Wang Y, Wang Q. Interpretable machine learning model for prediction functional cure in chronic hepatitis B patients receiving Peg-IFN therapy: A multi-center study. Int J Med Inform 2025; 201:105916. [PMID: 40300485 DOI: 10.1016/j.ijmedinf.2025.105916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 04/06/2025] [Accepted: 04/07/2025] [Indexed: 05/01/2025]
Abstract
BACKGROUND Functional cure is the ideal treatment goal for chronic hepatitis B (CHB) treatment. We developed and validated machine learning (ML) models to predict functional cure in CHB patients. METHODS This study retrospectively recruited 534 CHB patients who received Peg-IFN treatment to construct model and 269 patients for external validation. We analyzed three strategies: baseline, week 12, week 24. Seven ML models were constructed using selected variables by Boruta and least absolute shrinkage and selection operator regression algorithm, and performance metrics, including area under the curve (AUC), sensitivity, specificity, and F1 score were applied to determine the best model. We utilized SHapley Additive exPlanation to visualize and interpret the best model and built a website to conveniently predict functional cure of CHB. RESULTS A total of 272 participants were cured in our study. Compared to baseline and week 12 strategies, week 24 using Support Vector Machine (SVM) model can better predict functional cure of CHB, with reliable predictive performance (AUC = 0.981), calibration and clinical applicability in external validation cohort. Age, ALT ratio at week 12, HBsAg at week 24 and HBsAg ratio at week 24 were important features. In order to enhance clinical convenience and effectiveness of the constructed model, a web-based dynamic nomogram was created (Dynamic Nomogram (shinyapps.io)). CONCLUSION This study developed SVM model to predict functional cure in CHB patients treated with Peg-IFN. Furthermore, we also built a website that clinicians can individualized predict the efficacy of Peg-IFN therapy in CHB patients.
Collapse
Affiliation(s)
- Peiyu Zheng
- Department of Infectious Diseases, The First Hospital of Shanxi Medical University, Taiyuan, China; Graduate School of Shanxi Medical University, Taiyuan, China
| | - Peifeng He
- School of Management, Shanxi Medical University, Taiyuan, China; Shanxi Key Laboratory of Big Data for Clinical Decision Research (Shanxi Medical University), Jinzhong, China
| | - Ying Guo
- Department of Liver diseases, Taiyuan Infectious Diseases Hospital, Taiyuan, China
| | - Yan Wang
- Department of Infectious Diseases, Shanxi Bethune Hospital, Taiyuan, China
| | - Qinying Wang
- Department of Infectious Diseases, The First Hospital of Shanxi Medical University, Taiyuan, China.
| |
Collapse
|
3
|
Migiddorj B, Batterham M, Win KT. Systematic literature review on the application of explainable artificial intelligence in palliative care studies. Int J Med Inform 2025; 200:105914. [PMID: 40250167 DOI: 10.1016/j.ijmedinf.2025.105914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 03/02/2025] [Accepted: 04/07/2025] [Indexed: 04/20/2025]
Abstract
BACKGROUND As machine learning models become increasingly prevalent in palliative care, explainability has become a critical factor in their successful deployment in this sensitive field, where decisions can profoundly impact patient health and quality of life. To address these concerns, Explainable AI (XAI) aims to make complex AI models more understandable and trustworthy. OBJECTIVE This study aims to assess the current state of machine learning models in palliative care, specifically focusing on their compliance with the principles of XAI. METHODS A comprehensive literature search in four databases was conducted to identify articles on machine learning in palliative care studies published until May 2024, followed by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guideline. The Checklist for Assessment of Medical Artificial Intelligence was used to evaluate the quality of the studies. RESULTS Mortality and survival prediction were the primary focus areas in 15 (54%) of the included 28 studies. Regarding data explainability, 20 studies (71%) documented their data preprocessing methods. However, a notable concern is that 45% of the studies did not address handling missing data. Across these studies, 74 machine learning algorithms were employed. Complex models, including Random Forest, Support Vector Machines, Gradient Boosting Machines, and Deep Neural Networks, were predominantly used (64%) due to their high predictive power, achieving AUC values between 0.82 and 0.96. Post-hoc explanation techniques were applied in only 11 studies, using seven different XAI techniques, focusing on global explanations to enhance understanding of model behavior. CONCLUSION Given the critical role of AI-driven decisions in patient care, adopting XAI techniques is essential for fostering trust and usability. Although progress has been made, significant gaps persist. A main challenge remains the trade-off between model performance and interpretability, as highly accurate models often lack the transparency required to build trust in clinical settings. Additionally, complex models frequently provide inadequate explanations for their outputs, lack consistent documentation, and have limited XAI applications, reducing the interpretability of machine learning studies for clinicians and decision-makers.
Collapse
Affiliation(s)
- Battushig Migiddorj
- Faculty of Engineering and Information Science, University of Wollongong, Australia.
| | - Marijka Batterham
- Faculty of Engineering and Information Science, University of Wollongong, Australia
| | - Khin Than Win
- Faculty of Engineering and Information Science, University of Wollongong, Australia
| |
Collapse
|
4
|
Sharan RV, Xiong H. Wet and dry cough classification using cough sound characteristics and machine learning: A systematic review. Int J Med Inform 2025; 199:105912. [PMID: 40203586 DOI: 10.1016/j.ijmedinf.2025.105912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 03/10/2025] [Accepted: 04/03/2025] [Indexed: 04/11/2025]
Abstract
BACKGROUND Distinguishing between productive (wet) and non-productive (dry) cough types is important for evaluating respiratory health, assisting in differential diagnosis, and monitoring disease progression. However, assessing cough type through the perception of cough sounds in clinical settings poses challenges due to its subjectivity. Employing objective cough sound analysis holds promise for aiding diagnostic assessments and guiding the management of respiratory conditions. This systematic review aims to assess and summarize the predictive capabilities of machine learning algorithms in analyzing cough sounds to determine cough type. METHOD A systematic search of the Scopus, Medline, and Embase databases conducted on March 8, 2025, yielded three studies that met the inclusion criteria. The quality assessment of these studies was conducted using the checklist for the assessment of medical artificial intelligence (ChAMAI). RESULTS The inter-rater agreement for annotating wet and dry coughs ranged from 0.22 to 0.81 across the three studies. Furthermore, these studies employed diverse inputs for their machine learning algorithms, including different cough sound features and time-frequency representations. The algorithms used ranged from conventional classifiers like logistic regression to neural networks. While the classification accuracy for identifying wet and dry coughs ranged from 78% to 87% across these studies, none of them assessed their algorithms through external validation. CONCLUSION The high variability in inter-rater agreement highlights the subjectivity in manually interpreting cough sounds and underscores the need for objective cough sound analysis methods. The predictive ability of cough-type classification algorithms shows promise in the small number of studies analyzed in this systematic review. However, more studies are needed, particularly those validating their models on independent and external datasets.
Collapse
Affiliation(s)
- Roneel V Sharan
- School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, United Kingdom.
| | - Hao Xiong
- Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
5
|
Boakye NF, O'Toole CC, Jalali A, Hannigan A. Comparing logistic regression and machine learning for obesity risk prediction: A systematic review and meta-analysis. Int J Med Inform 2025; 199:105887. [PMID: 40157246 DOI: 10.1016/j.ijmedinf.2025.105887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/28/2025] [Accepted: 03/19/2025] [Indexed: 04/01/2025]
Abstract
BACKGROUND Logistic regression (LR) has traditionally been the standard method used for predicting binary health outcomes; however, machine learning (ML) methods are increasingly popular. OBJECTIVE This study aimed to compare the performance of ML and LR for obesity risk prediction, identify how LR and ML were being compared, and identify the commonly used ML methods. METHODS We conducted comprehensive searches in PubMed, Scopus, Embase, IEEE Xplore, and Web of Science databases on 24th November 2023, with no restrictions on publication dates. Meta-analyses were performed to quantify the overall predictive performance of the methods using the area under the curve (AUC) for LR, AUC for the best performing ML, as well as the difference in the AUC between the two approaches as the effect measures. RESULTS We included 28 studies out of 913 abstracts screened. Accuracy and sensitivity were the most commonly used performance measures. More than half of the studies used AUC, with no calibration assessment conducted in any of the studies. Decision trees followed by boosting algorithms were the most commonly used ML methods. Seventy-five percent of the studies were at high risk of bias. There were 14 included studies in the meta-analysis. The pooled AUC for LR was 0.75 (95% CI 0.70 to 0.80) and the pooled AUC for ML was 0.76 (95% CI 0.70 to 0.82). The pooled difference in logit(AUC) between ML and LR was 0.13 (95% CI -0.11 to 0.37). CONCLUSION We conclude that there is no significant difference in the performance of ML and LR for obesity risk prediction. However, there is a need for improved quality of reporting of studies, the use of more performance measures particularly calibration, and to validate models in different populations.
Collapse
Affiliation(s)
- Nancy Fosua Boakye
- Research Ireland Centre for Research Training in Foundations of Data Science, Department of Mathematics and Statistics, University of Limerick, Ireland; Health Research Institute (HRI), University of Limerick, Limerick, V94T9PX, Ireland.
| | - Ciarán Courtney O'Toole
- School of Medicine, University of Limerick, Ireland; Health Research Institute (HRI), University of Limerick, Limerick, V94T9PX, Ireland
| | - Amirhossein Jalali
- School of Medicine, University of Limerick, Ireland; Health Research Institute (HRI), University of Limerick, Limerick, V94T9PX, Ireland
| | - Ailish Hannigan
- School of Medicine, University of Limerick, Ireland; Health Research Institute (HRI), University of Limerick, Limerick, V94T9PX, Ireland
| |
Collapse
|
6
|
Zhou Q, He R, Li H, Gu M. Development and validation of a nomogram to predict the risk of in-hospital MACE for emergence NSTE-ACS: A retrospective multicenter study based on the Chinese population. Int J Med Inform 2025; 199:105884. [PMID: 40147416 DOI: 10.1016/j.ijmedinf.2025.105884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 03/04/2025] [Accepted: 03/19/2025] [Indexed: 03/29/2025]
Abstract
PURPOSE Our study aims to develop and validate an effective in-hospital major adverse cardiovascular events(MACE) prediction model for patients with emergency Non-ST elevation acute coronary syndrome(NSTE-ACS). METHODS We retrospectively collected NSTE-ACS patients in three tertiary hospitals in Chongqing. In-hospital MACE was the predicted outcome. Patients from one hospital were divided into training set and internal validation set according to the ratio of 7:3. Besides, 662 patients from two other tertiary hospitals were for external validation. Patient information including demographics, laboratory tests results and disease course records were for comprehensive analysis. Finally, LASSO were used to identify the predictors and develop the model. This model was subsequently visualized as a nomogram, followed by both internal and external validations.The receiver operating characteristic curve, calibration curve and clinical decision curve analysis were used to assess the model's discrimination, calibration and clinical applicability, respectively. RESULTS A total of 3,308 patients were included, 375 of whom developed in-hospital MACE. The LR model demonstrated that length of stay, neutrophils, myoglobin, NYHA, CCI, NT-proBNP, LVEF and respiratory failure were risk factors for in-hospital MACE in emergence NSTE-ACS patients. In the training set, the AUC was 0.860 (95%CI:0.831-0.889). In external validation,the AUC was 0.855(95%CI:0.808-0.902), and both the calibration curve and DCA in validation set also revealed stable predictive accuracy and clinical validity.Additionally,it is available to calculate the MACE risk online via the web page (https://cocozhou99.shinyapps.io/DynNomapp/). CONCLUSION The prediction model we constructed has good predictive performance and can help healthcare professionals accurately assess the risk of in-hospital MACE in emergence NSTE-ACS patients.
Collapse
Affiliation(s)
- Qianhui Zhou
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China
| | - Rui He
- Department of Cardiothoracic Surgery, The First Affiliated Hospital of Chongqing, Medical University, Chongqing, China
| | - Hong Li
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China
| | - Manping Gu
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China.
| |
Collapse
|
7
|
Zantvoort K, Matthiesen JJ, Bjurner P, Bendix M, Brefeld U, Funk B, Kaldo V. The promise and challenges of computer mouse trajectories in DMHIs - A feasibility study on pre-treatment dropout predictions. Internet Interv 2025; 40:100828. [PMID: 40271204 PMCID: PMC12017972 DOI: 10.1016/j.invent.2025.100828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 04/05/2025] [Accepted: 04/08/2025] [Indexed: 04/25/2025] Open
Abstract
With the impetus of Digital Mental Health Interventions (DMHIs), complex data can be leveraged to improve and personalize mental health care. However, most approaches rely on a very limited number of often costly features. Computer mouse trajectories can be unobtrusively and cost-efficiently gathered and seamlessly integrated into current baseline processes. Empirical evidence suggests that mouse movements hold information on user motivation and attention, both valuable aspects otherwise difficult to measure at scale. Further, mouse trajectories can already be collected on pre-treatment questionnaires, making them a promising candidate for early predictions informing treatment allocation. Therefore, this paper discusses how to collect and process mouse trajectory data on questionnaires in DMHIs. Covering different complexity levels, we combine hand-crafted features with non-sequential machine learning models, as well as spatiotemporal raw mouse data with state-of-the-art sequential neural networks. The data processing pipeline for the latter includes task-specific pre-processing to convert the variable length trajectories into a single prediction per user. As a feasibility study, we collected mouse trajectory data from 183 patients filling out a pre-intervention depression questionnaire. While the hand-crafted features slightly improve baseline predictions, the spatiotemporal models underperform. However, considering our small data set size, we propose more research to investigate the potential value of this novel and promising data type and provide the necessary steps and open-source code to do so.
Collapse
Affiliation(s)
- Kirsten Zantvoort
- Institute of Information Systems, Leuphana University, Lüneburg, Germany
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Region Stockholm, Sweden
| | - Jennifer J. Matthiesen
- Institute of Information Systems, Leuphana University, Lüneburg, Germany
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Region Stockholm, Sweden
| | - Pontus Bjurner
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Region Stockholm, Sweden
| | - Marie Bendix
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Region Stockholm, Sweden
- Department of Clinical Sciences, Division of Psychiatry, Umeå University, Umeå, Sweden
| | - Ulf Brefeld
- Institute of Information Systems, Leuphana University, Lüneburg, Germany
| | - Burkhardt Funk
- Institute of Information Systems, Leuphana University, Lüneburg, Germany
| | - Viktor Kaldo
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Region Stockholm, Sweden
- Department of Psychology, Faculty of Health and Life Sciences, Linnaeus University, Växjö, Sweden
| |
Collapse
|
8
|
Vos G, Ebrahimpour M, van Eijk L, Sarnyai Z, Rahimi Azghadi M. Stress monitoring using low-cost electroencephalogram devices: A systematic literature review. Int J Med Inform 2025; 198:105859. [PMID: 40056845 DOI: 10.1016/j.ijmedinf.2025.105859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 02/27/2025] [Accepted: 03/01/2025] [Indexed: 03/10/2025]
Abstract
INTRODUCTION The use of low-cost, consumer-grade wearable health monitoring devices has become increasingly prevalent in mental health research, including stress studies. While cortisol response magnitude remains the gold standard for stress assessment, an expanding body of research employs low-cost EEG devices as primary tools for recording biomarker data, often combined with wrist and ring-based wearables. However, the technical variability among low-cost EEG devices, particularly in sensor count and placement according to the 10-20 Electrode Placement System, poses challenges for reproducibility in study outcomes. OBJECTIVE This review aims to provide an overview of the growing application of low-cost EEG devices and machine learning techniques for assessing brain function, with a focus on stress detection. It also highlights the strengths and weaknesses of various machine learning methods commonly used in stress research, and evaluates the reproducibility of reported findings along with sensor count and placement importance. METHODS A comprehensive review was conducted of published studies utilizing EEG devices for stress detection and their associated machine learning approaches. Searches were performed across databases including Scopus, Google Scholar, ScienceDirect, Nature, and PubMed, yielding 69 relevant articles for analysis. The selected studies were synthesized into four thematic categories: stress assessment using EEG, low-cost EEG devices, datasets for EEG-based stress measurement, and machine learning techniques for EEG-based stress analysis. For machine learning-focused studies, validation and reproducibility methods were critically assessed. Study quality was evaluated and scored using the IJMEDI checklist. RESULTS The review identified several studies employing low-cost EEG devices to monitor brain activity during stress and relaxation phases, with many reporting high predictive accuracy using various machine learning validation techniques. However, only 54% of the studies included health screening prior to experimentation, and 58% were categorized as low-powered due to limited sample sizes. Additionally, few studies validated their results using an independent validation set or cortisol response as a correlating biomarker and there was a lack of consensus on data pre-processing and sensor placement as a key contributor to improving model generalization and accuracy. CONCLUSION Low-cost consumer-grade wearable devices, including EEG and wrist-based monitors, are increasingly utilized in stress-related research, offering promising avenues for non-invasive biomarker monitoring. However, significant gaps remain in standardizing EEG signal processing and sensor placement, both of which are critical for enhancing model generalization and accuracy. Furthermore, the limited use of independent validation sets and cortisol response as correlating biomarkers highlights the need for more robust validation methodologies. Future research should focus on addressing these limitations and establishing consensus on data pre-processing techniques and sensor configurations to improve the reliability and reproducibility of findings in this growing field.
Collapse
Affiliation(s)
- Gideon Vos
- College of Science and Engineering, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia
| | - Maryam Ebrahimpour
- College of Science and Engineering, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia
| | - Liza van Eijk
- College of Health Care Sciences, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia
| | - Zoltan Sarnyai
- College of Public Health, Medical, and Vet Sciences, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia
| | - Mostafa Rahimi Azghadi
- College of Science and Engineering, James Cook University, James Cook Dr, Townsville, 4811, QLD, Australia.
| |
Collapse
|
9
|
Zhang Y, Liu H, Huang Q, Qu W, Shi Y, Zhang T, Li J, Chen J, Shi Y, Deng R, Chen Y, Zhang Z. Predictive value of machine learning for in-hospital mortality risk in acute myocardial infarction: A systematic review and meta-analysis. Int J Med Inform 2025; 198:105875. [PMID: 40073650 DOI: 10.1016/j.ijmedinf.2025.105875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 02/25/2025] [Accepted: 03/07/2025] [Indexed: 03/14/2025]
Abstract
BACKGROUND Machine learning (ML) models have been constructed to predict the risk of in-hospital mortality in patients with myocardial infarction (MI). Due to diverse ML models and modeling variables, along with the significant imbalance in data, the predictive accuracy of these models remains controversial. OBJECTIVE This study aimed to review the accuracy of ML in predicting in-hospital mortality risk in MI patients and to provide evidence-based advices for the development or updating of clinical tools. METHODS PubMed, Embase, Cochrane, and Web of Science databases were searched, up to June 4, 2024. PROBAST and ChAMAI checklist are utilized to assess the risk of bias in the included studies. Since the included studies constructed models based on severely unbalanced datasets, subgroup analyses were conducted by the type of dataset (balanced data, unbalanced data, model type). RESULTS This meta-analysis included 32 studies. In the validation set, the pooled C-index, sensitivity, and specificity of prediction models based on balanced data were 0.83 (95 % CI: 0.795-0.866), 0.81 (95 % CI: 0.79-0.84), and 0.82 (95 % CI: 0.78-0.86), respectively. In the validation set, the pooled C-index, sensitivity, and specificity of ML models based on imbalanced data were 0.815 (95 % CI: 0.789-0.842), 0.66 (95 % CI: 0.60-0.72), and 0.84 (95 % CI: 0.83-0.85), respectively. CONCLUSIONS ML models such as LR, SVM, and RF exhibit high sensitivity and specificity in predicting in-hospital mortality in MI patients. However, their sensitivity is not superior to well-established scoring tools. Mitigating the impact of imbalanced data on ML models remains challenging.
Collapse
Affiliation(s)
- Yuan Zhang
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Huan Liu
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Qingxia Huang
- Research Center of Traditional Chinese Medicine, The First Affiliated Hospital of Changchun University of Chinese Medicine, Changchun, Jilin 130117, China
| | - Wantong Qu
- Department of Cardiology, The First Affiliated Hospital of Changchun University of Chinese Medicine, Changchun 130000 Jilin, China
| | - Yanyu Shi
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Tianyang Zhang
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Jing Li
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Jinjin Chen
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Yuqing Shi
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Ruixue Deng
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, Jilin 130000, China
| | - Ying Chen
- Department of Cardiology, The First Affiliated Hospital of Changchun University of Chinese Medicine, Changchun 130000 Jilin, China.
| | - Zepeng Zhang
- Research Center of Traditional Chinese Medicine, The First Affiliated Hospital of Changchun University of Chinese Medicine, Changchun, Jilin 130117, China.
| |
Collapse
|
10
|
Matsumoto K, Suzuki M, Ishihara K, Tokunaga K, Matsuda K, Chen J, Yamashiro S, Soejima H, Nakashima N, Kamouchi M. Performance of multimodal prediction models for intracerebral hemorrhage outcomes using real-world data. Int J Med Inform 2025; 202:105989. [PMID: 40412140 DOI: 10.1016/j.ijmedinf.2025.105989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 05/05/2025] [Accepted: 05/20/2025] [Indexed: 05/27/2025]
Abstract
BACKGROUND We aimed to develop and validate multimodal models integrating computed tomography (CT) images, text and tabular clinical data to predict poor functional outcomes and in-hospital mortality in patients with intracerebral hemorrhage (ICH). These models were designed to assist non-specialists in emergency settings with limited access to stroke specialists. METHODS A retrospective analysis of 527 patients with ICH admitted to a Japanese tertiary hospital between April 2019 and February 2022 was conducted. Deep learning techniques were used to extract features from three-dimensional CT images and unstructured data, which were then combined with tabular data to develop an L1-regularized logistic regression model to predict poor functional outcomes (modified Rankin scale score 3-6) and in-hospital mortality. The model's performance was evaluated by assessing discrimination metrics, calibration plots, and decision curve analysis (DCA) using temporal validation data. RESULTS The multimodal model utilizing both imaging and text data, such as medical interviews, exhibited the highest performance in predicting poor functional outcomes. In contrast, the model that combined imaging with tabular data, including physiological and laboratory results, demonstrated the best predictive performance for in-hospital mortality. These models exhibited high discriminative performance, with areas under the receiver operating curve (AUROCs) of 0.86 (95% CI: 0.79-0.92) and 0.91 (95% CI: 0.84-0.96) for poor functional outcomes and in-hospital mortality, respectively. Calibration was satisfactory for predicting poor functional outcomes, but requires refinement for mortality prediction. The models performed similar to or better than conventional risk scores, and DCA curves supported their clinical utility. CONCLUSION Multimodal prediction models have the potential to aid non-specialists in making informed decisions regarding ICH cases in emergency departments as part of clinical decision support systems. Enhancing real-world data infrastructure and improving model calibration are essential for successful implementation in clinical practice.
Collapse
Affiliation(s)
- Koutarou Matsumoto
- Department of Health Care Administration and Management, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan; Institute for Medical Information Research and Analysis, Saiseikai Kumamoto Hospital, Kumamoto, Japan.
| | - Masahiro Suzuki
- Graduate Degree Program of Applied Data Sciences, Sophia University, Tokyo, Japan
| | - Kazuaki Ishihara
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Koki Tokunaga
- Department of Pharmacy, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Katsuhiko Matsuda
- Department of Radiology, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Jenhui Chen
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Shigeo Yamashiro
- Division of Neurosurgery, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Hidehisa Soejima
- Institute for Medical Information Research and Analysis, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan; Department of Medical Informatics, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Masahiro Kamouchi
- Department of Health Care Administration and Management, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan; Center for Cohort Studies, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| |
Collapse
|
11
|
Kandaswamy S, Knake LA, Dziorny AC, Hernandez SM, McCoy AB, Hess LM, Orenstein E, White MS, Kirkendall ES, Molloy MJ, Hagedorn PA, Muthu N, Murugan A, Beus JM, Mai M, Luo B, Chaparro JD. Pediatric Predictive Artificial Intelligence Implemented in Clinical Practice from 2010 to 2021: A Systematic Review. Appl Clin Inform 2025; 16:477-487. [PMID: 39837545 PMCID: PMC12119141 DOI: 10.1055/a-2521-1508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 01/20/2025] [Indexed: 01/23/2025] Open
Abstract
To review pediatric artificial intelligence (AI) implementation studies from 2010 to 2021 and analyze reported performance measures.We searched PubMed/Medline, Embase CINHAL, Cochrane Library CENTRAL, IEEE, and Web of Science with controlled vocabulary. Inclusion criteria: AI intervention in a pediatric clinical setting that learns from data (i.e., data-driven, as opposed to rule-based) and takes actions to make patient-specific recommendations; published between 01/2010 and 10/2021; must have agency (AI must provide guidance that affects clinical care, not merely running in the background). We extracted study characteristics, target users, implementation setting, time span, and performance measures.Of 126 articles reviewed as full text, 17 met inclusion criteria. Eight studies (47%) reported both clinical outcomes and process measures, six (35%) reported only process measures and two (12%) reported only clinical outcomes. Five studies (30%) reported no difference in clinical outcomes with AI, four (24%) reported improvement in clinical outcomes compared with controls, two (12%) reported positive effects on clinical outcomes with use of AI but had no formal comparison or controls, and one (6%) reported poor clinical outcomes with AI. Twelve studies (71%) reported improvement in process measures, while two (12%) reported no improvement. Five (30%) studies reported on at least 1 human performance measure.While there are many published pediatric AI models, the number of AI implementations is minimal with no standardized reporting of outcomes, care processes, or human performance measures. More comprehensive evaluations will help elucidate mechanisms of impact.
Collapse
Affiliation(s)
- Swaminathan Kandaswamy
- Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States
| | - Lindsey A. Knake
- Division of Neonatology, Department of Pediatrics, University of Iowa, Iowa City, Iowa, United States
| | - Adam C. Dziorny
- Department of Pediatrics, University of Rochester, Rochester, New York, United States
| | - Sean M. Hernandez
- Primary Care, Miami Veteran's Affairs, Miami, Florida, United States
| | - Allison B. McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Lauren M. Hess
- Pediatric Hospital Medicine, Texas Children's Hospital, Houston, Texas, United States
- Division of Pediatric Hospital Medicine, Department of Pediatrics, Baylor College of Medicine, Houston, Texas, United States
| | - Evan Orenstein
- Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States
- Division of Hospital Medicine, Children's Healthcare of Atlanta, Atlanta, Georgia, United States
| | - Mia S. White
- Woodruff Health Sciences Center Library, Emory University, Atlanta, Georgia, United States
| | - Eric S. Kirkendall
- Department of Pediatrics, Wake Forest University School of Medicine, Center for Healthcare Innovation, Winston-Salem, North Carolina, United States
| | - Matthew J. Molloy
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center and the University of Cincinnati College of Medicine, Cincinnati, Ohio, United States
| | - Philip A. Hagedorn
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center and the University of Cincinnati College of Medicine, Cincinnati, Ohio, United States
| | - Naveen Muthu
- Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States
- Division of Hospital Medicine, Children's Healthcare of Atlanta, Atlanta, Georgia, United States
| | - Avinash Murugan
- Department of Internal Medicine, Yale New Haven Hospital, New Haven, Connecticut, United States
| | - Jonathan M. Beus
- Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States
- Division of Hospital Medicine, Children's Healthcare of Atlanta, Atlanta, Georgia, United States
| | - Mark Mai
- Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States
- Division of Hospital Medicine, Children's Healthcare of Atlanta, Atlanta, Georgia, United States
| | - Brooke Luo
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States
- Department of Pediatrics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Juan D. Chaparro
- Division of Clinical Informatics, Department of Pediatrics, Nationwide Children's Hospital/Ohio State University College of Medicine, Columbus, Ohio, United States
| |
Collapse
|
12
|
Tieliwaerdi X, Manalo K, Abuduweili A, Khan S, Appiah-Kubi E, Williams BA, Oehler AC. Machine Learning-Based Prediction Models for Healthcare Outcomes in Patients Participating in Cardiac Rehabilitation: A Systematic Review. J Cardiopulm Rehabil Prev 2025:01273116-990000000-00203. [PMID: 40257822 DOI: 10.1097/hcr.0000000000000943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/22/2025]
Abstract
PURPOSE Cardiac rehabilitation (CR) has been proven to reduce mortality and morbidity in patients with cardiovascular disease. Machine learning (ML) techniques are increasingly used to predict healthcare outcomes in various fields of medicine including CR. This systemic review aims to perform critical appraisal of existing ML-based prognosis predictive model within CR and identify key research gaps in this area. REVIEW METHODS A systematic literature search was conducted in Scopus, PubMed, Web of Science, and Google Scholar from the inception of each database to January 28, 2024. The data extracted included clinical features, predicted outcomes, model development, and validation as well as model performance metrics. Included studies underwent quality assessments using the IJMEDI and Prediction Model Risk of Bias Assessment Tool checklist. SUMMARY A total of 22 ML-based clinical models from 7 studies across multiple phases of CR were included. Most models were developed using smaller patient cohorts from 41 to 227, with one exception involving 2280 patients. The prediction objectives ranged from patient intention to initiate CR to graduate from outpatient CR along with interval physiological and psychological progression in CR. The best-performing ML models reported area under the receiver operating characteristics curve between 0.82 and 0.91, with sensitivity from 0.77 to 0.95, indicating good prediction capabilities. However, none of them underwent calibration or external validation. Most studies raised concerns about bias. Readiness of these models for implementation into practice is questionable. External validation of existing models and development of new models with robust methodology based on larger populations and targeting diverse clinical outcomes in CR are needed.
Collapse
Affiliation(s)
- Xiarepati Tieliwaerdi
- Author Affiliations: Department of Medicine, Allegheny Health Network, Pittsburgh, Pennsylvania (Drs Tieliwaerdi, Manalo, Khan, and Appiah-kubi); Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania(Dr Abuduweili); and Allegheny Health Network, Allegheny Health Network Cardiovascular Institute, Pittsburgh, Pennsylvania (Drs Williams and Oehler)
| | | | | | | | | | | | | |
Collapse
|
13
|
Miller HA, Valdes R. Rigorous validation of machine learning in laboratory medicine: guidance toward quality improvement. Crit Rev Clin Lab Sci 2025:1-20. [PMID: 40247648 DOI: 10.1080/10408363.2025.2488842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 01/20/2025] [Accepted: 03/31/2025] [Indexed: 04/19/2025]
Abstract
The application of artificial intelligence (AI) in laboratory medicine will revolutionize predictive modeling using clinical laboratory information. Machine learning (ML), a sub-discipline of AI, involves fitting algorithms to datasets and is broadly used for data-driven predictive modeling in various disciplines. The majority of ML studies reported in systematic reviews lack key aspects of quality assurance. In clinical laboratory medicine, it is important to consider how differences in analytical methodologies, assay calibration, harmonization, pre-analytical errors, interferences, and physiological factors affecting measured analyte concentrations may also affect the downstream robustness and reliability of ML models. In this article, we address the need for quality improvement and proper validation of ML classification models, with the goal of bringing attention to key concepts pertinent to researchers, manuscript reviewers, and journal editors within the field of pathology and laboratory medicine. Several existing predictive modeling guidelines and recommendations can be readily adapted to the development of ML models in laboratory medicine. We summarize a basic overview of ML and key points from current guidelines including advantages and pitfalls of applied ML. In addition, we draw a parallel between validation of clinical assays and ML models in the context of current regulatory frameworks. The importance of classification performance metrics, model explainability, and data quality along with recommendations for strengthening journal submission requirements are also discussed. Although the focus of this article is on the application of ML in laboratory medicine, many of these concepts extend into other areas of medicine and biomedical science as well.
Collapse
Affiliation(s)
- Hunter A Miller
- Department of Pathology and Laboratory Medicine, University of Louisville, Louisville, KY, USA
| | - Roland Valdes
- Department of Pathology and Laboratory Medicine, University of Louisville, Louisville, KY, USA
| |
Collapse
|
14
|
Li S, Wang J, Zhang Z, Ren C, He D. Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation. Int J Med Inform 2025; 196:105813. [PMID: 39904180 DOI: 10.1016/j.ijmedinf.2025.105813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 12/27/2024] [Accepted: 01/23/2025] [Indexed: 02/06/2025]
Abstract
PURPOSE Neuroblastoma (NB) is a childhood malignancy with a poor prognosis and a propensity for distant metastasis (DM). We aimed to establish machine learning (ML) based model to accurately predict risk of DM and prognosis of NB patients with DM. METHODS We analyzed NB patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2020. Univariate and multivariate logistic analysis were employed to select meaning variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, Balanced Accuracy and F-beta score. An optimal ML model was constructed to predict DM, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of NB patients with DM. RESULTS A total of 1,668 NB patients from SEER database was consecutively enrolled. We identified that tumor primary site, grade, surgery type, regional lymph nodes, radiotherapy and chemotherapy are significant risk factors for DM. CatBoost model was selected as the best prediction model, and AUC was 0.846 (95 %CI: [0.804,0.899]), 0.834 (95 %CI: [0.796,0.873]) and 0.813 (95 %CI: [0.776,0.852]) in training, internal test and external test sets, with 0.777 accuracy, 0.839 sensitivity, 0.72 specificity and 0.731 precision in training set. Grade, chemotherapy and radiotherapy had the greatest effects on DM according to SHAP results. For prognosis prediction, "RSF + GBM" algorithm was the best prognostic model with C-index of 0.656, 0.611 and 0.629 in training, internal test and external test sets. CONCLUSIONS Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized metastasis diagnosis and prognostic prediction to NB patients.
Collapse
Affiliation(s)
- Shan Li
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing 400014, China; Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing 400014, China; China International Science and Technology Cooperation base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Children's Hospital of Chongqing Medical University, Chongqing 400014, China
| | - Jinkui Wang
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing 400014, China; Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing 400014, China; China International Science and Technology Cooperation base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Children's Hospital of Chongqing Medical University, Chongqing 400014, China
| | - Zhaoxia Zhang
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing 400014, China; Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing 400014, China; China International Science and Technology Cooperation base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Children's Hospital of Chongqing Medical University, Chongqing 400014, China
| | - Chunnian Ren
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing 400014, China; Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing 400014, China; China International Science and Technology Cooperation base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Children's Hospital of Chongqing Medical University, Chongqing 400014, China
| | - Dawei He
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing 400014, China; Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing 400014, China; China International Science and Technology Cooperation base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Children's Hospital of Chongqing Medical University, Chongqing 400014, China.
| |
Collapse
|
15
|
Kierner S, Kierner P, Kucharski J. Combining machine learning models and rule engines in clinical decision systems: Exploring optimal aggregation methods for vaccine hesitancy prediction. Comput Biol Med 2025; 188:109749. [PMID: 39983355 DOI: 10.1016/j.compbiomed.2025.109749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 01/21/2025] [Accepted: 01/22/2025] [Indexed: 02/23/2025]
Abstract
BACKGROUND With the increasing application of artificial intelligence (AI) technologies in the healthcare sector and the emergence of new solutions, such as large language models, there is a growing need to combine medical knowledge, often expressed as clinical rules, with advances in machine learning (ML) offering higher prediction accuracy at the expense of decision-making transparency. PURPOSE This study investigates the efficacy of various aggregation methods combining the decisions of an AI model and a clinical rule-based (RB) engine in predicting vaccine hesitancy to maximize the effectiveness of patient incentive programs. This is the first study of parallel ensemble of rules and machine learning in clinical context proposing RB confidence-led fusion of ML and RB inference. METHODS A clinical decision system for predicting hesitation to vaccinate is developed based on a differentially private set of longitudinal health records of 974,000 US patients and clinical rules obtained from the present literature. Various approaches based on possibility theory have been explored to maximize classification accuracy, capture and hurdle rates while ensuring trustworthiness in clinical interventions. RESULTS Our findings reveal that the hybrid approach outperforms the individual models and RB systems when transparency and accuracy are critical. A RB confidence-led approach emerged as the most effective method. The aggregation of mismatched classes relies on RB results when the RB engine has high confidence (expressed as more than the median degree of membership to the vaccination hesitation output function) and on ML predictions when the RB engine exhibits lower confidence. CONCLUSIONS Implementing such an aggregation method preserves the accuracy and capture rates of a clinical decision system, while potentially improving acceptance among healthcare providers.
Collapse
Affiliation(s)
- Slawomir Kierner
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Piotr Kierner
- Department of Genetics - Blavatnik Institute, Sinclair Lab, Harvard Medical School, D 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Jacek Kucharski
- Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology, 18/22 Stefanowskiego St., Łodź 90-924, Poland
| |
Collapse
|
16
|
Luo Q, Zhang Q, Liu H, Chen X, Yang S, Xu Q. Time-dependent interpretable survival prediction model for second primary NSCLC patients. Int J Med Inform 2025; 195:105771. [PMID: 39721115 DOI: 10.1016/j.ijmedinf.2024.105771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 11/23/2024] [Accepted: 12/19/2024] [Indexed: 12/28/2024]
Abstract
OBJECTIVE Accurate predictive models for second primary non-small cell lung cancer (SP-NSCLC) are limited. This study aimed to develop and validate overall survival (OS) prediction models for SP-NSCLC patients using time-dependent interpretable survival machine learning algorithms. METHODS This study utilized data from the Surveillance, Epidemiology, and End Results (SEER) database, encompassing 8 and 12 registries, to extract data on patients aged 20-89 diagnosed with SP-NSCLC between 1988 and 2020. The dataset was divided into development, external temporal and spatial validation cohorts. Predictors included demographic, clinical, pathological and initial primary cancer-related features. Multiple survival machine learning algorithms were developed and validated, assessing model performance using C-index, time-dependent area under the receiver operating characteristic curve (time-AUC), and time-dependent Brier Score. The time-dependent interpretability analysis was employed to explore the time-dependent feature importance of key predictors. RESULTS The Blackboost model demonstrated excellent performance (C-index: 0.7517, and time-AUC: 0.8438), and good calibration (time-Brier Score of 0.0754). External validations and subgroup analyses demonstrated the robustness, generalizability, and fairness. Utilizing the optimal cutoff threshold, high-risk groups could be effectively identified. Surgery was the most critical predictor across the entire survival period. Combined stage (distant) and chemotherapy were the second most important predictors within 0 to 5 years, while age replaced from 5 to 20 years. Additionally, we developed an online visualization tool. CONCLUSIONS The Blackboost survival model achieved accurate, fair, and robust survival prediction for SP-NSCLC patients. Surgery, combined stage (distant), chemotherapy, and age contributed differently across various survival periods. The online visualization tool facilitated personalized survival predictions.
Collapse
Affiliation(s)
- Qiong Luo
- Department of Oncology Medicine, Fujian Medical University Union Hospital, Fuzhou, 350001, PR China
| | - Qianyuan Zhang
- Department of General Medicine, Fujian Medical University Union Hospital, Fuzhou, 350001, PR China
| | - Haiyu Liu
- Department of Pulmonary and Critical Care Medicine, Fujian Medical University Union Hospital, Fuzhou 350001, PR China
| | - Xiangqi Chen
- Department of Pulmonary and Critical Care Medicine, Fujian Medical University Union Hospital, Fuzhou 350001, PR China.
| | - Sheng Yang
- Department of Oncology Medicine, Fujian Medical University Union Hospital, Fuzhou, 350001, PR China.
| | - Qian Xu
- Department of Oncology Medicine, Fujian Medical University Union Hospital, Fuzhou, 350001, PR China.
| |
Collapse
|
17
|
Mo D, Xiong S, Ji T, Zhou Q, Zheng Q. Predicting abnormal C-reactive protein level for improving utilization by deep neural network model. Int J Med Inform 2025; 195:105726. [PMID: 39612701 DOI: 10.1016/j.ijmedinf.2024.105726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 10/29/2024] [Accepted: 11/25/2024] [Indexed: 12/01/2024]
Abstract
BACKGROUND C-reactive protein (CRP) is an inflammatory biomarker frequently used in clinical practice. However, insufficient evidence-based ordering inevitably results in its overuse or underuse. This study aims to predict its normal and abnormal levels using the deep neural network (DNN) models, helping clinicians order this item more appropriately and intelligently. METHODS We considered complete blood count (CBC) parameters as feature vectors and 10 mg/L as a cutoff value for CRP. Several models, including linear support vector classification, logistic regression, decision trees, random forests, and DNN, were developed based on a dataset of 53834 medical records to predict binary output. We externally validated DNN models on independent 20723 samples through discrimination, calibration curve, and decision curve analysis. RESULTS DNN models has the best area under the receiver operating characteristic curves (AUC). Learning curves revealed that models' AUC, balanced accuracy, and F1 score do not significantly and continuously improve following increasing data volume. In internal validation, the AUC, balanced accuracy, and the F1 score of 10 models were 0.818 (0.95 CI: 0.812-0.824), 0.741 (0.95 CI: 0.736-0.747), and 0.649 (0.95 CI: 0.643-0.656), respectively. These metrics were 0.817 (0.95 CI: 0.816-0.817), 0.741 (0.95 CI: 0.740-0.742), and 0.641 (0.95 CI: 0.640-0.642), respectively, in external validation. AUC and balanced accuracy shown no significant difference (P-values were 0.106 and 0.339). CRP10-C2 model has the lowest Brier score of 0.154, AUC of 0.818, and calibration curve formula of y=1.001x-0.010, which was identified as a target model to deploy in the app. CONCLUSIONS DNN models obtained moderate performance, surpassing baseline indices in distinguishing binary CRP levels. They are good generalizations and well-calibrated. The CRP-C2 model can enhance CRP utilization by informing the orders appropriately and can contribute to inflammatory diagnostics in primary health care where CBC is available, but the CRP test is inaccessible.
Collapse
Affiliation(s)
- Donghua Mo
- Clinical Laboratory Medicine Department, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shilong Xiong
- Clinical Laboratory Medicine Department, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Tianxing Ji
- Clinical Laboratory Medicine Department, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Qiang Zhou
- Clinical Laboratory Medicine Department, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Qian Zheng
- Department of Cardiovascular, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
18
|
Quennelle S, Malekzadeh-Milani S, Garcelon N, Faour H, Burgun A, Faviez C, Tsopra R, Bonnet D, Neuraz A. Active learning for extracting rare adverse events from electronic health records: A study in pediatric cardiology. Int J Med Inform 2025; 195:105761. [PMID: 39689449 DOI: 10.1016/j.ijmedinf.2024.105761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/04/2024] [Accepted: 12/10/2024] [Indexed: 12/19/2024]
Abstract
OBJECTIVE Automate the extraction of adverse events from the text of electronic medical records of patients hospitalized for cardiac catheterization. METHODS We focused on events related to cardiac catheterization as defined by the NCDR-IMPACT registry. These events were extracted from the Necker Children's Hospital data warehouse. Electronic health records were pre-screened using regular expressions. The resulting datasets contained numerous false positives sentences that were annotated by a cardiologist using an active learning process. A deep learning text classifier was then trained on this active learning-annotated dataset to accurately identify patients who have suffered a serious adverse event. RESULTS The dataset included 2,980 patients. Regular expression based extraction of adverse events related to cardiac catheterization achieved a perfect recall. Due to the rarity of adverse events, the dataset obtained from this initial pre-screening step was imbalanced, containing a significant number of false positives. The active learning annotation enabled the acquisition of a representative dataset suitable for training a deep learning model. The deep learning text-classifier identified patients who underwent adverse events after cardiac catheterization with a recall of 0.78 and a specificity of 0.94. CONCLUSION Our model effectively identified patients who experienced adverse events related to cardiac catheterization using real clinical data. Enabled by an active learning annotation process, it shows promise for large language model applications in clinical research, especially for rare diseases with limited annotated databases. Our model's strength lies in its development by physicians for physicians, ensuring its relevance and applicability in clinical practice.
Collapse
Affiliation(s)
- Sophie Quennelle
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; M3C-Necker, Hôpital Universitaire Necker-Enfants malades, Assistance Publique-Hôpitaux de Paris, Paris, France; Université Paris Cité, Paris, France.
| | - Sophie Malekzadeh-Milani
- M3C-Necker, Hôpital Universitaire Necker-Enfants malades, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Nicolas Garcelon
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Data Science Platform, Imagine Institute, Université Paris Cité, Paris, France
| | - Hassan Faour
- Data Science Platform, Imagine Institute, Université Paris Cité, Paris, France
| | - Anita Burgun
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; Université Paris Cité, Paris, France; Service d'informatique biomédicale, Hôpital Necker Enfants Malades, Assistance Publique-Hôpitaux de Paris, F-75015 Paris, France
| | - Carole Faviez
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; Université Paris Cité, Paris, France
| | - Rosy Tsopra
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; Université Paris Cité, Paris, France; Service d'informatique biomédicale, Hôpital Necker Enfants Malades, Assistance Publique-Hôpitaux de Paris, F-75015 Paris, France
| | - Damien Bonnet
- M3C-Necker, Hôpital Universitaire Necker-Enfants malades, Assistance Publique-Hôpitaux de Paris, Paris, France; Université Paris Cité, Paris, France
| | - Antoine Neuraz
- Inserm, UMR_S1138, Centre de Recherche des Cordeliers, Sorbonne Université, Paris, France; Inria, équipe HeKA, PariSantéCampus, Paris, France; Service d'informatique biomédicale, Hôpital Necker Enfants Malades, Assistance Publique-Hôpitaux de Paris, F-75015 Paris, France
| |
Collapse
|
19
|
Campagner A, Agnello L, Carobene A, Padoan A, Del Ben F, Locatelli M, Plebani M, Ognibene A, Lorubbio M, De Vecchi E, Cortegiani A, Piva E, Poz D, Curcio F, Cabitza F, Ciaccio M. Complete Blood Count and Monocyte Distribution Width-Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study. J Med Internet Res 2025; 27:e55492. [PMID: 40009841 PMCID: PMC11904381 DOI: 10.2196/55492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/04/2024] [Accepted: 09/09/2024] [Indexed: 02/28/2025] Open
Abstract
BACKGROUND Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improving the patient outcome. Laboratory medicine can play a crucial role by providing biomarkers whose alteration can be detected before the onset of clinical signs and symptoms. In particular, the relevance of monocyte distribution width (MDW) as a sepsis biomarker has emerged in the previous decade. However, despite encouraging results, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. OBJECTIVE This study aims to investigate the use of machine learning (ML) to overcome the limitations mentioned earlier by combining different parameters and therefore improving sepsis detection. However, making ML models function in clinical practice may be problematic, as their performance may suffer when deployed in contexts other than the research environment. In fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. METHODS In this multicentric study, we developed ML models whose intended use is the early detection of sepsis on the basis of MDW and complete blood count parameters. In total, data from 6 patient cohorts (encompassing 5344 patients) collected at 5 different Italian hospitals were used to train and externally validate ML models. The models were trained on a patient cohort encompassing patients enrolled at the emergency department, and it was externally validated on 5 different cohorts encompassing patients enrolled at both the emergency department and the intensive care unit. The cohorts were selected to exhibit a variety of data distribution shifts compared to the training set, including label, covariate, and missing data shifts, enabling a conservative validation of the developed models. To improve generalizability and robustness to different types of distribution shifts, the developed ML models combine traditional methodologies with advanced techniques inspired by controllable artificial intelligence (AI), namely cautious classification, which gives the ML models the ability to abstain from making predictions, and explainable AI, which provides health operators with useful information about the models' functioning. RESULTS The developed models achieved good performance on the internal validation (area under the receiver operating characteristic curve between 0.91 and 0.98), as well as consistent generalization performance across the external validation datasets (area under the receiver operating characteristic curve between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance and were used to derive an interpretable set of diagnostic rules. CONCLUSIONS Our findings demonstrate how controllable AI approaches based on complete blood count and MDW may be used for the early detection of sepsis while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts.
Collapse
Affiliation(s)
| | | | - Anna Carobene
- IRCCS San Raffaele Scientific Institute, Milano, Italy
| | - Andrea Padoan
- Department of Medicine, University of Padova, Padova, Italy
- Laboratory Medicine Unit, University-Hospital of Padova, Padova, Italy
| | - Fabio Del Ben
- IRCCS Centro Di Riferimento Oncologico Aviano, Aviano, Italy
| | | | - Mario Plebani
- Department of Medicine, University of Padova, Padova, Italy
- Laboratory Medicine Unit, University-Hospital of Padova, Padova, Italy
| | | | | | | | - Andrea Cortegiani
- University of Palermo, Palermo, Italy
- University Hospital Policlinico Paolo Giaccone, Palermo, Italy
| | - Elisa Piva
- Azienda Socio Sanitaria Territoriale di Mantova, Mantova, Italy
| | | | | | - Federico Cabitza
- IRCCS Ospedale Galeazzi Sant'Ambrogio, Milan, Italy
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milano, Italy
| | - Marcello Ciaccio
- University of Palermo, Palermo, Italy
- University Hospital Policlinico Paolo Giaccone, Palermo, Italy
| |
Collapse
|
20
|
Clark SL, Hartwell EE, Choi DS, Krystal JH, Messing RO, Ferguson LB. Next-generation biomarkers for alcohol consumption and alcohol use disorder diagnosis, prognosis, and treatment: A critical review. ALCOHOL, CLINICAL & EXPERIMENTAL RESEARCH 2025; 49:5-24. [PMID: 39532676 PMCID: PMC11747793 DOI: 10.1111/acer.15476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 10/04/2024] [Accepted: 10/14/2024] [Indexed: 11/16/2024]
Abstract
This critical review summarizes the current state of omics-based biomarkers in the alcohol research field. We first provide definitions and background information on alcohol and alcohol use disorder (AUD), biomarkers, and "omic" technologies. We next summarize using (1) genetic information as risk/prognostic biomarkers for the onset of alcohol-related problems and the progression from regular drinking to problematic drinking (including AUD), (2) epigenetic information as diagnostic biomarkers for AUD and risk biomarkers for alcohol consumption, (3) transcriptomic information as diagnostic biomarkers for AUD, risk biomarkers for alcohol consumption, and (4) metabolomic information as diagnostic biomarkers for AUD, risk biomarkers for alcohol consumption, and predictive biomarkers for response to acamprosate in subjects with AUD. In the final section, the clinical implications of the findings are discussed, and recommendations are made for future research.
Collapse
Affiliation(s)
- Shaunna L. Clark
- Department of Psychiatry & Behavioral Sciences, Texas A&M University, College Station, TX, USA
| | - Emily E. Hartwell
- Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA
- Center for Studies of Addiction, Department of Psychiatry, Perelman School of Medicine of the University of Pennsylvania, Philadelphia, PA, USA
| | - Doo-Sup Choi
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
- Department of Psychiatry and Psychology, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
- Neuroscience Program, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
| | - John H. Krystal
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Robert O. Messing
- Waggoner Center for Alcohol and Addiction Research, University of Texas at Austin, Austin, Texas, USA
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, Texas, USA
- Department of Neuroscience, University of Texas at Austin, Austin, Texas, USA
| | - Laura B. Ferguson
- Waggoner Center for Alcohol and Addiction Research, University of Texas at Austin, Austin, Texas, USA
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, Texas, USA
- Department of Neuroscience, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
21
|
Zhong W, Wang C, Wang J, Chen T. Machine learning models to further identify advantaged populations that can achieve functional cure of chronic hepatitis B virus infection after receiving Peg-IFN alpha treatment. Int J Med Inform 2025; 193:105660. [PMID: 39454328 DOI: 10.1016/j.ijmedinf.2024.105660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 10/12/2024] [Accepted: 10/18/2024] [Indexed: 10/28/2024]
Abstract
OBJECTIVE Functional cure is currently the highest goal of hepatitis B virus(HBV) treatment.Pegylated interferon(Peg-IFN) alpha is an important drug for this purpose,but even in the hepatitis B e antigen(HBeAg)-negative population,there is still a portion of the population respond poorly to it.Therefore,it is important to explore the influencing factors affecting the response rate of Peg-IFN alpha and establish a prediction model to further identify advantaged populations. METHODS We retrospectively analyzed 382 patients.297 patients were in the training set and 85 patients from another hospital were in the test set.The intersect features were extracted from all variables using the recursive feature elimination(RFE) algorithm, Boruta algorithm, and least absolute shrinkage and selection operator(LASSO) regression algorithm in the training dataset.Then,we employed six machine learning(ML) algorithms-Logistic Regression(LR),Random Forest(RF),Support Vector Machines(SVM),K Nearest Neighbors(KNN),Light Gradient Boosting Machine(LightGBM) and Extreme Gradient Boosting(XGBoost)-to develop the model.Internal 10-fold cross-validation helped determine the best-performing model,which was then tested externally.Model performance was assessed using metrics such as area under the curve(AUC) and other metrics.SHapley Additive exPlanations(SHAP) plots were used to interpret variable significance. RESULTS 138/382(36.13 %) patients achieved functional cure.HBsAg at baseline,HBsAg decline at week12,non-alcoholic fatty liver disease(NAFLD) and age were identified as significant variables.RF performed the best,with AUC value of 0.988,and maintained good performance in test set.The SHapley Additive exPlanations(SHAP) plot highlighted HBsAg at baseline and HBsAg decline at week 12 are the top two predictors.The web-calculator was designed to predict functional cure more conveniently(https://www.xsmartanalysis.com/model/list/predict/model/html?mid = 17054&symbol = 317ad245Hx628ko3uW51). CONCLUSION We developed a prediction model,which can be used to not only accurately identifies advantageous populations with Peg-IFN alpha,but also determines whether to continue subsequent Peg-IFN alpha.
Collapse
Affiliation(s)
- Wenting Zhong
- Department of Infectious Disease, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Che Wang
- Department of Radiology Oncology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Jia Wang
- Department of Infectious Disease, The Eight Hospital of Xi'an, Xi'an, Shaanxi, China
| | - Tianyan Chen
- Department of Infectious Disease, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| |
Collapse
|
22
|
Oneto L, Chicco D. Eight quick tips for biologically and medically informed machine learning. PLoS Comput Biol 2025; 21:e1012711. [PMID: 39787089 PMCID: PMC11717244 DOI: 10.1371/journal.pcbi.1012711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025] Open
Abstract
Machine learning has become a powerful tool for computational analysis in the biomedical sciences, with its effectiveness significantly enhanced by integrating domain-specific knowledge. This integration has give rise to informed machine learning, in contrast to studies that lack domain knowledge and treat all variables equally (uninformed machine learning). While the application of informed machine learning to bioinformatics and health informatics datasets has become more seamless, the likelihood of errors has also increased. To address this drawback, we present eight guidelines outlining best practices for employing informed machine learning methods in biomedical sciences. These quick tips offer recommendations on various aspects of informed machine learning analysis, aiming to assist researchers in generating more robust, explainable, and dependable results. Even if we originally crafted these eight simple suggestions for novices, we believe they are deemed relevant for expert computational researchers as well.
Collapse
Affiliation(s)
- Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
23
|
Rojas LH, Pereira-Morales AJ, Amador W, Montenegro A, Buelvas W, de la Espriella V. Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients. Ann Clin Biochem 2025; 62:57-66. [PMID: 39242084 DOI: 10.1177/00045632241285528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2024]
Abstract
BACKGROUND ML predictive models have shown their capability to improve risk prediction and assist medical decision-making, nevertheless, there is a lack of accuracy systems to early identify future rapid CKD progressors in Colombia and even in South America. OBJECTIVE The purpose of this study was to develop a series of interpretable machine learning models that predict GFR at 6-months, 9-months, and 12-months. STUDY DESIGN AND SETTING Over 29,000 CKD patients stage 1 to 3b (estimated GFR, <60 mL/min/1.73 m2) with an average of 3-year follow-up data were included. We used the machine learning extreme gradient boosting (XGBoost) to build three models to predict the next eGFR. Models were internally and externally validated. In addition, we included SHapley Additive exPlanation (SHAP) values to offer interpretable global and local prediction models. RESULTS All models showed a good performance in development and external validation. However, the 6-months XGBoost prediction model showed the best performance in internal (MAE average = 6.07; RSME = 78.87), and in external validation (MAE average = 6.45, RSME = 18.94). The top 3 most influential features that pushed the predicted eGFR value to lower values were the interpolated values for eGFR and creatinine, and eGFR at baseline. CONCLUSION In the current study we have developed and validated machine learning models to predict the next eGFR value at different intervals. Furthermore, we attempted to approach the need for prediction explanation by offering transparent predictions.
Collapse
|
24
|
Wang Z, Wu Y, Zhu J, Fang Y. Machine learning-based prediction of sarcopenia in community-dwelling middle-aged and older adults: findings from the CHARLS. Psychogeriatrics 2025; 25:e13205. [PMID: 39444246 DOI: 10.1111/psyg.13205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 09/16/2024] [Accepted: 10/04/2024] [Indexed: 10/25/2024]
Abstract
BACKGROUND Sarcopenia is a prominent issue among aging populations and associated with poor health outcomes. This study aimed to examine the predictive value of questionnaire and biomarker data for sarcopenia, and to further develop a user-friendly calculator for community-dwelling middle-aged and older adults. METHODS We used two waves (2011 and 2013) of the China Health and Retirement Longitudinal Study (CHARLS) to predict sarcopenia, defined by the Asian Working Group for Sarcopenia 2019 criteria. We restricted the analytical sample to adults aged 45 or above (N = 2934). Five machine learning models were used to construct Q-based (only questionnaire variables), Bio-based (only biomarker variables), and combined (questionnaire plus biomarker variables) models. Area under the receiver operating characteristic curve (AUROC) was used for performance assessment. Temporal external validation was performed based on two datasets from CHARLS. Important predictors were identified by Shapley values and coefficients. RESULTS Extreme gradient boosting (XGBoost), considering both questionnaire and biomarker characteristics, emerged as the optimal model, and its AUROC was 0.759 (95% CI: 0.747-0.771) at a decision threshold of 0.20 on the test set. Models also performed well on the external datasets. We found that cognitive function was the most important predictor in both Q-based and combined models, and blood urea nitrogen was the most important predictor in the Bio-based model. Other key predictors included education, haematocrit, total cholesterol, drinking, number of chronic diseases, and instrumental activities of daily living score. CONCLUSIONS Our findings offer a potential for early screening and targeted prevention of sarcopenia among middle-aged and older adults in the community setting.
Collapse
Affiliation(s)
- Zongjie Wang
- School of Public Health, Xiamen University, Xiamen, China
- Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, China
| | - Yafei Wu
- School of Public Health, Xiamen University, Xiamen, China
- School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Junmin Zhu
- School of Public Health, Xiamen University, Xiamen, China
- Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, China
| | - Ya Fang
- School of Public Health, Xiamen University, Xiamen, China
- Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
25
|
Ying Y, Ju R, Wang J, Li W, Ji Y, Shi Z, Chen J, Chen M. Accuracy of machine learning in diagnosing microsatellite instability in gastric cancer: A systematic review and meta-analysis. Int J Med Inform 2025; 193:105685. [PMID: 39515046 DOI: 10.1016/j.ijmedinf.2024.105685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/21/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND Significant challenges persist in the early identification of microsatellite instability (MSI) within current clinical practice. In recent years, with the growing utilization of machine learning (ML) in the diagnosis and management of gastric cancer (GC), numerous researchers have explored the effectiveness of ML methodologies in detecting MSI. Nevertheless, the predictive value of these approaches still lacks comprehensive evidence. Accordingly, this study was carried out to consolidate the accuracy of ML in the prompt detection of MSI in GC. METHODS PubMed, the Cochrane Library, the Web of Science, and Embase were retrieved up to March 20, 2024. The risk of bias in the encompassed studies was evaluated utilizing a risk assessment tool for predictive models. Models were then subjected to subgroup analysis based on the modeling variables. RESULTS A total of 12 studies, encompassing 11,912 patients with GC, satisfied the predefined inclusion criteria. ML models established in these studies were primarily based on pathological images, clinical features, and radiomics. The results suggested that in the validation sets, the pathological image-based models had a synthesized c-index of 0.86 [95 % CI (0.83-0.89)], with sensitivity and specificity being 0.86 [95 % CI (0.76-0.92)] and 0.83 [95 % CI (0.78-0.87)], respectively; radiomics feature-based models achieved respective values of 0.87 [95 % CI (0.81-0.92)], 0.77 [95 % CI (0.70-0.83)] and 0.81 [95 % CI (0.74-0.87)]; radiomics feature-based models + clinical feature-based models achieved respective values of 0.87 [95 % CI (0.81-0.93)], 0.78 [95 % CI (0.70-0.84)] and 0.79 [95 % CI (0.69-0.86)]. CONCLUSIONS ML has demonstrated optimal performance in detecting MSI in GC and could serve as a prospective early adjunctive detection tool for MSI in GC. Future research should contemplate minimally invasive or non-invasive, readily collectible, and efficient predictors to augment the predictive accuracy of ML.
Collapse
Affiliation(s)
- Yuou Ying
- The Second Affiliated College of Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Ruyi Ju
- Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Jieyi Wang
- The Basic Medical College of Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Wenkai Li
- Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Yuan Ji
- The Second Affiliated College of Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Zhenyu Shi
- The Second Affiliated College of Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Jinhan Chen
- The Second Affiliated College of Zhejiang Chinese Medical University, Hangzhou 310053, Zhejiang Province, China
| | - Mingxian Chen
- Department of Gastroenterology, Tongde Hospital of Zhejiang Province, Street Gucui No. 234, Region Xihu, Hangzhou 310012, Zhejiang Province, China.
| |
Collapse
|
26
|
Cisotto G, Zancanaro A, Zoppis IF, Manzoni SL. hvEEGNet: a novel deep learning model for high-fidelity EEG reconstruction. Front Neuroinform 2024; 18:1459970. [PMID: 39759760 PMCID: PMC11695360 DOI: 10.3389/fninf.2024.1459970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 11/27/2024] [Indexed: 01/07/2025] Open
Abstract
Introduction Modeling multi-channel electroencephalographic (EEG) time-series is a challenging tasks, even for the most recent deep learning approaches. Particularly, in this work, we targeted our efforts to the high-fidelity reconstruction of this type of data, as this is of key relevance for several applications such as classification, anomaly detection, automatic labeling, and brain-computer interfaces. Methods We analyzed the most recent works finding that high-fidelity reconstruction is seriously challenged by the complex dynamics of the EEG signals and the large inter-subject variability. So far, previous works provided good results in either high-fidelity reconstruction of single-channel signals, or poor-quality reconstruction of multi-channel datasets. Therefore, in this paper, we present a novel deep learning model, called hvEEGNet, designed as a hierarchical variational autoencoder and trained with a new loss function. We tested it on the benchmark Dataset 2a (including 22-channel EEG data from 9 subjects). Results We show that it is able to reconstruct all EEG channels with high-fidelity, fastly (in a few tens of epochs), and with high consistency across different subjects. We also investigated the relationship between reconstruction fidelity and the training duration and, using hvEEGNet as an anomaly detector, we spotted some data in the benchmark dataset that are corrupted and never highlighted before. Discussion Thus, hvEEGNet could be very useful in several applications where automatic labeling of large EEG dataset is needed and time-consuming. At the same time, this work opens new fundamental research questions about (1) the effectiveness of deep learning models training (for EEG data) and (2) the need for a systematic characterization of the input EEG data to ensure robust modeling.
Collapse
Affiliation(s)
- Giulia Cisotto
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Alberto Zancanaro
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Italo F. Zoppis
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Sara L. Manzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
27
|
Hu Q, Li J, Li X, Zou D, Xu T, He Z. Machine learning to predict adverse drug events based on electronic health records: a systematic review and meta-analysis. J Int Med Res 2024; 52:3000605241302304. [PMID: 39668733 PMCID: PMC11639029 DOI: 10.1177/03000605241302304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Accepted: 11/07/2024] [Indexed: 12/14/2024] Open
Abstract
OBJECTIVE This systematic review aimed to provide a comprehensive overview of the application of machine learning (ML) in predicting multiple adverse drug events (ADEs) using electronic health record (EHR) data. METHODS Systematic searches were conducted using PubMed, Web of Science, Embase, and IEEE Xplore from database inception until 21 November 2023. Studies that developed ML models for predicting multiple ADEs based on EHR data were included. RESULTS Ten studies met the inclusion criteria. Twenty ML methods were reported, most commonly random forest (RF, n = 9), followed by AdaBoost (n = 4), eXtreme Gradient Boosting (n = 3), and support vector machine (n = 3). The mean area under the summary receiver operator characteristics curve (AUC) was 0.76 (95% confidence interval [CI] = 0.26-0.95). RF combined with resampling-based approaches achieved high AUCs (0.9448-0.9457). The common risk factors of ADEs included the length of hospital stay, number of prescribed drugs, and admission type. The pooled estimated AUC was 0.72 (95% CI = 0.68-0.75). CONCLUSIONS Future studies should adhere to more rigorous reporting standards and consider new ML methods to facilitate the application of ML models in clinical practice.
Collapse
Affiliation(s)
- Qiaozhi Hu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, China
| | - Jiafeng Li
- Mental Health Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Xiaoqi Li
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zou
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Ting Xu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Zhiyao He
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
28
|
Chavosh Nejad M, Vestergaard Matthiesen R, Dukovska-Popovska I, Jakobsen T, Johansen J. Machine learning for predicting duration of surgery and length of stay: A literature review on joint arthroplasty. Int J Med Inform 2024; 192:105631. [PMID: 39293161 DOI: 10.1016/j.ijmedinf.2024.105631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 08/15/2024] [Accepted: 09/13/2024] [Indexed: 09/20/2024]
Abstract
INTRODUCTION In recent years, different factors such as population aging have caused escalating demand for hip and knee arthroplasty straining already limited hospitals' resources. To address this challenge, focus is put on medical and operational efficiency improvements. This includes an increased use of machine learning (ML) to predict duration of surgery (DOS) and length of stay (LOS) for total knee and total hip arthroplasty, which can be utilized for optimizing resource allocation to satisfy medical and operational limitations. This paper explores the development and performance of ML models in predicting DOS and LOS. METHODS A systematic search of publications between 2010-2023 was conducted following PRISMA guidelines. Considering the inclusion and exclusion criteria, 28 out of 722 gathered papers from PubMed, Web of Science, and manual search were included in the study. Descriptive statistics was used to analyze the extracted data regarding data preprocessing, model development, and model performance assessment. RESULTS Most of the papers work on LOS as a binary variable. Patient's age was identified as the most frequently used and reported as important variable for predicting DOS and LOS. Investigations also illustrated that within the resulting 28 papers, more than 71% of models reached good to perfect performance based on the area under the receiver operating characteristic curve (AUC), where artificial neural networks and ensemble learning models had the biggest share among the best-performing models. CONCLUSION The utilization of ML models is increasing in the literature. The current performance level indicates that ML can potentially turn to powerful tools in predicting DOS and LOS for different purposes. Meanwhile, the literature is not matured yet in reporting real-life application. Future studies can focus on model specification and validation by considering empirical application.
Collapse
Affiliation(s)
- Mohammad Chavosh Nejad
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-109, Aalborg Ø 9220, Danmark.
| | | | - Iskra Dukovska-Popovska
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-107, Aalborg Ø 9220, Danmark.
| | - Thomas Jakobsen
- Department of Orthopaedics, Aalborg University Hospital, Hobrovej 18-22, Aalborg Universitetshospital, Aalborg Syd 9000, Danmark.
| | - John Johansen
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-114, Aalborg Ø 9220, Danmark.
| |
Collapse
|
29
|
Cannarozzi AL, Latiano A, Massimino L, Bossa F, Giuliani F, Riva M, Ungaro F, Guerra M, Brina ALD, Biscaglia G, Tavano F, Carparelli S, Fiorino G, Danese S, Perri F, Palmieri O. Inflammatory bowel disease genomics, transcriptomics, proteomics and metagenomics meet artificial intelligence. United European Gastroenterol J 2024; 12:1461-1480. [PMID: 39215755 PMCID: PMC11652336 DOI: 10.1002/ueg2.12655] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open
Abstract
Various extrinsic and intrinsic factors such as drug exposures, antibiotic treatments, smoking, lifestyle, genetics, immune responses, and the gut microbiome characterize ulcerative colitis and Crohn's disease, collectively called inflammatory bowel disease (IBD). All these factors contribute to the complexity and heterogeneity of the disease etiology and pathogenesis leading to major challenges for the scientific community in improving management, medical treatments, genetic risk, and exposome impact. Understanding the interaction(s) among these factors and their effects on the immune system in IBD patients has prompted advances in multi-omics research, the development of new tools as part of system biology, and more recently, artificial intelligence (AI) approaches. These innovative approaches, supported by the availability of big data and large volumes of digital medical datasets, hold promise in better understanding the natural histories, predictors of disease development, severity, complications and treatment outcomes in complex diseases, providing decision support to doctors, and promising to bring us closer to the realization of the "precision medicine" paradigm. This review aims to provide an overview of current IBD omics based on both individual (genomics, transcriptomics, proteomics, metagenomics) and multi-omics levels, highlighting how AI can facilitate the integration of heterogeneous data to summarize our current understanding of the disease and to identify current gaps in knowledge to inform upcoming research in this field.
Collapse
Affiliation(s)
- Anna Lucia Cannarozzi
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Anna Latiano
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Luca Massimino
- Gastroenterology and Digestive Endoscopy DepartmentIRCCS Ospedale San RaffaeleMilanItaly
| | - Fabrizio Bossa
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Francesco Giuliani
- Innovation & Research UnitFondazione IRCCS “Casa Sollievo della Sofferenza”San Giovanni RotondoItaly
| | - Matteo Riva
- Gastroenterology and Digestive Endoscopy DepartmentIRCCS Ospedale San RaffaeleMilanItaly
| | - Federica Ungaro
- Gastroenterology and Digestive Endoscopy DepartmentIRCCS Ospedale San RaffaeleMilanItaly
| | - Maria Guerra
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Anna Laura Di Brina
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Giuseppe Biscaglia
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Francesca Tavano
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Sonia Carparelli
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Gionata Fiorino
- Gastroenterology and Digestive EndoscopySan Camillo‐Forlanini HospitalRomeItaly
| | - Silvio Danese
- Faculty of MedicineUniversità Vita‐Salute San RaffaeleMilanItaly
| | - Francesco Perri
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| | - Orazio Palmieri
- Division of Gastroenterology and EndoscopyFondazione IRCCS Casa Sollievo della SofferenzaSan Giovanni RotondoItaly
| |
Collapse
|
30
|
Brandl L, Jansen-Kosterink S, Brodbeck J, Jacinto S, Mooser B, Heylen D. Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study. JMIR Form Res 2024; 8:e63262. [PMID: 39608005 PMCID: PMC11620699 DOI: 10.2196/63262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 10/06/2024] [Accepted: 10/10/2024] [Indexed: 11/30/2024] Open
Abstract
Background Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike. Objective The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health. Methods We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention. Results Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention. Conclusions The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.
Collapse
Affiliation(s)
- Lena Brandl
- Human Media Interaction group, University of Twente, Drienerlolaan 5, Enschede, 7522NB, Netherlands, 31 534893740
- Roessingh Research and Development, Enschede, Netherlands
| | - Stephanie Jansen-Kosterink
- Roessingh Research and Development, Enschede, Netherlands
- Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
| | - Jeannette Brodbeck
- Institute for Psychology, University of Bern, Bern, Switzerland
- School of Social Work, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland
| | - Sofia Jacinto
- Institute for Psychology, University of Bern, Bern, Switzerland
- School of Social Work, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa, Lisboa, Portugal
| | - Bettina Mooser
- Institute for Psychology, University of Bern, Bern, Switzerland
| | - Dirk Heylen
- Human Media Interaction group, University of Twente, Drienerlolaan 5, Enschede, 7522NB, Netherlands, 31 534893740
| |
Collapse
|
31
|
Oliveira ACD, Bessa RF, Teles AS. Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study. CAD SAUDE PUBLICA 2024; 40:e00028824. [PMID: 39607132 DOI: 10.1590/0102-311xen028824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/03/2024] [Indexed: 11/29/2024] Open
Abstract
Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot's response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.
Collapse
Affiliation(s)
- Adonias Caetano de Oliveira
- Instituto Federal de Educação, Ciência e Tecnologia do Ceará, Fortaleza, Brasil
- Universidade Federal do Delta do Parnaíba, Parnaíba, Brasil
| | | | - Ariel Soares Teles
- Universidade Federal do Delta do Parnaíba, Parnaíba, Brasil
- Instituto Federal do Maranhão, São Luís, Brasil
| |
Collapse
|
32
|
Chan PZ, Jin E, Jansson M, Chew HSJ. AI-Based Noninvasive Blood Glucose Monitoring: Scoping Review. J Med Internet Res 2024; 26:e58892. [PMID: 39561353 PMCID: PMC11615544 DOI: 10.2196/58892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/24/2024] [Accepted: 10/08/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Current blood glucose monitoring (BGM) methods are often invasive and require repetitive pricking of a finger to obtain blood samples, predisposing individuals to pain, discomfort, and infection. Noninvasive blood glucose monitoring (NIBGM) is ideal for minimizing discomfort, reducing the risk of infection, and increasing convenience. OBJECTIVE This review aimed to map the use cases of artificial intelligence (AI) in NIBGM. METHODS A systematic scoping review was conducted according to the Arksey O'Malley five-step framework. Eight electronic databases (CINAHL, Embase, PubMed, Web of Science, Scopus, The Cochrane-Central Library, ACM Digital Library, and IEEE Xplore) were searched from inception until February 8, 2023. Study selection was conducted by 2 independent reviewers, descriptive analysis was conducted, and findings were presented narratively. Study characteristics (author, country, type of publication, study design, population characteristics, mean age, types of noninvasive techniques used, and application, as well as characteristics of the BGM systems) were extracted independently and cross-checked by 2 investigators. Methodological quality appraisal was conducted using the Checklist for assessment of medical AI. RESULTS A total of 33 papers were included, representing studies from Asia, the United States, Europe, the Middle East, and Africa published between 2005 and 2023. Most studies used optical techniques (n=19, 58%) to estimate blood glucose levels (n=27, 82%). Others used electrochemical sensors (n=4), imaging (n=2), mixed techniques (n=2), and tissue impedance (n=1). Accuracy ranged from 35.56% to 94.23% and Clarke error grid (A+B) ranged from 86.91% to 100%. The most popular machine learning algorithm used was random forest (n=10) and the most popular deep learning model was the artificial neural network (n=6). The mean overall checklist for assessment of medical AI score on the included papers was 33.5 (SD 3.09), suggesting an average of medium quality. The studies reviewed demonstrate that some AI techniques can accurately predict glucose levels from noninvasive sources while enhancing comfort and ease of use for patients. However, the overall range of accuracy was wide due to the heterogeneity of models and input data. CONCLUSIONS Efforts are needed to standardize and regulate the use of AI technologies in BGM, as well as develop consensus guidelines and protocols to ensure the quality and safety of AI-assisted monitoring systems. The use of AI for NIBGM is a promising area of research that has the potential to revolutionize diabetes management.
Collapse
Affiliation(s)
- Pin Zhong Chan
- Department of Nursing, Ng Teng Fong General Hospital, Singapore, Singapore
| | - Eric Jin
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Miia Jansson
- Research Unit of Health Sciences and Technology, University of Oulu, Oulu, Finland
| | - Han Shi Jocelyn Chew
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
33
|
Hu Q, Chen Y, Zou D, He Z, Xu T. Predicting adverse drug event using machine learning based on electronic health records: a systematic review and meta-analysis. Front Pharmacol 2024; 15:1497397. [PMID: 39605909 PMCID: PMC11600142 DOI: 10.3389/fphar.2024.1497397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 10/21/2024] [Indexed: 11/29/2024] Open
Abstract
Introduction Adverse drug events (ADEs) pose a significant challenge in current clinical practice. Machine learning (ML) has been increasingly used to predict specific ADEs using electronic health record (EHR) data. This systematic review provides a comprehensive overview of the application of ML in predicting specific ADEs based on EHR data. Methods A systematic search of PubMed, Web of Science, Embase, and IEEE Xplore was conducted to identify relevant articles published from the inception to 20 May 2024. Studies that developed ML models for predicting specific ADEs or ADEs associated with particular drugs were included using EHR data. Results A total of 59 studies met the inclusion criteria, covering 15 drugs and 15 ADEs. In total, 38 machine learning algorithms were reported, with random forest (RF) being the most frequently used, followed by support vector machine (SVM), eXtreme gradient boosting (XGBoost), decision tree (DT), and light gradient boosting machine (LightGBM). The performance of the ML models was generally strong, with an average area under the curve (AUC) of 76.68% ± 10.73, accuracy of 76.00% ± 11.26, precision of 60.13% ± 24.81, sensitivity of 62.35% ± 20.19, specificity of 75.13% ± 16.60, and an F1 score of 52.60% ± 21.10. The combined sensitivity, specificity, diagnostic odds ratio (DOR), and AUC from the summary receiver operating characteristic (SROC) curve using a random effects model were 0.65 (95% CI: 0.65-0.66), 0.89 (95% CI: 0.89-0.90), 12.11 (95% CI: 8.17-17.95), and 0.8069, respectively. The risk factors associated with different drugs and ADEs varied. Discussion Future research should focus on improving standardization, conducting multicenter studies that incorporate diverse data types, and evaluating the impact of artificial intelligence predictive models in real-world clinical settings. Systematic Review Registration https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42024565842, identifier CRD42024565842.
Collapse
Affiliation(s)
- Qiaozhi Hu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, China
| | - Yuxian Chen
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zou
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Zhiyao He
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy, Sichuan University, Chengdu, Sichuan, China
| | - Ting Xu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
34
|
Javier Gil-Terrón F, Ferri P, Montosa-I-Micó V, Gómez Mahiques M, Lopez-Mateu C, Martí P, García-Gómez JM, Fuster-Garcia E. Exploring the Trade-Off between generalist and specialized Models: A center-based comparative analysis for glioblastoma segmentation. Int J Med Inform 2024; 191:105604. [PMID: 39154600 DOI: 10.1016/j.ijmedinf.2024.105604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 08/08/2024] [Accepted: 08/14/2024] [Indexed: 08/20/2024]
Abstract
INTRODUCTION Inherent variations between inter-center data can undermine the robustness of segmentation models when applied at a specific center (dataset shift). We investigated whether specialized center-specific models are more effective compared to generalist models based on multi-center data, and how center-specific data could enhance the performance of generalist models within a particular center using a fine-tuning transfer learning approach. For this purpose, we studied the dataset shift at center level and conducted a comparative analysis to assess the impact of data source on glioblastoma segmentation models. METHODS & MATERIALS The three key components of dataset shift were studied: prior probability shift-variations in tumor size or tissue distribution among centers; covariate shift-inter-center MRI alterations; and concept shift-different criteria for tumor segmentation. BraTS 2021 dataset was used, which includes 1251 cases from 23 centers. Thereafter, 155 deep-learning models were developed and compared, including 1) generalist models trained with multi-center data, 2) specialized models using only center-specific data, and 3) fine-tuned generalist models using center-specific data. RESULTS The three key components of dataset shift were characterized. The amount of covariate shift was substantial, indicating large variations in MR imaging between different centers. Glioblastoma segmentation models tend to perform best when using data from the application center. Generalist models, trained with over 700 samples, achieved a median Dice score of 88.98%. Specialized models surpassed this with 200 cases, while fine-tuned models outperformed with 50 cases. CONCLUSIONS The influence of dataset shift on model performance is evident. Fine-tuned and specialized models, utilizing data from the evaluated center, outperform generalist models, which rely on data from other centers. These approaches could encourage medical centers to develop customized models for their local use, enhancing the accuracy and reliability of glioblastoma segmentation in a context where dataset shift is inevitable.
Collapse
Affiliation(s)
- F Javier Gil-Terrón
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - Pablo Ferri
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - Víctor Montosa-I-Micó
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - María Gómez Mahiques
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - Carles Lopez-Mateu
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - Pau Martí
- Departament d'Enginyeria Industrial i Construcció, Àrea d'Enginyeria Agroforestal, Universitat de les Illes Balears, Palma, Spain
| | - Juan M García-Gómez
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| | - Elies Fuster-Garcia
- Biomedical Data Science Laboratory, ITACA Institute, Universitat Politècnica de València, València, Spain
| |
Collapse
|
35
|
Wu Y, Ye Z, Wang Z, Duan S, Zhu J, Fang Y. Examining individual and contextual predictors of disability in Chinese older adults: A machine learning approach. Int J Med Inform 2024; 191:105552. [PMID: 39068893 DOI: 10.1016/j.ijmedinf.2024.105552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/12/2024] [Accepted: 07/14/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND There is a large gap of understanding the determinants of disability, especially the contextual characteristics. Therefore, this study aimed to examine the important predictors of disability in Chinese older adults based on the social ecological framework. METHODS We used the China Health and Retirement Longitudinal Study to examine predictors of disability, defined as self-report of any difficulty for six activity of daily living items. We restricted analytical sample to older adults aged 65 or above (N=1816). We considered 44 predictors, including personal-, behavioral-, interpersonal-, community-, and policy-level characteristics. The built-in variable importance measure (VIM) of random forest and SHapley Additive exPlanations (SHAP) were applied to assess key predictors of disability. A multilevel logit regression was further used to examine the associations of individual and contextual characteristics with disability. RESULTS The mean age of study sample was 72.62 years old (standard deviation: 5.77). During a 2-year of follow-up, 518 (28.52 %) of them developed into disability. Walking speed, age, and peak expiratory flow were the top important predictors in both VIM and SHAP. Contextual characteristics such as humidity, PM2.5, temperature, normalized difference vegetation index, and landscape also showed promise in predicting disability. Multilevel logit regression showed that people with male gender, arthritis, vision impairment, unable to finish semi tandem, no social activity, lower grip strength, and higher waist circumference, had much higher risk of disability. CONCLUSION Disability prevention strategies should specifically focus on multilevel factors such as individual and contextual characteristics, although the latter is warranted to be verified in future studies.
Collapse
Affiliation(s)
- Yafei Wu
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China; School of Nursing, Faculty of Health and Social Sciences, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Zirong Ye
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China
| | - Zongjie Wang
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China
| | - Siyu Duan
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China
| | - Junmin Zhu
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China
| | - Ya Fang
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, Xiamen University, Xiamen, Fujian, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
| |
Collapse
|
36
|
Rough K, Rashidi ES, Tai CG, Lucia RM, Mack CD, Largent JA. Core Concepts in Pharmacoepidemiology: Principled Use of Artificial Intelligence and Machine Learning in Pharmacoepidemiology and Healthcare Research. Pharmacoepidemiol Drug Saf 2024; 33:e70041. [PMID: 39500844 DOI: 10.1002/pds.70041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/20/2024] [Accepted: 10/04/2024] [Indexed: 11/17/2024]
Abstract
Artificial intelligence (AI) and machine learning (ML) are important tools across many fields of health and medical research. Pharmacoepidemiologists can bring essential methodological rigor and study design expertise to the design and use of these technologies within healthcare settings. AI/ML-based tools also play a role in pharmacoepidemiology research, as we may apply them to answer our own research questions, take responsibility for evaluating medical devices with AI/ML components, or participate in interdisciplinary research to create new AI/ML algorithms. While epidemiologic expertise is essential to deploying AI/ML responsibly and ethically, the rapid advancement of these technologies in the past decade has resulted in a knowledge gap for many in the field. This article provides a brief overview of core AI/ML concepts, followed by a discussion of potential applications of AI/ML in pharmacoepidemiology research, and closes with a review of important concepts across application areas, including interpretability and fairness. This review is intended to provide an accessible, practical overview of AI/ML for pharmacoepidemiology research, with references to further, more detailed resources on fundamental topics.
Collapse
Affiliation(s)
| | | | - Caroline G Tai
- Real World Solutions, IQVIA, Durham, North Carolina, USA
| | - Rachel M Lucia
- Real World Solutions, IQVIA, Durham, North Carolina, USA
| | | | - Joan A Largent
- Real World Solutions, IQVIA, Durham, North Carolina, USA
| |
Collapse
|
37
|
Koochakpour K, Pant D, Westbye OS, Røst TB, Leventhal B, Koposov R, Clausen C, Skokauskas N, Nytrø Ø. Ability of clinical data to predict readmission in Child and Adolescent Mental Health Services. PeerJ Comput Sci 2024; 10:e2367. [PMID: 39650424 PMCID: PMC11622991 DOI: 10.7717/peerj-cs.2367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/07/2024] [Indexed: 12/11/2024]
Abstract
This study addresses the challenge of predicting readmissions in Child and Adolescent Mental Health Services (CAMHS) by analyzing the predictability of readmissions over short, medium, and long term periods. Using health records spanning 35 years, which included 22,643 patients and 30,938 episodes of care, we focused on the episode of care as a central unit, defined as a referral-discharge cycle that incorporates assessments and interventions. Data pre-processing involved handling missing values, normalizing, and transforming data, while resolving issues related to overlapping episodes and correcting registration errors where possible. Readmission prediction was inferred from electronic health records (EHR), as this variable was not directly recorded. A binary classifier distinguished between readmitted and non-readmitted patients, followed by a multi-class classifier to categorize readmissions based on timeframes: short (within 6 months), medium (6 months - 2 years), and long (more than 2 years). Several predictive models were evaluated based on metrics like AUC, F1-score, precision, and recall, and the K-prototype algorithm was employed to explore similarities between episodes through clustering. The optimal binary classifier (Oversampled Gradient Boosting) achieved an AUC of 0.7005, while the multi-class classifier (Oversampled Random Forest) reached an AUC of 0.6368. The K-prototype resulted in three clusters as optimal (SI: 0.256, CI: 4473.64). Despite identifying relationships between care intensity, case complexity, and readmission risk, generalizing these findings proved difficult, partly because clinicians often avoid discharging patients likely to be readmitted. Overall, while this dataset offers insights into patient care and service patterns, predicting readmissions remains challenging, suggesting a need for improved analytical models that consider patient development, disease progression, and intervention effects.
Collapse
Affiliation(s)
- Kaban Koochakpour
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dipendra Pant
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
| | - Odd Sverre Westbye
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Thomas Brox Røst
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Vivit AS, Trondheim, Norway
| | | | - Roman Koposov
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU North), UiT The Arctic University of Norway, Tromsø, Norway
| | - Carolyn Clausen
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Norbert Skokauskas
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Øystein Nytrø
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
38
|
Omobolaji Alabi R, Mäkitie RE. Machine Learning for Treatment Management Prediction in Laryngeal Fractures. J Voice 2024:S0892-1997(24)00322-9. [PMID: 39419705 DOI: 10.1016/j.jvoice.2024.09.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 09/17/2024] [Accepted: 09/17/2024] [Indexed: 10/19/2024]
Abstract
OBJECTIVES Laryngeal fractures are rare but potentially life-threatening traumas. Complications, such as airway obstruction and disrupted laryngeal anatomy, associate with significant morbidity. Early identification of at-risk patients and optimal management remain crucial for improved outcomes. Recently, machine learning (ML) has attained great attention as a unique and novel technique for evaluating complex nonlinear relationships between multiple observations to create a predictive model with greater accuracy. This study aimed to demonstrate the potential of ML in predicting airway and surgical management of laryngeal fracture patients and identify key contributing parameters for the predictive performance of the ML models. METHODS The ML models were developed using a patient series managed at the Helsinki University Hospital during 2005-2019. The developed models were further evaluated independently using a different cohort collected from the same institution between 1995 and 2004. RESULTS The ML showed a weighted area under curve (AUC) of 0.93 and accuracy of 0.86 following training for airway management. Likewise, for treatment approach, weighted AUC was 0.85 and accuracy 0.78. Injury type, Schaefer-Fuhrman grade (SF gr), age at incident, cause of injury, and fracture of the cricoid, in decreasing order of significance, were the most prominent features for the model's predictive performance for airway management. Similarly, our model identified SF gr, fracture of the cricoid, injury type, age at incident, and cause of injury as the most significant predictors for surgical treatment approach. CONCLUSIONS The proposed prediction of management approach by an ML technique can provide accurate predictions and thus aid clinicians in administering early and personalized interventions. The model may serve as a supporting tool in recognizing at-risk patients and in timely decision-making. Further independent external validation is warranted for model generalizability.
Collapse
Affiliation(s)
- Rasheed Omobolaji Alabi
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Riikka E Mäkitie
- Department of Otorhinolaryngology - Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; Clinicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
39
|
Koçak B, Keleş A, Köse F. Meta-research on reporting guidelines for artificial intelligence: are authors and reviewers encouraged enough in radiology, nuclear medicine, and medical imaging journals? Diagn Interv Radiol 2024; 30:291-298. [PMID: 38375627 PMCID: PMC11590734 DOI: 10.4274/dir.2024.232604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/10/2024] [Indexed: 02/21/2024]
Abstract
PURPOSE To determine how radiology, nuclear medicine, and medical imaging journals encourage and mandate the use of reporting guidelines for artificial intelligence (AI) in their author and reviewer instructions. METHODS The primary source of journal information and associated citation data used was the Journal Citation Reports (June 2023 release for 2022 citation data; Clarivate Analytics, UK). The first- and second-quartile journals indexed in the Science Citation Index Expanded and the Emerging Sources Citation Index were included. The author and reviewer instructions were evaluated by two independent readers, followed by an additional reader for consensus, with the assistance of automatic annotation. Encouragement and submission requirements were systematically analyzed. The reporting guidelines were grouped as AI-specific, related to modeling, and unrelated to modeling. RESULTS Out of 102 journals, 98 were included in this study, and all of them had author instructions. Only five journals (5%) encouraged the authors to follow AI-specific reporting guidelines. Among these, three required a filled-out checklist. Reviewer instructions were found in 16 journals (16%), among which one journal (6%) encouraged the reviewers to follow AI-specific reporting guidelines without submission requirements. The proportions of author and reviewer encouragement for AI-specific reporting guidelines were statistically significantly lower compared with those for other types of guidelines (P < 0.05 for all). CONCLUSION The findings indicate that AI-specific guidelines are not commonly encouraged and mandated (i.e., requiring a filled-out checklist) by these journals, compared with guidelines related to modeling and unrelated to modeling, leaving vast space for improvement. This meta-research study hopes to contribute to the awareness of the imaging community for AI reporting guidelines and ignite large-scale group efforts by all stakeholders, making AI research less wasteful. CLINICAL SIGNIFICANCE This meta-research highlights the need for improved encouragement of AI-specific guidelines in radiology, nuclear medicine, and medical imaging journals. This can potentially foster greater awareness among the AI community and motivate various stakeholders to collaborate to promote more efficient and responsible AI research reporting practices.
Collapse
Affiliation(s)
- Burak Koçak
- University of Health Sciences, Başakşehir Çam and Sakura City Hospital, Clinic of Radiology, İstanbul, Türkiye
| | - Ali Keleş
- University of Health Sciences, Başakşehir Çam and Sakura City Hospital, Clinic of Radiology, İstanbul, Türkiye
| | - Fadime Köse
- University of Health Sciences, Başakşehir Çam and Sakura City Hospital, Clinic of Radiology, İstanbul, Türkiye
| |
Collapse
|
40
|
Cisotto G, Chicco D. Ten quick tips for clinical electroencephalographic (EEG) data acquisition and signal processing. PeerJ Comput Sci 2024; 10:e2256. [PMID: 39314688 PMCID: PMC11419606 DOI: 10.7717/peerj-cs.2256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 07/22/2024] [Indexed: 09/25/2024]
Abstract
Electroencephalography (EEG) is a medical engineering technique aimed at recording the electric activity of the human brain. Brain signals derived from an EEG device can be processed and analyzed through computers by using digital signal processing, computational statistics, and machine learning techniques, that can lead to scientifically-relevant results and outcomes about how the brain works. In the last decades, the spread of EEG devices and the higher availability of EEG data, of computational resources, and of software packages for electroencephalography analysis has made EEG signal processing easier and faster to perform for any researcher worldwide. This increased ease to carry out computational analyses of EEG data, however, has made it easier to make mistakes, as well. And these mistakes, if unnoticed or treated wrongly, can in turn lead to wrong results or misleading outcomes, with worrisome consequences for patients and for the advancements of the knowledge about human brain. To tackle this problem, we present here our ten quick tips to perform electroencephalography signal processing analyses avoiding common mistakes: a short list of guidelines designed for beginners on what to do, how to do it, and what not to do when analyzing EEG data with a computer. We believe that following our quick recommendations can lead to better, more reliable and more robust results and outcome in clinical neuroscientific research.
Collapse
Affiliation(s)
- Giulia Cisotto
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Milan, Italy
- Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padua, Padua, Italy
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
41
|
Adebanji AO, Asare C, Gyamerah SA. Predictive analysis on the factors associated with birth Outcomes: A machine learning perspective. Int J Med Inform 2024; 189:105529. [PMID: 38905958 DOI: 10.1016/j.ijmedinf.2024.105529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/11/2024] [Accepted: 06/18/2024] [Indexed: 06/23/2024]
Abstract
BACKGROUND Recent studies reveal that around 1.9 million stillbirths occur annually worldwide, with Sub-Saharan Africa having among the highest cases. Some Sub-Saharan African countries, including Ghana, failed to meet Millennium Development Goal 5 (MDG5) by 2015 and may struggle to meet Sustainable Development Goal 3 (SDG3) despite maternal healthcare interventions. Concerns arise about Ghana's ability to achieve the World Health Organization's neonatal mortality goal of 12 per 1000 live births by 2030. This study aims to identify key factors influencing childbirth outcomes and create a predictive method for high-risk pregnancies. METHODS We compared four machine learning classifiers (Extreme Gradient Boosting, Random Forest, Logistic Regression, and Artificial Neural Network) in predicting childbirth outcomes using data from a tertiary health facility in Ghana. To address class imbalance, we employed the Synthetic Minority Over-sampling Technique (SMOTE). RESULTS Our findings show that fetal heartbeat, gestation age at birth are the most influential factors on birth outcome (stillbirth or live birth), while there is no significant association with maternal age, number of babies, and type of delivery method. Among the machine learning models considered, Random Forest emerged as the optimal model achieving an accuracy, F1-score, and AUC values of approximately 0.98, 0.99, and 0.90 respectively. CONCLUSION Our study identifies key factors affecting childbirth outcomes and highlights the potential of machine learning for early high-risk pregnancy detection in clinical settings. These findings are crucial for Ghana and other Sub-Saharan African countries striving to meet maternal and neonatal healthcare goals. Further research and policy initiatives can use these results to improve healthcare in the region and work toward the World Health Organization's objectives by 2030.
Collapse
Affiliation(s)
- Atinuke Olusola Adebanji
- Department of Statistics and Actuarial Science, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Clement Asare
- Department of Statistics and Actuarial Science, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Samuel Asante Gyamerah
- Department of Statistics and Actuarial Science, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana; Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada.
| |
Collapse
|
42
|
Klempíř O, Krupička R. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson's Disease Detection and Speech Features Extraction. SENSORS (BASEL, SWITZERLAND) 2024; 24:5520. [PMID: 39275431 PMCID: PMC11398018 DOI: 10.3390/s24175520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/22/2024] [Accepted: 08/24/2024] [Indexed: 09/16/2024]
Abstract
Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.
Collapse
Affiliation(s)
| | - Radim Krupička
- Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, 16000 Prague, Czech Republic;
| |
Collapse
|
43
|
Ruchonnet-Métrailler I, Siebert JN, Hartley MA, Lacroix L. Automated Interpretation of Lung Sounds by Deep Learning in Children With Asthma: Scoping Review and Strengths, Weaknesses, Opportunities, and Threats Analysis. J Med Internet Res 2024; 26:e53662. [PMID: 39178033 PMCID: PMC11380063 DOI: 10.2196/53662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/28/2024] [Accepted: 07/10/2024] [Indexed: 08/24/2024] Open
Abstract
BACKGROUND The interpretation of lung sounds plays a crucial role in the appropriate diagnosis and management of pediatric asthma. Applying artificial intelligence (AI) to this task has the potential to better standardize assessment and may even improve its predictive potential. OBJECTIVE This study aims to objectively review the literature on AI-assisted lung auscultation for pediatric asthma and provide a balanced assessment of its strengths, weaknesses, opportunities, and threats. METHODS A scoping review on AI-assisted lung sound analysis in children with asthma was conducted across 4 major scientific databases (PubMed, MEDLINE Ovid, Embase, and Web of Science), supplemented by a gray literature search on Google Scholar, to identify relevant studies published from January 1, 2000, until May 23, 2023. The search strategy incorporated a combination of keywords related to AI, pulmonary auscultation, children, and asthma. The quality of eligible studies was assessed using the ChAMAI (Checklist for the Assessment of Medical Artificial Intelligence). RESULTS The search identified 7 relevant studies out of 82 (9%) to be included through an academic literature search, while 11 of 250 (4.4%) studies from the gray literature search were considered but not included in the subsequent review and quality assessment. All had poor to medium ChAMAI scores, mostly due to the absence of external validation. Identified strengths were improved predictive accuracy of AI to allow for prompt and early diagnosis, personalized management strategies, and remote monitoring capabilities. Weaknesses were the heterogeneity between studies and the lack of standardization in data collection and interpretation. Opportunities were the potential of coordinated surveillance, growing data sets, and new ways of collaboratively learning from distributed data. Threats were both generic for the field of medical AI (loss of interpretability) but also specific to the use case, as clinicians might lose the skill of auscultation. CONCLUSIONS To achieve the opportunities of automated lung auscultation, there is a need to address weaknesses and threats with large-scale coordinated data collection in globally representative populations and leveraging new approaches to collaborative learning.
Collapse
Affiliation(s)
- Isabelle Ruchonnet-Métrailler
- Pediatric Pulmonology Unit, Department of Pediatrics, Geneva Children's Hospital, University Hospitals of Geneva, Geneva, Switzerland
- Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Johan N Siebert
- Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Division of Pediatric Emergency Medicine, Department of Pediatrics, Geneva Children's Hospital, Geneva University Hospitals, Geneva, Switzerland
| | - Mary-Anne Hartley
- Intelligent Global Health Research Group, Machine Learning and Optimization Laboratory, Swiss Federal Institute of Technology, Lausanne, Switzerland
- Laboratory of Intelligent Global Health Technologies, Bioinformatics and Data Science, Yale School of Medicine, New Haven, CT, United States
| | - Laurence Lacroix
- Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Division of Pediatric Emergency Medicine, Department of Pediatrics, Geneva Children's Hospital, Geneva University Hospitals, Geneva, Switzerland
| |
Collapse
|
44
|
Zhen J, Liu C, Zhang J, Liao F, Xie H, Tan C, An P, Liu Z, Jiang C, Shi J, Wu K, Dong W. Evaluating Inflammatory Bowel Disease-Related Quality of Life Using an Interpretable Machine Learning Approach: A Multicenter Study in China. J Inflamm Res 2024; 17:5271-5283. [PMID: 39139580 PMCID: PMC11321795 DOI: 10.2147/jir.s470197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/30/2024] [Indexed: 08/15/2024] Open
Abstract
Purpose Impaired quality of life (QOL) is common in patients with inflammatory bowel disease (IBD). A tool to more quickly identify IBD patients at high risk of impaired QOL improves opportunities for earlier intervention and improves long-term prognosis. The purpose of this study was to use a machine learning (ML) approach to develop risk stratification models for evaluating IBD-related QOL impairments. Patients and Methods An online questionnaire was used to collect clinical data on 2478 IBD patients from 42 hospitals distributed across 22 provinces in China from September 2021 to May 2022. Eight ML models used to predict the risk of IBD-related QOL impairments were developed and validated. Model performance was evaluated using a set of indexes and the best ML model was explained using a Local Interpretable Model-Agnostic Explanations (LIME) algorithm. Results The support vector machine (SVM) classifier algorithm-based model outperformed other ML models with an area under the receiver operating characteristic curve (AUC) and an accuracy of 0.80 and 0.71, respectively. The feature importance calculated by the SVM classifier algorithm revealed that glucocorticoid use, anxiety, abdominal pain, sleep disorders, and more severe disease contributed to a higher risk of impaired QOL, while longer disease course and the use of biological agents and immunosuppressants were associated with a lower risk. Conclusion An ML approach for assessing IBD-related QOL impairments is feasible and effective. This mechanism is a promising tool for gastroenterologists to identify IBD patients at high risk of impaired QOL.
Collapse
Affiliation(s)
- Junhai Zhen
- Department of General Practice, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Chuan Liu
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Jixiang Zhang
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Fei Liao
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Huabing Xie
- Department of General Practice, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Cheng Tan
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Ping An
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| | - Zhongchun Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan, 430060, People’s Republic of China
| | - Changqing Jiang
- Department of Clinical Psychology, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, People’s Republic of China
| | - Jie Shi
- Department of Medical Psychology, Chinese People’s Liberation Army Rocket Army Characteristic Medical Center, Beijing, 100032, People’s Republic of China
| | - Kaichun Wu
- Department of Gastroenterology, Xijing Hospital, Air Force Medical University, Xi’an, 710032, People’s Republic of China
| | - Weiguo Dong
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, 430060, People’s Republic of China
| |
Collapse
|
45
|
Kocak B, Akinci D'Antonoli T, Ates Kus E, Keles A, Kala A, Kose F, Kadioglu M, Solak S, Sunman S, Temiz ZH. Self-reported checklists and quality scoring tools in radiomics: a meta-research. Eur Radiol 2024; 34:5028-5040. [PMID: 38180530 DOI: 10.1007/s00330-023-10487-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/11/2023] [Accepted: 11/24/2023] [Indexed: 01/06/2024]
Abstract
OBJECTIVE To evaluate the use of reporting checklists and quality scoring tools for self-reporting purposes in radiomics literature. METHODS Literature search was conducted in PubMed (date, April 23, 2023). The radiomics literature was sampled at random after a sample size calculation with a priori power analysis. A systematic assessment for self-reporting, including the use of documentation such as completed checklists or quality scoring tools, was conducted in original research papers. These eligible papers underwent independent evaluation by a panel of nine readers, with three readers assigned to each paper. Automatic annotation was used to assist in this process. Then, a detailed item-by-item confirmation analysis was carried out on papers with checklist documentation, with independent evaluation of two readers. RESULTS The sample size calculation yielded 117 papers. Most of the included papers were retrospective (94%; 110/117), single-center (68%; 80/117), based on their private data (89%; 104/117), and lacked external validation (79%; 93/117). Only seven papers (6%) had at least one self-reported document (Radiomics Quality Score (RQS), Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), or Checklist for Artificial Intelligence in Medical Imaging (CLAIM)), with a statistically significant binomial test (p < 0.001). Median rate of confirmed items for all three documents was 81% (interquartile range, 6). For quality scoring tools, documented scores were higher than suggested scores, with a mean difference of - 7.2 (standard deviation, 6.8). CONCLUSION Radiomic publications often lack self-reported checklists or quality scoring tools. Even when such documents are provided, it is essential to be cautious, as the accuracy of the reported items or scores may be questionable. CLINICAL RELEVANCE STATEMENT Current state of radiomic literature reveals a notable absence of self-reporting with documentation and inaccurate reporting practices. This critical observation may serve as a catalyst for motivating the radiomics community to adopt and utilize such tools appropriately, thereby fostering rigor, transparency, and reproducibility of their research, moving the field forward. KEY POINTS • In radiomics literature, there has been a notable absence of self-reporting with documentation. • Even if such documents are provided, it is critical to exercise caution because the accuracy of the reported items or scores may be questionable. • Radiomics community needs to be motivated to adopt and appropriately utilize the reporting checklists and quality scoring tools.
Collapse
Affiliation(s)
- Burak Kocak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey.
| | - Tugba Akinci D'Antonoli
- Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland
| | - Ece Ates Kus
- Department of Neuroradiology, Klinikum Lippe, Lemgo, Germany
| | - Ali Keles
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Ahmet Kala
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Fadime Kose
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Mehmet Kadioglu
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Sila Solak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Seyma Sunman
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| | - Zisan Hayriye Temiz
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, 34480, Turkey
| |
Collapse
|
46
|
Li Y, Zhang H, Sun Y, Fan Q, Wang L, Ji C, HuiGu, Chen B, Zhao S, Wang D, Yu P, Li J, Yang S, Zhang C, Wang X. Deep learning-based platform performs high detection sensitivity of intracranial aneurysms in 3D brain TOF-MRA: An external clinical validation study. Int J Med Inform 2024; 188:105487. [PMID: 38761459 DOI: 10.1016/j.ijmedinf.2024.105487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/06/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]
Abstract
PURPOSE To evaluate the diagnostic efficacy of a developed artificial intelligence (AI) platform incorporating deep learning algorithms for the automated detection of intracranial aneurysms in time-of-flight (TOF) magnetic resonance angiography (MRA). METHOD This retrospective study encompassed 3D TOF MRA images acquired between January 2023 and June 2023, aiming to validate the presence of intracranial aneurysms via our developed AI platform. The manual segmentation results by experienced neuroradiologists served as the "gold standard". Following annotation of MRA images by neuroradiologists using InferScholar software, the AI platform conducted automatic segmentation of intracranial aneurysms. Various metrics including accuracy (ACC), balanced ACC, area under the curve (AUC), sensitivity (SE), specificity (SP), F1 score, Brier Score, and Net Benefit were utilized to evaluate the generalization of AI platform. Comparison of intracranial aneurysm identification performance was conducted between the AI platform and six radiologists with experience ranging from 3 to 12 years in interpreting MR images. Additionally, a comparative analysis was carried out between radiologists' detection performance based on independent visual diagnosis and AI-assisted diagnosis. Subgroup analyses were also performed based on the size and location of the aneurysms to explore factors impacting aneurysm detectability. RESULTS 510 patients were enrolled including 215 patients (42.16 %) with intracranial aneurysms and 295 patients (57.84 %) without aneurysms. Compared with six radiologists, the AI platform showed competitive discrimination power (AUC, 0.96), acceptable calibration (Brier Score loss, 0.08), and clinical utility (Net Benefit, 86.96 %). The AI platform demonstrated superior performance in detecting aneurysms with an overall SE, SP, ACC, balanced ACC, and F1 score of 91.63 %, 92.20 %, 91.96 %, 91.92 %, and 90.57 % respectively, outperforming the detectability of the two resident radiologists. For subgroup analysis based on aneurysm size and location, we observed that the SE of the AI platform for identifying tiny (diameter<3mm), small (3 mm ≤ diameter<5mm), medium (5 mm ≤ diameter<7mm) and large aneurysms (diameter ≥ 7 mm) was 87.80 %, 93.14 %, 95.45 %, and 100 %, respectively. Furthermore, the SE for detecting aneurysms in the anterior circulation was higher than that in the posterior circulation. Utilizing the AI assistance, six radiologists (i.e., two residents, two attendings and two professors) achieved statistically significant improvements in mean SE (residents: 71.40 % vs. 88.37 %; attendings: 82.79 % vs. 93.26 %; professors: 90.07 % vs. 97.44 %; P < 0.05) and ACC (residents: 85.29 % vs. 94.12 %; attendings: 91.76 % vs. 97.06 %; professors: 95.29 % vs. 98.82 %; P < 0.05) while no statistically significant change was observed in SP. Overall, radiologists' mean SE increased by 11.40 %, mean SP increased by 1.86 %, and mean ACC increased by 5.88 %, mean balanced ACC promoted by 6.63 %, mean F1 score grew by 7.89 %, and Net Benefit rose by 12.52 %, with a concurrent decrease in mean Brier score declined by 0.06. CONCLUSIONS The deep learning algorithms implemented in the AI platform effectively detected intracranial aneurysms on TOF-MRA and notably enhanced the diagnostic capabilities of radiologists. This indicates that the AI-based auxiliary diagnosis model can provide dependable and precise prediction to improve the diagnostic capacity of radiologists.
Collapse
Affiliation(s)
- Yuanyuan Li
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China; Department of Radiology, Liaocheng People's Hospital, Shandong First Medical University & Shandong Academy of Medical Sciences, China; Department of Radiology, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, China
| | - Huiling Zhang
- Institute of Research, Infervision Medical Technology Co., Ltd, China
| | - Yun Sun
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China
| | - Qianrui Fan
- Institute of Research, Infervision Medical Technology Co., Ltd, China
| | - Long Wang
- Department of Cardiovascular Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China
| | - Congshan Ji
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China
| | - HuiGu
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China; Department of Radiology, Liaocheng People's Hospital, Shandong First Medical University & Shandong Academy of Medical Sciences, China; Department of Radiology, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, China
| | - Baojin Chen
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China
| | - Shuo Zhao
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China; Department of Radiology, Liaocheng People's Hospital, Shandong First Medical University & Shandong Academy of Medical Sciences, China; Department of Radiology, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, China
| | - Dawei Wang
- Institute of Research, Infervision Medical Technology Co., Ltd, China
| | - Pengxin Yu
- Institute of Research, Infervision Medical Technology Co., Ltd, China
| | - Junchen Li
- Department of Radiology, Changshu Hospital Affiliated to Nanjing University of Chinese Medicine, China
| | - Shifeng Yang
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China.
| | - Chuanchen Zhang
- Department of Radiology, Liaocheng People's Hospital, Shandong First Medical University & Shandong Academy of Medical Sciences, China.
| | - Ximing Wang
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, China; Department of Radiology, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, China.
| |
Collapse
|
47
|
Shyr D, Zhang BM, Saini G, Brewer SC. Exploring Pattern of Relapse in Pediatric Patients with Acute Lymphocytic Leukemia and Acute Myeloid Leukemia Undergoing Stem Cell Transplant Using Machine Learning Methods. J Clin Med 2024; 13:4021. [PMID: 39064061 PMCID: PMC11277799 DOI: 10.3390/jcm13144021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open
Abstract
Background. Leukemic relapse remains the primary cause of treatment failure and death after allogeneic hematopoietic stem cell transplant. Changes in post-transplant donor chimerism have been identified as a predictor of relapse. A better predictive model of relapse incorporating donor chimerism has the potential to improve leukemia-free survival by allowing earlier initiation of post-transplant treatment on individual patients. We explored the use of machine learning, a suite of analytical methods focusing on pattern recognition, to improve post-transplant relapse prediction. Methods. Using a cohort of 63 pediatric patients with acute lymphocytic leukemia (ALL) and 46 patients with acute myeloid leukemia (AML) who underwent stem cell transplant at a single institution, we built predictive models of leukemic relapse with both pre-transplant and post-transplant patient variables (specifically lineage-specific chimerism) using the random forest classifier. Local Interpretable Model-Agnostic Explanations, an interpretable machine learning tool was used to confirm our random forest classification result. Results. Our analysis showed that a random forest model using these hyperparameter values achieved 85% accuracy, 85% sensitivity, 89% specificity for ALL, while for AML 81% accuracy, 75% sensitivity, and 100% specificity at predicting relapses within 24 months post-HSCT in cross validation. The Local Interpretable Model-Agnostic Explanations tool was able to confirm many variables that the random forest classifier identified as important for the relapse prediction. Conclusions. Machine learning methods can reveal the interaction of different risk factors of post-transplant leukemic relapse and robust predictions can be obtained even with a modest clinical dataset. The random forest classifier distinguished different important predictive factors between ALL and AML in our relapse models, consistent with previous knowledge, lending increased confidence to adopting machine learning prediction to clinical management.
Collapse
Affiliation(s)
- David Shyr
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, Section of Stem Cell Transplant, Stanford University, Stanford, CA 94305, USA
| | - Bing M. Zhang
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gopin Saini
- Stem Cell and Gene Therapy Clinical Trial Program, Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| | - Simon C. Brewer
- Department of Geography, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
48
|
Makarov V, Chabbert C, Koletou E, Psomopoulos F, Kurbatova N, Ramirez S, Nelson C, Natarajan P, Neupane B. Good machine learning practices: Learnings from the modern pharmaceutical discovery enterprise. Comput Biol Med 2024; 177:108632. [PMID: 38788373 DOI: 10.1016/j.compbiomed.2024.108632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/07/2024] [Accepted: 05/18/2024] [Indexed: 05/26/2024]
Abstract
Machine Learning (ML) and Artificial Intelligence (AI) have become an integral part of the drug discovery and development value chain. Many teams in the pharmaceutical industry nevertheless report the challenges associated with the timely, cost effective and meaningful delivery of ML and AI powered solutions for their scientists. We sought to better understand what these challenges were and how to overcome them by performing an industry wide assessment of the practices in AI and Machine Learning. Here we report results of the systematic business analysis of the personas in the modern pharmaceutical discovery enterprise in relation to their work with the AI and ML technologies. We identify 23 common business problems that individuals in these roles face when they encounter AI and ML technologies at work, and describe best practices (Good Machine Learning Practices) that address these issues.
Collapse
Affiliation(s)
- Vladimir Makarov
- The Pistoia Alliance, 401 Edgewater Place, Suite 600, Wakefield, MA, 01880, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Lee TY, Price D, Yadav CP, Roy R, Lim LHM, Wang E, Wechsler ME, Jackson DJ, Busby J, Heaney LG, Pfeffer PE, Mahboub B, Perng Steve DW, Cosio BG, Perez-de-Llano L, Al-Lehebi R, Larenas-Linnemann D, Al-Ahmad M, Rhee CK, Iwanaga T, Heffler E, Canonica GW, Costello R, Papadopoulos NG, Papaioannou AI, Porsbjerg CM, Torres-Duque CA, Christoff GC, Popov TA, Hew M, Peters M, Gibson PG, Maspero J, Bergeron C, Cerda S, Contreras-Contreras EA, Chen W, Sadatsafavi M. International Variation in Severe Exacerbation Rates in Patients With Severe Asthma. Chest 2024; 166:28-38. [PMID: 38395297 DOI: 10.1016/j.chest.2024.02.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 12/07/2023] [Accepted: 02/19/2024] [Indexed: 02/25/2024] Open
Abstract
BACKGROUND Exacerbation frequency strongly influences treatment choices in patients with severe asthma. RESEARCH QUESTION What is the extent of the variability of exacerbation rate across countries and its implications in disease management? STUDY DESIGN AND METHODS We retrieved data from the International Severe Asthma Registry, an international observational cohort of patients with a clinical diagnosis of severe asthma. We identified patients aged ≥ 18 years who did not initiate any biologics prior to baseline visit. A severe exacerbation was defined as the use of oral corticosteroids for ≥ 3 days or asthma-related hospitalization/ED visit. A series of negative binomial models were applied to estimate country-specific severe exacerbation rates during 365 days of follow-up, starting from a naive model with country as the only variable to an adjusted model with country as a random-effect term and patient and disease characteristics as independent variables. RESULTS The final sample included 7,510 patients from 17 countries (56% from the United States), contributing to 1,939 severe exacerbations (0.27/person-year). There was large between-country variation in observed severe exacerbation rate (minimum, 0.04 [Argentina]; maximum, 0.88 [Saudi Arabia]; interquartile range, 0.13-0.54), which remained substantial after adjusting for patient characteristics and sampling variability (interquartile range, 0.16-0.39). INTERPRETATION Individuals with similar patient characteristics but coming from different jurisdictions have varied severe exacerbation risks, even after controlling for patient and disease characteristics. This suggests unknown patient factors or system-level variations at play. Disease management guidelines should recognize such between-country variability. Risk prediction models that are calibrated for each jurisdiction will be needed to optimize treatment strategies.
Collapse
Affiliation(s)
- Tae Yoon Lee
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore; Respiratory Evaluation Sciences Program, Faculty of Pharmaceutical Sciences, University of British Columbia, Canada
| | - David Price
- Optimum Patient Care Global, Cambridge, England; Observational and Pragmatic Research Institute, Singapore, Singapore; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, Scotland
| | | | - Rupsa Roy
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Laura Huey Mien Lim
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Eileen Wang
- Division of Allergy & Clinical Immunology, Department of Medicine, National Jewish Health, Denver, CO; Division of Allergy & Clinical Immunology, Department of Medicine, University of Colorado School of Medicine, Aurora, CO
| | - Michael E Wechsler
- NJH Cohen Family Asthma Institute, Department of Medicine, National Jewish Health, Denver, CO
| | - David J Jackson
- UK Severe Asthma Network and National Registry, Guy's and St Thomas' NHS Trust, London, England; School of Immunology & Microbial Sciences, King's College London, London, England
| | - John Busby
- Centre for Public Health, Queen's University Belfast, Belfast, Northern Ireland
| | - Liam G Heaney
- Wellcome-Wolfson Centre for Experimental Medicine, Queen's University Belfast, Belfast, Northern Ireland
| | - Paul E Pfeffer
- Department of Respiratory Medicine, Barts Health NHS Trust, London, England; Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, England
| | - Bassam Mahboub
- College of Medicine, University of Sharjah, Sharjah, United Arab Emirates; Rashid Hospital, Dubai Health Authority, Dubai, United Arab Emirates
| | - Diahn-Warng Perng Steve
- Division of Clinical Respiratory, Physiology Chest Department, Taipei Veterans General Hospital, Taipei City, Taiwan; COPD Assembly of the Asian Pacific Society of Respirology, Tokyo, Japan
| | - Borja G Cosio
- Son Espases University Hospital-IdISBa-Ciberes, Mallorca, Spain
| | - Luis Perez-de-Llano
- Pneumology Service, Lucus Augusti University Hospital, EOXI Lugo, Monforte, Cervo, Spain; Biodiscovery Research Group, Health Research Institute of Santiago de Compostela, Spain
| | - Riyad Al-Lehebi
- Department of Pulmonology, King Fahad Medical City, Riyadh, Saudi Arabia; College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
| | | | - Mona Al-Ahmad
- Microbiology Department, Faculty of Medicine, Kuwait University, Al-Rashed Allergy Center, Ministry of Health, Kuwait
| | - Chin Kook Rhee
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - Takashi Iwanaga
- Center for General Medical Education and Clinical Training, Kindai University Hospital, Osakasayama, Japan
| | - Enrico Heffler
- Personalized Medicine, Asthma and Allergy, Humanitas Clinical and Research Center IRCCS, Rozzano, Milan, Italy; Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Giorgio Walter Canonica
- Personalized Medicine, Asthma and Allergy, Humanitas Clinical and Research Center IRCCS, Rozzano, Milan, Italy; Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Richard Costello
- Clinical Research Centre, Smurfit Building Beaumont Hospital, Department of Respiratory Medicine, RCSI, Dublin, Ireland
| | - Nikolaos G Papadopoulos
- Division of Infection, Immunity & Respiratory Medicine, University of Manchester, Manchester, England; Allergy Department, 2nd Pediatric Clinic, University of Athens, Athens, Greece
| | - Andriana I Papaioannou
- 2nd Respiratory Medicine Department, National and Kapodistrian University of Athens Medical School, Attikon University Hospital, Athens, Greece
| | - Celeste M Porsbjerg
- Respiratory Research Unit, Bispebjerg University Hospital, Copenhagen, Denmark
| | - Carlos A Torres-Duque
- CINEUMO, Respiratory Research Center, Fundación Neumológica Colombiana, Bogotá, Colombia
| | | | - Todor A Popov
- Clinic of Occupational Diseases, University Hospital "Sv. Ivan Rilski", Sofia, Bulgaria
| | - Mark Hew
- Allergy, Asthma & Clinical Immunology Service, Alfred Health, Melbourne, Australia; Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Matthew Peters
- Department of Thoracic Medicine, Concord Hospital, Sydney, Australia
| | - Peter G Gibson
- Australian Severe Asthma Network, Priority Research Centre for Healthy Lungs, University of Newcastle, Newcastle, Australia; Hunter Medical Research Institute, Department of Respiratory and Sleep Medicine, John Hunter Hospital, New Lambton Heights, Australia
| | - Jorge Maspero
- Clinical Research for Allergy and Respiratory Medicine, CIDEA Foundation, Buenos Aires, Argentina; University Career of Specialists in Allergy and Clinical Immunology, Buenos Aires University School of Medicine, Buenos Aires, Argentina
| | - Celine Bergeron
- Centre for Lung Health, Vancouver General Hospital, University of British Columbia, Vancouver, BC, Canada
| | - Saraid Cerda
- Medical Specialties Unit, Secretary of National Defense, Mexico City, Mexico
| | - Elvia Angelica Contreras-Contreras
- Mexican Council of Clinical Immunology and Allergy, Mexico City Office, Mexico City, Mexico; Department of Allergy and Clinical Immunology, Lic. Adolfo López Mateos Regional Hospital of the Institute of Security and Social Services for State Workers (ISSSTE), Mexico City, Mexico
| | - Wenjia Chen
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore.
| | - Mohsen Sadatsafavi
- Respiratory Evaluation Sciences Program, Faculty of Pharmaceutical Sciences, University of British Columbia, Canada
| |
Collapse
|
50
|
Wang Y, Fu W, Zhang Y, Wang D, Gu Y, Wang W, Xu H, Ge X, Ye C, Fang J, Su L, Wang J, He W, Zhang X, Feng R. Constructing and implementing a performance evaluation indicator set for artificial intelligence decision support systems in pediatric outpatient clinics: an observational study. Sci Rep 2024; 14:14482. [PMID: 38914707 PMCID: PMC11196575 DOI: 10.1038/s41598-024-64893-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 06/13/2024] [Indexed: 06/26/2024] Open
Abstract
Artificial intelligence (AI) decision support systems in pediatric healthcare have a complex application background. As an AI decision support system (AI-DSS) can be costly, once applied, it is crucial to focus on its performance, interpret its success, and then monitor and update it to ensure ongoing success consistently. Therefore, a set of evaluation indicators was explicitly developed for AI-DSS in pediatric healthcare, enabling continuous and systematic performance monitoring. The study unfolded in two stages. The first stage encompassed establishing the evaluation indicator set through a literature review, a focus group interview, and expert consultation using the Delphi method. In the second stage, weight analysis was conducted. Subjective weights were calculated based on expert opinions through analytic hierarchy process, while objective weights were determined using the entropy weight method. Subsequently, subject and object weights were synthesized to form the combined weight. In the two rounds of expert consultation, the authority coefficients were 0.834 and 0.846, Kendall's coordination coefficient was 0.135 in Round 1 and 0.312 in Round 2. The final evaluation indicator set has three first-class indicators, fifteen second-class indicators, and forty-seven third-class indicators. Indicator I-1(Organizational performance) carries the highest weight, followed by Indicator I-2(Societal performance) and Indicator I-3(User experience performance) in the objective and combined weights. Conversely, 'Societal performance' holds the most weight among the subjective weights, followed by 'Organizational performance' and 'User experience performance'. In this study, a comprehensive and specialized set of evaluation indicators for the AI-DSS in the pediatric outpatient clinic was established, and then implemented. Continuous evaluation still requires long-term data collection to optimize the weight proportions of the established indicators.
Collapse
Affiliation(s)
- Yingwen Wang
- Nursing Department, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Weijia Fu
- Medical Information Center, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Yuejie Zhang
- School of Computer Science, Fudan University, Shanghai, 200438, China
| | - Daoyang Wang
- School of Public, Health Fudan University, Shanghai, 200032, China
| | - Ying Gu
- Nursing Department, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Weibing Wang
- School of Public, Health Fudan University, Shanghai, 200032, China
| | - Hong Xu
- Nephrology Department, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Xiaoling Ge
- Statistical and Data Management Center, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Chengjie Ye
- Medical Information Center, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Jinwu Fang
- School of Public, Health Fudan University, Shanghai, 200032, China
| | - Ling Su
- Statistical and Data Management Center, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Jiayu Wang
- National Health Commission Key Laboratory of Neonatal Diseases (Fudan University), Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Wen He
- Respiratory Department, Children's Hospital of Fudan University, Shanghai, 201102, China
| | - Xiaobo Zhang
- Respiratory Department, Children's Hospital of Fudan University, Shanghai, 201102, China.
| | - Rui Feng
- School of Computer Science, Fudan University, Shanghai, 200438, China.
- School of Computer Science, Fudan University, 2005 Songhu Road, Shanghai, 200438, China.
| |
Collapse
|