1
|
Schöler LM, Graf L, Airola A, Ritzi A, Simon M, Peltonen LM. Determining the ground truth for the prediction of delirium in adult patients in acute care: a scoping review. JAMIA Open 2025; 8:ooaf037. [PMID: 40421319 PMCID: PMC12105575 DOI: 10.1093/jamiaopen/ooaf037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 04/10/2025] [Accepted: 04/25/2025] [Indexed: 05/28/2025] Open
Abstract
Objective Delirium is a severe condition, often underreported and linked to adverse outcomes such as increased mortality and prolonged hospitalization. Despite its significance, delirium prediction is often hindered by underreporting and inconsistent labeling, highlighting the need for models trained on reliably labeled data (ground truth). This review examines (i) practices for determining labels in delirium prediction models and (ii) how study designs affect label quality, aiming to identify key considerations for improving model reliability. Materials and Methods A search of Cochrane, PubMed, and IEEE identified 120 studies that met the inclusion criteria. Results To establish the ground truth, 40.8% of studies used routine data, while 42.5% used primary data. The Confusion Assessment Method (CAM) was the most widely used assessment tool (60. 0%). Label and data leakage occurred in 35.0% of studies. High Risk of Bias (RoB) was a recurring issue, with 31.7% of studies lacking sufficient reporting and 36.7% showing inadequate outcome determination. Studies using primary data had lower RoB, whereas those with unclear label sources displayed higher RoB. Discussion Our findings underscore the importance of careful planning in determining the ground truth frequently neglected in existing studies. To address these challenges, we provide a decision support flowchart to guide the development of more accurate and reliable prediction models. Conclusion This review uncovers significant variability in labeling methods and discusses how this may affect delirium prediction model reliability. Highlighting the importance of addressing underreporting bias and providing guidance for developing more robust models.
Collapse
Affiliation(s)
- Lili M Schöler
- Department of Nursing, Medical Center—University of Freiburg, Freiburg 79106, Germany
- Department of Nursing Science, University of Turku, Turku 20520, Finland
| | - Lisa Graf
- Department of Neurology, Medical Center—University of Freiburg, Freiburg 79106, Germany
- Neurorobotics Lab, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Antti Airola
- Department of Computing, University of Turku, Turku 20500, Finland
| | - Alexander Ritzi
- Department of Nursing, Medical Center—University of Freiburg, Freiburg 79106, Germany
- Centre for Geriatric Medicine and Gerontology (ZGGF), Medical Center—University of Freiburg, Freiburg 79106, Germany
| | - Michael Simon
- Institute of Nursing Science, Department of Public Health, University of Basel, Basel 4056, Switzerland
| | - Laura-Maria Peltonen
- Department of Nursing Science, University of Turku, Turku 20520, Finland
- Research Services, The Wellbeing Services County of Southwest Finland, Turku 20521, Finland
| |
Collapse
|
2
|
Pabón J, Gómez D, Cerón JD, Salazar-Cabrera R, López DM, Blobel B. A Comprehensive Dataset for Activity of Daily Living (ADL) Research Compiled by Unifying and Processing Multiple Data Sources. J Pers Med 2025; 15:210. [PMID: 40423081 DOI: 10.3390/jpm15050210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Revised: 05/12/2025] [Accepted: 05/15/2025] [Indexed: 05/28/2025] Open
Abstract
Background: Activities of Daily Living (ADLs) are essential tasks performed at home and used in healthcare to monitor sedentary behavior, track rehabilitation therapy, and monitor chronic obstructive pulmonary disease. The Barthel Index, used by healthcare professionals, has limitations due to its subjectivity. Human activity recognition (HAR) is a more accurate method using Information and Communication Technologies (ICTs) to assess ADLs more accurately. This work aims to create a singular, adaptable, and heterogeneous ADL dataset that integrates information from various sources, ensuring a rich representation of different individuals and environments. Methods: A literature review was conducted in Scopus, the University of California Irvine (UCI) Machine Learning Repository, Google Dataset Search, and the University of Cauca Repository to obtain datasets related to ADLs. Inclusion criteria were defined, and a list of dataset characteristics was made to integrate multiple datasets. Twenty-nine datasets were identified, including data from various accelerometers, gyroscopes, inclinometers, and heart rate monitors. These datasets were classified and analyzed from the review. Tasks such as dataset selection, categorization, analysis, cleaning, normalization, and data integration were performed. Results: The resulting unified dataset contained 238,990 samples, 56 activities, and 52 columns. The integrated dataset features a wealth of information from diverse individuals and environments, improving its adaptability for various applications. Conclusions: In particular, it can be used in various data science projects related to ADL and HAR, and due to the integration of diverse data sources, it is potentially useful in addressing bias in and improving the generalizability of machine learning models.
Collapse
Affiliation(s)
- Jaime Pabón
- Telematics Engineering Research Group, Telematics Department, Universidad del Cauca, Popayán 190002, Colombia
| | - Daniel Gómez
- Telematics Engineering Research Group, Telematics Department, Universidad del Cauca, Popayán 190002, Colombia
| | - Jesús D Cerón
- Telematics Engineering Research Group, Telematics Department, Universidad del Cauca, Popayán 190002, Colombia
| | - Ricardo Salazar-Cabrera
- Telematics Engineering Research Group, Telematics Department, Universidad del Cauca, Popayán 190002, Colombia
| | - Diego M López
- Telematics Engineering Research Group, Telematics Department, Universidad del Cauca, Popayán 190002, Colombia
| | - Bernd Blobel
- Medical Faculty, University of Regensburg, 93053 Regensburg, Germany
- eHealth Competence Center Bavaria, Deggendorf Institute of Technology, 94469 Deggendorf, Germany
- First Medical Faculty, Charles University Prague, 12800 Prague, Czech Republic
| |
Collapse
|
3
|
Friedman JI, Parchure P, Cheng FY, Fu W, Cheertirala S, Timsina P, Raut G, Reina K, Joseph-Jimerson J, Mazumdar M, Freeman R, Reich DL, Kia A. Machine Learning Multimodal Model for Delirium Risk Stratification. JAMA Netw Open 2025; 8:e258874. [PMID: 40332938 PMCID: PMC12059973 DOI: 10.1001/jamanetworkopen.2025.8874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 03/05/2025] [Indexed: 05/08/2025] Open
Abstract
Importance Automating the identification of risk for developing hospital delirium with models that use machine learning (ML) could facilitate more rapid prevention, identification, and treatment of delirium. However, there are very few reports on the performance of ML models for delirium risk stratification in live clinical practice. Objective To report on development, operationalization, and validation of a multimodal ML model for delirium risk stratification in live clinical practice and its associations with workflow and clinical outcomes. Design, Setting, and Participants This quality improvement study developed an ML model supported by automated electronic medical records to stratify the risk of non-intensive care unit delirium in live clinical practice using the Confusion Assessment Method as the diagnostic reference standard, with an iterative model update method. Data from patients aged at least 60 years admitted to non-intensive care units at Mount Sinai Hospital between January 2016 and January 2020 were used to train and test the ML model presented. The model was validated in live clinical practice from March 2023 to March 2024. Analysis of the model's associations with workflow and clinical outcomes was conducted retrospectively in 2024, comparing hospitalized patients prior to deployment of any model version (pre-ML cohort) and during model clinical deployment (post-ML cohort). Main Outcomes and Measures Outcomes of interest were area under the receiver operating characteristic curve, monthly delirium detection rates, median length of hospital stay, and daily doses of opiate, benzodiazepine, and antipsychotic medications administered. Results The overall sample included 32 284 inpatient admissions (mean [SD] age, 73.56 (9.67) years, 15 157 [46.9%] women). A total of 25 261 inpatient admissions of older patients with both medical and surgical primary diagnoses represented the combined model testing and training cohort (median age, 73.37 [66.42-81.36] years) and live clinical deployment validation cohort (median [IQR] age, 72.11 [62.26-78.97] years), while 7023 inpatient admissions of older patients with both medical and surgical primary diagnoses represented the combined pre-ML (median [IQR] age, 74.00 [68.00-81.00] years) and post-ML (median [IQR] age, 75.33 [68.34-82.91] years) cohorts. The model presented is a fusion of electronic medical record patient data features and clinical note features processed by natural language processing. The results of model validation in live clinical practice included an area under the curve of 0.94 (95% CI, 0.93-0.95). Median (IQR) monthly delirium detection rates of inpatients assessed for delirium with the Confusion Assessment Method increased from 4.42% (95% CI, 3.70%-5.14%) in the pre-ML cohort to 17.17% (95% CI, 15.54%-18.80%) in the post-ML cohort (P < .001). Post-ML vs pre-ML cohorts received lower daily doses of benzodiazepines (median [IQR] 0.93 [0.42-2.28] diazepam dose equivalents vs 1.60 [0.66-4.27] diazepam dose equivalents; P < .001) and olanzapine (median [IQR], 1.09 [0.38-2.46] mg vs 2.50 [1.17-6.65] mg; P < .001). Conclusions and Relevance This quality improvement study demonstrates the feasibility of a novel multimodal ML model to automate delirium risk stratification in live clinical practice. The model demonstrated acceptable performance in live clinical practice and may facilitate resource allocation to enhance delirium identification and care.
Collapse
Affiliation(s)
- Joseph I. Friedman
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Prathamesh Parchure
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Fu-Yuan Cheng
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Weijia Fu
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Satyanarayana Cheertirala
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Prem Timsina
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Ganesh Raut
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Katherine Reina
- Nursing Administration, Mount Sinai Morningside Hospital, New York, New York
| | | | - Madhu Mazumdar
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Robert Freeman
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
| | - David L. Reich
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Arash Kia
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
4
|
Heikal M, Saad H, Ghanime PM, Bou Dargham T, Bizri M, Kobeissy F, El Hajj W, Talih F. Using Machine Learning and Electronic Health Records to Identify Neuropsychiatric Risk Scores for Delirium in ICU and General Hospital Settings. Neuropsychiatr Dis Treat 2024; 20:1861-1876. [PMID: 39372875 PMCID: PMC11456270 DOI: 10.2147/ndt.s479756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/24/2024] [Indexed: 10/08/2024] Open
Abstract
Objective Delirium is a common and acute neuropsychiatric syndrome that requires timely intervention to prevent its associated morbidity and mortality. Yet, its diagnosis and symptoms are often overlooked due to its variable clinical presentation and fluctuating nature. Thus, in this study, we address the barriers to delirium diagnosis by utilizing a machine learning-based predictive algorithm for incident delirium that relies on archived electronic health records (EHRs) data. Methods We used the Medical Information Mart for Intensive Care (MIMIC) database to create a detailed dataset for identifying delirium in intensive care unit (ICU) patients. Our approach involved training machine learning models on this dataset to pinpoint critical clinical features for delirium detection. These features were then refined and applied to non-ICU patients using EHRs from the American University of Beirut Medical Center (AUBMC). Results Our study assessed machine learning models like Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Classification and Regression Trees (CART), Random Forest (RF), Neural Oblivious Decision Ensembles (NODE), and Logistic Regression (LR), highlighting superior delirium detection in diverse clinical settings. The CatBoost model excelled in ICU environments with an F1 Score of 89.2%, while XGBoost performed best in general hospital settings with a 75.4% F1 Score. Interpretations using Tabular Local Interpretable Model-agnostic Explanations (LIME) revealed critical indicators such as prothrombin time and hematocrit levels, enhancing model transparency and clinical applicability. These clinical insights help differentiate the delirium predictors between ICU patients, who are often sensitive to various factors. Conclusion The proposed predictive algorithm improves delirium detection rates and streamlines efficiency in hospital electronic systems, thereby enabling prompt interventions to prevent delirium progression and associated complications. The clinical indicators for delirium that we identified in general hospital settings and ICU can greatly help healthcare professionals identify potential causes of delirium and reduce misdiagnosis.
Collapse
Affiliation(s)
- Mariam Heikal
- Department of Computer Science, American University of Beirut, Beirut, Lebanon
| | - Halim Saad
- Department of Psychiatry, Faculty of Medicine, American University of Beirut, Beirut, Lebanon
| | - Pia Maria Ghanime
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Tarek Bou Dargham
- Department of Neurosurgery, Duke University Medical Center, Durham, NC, USA
| | - Maya Bizri
- Department of Psychiatry and Psychology, Cleveland Clinic, Cleveland, OH, USA
| | - Firas Kobeissy
- Department of Neurobiology, Morehouse School of Medicine, Atlanta, GA, USA
| | - Wassim El Hajj
- Department of Computer Science, American University of Beirut, Beirut, Lebanon
| | - Farid Talih
- Department of Psychiatry, Faculty of Medicine, American University of Beirut, Beirut, Lebanon
| |
Collapse
|
5
|
Weidmann AE, Watson EW. Novel opportunities for clinical pharmacy research: development of a machine learning model to identify medication related causes of delirium in different patient groups. Int J Clin Pharm 2024; 46:992-995. [PMID: 38594470 PMCID: PMC11286716 DOI: 10.1007/s11096-024-01707-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 01/22/2024] [Indexed: 04/11/2024]
Abstract
The advent of artificial intelligence (AI) technologies has taken the world of science by storm in 2023. The opportunities of this easy to access technology for clinical pharmacy research are yet to be fully understood. The development of a custom-made large language model (LLM) (DELSTAR) trained on a wide range of internationally recognised scientific publication databases, pharmacovigilance sites and international product characteristics to help identify and summarise medication related information on delirium, as a proof-of-concept model, identified new facilitators and barriers for robust clinical pharmacy practice research. This technology holds great promise for the development of much more comprehensive prescribing guidelines, practice support applications for clinical pharmacy, increased patient and prescribing safety and resultant implications for healthcare costs. The challenge will be to ensure its methodologically robust use and the detailed and transparent verification of its information accuracy.
Collapse
Affiliation(s)
- Anita Elaine Weidmann
- Department of Clinical Pharmacy, Institute of Pharmacy, Innsbruck University, Innrain 80, 6020, Innsbruck, Austria.
| | - Edward William Watson
- Department of Media and Learning Technology, Innsbruck University, Innrain 52, 6020, Innsbruck, Austria
| |
Collapse
|
6
|
Snigurska UA, Liu Y, Ser SE, Macieira TGR, Ansell M, Lindberg D, Prosperi M, Bjarnadottir RI, Lucero RJ. Risk of bias in prognostic models of hospital-induced delirium for medical-surgical units: A systematic review. PLoS One 2023; 18:e0285527. [PMID: 37590196 PMCID: PMC10434879 DOI: 10.1371/journal.pone.0285527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 04/25/2023] [Indexed: 08/19/2023] Open
Abstract
PURPOSE The purpose of this systematic review was to assess risk of bias in existing prognostic models of hospital-induced delirium for medical-surgical units. METHODS APA PsycInfo, CINAHL, MEDLINE, and Web of Science Core Collection were searched on July 8, 2022, to identify original studies which developed and validated prognostic models of hospital-induced delirium for adult patients who were hospitalized in medical-surgical units. The Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies was used for data extraction. The Prediction Model Risk of Bias Assessment Tool was used to assess risk of bias. Risk of bias was assessed across four domains: participants, predictors, outcome, and analysis. RESULTS Thirteen studies were included in the qualitative synthesis, including ten model development and validation studies and three model validation only studies. The methods in all of the studies were rated to be at high overall risk of bias. The methods of statistical analysis were the greatest source of bias. External validity of models in the included studies was tested at low levels of transportability. CONCLUSIONS Our findings highlight the ongoing scientific challenge of developing a valid prognostic model of hospital-induced delirium for medical-surgical units to tailor preventive interventions to patients who are at high risk of this iatrogenic condition. With limited knowledge about generalizable prognosis of hospital-induced delirium in medical-surgical units, existing prognostic models should be used with caution when creating clinical practice policies. Future research protocols must include robust study designs which take into account the perspectives of clinicians to identify and validate risk factors of hospital-induced delirium for accurate and generalizable prognosis in medical-surgical units.
Collapse
Affiliation(s)
- Urszula A. Snigurska
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States of America
| | - Yiyang Liu
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States of America
| | - Sarah E. Ser
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States of America
| | - Tamara G. R. Macieira
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States of America
| | - Margaret Ansell
- Health Science Center Libraries, George A. Smathers Libraries, University of Florida, Gainesville, FL, United States of America
| | - David Lindberg
- Department of Statistics, College of Liberal Arts and Sciences, University of Florida, Gainesville, FL, United States of America
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States of America
| | - Ragnhildur I. Bjarnadottir
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States of America
| | - Robert J. Lucero
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States of America
- School of Nursing, University of California Los Angeles, Los Angeles, CA, United States of America
| |
Collapse
|
7
|
Strating T, Shafiee Hanjani L, Tornvall I, Hubbard R, Scott IA. Navigating the machine learning pipeline: a scoping review of inpatient delirium prediction models. BMJ Health Care Inform 2023; 30:e100767. [PMID: 37407226 PMCID: PMC10335592 DOI: 10.1136/bmjhci-2023-100767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/12/2023] [Indexed: 07/07/2023] Open
Abstract
OBJECTIVES Early identification of inpatients at risk of developing delirium and implementing preventive measures could avoid up to 40% of delirium cases. Machine learning (ML)-based prediction models may enable risk stratification and targeted intervention, but establishing their current evolutionary status requires a scoping review of recent literature. METHODS We searched ten databases up to June 2022 for studies of ML-based delirium prediction models. Eligible criteria comprised: use of at least one ML prediction method in an adult hospital inpatient population; published in English; reporting at least one performance measure (area under receiver-operator curve (AUROC), sensitivity, specificity, positive or negative predictive value). Included models were categorised by their stage of maturation and assessed for performance, utility and user acceptance in clinical practice. RESULTS Among 921 screened studies, 39 met eligibility criteria. In-silico performance was consistently high (median AUROC: 0.85); however, only six articles (15.4%) reported external validation, revealing degraded performance (median AUROC: 0.75). Three studies (7.7%) of models deployed within clinical workflows reported high accuracy (median AUROC: 0.92) and high user acceptance. DISCUSSION ML models have potential to identify inpatients at risk of developing delirium before symptom onset. However, few models were externally validated and even fewer underwent prospective evaluation in clinical settings. CONCLUSION This review confirms a rapidly growing body of research into using ML for predicting delirium risk in hospital settings. Our findings offer insights for both developers and clinicians into strengths and limitations of current ML delirium prediction applications aiming to support but not usurp clinician decision-making.
Collapse
Affiliation(s)
- Tom Strating
- Centre for Health Services Research, The University of Queensland Faculty of Medicine, Brisbane, Queensland, Australia
| | - Leila Shafiee Hanjani
- Centre for Health Services Research, The University of Queensland Faculty of Medicine, Brisbane, Queensland, Australia
| | - Ida Tornvall
- Centre for Health Services Research, The University of Queensland Faculty of Medicine, Brisbane, Queensland, Australia
| | - Ruth Hubbard
- Centre for Health Services Research, The University of Queensland Faculty of Medicine, Brisbane, Queensland, Australia
| | - Ian A Scott
- Centre for Health Services Research, The University of Queensland Faculty of Medicine, Brisbane, Queensland, Australia
- Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Woolloongabba, Queensland, Australia
| |
Collapse
|
8
|
Tornero-Costa R, Martinez-Millana A, Azzopardi-Muscat N, Lazeri L, Traver V, Novillo-Ortiz D. Methodological and Quality Flaws in the Use of Artificial Intelligence in Mental Health Research: Systematic Review. JMIR Ment Health 2023; 10:e42045. [PMID: 36729567 PMCID: PMC9936371 DOI: 10.2196/42045] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/02/2022] [Accepted: 11/20/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) is giving rise to a revolution in medicine and health care. Mental health conditions are highly prevalent in many countries, and the COVID-19 pandemic has increased the risk of further erosion of the mental well-being in the population. Therefore, it is relevant to assess the current status of the application of AI toward mental health research to inform about trends, gaps, opportunities, and challenges. OBJECTIVE This study aims to perform a systematic overview of AI applications in mental health in terms of methodologies, data, outcomes, performance, and quality. METHODS A systematic search in PubMed, Scopus, IEEE Xplore, and Cochrane databases was conducted to collect records of use cases of AI for mental health disorder studies from January 2016 to November 2021. Records were screened for eligibility if they were a practical implementation of AI in clinical trials involving mental health conditions. Records of AI study cases were evaluated and categorized by the International Classification of Diseases 11th Revision (ICD-11). Data related to trial settings, collection methodology, features, outcomes, and model development and evaluation were extracted following the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) guideline. Further, evaluation of risk of bias is provided. RESULTS A total of 429 nonduplicated records were retrieved from the databases and 129 were included for a full assessment-18 of which were manually added. The distribution of AI applications in mental health was found unbalanced between ICD-11 mental health categories. Predominant categories were Depressive disorders (n=70) and Schizophrenia or other primary psychotic disorders (n=26). Most interventions were based on randomized controlled trials (n=62), followed by prospective cohorts (n=24) among observational studies. AI was typically applied to evaluate quality of treatments (n=44) or stratify patients into subgroups and clusters (n=31). Models usually applied a combination of questionnaires and scales to assess symptom severity using electronic health records (n=49) as well as medical images (n=33). Quality assessment revealed important flaws in the process of AI application and data preprocessing pipelines. One-third of the studies (n=56) did not report any preprocessing or data preparation. One-fifth of the models were developed by comparing several methods (n=35) without assessing their suitability in advance and a small proportion reported external validation (n=21). Only 1 paper reported a second assessment of a previous AI model. Risk of bias and transparent reporting yielded low scores due to a poor reporting of the strategy for adjusting hyperparameters, coefficients, and the explainability of the models. International collaboration was anecdotal (n=17) and data and developed models mostly remained private (n=126). CONCLUSIONS These significant shortcomings, alongside the lack of information to ensure reproducibility and transparency, are indicative of the challenges that AI in mental health needs to face before contributing to a solid base for knowledge generation and for being a support tool in mental health management.
Collapse
Affiliation(s)
- Roberto Tornero-Costa
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Valencia, Spain
| | - Antonio Martinez-Millana
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Valencia, Spain
| | - Natasha Azzopardi-Muscat
- Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark
| | - Ledia Lazeri
- Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark
| | - Vicente Traver
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Valencia, Spain
| | - David Novillo-Ortiz
- Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark
| |
Collapse
|
9
|
Song Y, Yang X, Luo Y, Ouyang C, Yu Y, Ma Y, Li H, Lou J, Liu Y, Chen Y, Cao J, Mi W. Comparison of logistic regression and machine learning methods for predicting postoperative delirium in elderly patients: A retrospective study. CNS Neurosci Ther 2022; 29:158-167. [PMID: 36217732 PMCID: PMC9804041 DOI: 10.1111/cns.13991] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 02/06/2023] Open
Abstract
AIMS To compare the performance of logistic regression and machine learning methods in predicting postoperative delirium (POD) in elderly patients. METHOD This was a retrospective study of perioperative medical data from patients undergoing non-cardiac and non-neurology surgery over 65 years old from January 2014 to August 2019. Forty-six perioperative variables were used to predict POD. A traditional logistic regression and five machine learning models (Random Forest, GBM, AdaBoost, XGBoost, and a stacking ensemble model) were compared by the area under the receiver operating characteristic curve (AUC-ROC), sensitivity, specificity, and precision. RESULTS In total, 29,756 patients were enrolled, and the incidence of POD was 3.22% after variable screening. AUCs were 0.783 (0.765-0.8) for the logistic regression method, 0.78 for random forest, 0.76 for GBM, 0.74 for AdaBoost, 0.73 for XGBoost, and 0.77 for the stacking ensemble model. The respective sensitivities for the 6 aforementioned models were 74.2%, 72.2%, 76.8%, 63.6%, 71.6%, and 67.4%. The respective specificities for the 6 aforementioned models were 70.7%, 99.8%, 96.5%, 98.8%, 96.5%, and 96.1%. The respective precision values for the 6 aforementioned models were 7.8%, 52.3%, 55.6%, 57%, 54.5%, and 56.4%. CONCLUSIONS The optimal application of the logistic regression model could provide quick and convenient POD risk identification to help improve the perioperative management of surgical patients because of its better sensitivity, fewer variables, and easier interpretability than the machine learning model.
Collapse
Affiliation(s)
- Yu‐xiang Song
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina,Medical School of Chinese People's Liberation ArmyBeijingChina
| | - Xiao‐dong Yang
- Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
| | - Yun‐gen Luo
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina,Medical School of Chinese People's Liberation ArmyBeijingChina
| | - Chun‐lei Ouyang
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Yao Yu
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Yu‐long Ma
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Hao Li
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Jing‐sheng Lou
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Yan‐hong Liu
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Yi‐qiang Chen
- Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
| | - Jiang‐bei Cao
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| | - Wei‐dong Mi
- Department of AnesthesiologyThe First Medical Center of Chinese PLA General HospitalBeijingChina
| |
Collapse
|
10
|
Xie Q, Wang XL, Pei JH, Wu YP, Guo Q, Su YJ, Yan H, Nan RL, Chen HX, Dou XM. Machine Learning-Based Prediction Models for Delirium: A Systematic Review and Meta-Analysis. J Am Med Dir Assoc 2022; 23:1655-1668.e6. [DOI: 10.1016/j.jamda.2022.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/22/2022] [Accepted: 06/18/2022] [Indexed: 10/16/2022]
|