1
|
Qian L, Lu X, Haris P, Zhu J, Li S, Yang Y. Enhancing clinical trial outcome prediction with artificial intelligence: a systematic review. Drug Discov Today 2025; 30:104332. [PMID: 40097090 DOI: 10.1016/j.drudis.2025.104332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 02/04/2025] [Accepted: 03/12/2025] [Indexed: 03/19/2025]
Abstract
Clinical trials are pivotal in drug development yet fraught with uncertainties and resource-intensive demands. The application of AI models to forecast trial outcomes could mitigate failures and expedite the drug discovery process. This review discusses AI methodologies that impact clinical trial outcomes, focusing on clinical text embedding, trial multimodal learning, and prediction techniques, while addressing practical challenges and opportunities.
Collapse
Affiliation(s)
- Long Qian
- Faculty of Computing Engineering Media, De Montfort University, Leicester, UK
| | - Xin Lu
- Faculty of Computing Engineering Media, De Montfort University, Leicester, UK
| | - Parvez Haris
- Faculty of Health & Life Sciences, De Montfort University, Leicester, UK
| | | | - Shuo Li
- Faculty of Computing Engineering Media, De Montfort University, Leicester, UK
| | - Yingjie Yang
- Faculty of Computing Engineering Media, De Montfort University, Leicester, UK.
| |
Collapse
|
2
|
El-Hay T, Reps JM, Yanover C. Extensive benchmarking of a method that estimates external model performance from limited statistical characteristics. NPJ Digit Med 2025; 8:59. [PMID: 39870920 PMCID: PMC11772677 DOI: 10.1038/s41746-024-01414-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 12/21/2024] [Indexed: 01/29/2025] Open
Abstract
Predictive model performance may deteriorate when applied to data sources that were not used for training, thus, external validation is a key step in successful model deployment. As access to patient-level external data sources is typically limited, we recently proposed a method that estimates external model performance using only external summary statistics. Here, we benchmark the proposed method on multiple tasks using five large heterogeneous US data sources, where each, in turn, plays the role of an internal source and the remaining-external. Results showed accurate estimations for all metrics: 95th error percentiles for the area under the receiver operating characteristics (discrimination), calibration-in-the-large (calibration), Brier and scaled Brier scores (overall accuracy) of 0.03, 0.08, 0.0002, and 0.07, respectively. These results demonstrate the feasibility of estimating the transportability of prediction models using an internal cohort and external statistics. It may become an important accelerator of model deployment.
Collapse
Affiliation(s)
- Tal El-Hay
- KI Research Institute, Kfar Malal, Israel.
| | - Jenna M Reps
- Janssen Research and Development, Raritan, NJ, USA
| | | |
Collapse
|
3
|
Yordanov TR, Ravelli ACJ, Amiri S, Vis M, Houterman S, Van der Voort SR, Abu-Hanna A. Performance of federated learning-based models in the Dutch TAVI population was comparable to central strategies and outperformed local strategies. Front Cardiovasc Med 2024; 11:1399138. [PMID: 39036502 PMCID: PMC11257923 DOI: 10.3389/fcvm.2024.1399138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 06/14/2024] [Indexed: 07/23/2024] Open
Abstract
Background Federated learning (FL) is a technique for learning prediction models without sharing records between hospitals. Compared to centralized training approaches, the adoption of FL could negatively impact model performance. Aim This study aimed to evaluate four types of multicenter model development strategies for predicting 30-day mortality for patients undergoing transcatheter aortic valve implantation (TAVI): (1) central, learning one model from a centralized dataset of all hospitals; (2) local, learning one model per hospital; (3) federated averaging (FedAvg), averaging of local model coefficients; and (4) ensemble, aggregating local model predictions. Methods Data from all 16 Dutch TAVI hospitals from 2013 to 2021 in the Netherlands Heart Registration (NHR) were used. All approaches were internally validated. For the central and federated approaches, external geographic validation was also performed. Predictive performance in terms of discrimination [the area under the ROC curve (AUC-ROC, hereafter referred to as AUC)] and calibration (intercept and slope, and calibration graph) was measured. Results The dataset comprised 16,661 TAVI records with a 30-day mortality rate of 3.4%. In internal validation the AUCs of central, local, FedAvg, and ensemble models were 0.68, 0.65, 0.67, and 0.67, respectively. The central and local models were miscalibrated by slope, while the FedAvg and ensemble models were miscalibrated by intercept. During external geographic validation, central, FedAvg, and ensemble all achieved a mean AUC of 0.68. Miscalibration was observed for the central, FedAvg, and ensemble models in 44%, 44%, and 38% of the hospitals, respectively. Conclusion Compared to centralized training approaches, FL techniques such as FedAvg and ensemble demonstrated comparable AUC and calibration. The use of FL techniques should be considered a viable option for clinical prediction model development.
Collapse
Affiliation(s)
- Tsvetan R. Yordanov
- Department of Medical Informatics, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
| | - Anita C. J. Ravelli
- Department of Medical Informatics, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
| | - Saba Amiri
- Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
| | - Marije Vis
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Department of Cardiology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Amsterdam Cardiovascular Sciences Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
| | | | - Sebastian R. Van der Voort
- Department of Medical Informatics, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
4
|
Zhang F, Kreuter D, Chen Y, Dittmer S, Tull S, Shadbahr T, Preller J, Rudd JH, Aston JA, Schönlieb CB, Gleadall N, Roberts M. Recent methodological advances in federated learning for healthcare. PATTERNS (NEW YORK, N.Y.) 2024; 5:101006. [PMID: 39005485 PMCID: PMC11240178 DOI: 10.1016/j.patter.2024.101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
For healthcare datasets, it is often impossible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data have many simultaneous challenges, such as highly siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables, that require new methodologies to address. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus published between January 2015 and February 2023 that describe new federated learning methodologies for addressing challenges with healthcare data. We reviewed 89 papers meeting these criteria. Significant systemic issues were identified throughout the literature, compromising many methodologies reviewed. We give detailed recommendations to help improve methodology development for federated learning in healthcare.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Daniel Kreuter
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Yichen Chen
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Sören Dittmer
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- ZeTeM, University of Bremen, Bremen, Germany
| | - Samuel Tull
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Tolou Shadbahr
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jacobus Preller
- Addenbrooke’s Hospital, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - James H.F. Rudd
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - John A.D. Aston
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Carola-Bibiane Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | | | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
5
|
Ahmadi N, Nguyen QV, Sedlmayr M, Wolfien M. A comparative patient-level prediction study in OMOP CDM: applicative potential and insights from synthetic data. Sci Rep 2024; 14:2287. [PMID: 38280887 PMCID: PMC10821926 DOI: 10.1038/s41598-024-52723-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 01/23/2024] [Indexed: 01/29/2024] Open
Abstract
The emergence of collaborations, which standardize and combine multiple clinical databases across different regions, provide a wealthy source of data, which is fundamental for clinical prediction models, such as patient-level predictions. With the aid of such large data pools, researchers are able to develop clinical prediction models for improved disease classification, risk assessment, and beyond. To fully utilize this potential, Machine Learning (ML) methods are commonly required to process these large amounts of data on disease-specific patient cohorts. As a consequence, the Observational Health Data Sciences and Informatics (OHDSI) collaborative develops a framework to facilitate the application of ML models for these standardized patient datasets by using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). In this study, we compare the feasibility of current web-based OHDSI approaches, namely ATLAS and "Patient-level Prediction" (PLP), against a native solution (R based) to conduct such ML-based patient-level prediction analyses in OMOP. This will enable potential users to select the most suitable approach for their investigation. Each of the applied ML solutions was individually utilized to solve the same patient-level prediction task. Both approaches went through an exemplary benchmarking analysis to assess the weaknesses and strengths of the PLP R-Package. In this work, the performance of this package was subsequently compared versus the commonly used native R-package called Machine Learning in R 3 (mlr3), and its sub-packages. The approaches were evaluated on performance, execution time, and ease of model implementation. The results show that the PLP package has shorter execution times, which indicates great scalability, as well as intuitive code implementation, and numerous possibilities for visualization. However, limitations in comparison to native packages were depicted in the implementation of specific ML classifiers (e.g., Lasso), which may result in a decreased performance for real-world prediction problems. The findings here contribute to the overall effort of developing ML-based prediction models on a clinical scale and provide a snapshot for future studies that explicitly aim to develop patient-level prediction models in OMOP CDM.
Collapse
Affiliation(s)
- Najia Ahmadi
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307, Dresden, Germany.
| | - Quang Vu Nguyen
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307, Dresden, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307, Dresden, Germany
| | - Markus Wolfien
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden, Germany
| |
Collapse
|
6
|
Eickelberg G, Sanchez-Pinto LN, Kline AS, Luo Y. Transportability of bacterial infection prediction models for critically ill patients. J Am Med Inform Assoc 2023; 31:98-108. [PMID: 37647884 PMCID: PMC10746321 DOI: 10.1093/jamia/ocad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/20/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
OBJECTIVE Bacterial infections (BIs) are common, costly, and potentially life-threatening in critically ill patients. Patients with suspected BIs may require empiric multidrug antibiotic regimens and therefore potentially be exposed to prolonged and unnecessary antibiotics. We previously developed a BI risk model to augment practices and help shorten the duration of unnecessary antibiotics to improve patient outcomes. Here, we have performed a transportability assessment of this BI risk model in 2 tertiary intensive care unit (ICU) settings and a community ICU setting. We additionally explored how simple multisite learning techniques impacted model transportability. METHODS Patients suspected of having a community-acquired BI were identified in 3 datasets: Medical Information Mart for Intensive Care III (MIMIC), Northwestern Medicine Tertiary (NM-T) ICUs, and NM "community-based" ICUs. ICU encounters from MIMIC and NM-T datasets were split into 70/30 train and test sets. Models developed on training data were evaluated against the NM-T and MIMIC test sets, as well as NM community validation data. RESULTS During internal validations, models achieved AUROCs of 0.78 (MIMIC) and 0.81 (NM-T) and were well calibrated. In the external community ICU validation, the NM-T model had robust transportability (AUROC 0.81) while the MIMIC model transported less favorably (AUROC 0.74), likely due to case-mix differences. Multisite learning provided no significant discrimination benefit in internal validation studies but offered more stability during transport across all evaluation datasets. DISCUSSION These results suggest that our BI risk models maintain predictive utility when transported to external cohorts. CONCLUSION Our findings highlight the importance of performing external model validation on myriad clinically relevant populations prior to implementation.
Collapse
Affiliation(s)
- Garrett Eickelberg
- Department of Preventive Medicine (Health & Biomedical Informatics), Feinberg School of Medicine, Chicago, IL 60611, United States
| | - Lazaro Nelson Sanchez-Pinto
- Department of Preventive Medicine (Health & Biomedical Informatics), Feinberg School of Medicine, Chicago, IL 60611, United States
- Departments of Pediatrics (Critical Care), Chicago, IL 60611, United States
| | - Adrienne Sarah Kline
- Department of Preventive Medicine (Health & Biomedical Informatics), Feinberg School of Medicine, Chicago, IL 60611, United States
| | - Yuan Luo
- Department of Preventive Medicine (Health & Biomedical Informatics), Feinberg School of Medicine, Chicago, IL 60611, United States
| |
Collapse
|
7
|
Hurley NC, Dhruva SS, Desai NR, Ross JR, Ngufor CG, Masoudi F, Krumholz HM, Mortazavi BJ. Clinical Phenotyping with an Outcomes-driven Mixture of Experts for Patient Matching and Risk Estimation. ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE 2023; 4:1-18. [PMID: 37908872 PMCID: PMC10613929 DOI: 10.1145/3616021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/02/2023] [Indexed: 11/02/2023]
Abstract
Observational medical data present unique opportunities for analysis of medical outcomes and treatment decision making. However, because these datasets do not contain the strict pairing of randomized control trials, matching techniques are to draw comparisons among patients. A key limitation to such techniques is verification that the variables used to model treatment decision making are also relevant in identifying the risk of major adverse events. This article explores a deep mixture of experts approach to jointly learn how to match patients and model the risk of major adverse events in patients. Although trained with information regarding treatment and outcomes, after training, the proposed model is decomposable into a network that clusters patients into phenotypes from information available before treatment. This model is validated on a dataset of patients with acute myocardial infarction complicated by cardiogenic shock. The mixture of experts approach can predict the outcome of mortality with an area under the receiver operating characteristic curve of 0.85 ± 0.01 while jointly discovering five potential phenotypes of interest. The technique and interpretation allow for identifying clinically relevant phenotypes that may be used both for outcomes modeling as well as potentially evaluating individualized treatment effects.
Collapse
|
8
|
Banda JM, Shah NH, Periyakoil VS. Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA Open 2023; 6:ooad043. [PMID: 37397506 PMCID: PMC10307941 DOI: 10.1093/jamiaopen/ooad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 07/04/2023] Open
Abstract
Objective Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.
Collapse
Affiliation(s)
- Juan M Banda
- Corresponding Author: Juan M. Banda, PhD, Department of Computer Science, College of Arts and Sciences, Georgia State University, 25 Park Place, Suite 752, Atlanta, GA 30303, USA;
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, USA
| | - Vyjeyanthi S Periyakoil
- Stanford Department of Medicine, Palo Alto, California, USA
- VA Palo Alto Health Care System, Palo Alto, California, USA
| |
Collapse
|