Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Reps JM, Williams RD, Schuemie MJ, Ryan PB, Rijnbeek PR. Learning patient-level prediction models across multiple healthcare databases: evaluation of ensembles for increasing model transportability. BMC Med Inform Decis Mak 2022;22:142. [PMID: 35614485 PMCID: PMC9134686 DOI: 10.1186/s12911-022-01879-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 05/13/2022] [Indexed: 11/29/2022] Open

For:	Reps JM, Williams RD, Schuemie MJ, Ryan PB, Rijnbeek PR. Learning patient-level prediction models across multiple healthcare databases: evaluation of ensembles for increasing model transportability. BMC Med Inform Decis Mak 2022;22:142. [PMID: 35614485 PMCID: PMC9134686 DOI: 10.1186/s12911-022-01879-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 05/13/2022] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Qian L, Lu X, Haris P, Zhu J, Li S, Yang Y. Enhancing clinical trial outcome prediction with artificial intelligence: a systematic review. Drug Discov Today 2025;30:104332. [PMID: 40097090 DOI: 10.1016/j.drudis.2025.104332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 02/04/2025] [Accepted: 03/12/2025] [Indexed: 03/19/2025]

El-Hay T, Reps JM, Yanover C. Extensive benchmarking of a method that estimates external model performance from limited statistical characteristics. NPJ Digit Med 2025;8:59. [PMID: 39870920 PMCID: PMC11772677 DOI: 10.1038/s41746-024-01414-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 12/21/2024] [Indexed: 01/29/2025] Open

Yordanov TR, Ravelli ACJ, Amiri S, Vis M, Houterman S, Van der Voort SR, Abu-Hanna A. Performance of federated learning-based models in the Dutch TAVI population was comparable to central strategies and outperformed local strategies. Front Cardiovasc Med 2024;11:1399138. [PMID: 39036502 PMCID: PMC11257923 DOI: 10.3389/fcvm.2024.1399138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 06/14/2024] [Indexed: 07/23/2024] Open

Abstract

Background

Federated learning (FL) is a technique for learning prediction models without sharing records between hospitals. Compared to centralized training approaches, the adoption of FL could negatively impact model performance.

Aim

This study aimed to evaluate four types of multicenter model development strategies for predicting 30-day mortality for patients undergoing transcatheter aortic valve implantation (TAVI): (1) central, learning one model from a centralized dataset of all hospitals; (2) local, learning one model per hospital; (3) federated averaging (FedAvg), averaging of local model coefficients; and (4) ensemble, aggregating local model predictions.

Methods

Data from all 16 Dutch TAVI hospitals from 2013 to 2021 in the Netherlands Heart Registration (NHR) were used. All approaches were internally validated. For the central and federated approaches, external geographic validation was also performed. Predictive performance in terms of discrimination [the area under the ROC curve (AUC-ROC, hereafter referred to as AUC)] and calibration (intercept and slope, and calibration graph) was measured.

Results

The dataset comprised 16,661 TAVI records with a 30-day mortality rate of 3.4%. In internal validation the AUCs of central, local, FedAvg, and ensemble models were 0.68, 0.65, 0.67, and 0.67, respectively. The central and local models were miscalibrated by slope, while the FedAvg and ensemble models were miscalibrated by intercept. During external geographic validation, central, FedAvg, and ensemble all achieved a mean AUC of 0.68. Miscalibration was observed for the central, FedAvg, and ensemble models in 44%, 44%, and 38% of the hospitals, respectively.

Conclusion

Compared to centralized training approaches, FL techniques such as FedAvg and ensemble demonstrated comparable AUC and calibration. The use of FL techniques should be considered a viable option for clinical prediction model development.

Collapse

Zhang F, Kreuter D, Chen Y, Dittmer S, Tull S, Shadbahr T, Preller J, Rudd JH, Aston JA, Schönlieb CB, Gleadall N, Roberts M. Recent methodological advances in federated learning for healthcare. PATTERNS (NEW YORK, N.Y.) 2024;5:101006. [PMID: 39005485 PMCID: PMC11240178 DOI: 10.1016/j.patter.2024.101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]

Ahmadi N, Nguyen QV, Sedlmayr M, Wolfien M. A comparative patient-level prediction study in OMOP CDM: applicative potential and insights from synthetic data. Sci Rep 2024;14:2287. [PMID: 38280887 PMCID: PMC10821926 DOI: 10.1038/s41598-024-52723-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 01/23/2024] [Indexed: 01/29/2024] Open

Abstract

The emergence of collaborations, which standardize and combine multiple clinical databases across different regions, provide a wealthy source of data, which is fundamental for clinical prediction models, such as patient-level predictions. With the aid of such large data pools, researchers are able to develop clinical prediction models for improved disease classification, risk assessment, and beyond. To fully utilize this potential, Machine Learning (ML) methods are commonly required to process these large amounts of data on disease-specific patient cohorts. As a consequence, the Observational Health Data Sciences and Informatics (OHDSI) collaborative develops a framework to facilitate the application of ML models for these standardized patient datasets by using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). In this study, we compare the feasibility of current web-based OHDSI approaches, namely ATLAS and "Patient-level Prediction" (PLP), against a native solution (R based) to conduct such ML-based patient-level prediction analyses in OMOP. This will enable potential users to select the most suitable approach for their investigation. Each of the applied ML solutions was individually utilized to solve the same patient-level prediction task. Both approaches went through an exemplary benchmarking analysis to assess the weaknesses and strengths of the PLP R-Package. In this work, the performance of this package was subsequently compared versus the commonly used native R-package called Machine Learning in R 3 (mlr3), and its sub-packages. The approaches were evaluated on performance, execution time, and ease of model implementation. The results show that the PLP package has shorter execution times, which indicates great scalability, as well as intuitive code implementation, and numerous possibilities for visualization. However, limitations in comparison to native packages were depicted in the implementation of specific ML classifiers (e.g., Lasso), which may result in a decreased performance for real-world prediction problems. The findings here contribute to the overall effort of developing ML-based prediction models on a clinical scale and provide a snapshot for future studies that explicitly aim to develop patient-level prediction models in OMOP CDM.

Collapse

Eickelberg G, Sanchez-Pinto LN, Kline AS, Luo Y. Transportability of bacterial infection prediction models for critically ill patients. J Am Med Inform Assoc 2023;31:98-108. [PMID: 37647884 PMCID: PMC10746321 DOI: 10.1093/jamia/ocad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/20/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open

Abstract

OBJECTIVE

Bacterial infections (BIs) are common, costly, and potentially life-threatening in critically ill patients. Patients with suspected BIs may require empiric multidrug antibiotic regimens and therefore potentially be exposed to prolonged and unnecessary antibiotics. We previously developed a BI risk model to augment practices and help shorten the duration of unnecessary antibiotics to improve patient outcomes. Here, we have performed a transportability assessment of this BI risk model in 2 tertiary intensive care unit (ICU) settings and a community ICU setting. We additionally explored how simple multisite learning techniques impacted model transportability.

METHODS

Patients suspected of having a community-acquired BI were identified in 3 datasets: Medical Information Mart for Intensive Care III (MIMIC), Northwestern Medicine Tertiary (NM-T) ICUs, and NM "community-based" ICUs. ICU encounters from MIMIC and NM-T datasets were split into 70/30 train and test sets. Models developed on training data were evaluated against the NM-T and MIMIC test sets, as well as NM community validation data.

RESULTS

During internal validations, models achieved AUROCs of 0.78 (MIMIC) and 0.81 (NM-T) and were well calibrated. In the external community ICU validation, the NM-T model had robust transportability (AUROC 0.81) while the MIMIC model transported less favorably (AUROC 0.74), likely due to case-mix differences. Multisite learning provided no significant discrimination benefit in internal validation studies but offered more stability during transport across all evaluation datasets.

DISCUSSION

These results suggest that our BI risk models maintain predictive utility when transported to external cohorts.

CONCLUSION

Our findings highlight the importance of performing external model validation on myriad clinically relevant populations prior to implementation.

Collapse

Hurley NC, Dhruva SS, Desai NR, Ross JR, Ngufor CG, Masoudi F, Krumholz HM, Mortazavi BJ. Clinical Phenotyping with an Outcomes-driven Mixture of Experts for Patient Matching and Risk Estimation. ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE 2023;4:1-18. [PMID: 37908872 PMCID: PMC10613929 DOI: 10.1145/3616021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/02/2023] [Indexed: 11/02/2023]

Banda JM, Shah NH, Periyakoil VS. Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA Open 2023;6:ooad043. [PMID: 37397506 PMCID: PMC10307941 DOI: 10.1093/jamiaopen/ooad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 07/04/2023] Open

Abstract

Objective

Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults.

Materials and methods

We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework.

Results

We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others.

Discussion

Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences.

Conclusion

We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

Collapse