1
|
Moons KGM, Damen JAA, Kaul T, Hooft L, Andaur Navarro C, Dhiman P, Beam AL, Van Calster B, Celi LA, Denaxas S, Denniston AK, Ghassemi M, Heinze G, Kengne AP, Maier-Hein L, Liu X, Logullo P, McCradden MD, Liu N, Oakden-Rayner L, Singh K, Ting DS, Wynants L, Yang B, Reitsma JB, Riley RD, Collins GS, van Smeden M. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025; 388:e082505. [PMID: 40127903 PMCID: PMC11931409 DOI: 10.1136/bmj-2024-082505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/16/2025] [Indexed: 03/26/2025]
Affiliation(s)
- Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Johanna A A Damen
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Tabea Kaul
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Lotty Hooft
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Constanza Andaur Navarro
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Leo Anthony Celi
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, Health Data Research Centre UK, London, United Kingdom
| | | | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Georg Heinze
- Institute of Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | | | - Lena Maier-Hein
- Division of Intelligent Medical Systems, German Cancer Research Centre (DKFZ), Heidelberg, Germany
- National Centre for Tumour Diseases (NCT) Heidelberg, Heidelberg, Germany
| | - Xiaoxuan Liu
- College of Medicine and Health, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, Birmingham, UK
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Melissa D McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Karandeep Singh
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Daniel S Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- AI Office, Singapore Health Service, Duke-NUS Medical School, Singapore, Singapore
| | - Laure Wynants
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Bada Yang
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Richard D Riley
- School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, Birmingham, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| |
Collapse
|
2
|
Kaul T, Damen JAA, Wynants L, Van Calster B, van Smeden M, Hooft L, Moons KGM. Assessing the quality of prediction models in health care using the Prediction model Risk Of Bias ASsessment Tool (PROBAST): an evaluation of its use and practical application. J Clin Epidemiol 2025; 181:111732. [PMID: 40010583 DOI: 10.1016/j.jclinepi.2025.111732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/14/2025] [Accepted: 02/18/2025] [Indexed: 02/28/2025]
Abstract
BACKGROUND AND OBJECTIVES Since 2019, the Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) has supported methodological quality assessments of prediction model studies. Most prediction model studies are rated with a "High" risk of bias (ROB) and researchers report low interrater reliability (IRR) using PROBAST. We aimed to (1) assess the IRR of PROBAST ratings between assessors of the same study and understand reasons for discrepancies, (2) determine which items contribute most to domain-level ROB ratings, and (3) explore the impact of consensus meetings. STUDY DESIGN AND SETTING We used PROBAST assessments from a systematic review of diagnostic and prognostic COVID-19 prediction models as a case study. Assessors included international experts in prediction model studies or their reviews. We assessed IRR using prevalence-adjusted bias-adjusted kappa (PABAK) before consensus meetings, examined bias ratings per domain-level ROB judgments, and evaluated the impact of consensus meetings by identifying rating changes after discussion. RESULTS We analyzed 2167 PROBAST assessments from 27 assessor pairs covering 760 prediction models: 384 developments, 242 validations, and 134 mixed assessments (including both). The IRR using PABAK was higher for overall ROB judgments (development: 0.82 [0.76; 0.89]; validation: 0.78 [0.68; 0.88]) compared to domain- and item-level judgments. Some PROBAST items frequently contributed to domain-level ROB judgments, eg, 3.5 Outcome blinding and 4.1 Sample size. Consensus discussions mainly led to item-level and never to overall ROB rating changes. CONCLUSION Within this case study, PROBAST assessments received high IRR at the overall ROB level, with some variation at item- and domain-level. To reduce variability, PROBAST assessors should standardize item- and domain-level judgments and hold well-structured consensus meetings between assessors of the same study. PLAIN LANGUAGE SUMMARY The Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) provides a set of items to assess the quality of medical studies on so-called prediction tools that calculate an individual's probability of having or developing a certain disease or health outcome. Previous research found low interrater reliability (IRR; ie, how consistently two assessors rate aspects of the same study) when using PROBAST. To understand why this is the case, we conducted a large study involving more than 30 experts from around the world, all of whom applied PROBAST to the same set of prediction tool studies. Based on more than 2150 PROBAST assessments, we identified which PROBAST items led to the most disagreements between raters, explored reasons for these disagreements, and examined whether the use of so-called consensus meetings (ie, different assessors of the same study discuss their ratings and decide on a finalized rating) impacted PROBAST ratings. Our study found that the IRR between different assessors of the same study was higher than previously reported. One explanation for the better agreement compared to previous research may be the preplanning on how to assess certain PROBAST aspects before starting the assessments, as well as holding well-structured consensus meetings. These improvements lead to a more effective use of PROBAST in evaluating the trustworthiness and quality of prediction tools in the health-care domain.
Collapse
Affiliation(s)
- Tabea Kaul
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands; Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
3
|
Kaul T, Kellerhuis BE, Damen JAA, Schuit E, Jenniskens K, van Smeden M, Reitsma JB, Hooft L, Moons KGM, Yang B. Methodological quality assessment tools for diagnosis and prognosis research: overview and guidance. J Clin Epidemiol 2025; 177:111609. [PMID: 39536993 DOI: 10.1016/j.jclinepi.2024.111609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/30/2024] [Accepted: 11/07/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND AND OBJECTIVES Multiple tools exist for assessing the methodological quality of diagnosis and prognosis research. It can be challenging to decide on when to use which tool. We aimed to provide an overview of existing methodological quality assessment (QA) tools for diagnosis and prognosis studies, highlight the overlap and differences among these tools, and to provide guidance for choosing the appropriate tool. STUDY DESIGN AND SETTING We performed a methodological review of tools designed for assessing risk of bias, applicability, or other aspects related to methodological quality in studies investigating tests/factors/markers/models for classifying or predicting a current (diagnosis) and/or future (prognosis) health state. Tools focusing exclusively on causal research or on reporting quality were excluded. Guidance was subsequently developed to assist in choosing an appropriate QA tool. RESULTS We identified 14 QA tools, eight of which were developed for assessment of diagnosis studies, four for prognosis studies, and two addressing both. We propose a set of five questions to help guide the process of choosing a QA tool based on the purpose or question of the user: whether the focus is on (1) diagnosis, prognosis, or another domain; (2) a prediction model vs a test/factor/marker; (3) evaluating simply the performance of a test/factor/marker vs assessing its added value over other variables; (4) comparing two or more tests/factors/markers/models; and (5) whether the user aims to assess only risk of bias or also other quality aspects. CONCLUSION Existing QA tools for appraising diagnosis and prognosis studies vary in purpose, scope, and contents. Our guidance may help researchers, systematic reviewers, health policy makers, and guideline developers in specifying their purpose and question to select the most appropriate QA tool for their assessment. PLAIN LANGUAGE SUMMARY Methodological quality assessment (QA) tools provide a set of criteria to evaluate how well a medical study was done and how trustworthy its results are. To accurately assess a study's quality, it is important to use a QA tool that matches the type of medical study. However, with many QA tools available for different study types, choosing the right one can be challenging, especially for diagnosis and prognosis studies (ie, studies that evaluate tests, factors, markers, and models used for diagnosis and prognosis). To assist in selecting the best QA tools for diagnostic and prognostic studies, we created an overview of available tools and practical tips for choosing the most appropriate one. After searching online databases and consulting experts in the field, we identified 14 QA tools specific to diagnostic and prognostic studies. Additionally, we developed five key questions to guide users in choosing the best tool for their study. While the 14 QA tools differ in their focus and content, our guidance simplifies the process of choosing the right tool and helps users refine their research question.
Collapse
Affiliation(s)
- Tabea Kaul
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Bas E Kellerhuis
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Kevin Jenniskens
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Bada Yang
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
4
|
van Boekel AM, van der Meijden SL, Arbous SM, Nelissen RGHH, Veldkamp KE, Nieswaag EB, Jochems KFT, Holtz J, Veenstra AVIJ, Reijman J, de Jong Y, van Goor H, Wiewel MA, Schoones JW, Geerts BF, de Boer MGJ. Systematic evaluation of machine learning models for postoperative surgical site infection prediction. PLoS One 2024; 19:e0312968. [PMID: 39666725 PMCID: PMC11637340 DOI: 10.1371/journal.pone.0312968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/15/2024] [Indexed: 12/14/2024] Open
Abstract
BACKGROUND Surgical site infections (SSIs) lead to increased mortality and morbidity, as well as increased healthcare costs. Multiple models for the prediction of this serious surgical complication have been developed, with an increasing use of machine learning (ML) tools. OBJECTIVE The aim of this systematic review was to assess the performance as well as the methodological quality of validated ML models for the prediction of SSIs. METHODS A systematic search in PubMed, Embase and the Cochrane library was performed from inception until July 2023. Exclusion criteria were the absence of reported model validation, SSIs as part of a composite adverse outcome, and pediatric populations. ML performance measures were evaluated, and ML performances were compared to regression-based methods for studies that reported both methods. Risk of bias (ROB) of the studies was assessed using the Prediction model Risk of Bias Assessment Tool. RESULTS Of the 4,377 studies screened, 24 were included in this review, describing 85 ML models. Most models were only internally validated (81%). The C-statistic was the most used performance measure (reported in 96% of the studies) and only two studies reported calibration metrics. A total of 116 different predictors were described, of which age, steroid use, sex, diabetes, and smoking were most frequently (100% to 75%) incorporated. Thirteen studies compared ML models to regression-based models and showed a similar performance of both modelling methods. For all included studies, the overall ROB was high or unclear. CONCLUSIONS A multitude of ML models for the prediction of SSIs are available, with large variability in performance. However, most models lacked external validation, performance was reported limitedly, and the risk of bias was high. In studies describing both ML models and regression-based models, one modelling method did not outperform the other.
Collapse
Affiliation(s)
- Anna M. van Boekel
- Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands
| | - Siri L. van der Meijden
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Sesmu M. Arbous
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Rob G. H. H. Nelissen
- Department of Orthopedic surgery, Leiden University Medical Center, Leiden, The Netherlands
| | - Karin E. Veldkamp
- Department of Medical Microbiology and Infection Control, Leiden University Medical Center, Leiden, The Netherlands
| | - Emma B. Nieswaag
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Kim F. T. Jochems
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Jeroen Holtz
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Annekee van IJlzinga Veenstra
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Jeroen Reijman
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Ype de Jong
- Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Harry van Goor
- Department of Surgery, Radboud UMC, Nijmegen, The Netherlands
| | | | - Jan W. Schoones
- Waleus Medical Library, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Mark G. J. de Boer
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Infectious disease, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
5
|
Tomlinson E, Cooper C, Davenport C, Rutjes AWS, Leeflang M, Mallett S, Whiting P. Common challenges and suggestions for risk of bias tool development: a systematic review of methodological studies. J Clin Epidemiol 2024; 171:111370. [PMID: 38670243 DOI: 10.1016/j.jclinepi.2024.111370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024]
Abstract
OBJECTIVES To review the findings of studies that have evaluated the design and/or usability of key risk of bias (RoB) tools for the assessment of RoB in primary studies, as categorized by the Library of Assessment Tools and InsTruments Used to assess Data validity in Evidence Synthesis Network (a searchable library of RoB tools for evidence synthesis): Prediction model Risk Of Bias ASessment Tool (PROBAST) , Risk of Bias-2 (RoB2), Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I), Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), Quality Assessment of Diagnostic Accuracy Studies-Comparative (QUADAS-C), Quality Assessment of Prognostic Accuracy Studies (QUAPAS), Risk Of Bias in Non-randomised Studies of Exposures (ROBINS-E), and the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) RoB checklist. STUDY DESIGN AND SETTING Systematic review of methodological studies. We conducted a forward citation search from the primary report of each tool, to identify primary studies that aimed to evaluate the design and/or usability of the tool. Two reviewers assessed studies for inclusion. We extracted tool features into Microsoft Word and used NVivo for document analysis, comprising a mix of deductive and inductive approaches. We summarized findings within each tool and explored common findings across tools. RESULTS We identified 13 tool evaluations meeting our inclusion criteria: PROBAST (3), RoB2 (3), ROBINS-I (4), and QUADAS-2 (3). We identified no evaluations for the other tools. Evaluations varied in clinical topic area, methodology, approach to bias assessment, and tool user background. Some had limitations affecting generalizability. We identified common findings across tools for 6/14 themes: (1) challenging items (eg, RoB2/ROBINS-I "deviations from intended interventions" domain), (2) overall RoB judgment (concerns with overall risk calculation in PROBAST/ROBINS-I), (3) tool usability (concerns about complexity), (4) time to complete tool (varying demands on time, eg, depending on number of outcomes assessed), (5) user agreement (varied across tools), and (6) recommendations for future use (eg, piloting) and development (add intermediate domain answer to QUADAS-2/PROBAST; provide clearer guidance for all tools). Of the other eight themes, seven only had findings for the QUADAS-2 tool, limiting comparison across tools, and one ("reorganization of questions") had no findings. CONCLUSION Evaluations of key RoB tools have posited common challenges and recommendations for tool use and development. These findings may be helpful to people who use or develop RoB tools. Guidance is necessary to support the design and implementation of future RoB tool evaluations.
Collapse
Affiliation(s)
- Eve Tomlinson
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
| | - Chris Cooper
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Clare Davenport
- Test and Prediction Group, Institute of Applied Health Research, University of Birmingham, Birmingham, B15 2TT, UK; NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham B15 2TT, UK
| | - Anne W S Rutjes
- Department of Medical and Surgical Sciences for Children and Adults (SMECHIMAI), University of Modena and Reggio Emilia, Modena, Italy
| | - Mariska Leeflang
- Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Sue Mallett
- Centre for Medical Imaging, University College London, London, UK
| | - Penny Whiting
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| |
Collapse
|
6
|
Hassan A, Critelli B, Lahooti I, Lahooti A, Matzko N, Adams JN, Liss L, Quion J, Restrepo D, Nikahd M, Culp S, Noh L, Tong K, Park JS, Akshintala V, Windsor JA, Mull NK, Papachristou GI, Celi LA, Lee PJ. Critical appraisal of machine learning prognostic models for acute pancreatitis: protocol for a systematic review. Diagn Progn Res 2024; 8:6. [PMID: 38561864 PMCID: PMC10986113 DOI: 10.1186/s41512-024-00169-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/15/2024] [Indexed: 04/04/2024] Open
Abstract
Acute pancreatitis (AP) is an acute inflammatory disorder that is common, costly, and is increasing in incidence worldwide with over 300,000 hospitalizations occurring yearly in the United States alone. As its course and outcomes vary widely, a critical knowledge gap in the field has been a lack of accurate prognostic tools to forecast AP patients' outcomes. Despite several published studies in the last three decades, the predictive performance of published prognostic models has been found to be suboptimal. Recently, non-regression machine learning models (ML) have garnered intense interest in medicine for their potential for better predictive performance. Each year, an increasing number of AP models are being published. However, their methodologic quality relating to transparent reporting and risk of bias in study design has never been systematically appraised. Therefore, through collaboration between a group of clinicians and data scientists with appropriate content expertise, we will perform a systematic review of papers published between January 2021 and December 2023 containing artificial intelligence prognostic models in AP. To systematically assess these studies, the authors will leverage the CHARMS checklist, PROBAST tool for risk of bias assessment, and the most current version of the TRIPOD-AI. (Research Registry ( http://www.reviewregistry1727 .).
Collapse
Affiliation(s)
- Amier Hassan
- Division of Gastroenterology and Hepatology, Weill Cornell Medical College, New York, USA
| | - Brian Critelli
- Division of Gastroenterology and Hepatology, Weill Cornell Medical College, New York, USA
| | - Ila Lahooti
- Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Ali Lahooti
- Division of Gastroenterology and Hepatology, Weill Cornell Medical College, New York, USA
| | - Nate Matzko
- Division of Gastroenterology and Hepatology, Weill Cornell Medical College, New York, USA
| | - Jan Niklas Adams
- Division of Process and Data Science, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany
| | - Lukas Liss
- Division of Process and Data Science, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany
| | - Justin Quion
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, USA
| | - David Restrepo
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, USA
| | - Melica Nikahd
- Division of Bioinformatics, Ohio State University Wexner Medical Center, Columbus, USA
| | - Stacey Culp
- Division of Bioinformatics, Ohio State University Wexner Medical Center, Columbus, USA
| | - Lydia Noh
- Northeast Ohio Medical School, Rootstown, USA
| | - Kathleen Tong
- Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Jun Sung Park
- Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Venkata Akshintala
- Division of Gastroenterology, Johns Hopkins Medical Center, Baltimore, USA
| | - John A Windsor
- Department of Surgery, University of Auckland, Auckland, New Zealand
| | - Nikhil K Mull
- Division of Hospital Medicine and Penn Medicine Center for Evidence-based Practice, University of Pennsylvania, Philadelphia, USA
| | - Georgios I Papachristou
- Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Leo Anthony Celi
- Department of Surgery, University of Auckland, Auckland, New Zealand
- Division of Critical Care, Beth Israel Medical Center, Boston, USA
| | - Peter J Lee
- Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA.
| |
Collapse
|
7
|
Kunonga TP, Kenny RPW, Astin M, Bryant A, Kontogiannis V, Coughlan D, Richmond C, Eastaugh CH, Beyer FR, Pearson F, Craig D, Lovat P, Vale L, Ellis R. Predictive accuracy of risk prediction models for recurrence, metastasis and survival for early-stage cutaneous melanoma: a systematic review. BMJ Open 2023; 13:e073306. [PMID: 37770261 PMCID: PMC10546114 DOI: 10.1136/bmjopen-2023-073306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 09/03/2023] [Indexed: 09/30/2023] Open
Abstract
OBJECTIVES To identify prognostic models for melanoma survival, recurrence and metastasis among American Joint Committee on Cancer stage I and II patients postsurgery; and evaluate model performance, including overall survival (OS) prediction. DESIGN Systematic review and narrative synthesis. DATA SOURCES Searched MEDLINE, Embase, CINAHL, Cochrane Library, Science Citation Index and grey literature sources including cancer and guideline websites from 2000 to September 2021. ELIGIBILITY CRITERIA Included studies on risk prediction models for stage I and II melanoma in adults ≥18 years. Outcomes included OS, recurrence, metastases and model performance. No language or country of publication restrictions were applied. DATA EXTRACTION AND SYNTHESIS Two pairs of reviewers independently screened studies, extracted data and assessed the risk of bias using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies checklist and the Prediction study Risk of Bias Assessment Tool. Heterogeneous predictors prevented statistical synthesis. RESULTS From 28 967 records, 15 studies reporting 20 models were included; 8 (stage I), 2 (stage II), 7 (stages I-II) and 7 (stages not reported), but were clearly applicable to early stages. Clinicopathological predictors per model ranged from 3-10. The most common were: ulceration, Breslow thickness/depth, sociodemographic status and site. Where reported, discriminatory values were ≥0.7. Calibration measures showed good matches between predicted and observed rates. None of the studies assessed clinical usefulness of the models. Risk of bias was high in eight models, unclear in nine and low in three. Seven models were internally and externally cross-validated, six models were externally validated and eight models were internally validated. CONCLUSIONS All models are effective in their predictive performance, however the low quality of the evidence raises concern as to whether current follow-up recommendations following surgical treatment is adequate. Future models should incorporate biomarkers for improved accuracy. PROSPERO REGISTRATION NUMBER CRD42018086784.
Collapse
Affiliation(s)
- Tafadzwa Patience Kunonga
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - R P W Kenny
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Margaret Astin
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Andrew Bryant
- Biostatistics Research Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Vasileios Kontogiannis
- Health Economics Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Diarmuid Coughlan
- Health Economics Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Catherine Richmond
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Claire H Eastaugh
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Fiona R Beyer
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Fiona Pearson
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Dawn Craig
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Health Economics Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Penny Lovat
- Dermatological Sciences, Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- AMLo Bisciences, The Biosphere, Newcastle Helix, Newcastle upon Tyne, UK
| | - Luke Vale
- Health Economics Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Robert Ellis
- Dermatological Sciences, Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- AMLo Bisciences, The Biosphere, Newcastle Helix, Newcastle upon Tyne, UK
- Department of Dermatology, South Tees Hospitals NHS FT, Middlesbrough, UK
| |
Collapse
|
8
|
Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, Steyerberg EW, de Jong Y. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol 2023; 159:159-173. [PMID: 37142166 DOI: 10.1016/j.jclinepi.2023.04.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/30/2023] [Accepted: 04/25/2023] [Indexed: 05/06/2023]
Abstract
OBJECTIVES To (1) explore trends of risk of bias (ROB) in prediction research over time following key methodological publications, using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and (2) assess the inter-rater agreement of the PROBAST. STUDY DESIGN AND SETTING PubMed and Web of Science were searched for reviews with extractable PROBAST scores on domain and signaling question (SQ) level. ROB trends were visually correlated with yearly citations of key publications. Inter-rater agreement was assessed using Cohen's Kappa. RESULTS One hundred and thirty nine systematic reviews were included, of which 85 reviews (containing 2,477 single studies) on domain level and 54 reviews (containing 2,458 single studies) on SQ level. High ROB was prevalent, especially in the Analysis domain, and overall trends of ROB remained relatively stable over time. The inter-rater agreement was low, both on domain (Kappa 0.04-0.26) and SQ level (Kappa -0.14 to 0.49). CONCLUSION Prediction model studies are at high ROB and time trends in ROB as assessed with the PROBAST remain relatively stable. These results might be explained by key publications having no influence on ROB or recency of key publications. Moreover, the trend may suffer from the low inter-rater agreement and ceiling effect of the PROBAST. The inter-rater agreement could potentially be improved by altering the PROBAST or providing training on how to apply the PROBAST.
Collapse
Affiliation(s)
| | - Roemer J Janse
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Esmee Venema
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands; Department of Emergency Medicine, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Merel van Diepen
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Friedo W Dekker
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Ype de Jong
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands; Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|