1
|
Collins GS. Innovative solutions are needed to overcome implementation barriers to using reporting guidelines. BMJ 2025; 389:r718. [PMID: 40228827 DOI: 10.1136/bmj.r718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK
| |
Collapse
|
2
|
Smiley A, Villarreal-Zegarra D, Reategui-Rivera CM, Escobar-Agreda S, Finkelstein J. Methodological and reporting quality of machine learning studies on cancer diagnosis, treatment, and prognosis. Front Oncol 2025; 15:1555247. [PMID: 40297817 PMCID: PMC12034563 DOI: 10.3389/fonc.2025.1555247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Accepted: 03/18/2025] [Indexed: 04/30/2025] Open
Abstract
This study aimed to evaluate the quality and transparency of reporting in studies using machine learning (ML) in oncology, focusing on adherence to the Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS), TRIPOD-AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis), and PROBAST (Prediction Model Risk of Bias Assessment Tool). The literature search included primary studies published between February 1, 2024, and January 31, 2025, that developed or tested ML models for cancer diagnosis, treatment, or prognosis. To reflect the current state of the rapidly evolving landscape of ML applications in oncology, fifteen most recent articles in each category were selected for evaluation. Two independent reviewers screened studies and extracted data on study characteristics, reporting quality (CREMLS and TRIPOD+AI), risk of bias (PROBAST), and ML performance metrics. The most frequently studied cancer types were breast cancer (n=7/45; 15.6%), lung cancer (n=7/45; 15.6%), and liver cancer (n=5/45; 11.1%). The findings indicate several deficiencies in reporting quality, as assessed by CREMLS and TRIPOD+AI. These deficiencies primarily relate to sample size calculation, reporting on data quality, strategies for handling outliers, documentation of ML model predictors, access to training or validation data, and reporting on model performance heterogeneity. The methodological quality assessment using PROBAST revealed that 89% of the included studies exhibited a low overall risk of bias, and all studies have shown a low risk of bias in terms of applicability. Regarding the specific AI models identified as the best-performing, Random Forest (RF) and XGBoost were the most frequently reported, each used in 17.8% of the studies (n = 8). Additionally, our study outlines the specific areas where reporting is deficient, providing researchers with guidance to improve reporting quality in these sections and, consequently, reduce the risk of bias in their studies.
Collapse
Affiliation(s)
- Aref Smiley
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States
| | | | | | | | - Joseph Finkelstein
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
3
|
Tsegaye B, Snell KIE, Archer L, Kirtley S, Riley RD, Sperrin M, Van Calster B, Collins GS, Dhiman P. Larger sample sizes are needed when developing a clinical prediction model using machine learning in oncology: methodological systematic review. J Clin Epidemiol 2025; 180:111675. [PMID: 39814217 DOI: 10.1016/j.jclinepi.2025.111675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 12/17/2024] [Accepted: 01/07/2025] [Indexed: 01/18/2025]
Abstract
BACKGROUND AND OBJECTIVES Having a sufficient sample size is crucial when developing a clinical prediction model. We reviewed details of sample size in studies developing prediction models for binary outcomes using machine learning (ML) methods within oncology and compared the sample size used to develop the models with the minimum required sample size needed when developing a regression-based model (Nmin). METHODS We searched the Medline (via OVID) database for studies developing a prediction model using ML methods published in December 2022. We reviewed how sample size was justified. We calculated Nmin, which is the Nmin, and compared this with the sample size that was used to develop the models. RESULTS Only one of 36 included studies justified their sample size. We were able to calculate Nmin for 17 (47%) studies. 5/17 studies met Nmin, allowing to precisely estimate the overall risk and minimize overfitting. There was a median deficit of 302 participants with the event (n = 17; range: -21,331 to 2298) when developing the ML models. An additional three out of the 17 studies met the required sample size to precisely estimate the overall risk only. CONCLUSION Studies developing a prediction model using ML in oncology seldom justified their sample size and sample sizes were often smaller than Nmin. As ML models almost certainly require a larger sample size than regression models, the deficit is likely larger. We recommend that researchers consider and report their sample size and at least meet the minimum sample size required when developing a regression-based model.
Collapse
Affiliation(s)
- Biruk Tsegaye
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK.
| | - Kym I E Snell
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK; Institute of Translational Medicine, National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Lucinda Archer
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK; Institute of Translational Medicine, National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK; Institute of Translational Medicine, National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Matthew Sperrin
- Division of Imaging, Informatics and Data Science, Manchester Academic Health Science Centre, University of Manchester, Manchester M13 9PL, UK
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands; Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
4
|
Moons KGM, Damen JAA, Kaul T, Hooft L, Andaur Navarro C, Dhiman P, Beam AL, Van Calster B, Celi LA, Denaxas S, Denniston AK, Ghassemi M, Heinze G, Kengne AP, Maier-Hein L, Liu X, Logullo P, McCradden MD, Liu N, Oakden-Rayner L, Singh K, Ting DS, Wynants L, Yang B, Reitsma JB, Riley RD, Collins GS, van Smeden M. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025; 388:e082505. [PMID: 40127903 PMCID: PMC11931409 DOI: 10.1136/bmj-2024-082505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/16/2025] [Indexed: 03/26/2025]
Affiliation(s)
- Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Johanna A A Damen
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Tabea Kaul
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Lotty Hooft
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Constanza Andaur Navarro
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Leo Anthony Celi
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, Health Data Research Centre UK, London, United Kingdom
| | | | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Georg Heinze
- Institute of Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | | | - Lena Maier-Hein
- Division of Intelligent Medical Systems, German Cancer Research Centre (DKFZ), Heidelberg, Germany
- National Centre for Tumour Diseases (NCT) Heidelberg, Heidelberg, Germany
| | - Xiaoxuan Liu
- College of Medicine and Health, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, Birmingham, UK
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Melissa D McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Karandeep Singh
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Daniel S Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- AI Office, Singapore Health Service, Duke-NUS Medical School, Singapore, Singapore
| | - Laure Wynants
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Bada Yang
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| | - Richard D Riley
- School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, Birmingham, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, 3508 GA Utrecht, Netherlands
| |
Collapse
|
5
|
Maru S, Kuwatsuru R, Matthias MD, Simpson RJ. Public Disclosure of Results From Artificial Intelligence/Machine Learning Research in Health Care: Comprehensive Analysis of ClinicalTrials.gov, PubMed, and Scopus Data (2010-2023). J Med Internet Res 2025; 27:e60148. [PMID: 40117574 PMCID: PMC11971578 DOI: 10.2196/60148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 01/13/2025] [Accepted: 01/18/2025] [Indexed: 03/23/2025] Open
Abstract
BACKGROUND Despite the rapid growth of research in artificial intelligence/machine learning (AI/ML), little is known about how often study results are disclosed years after study completion. OBJECTIVE We aimed to estimate the proportion of AI/ML research that reported results through ClinicalTrials.gov or peer-reviewed publications indexed in PubMed or Scopus. METHODS Using data from the Clinical Trials Transformation Initiative Aggregate Analysis of ClinicalTrials.gov, we identified studies initiated and completed between January 2010 and December 2023 that contained AI/ML-specific terms in the official title, brief summary, interventions, conditions, detailed descriptions, primary outcomes, or keywords. For 842 completed studies, we searched PubMed and Scopus for publications containing study identifiers and AI/ML-specific terms in relevant fields, such as the title, abstract, and keywords. We calculated disclosure rates within 3 years of study completion and median times to disclosure-from the "primary completion date" to the "results first posted date" on ClinicalTrials.gov or the earliest date of journal publication. RESULTS When restricted to studies completed before 2021, ensuring at least 3 years of follow-up in which to report results, 7.0% (22/316) disclosed results on ClinicalTrials.gov, 16.5% (52/316) in journal publications, and 20.6% (65/316) through either route within 3 years of completion. Higher disclosure rates were observed for trials: 11.0% (15/136) on ClinicalTrials.gov, 25.0% (34/136) in journal publications, and 30.1% (41/136) through either route. Randomized controlled trials had even higher disclosure rates: 12.2% (9/74) on ClinicalTrials.gov, 31.1% (23/74) in journal publications, and 36.5% (27/74) through either route. Nevertheless, most study findings (79.4%; 251/316) remained undisclosed 3 years after study completion. Trials using randomization (vs nonrandomized) or masking (vs open label) had higher disclosure rates and shorter times to disclosure. Most trials (85%; 305/357) had sample sizes of ≤1000, yet larger trials (n>1000) had higher publication rates (30.8%; 16/52) than smaller trials (n≤1000) (17.4%; 53/305). Hospitals (12.4%; 42/340), academia (15.1%; 39/259), and industry (13.7%; 20/146) published the most. High-income countries accounted for 82.4% (89/108) of all published studies. Of studies with disclosed results, the median times to report through ClinicalTrials.gov and in journal publications were 505 days (IQR 399-676) and 407 days (IQR 257-674), respectively. Open-label trials were common (60%; 214/357). Single-center designs were prevalent in both trials (83.3%; 290/348) and observational studies (82.3%; 377/458). CONCLUSIONS For nearly 80% of completed studies, findings remained undisclosed within the 3 years of follow-up, raising questions about the representativeness of publicly available evidence. While methodological rigor was generally associated with higher publication rates, the predominance of single-center designs and high-income countries may limit the generalizability of the results currently accessible.
Collapse
Affiliation(s)
- Shoko Maru
- Real‑World Evidence and Data Assessment (READS), Graduate School of Medicine, Juntendo University, Tokyo, Japan
| | - Ryohei Kuwatsuru
- Real‑World Evidence and Data Assessment (READS), Graduate School of Medicine, Juntendo University, Tokyo, Japan
- Department of Radiology, School of Medicine, Juntendo University, Tokyo, Japan
| | | | - Ross J Simpson
- Division of Cardiology, School of Medicine, University of North Carolina Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
6
|
Jabbari M, Barati M, Kalhori A, Eini-Zinab H, Zayeri F, Poustchi H, Pourshams A, Hekmatdoost A, Malekzadeh R. Development of a Digestive Cancer Risk Score Based on Nutritional Predictors: A Risk Prediction Model in the Golestan Cohort Study. Nutr Cancer 2025; 77:518-529. [PMID: 40055926 DOI: 10.1080/01635581.2025.2474264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 02/18/2025] [Accepted: 02/25/2025] [Indexed: 04/01/2025]
Abstract
This study aimed to develop a non-laboratory simple and useful scoring system to predict risk of incident digestive cancers within the healthcare and clinical framework in Iranian population. The present study was conducted on the collected data from the Golestan Cohort Study. A total of 49,173 participants, aged 37-80 years, were recruited from Gonbad City and 326 rural villages and were followed from 2004 to 2021 in Iran. A non-laboratory model for prediction of the 15-year risk of digestive cancers by means of dietary predictors and formulating a simple and useful scoring system in Iranian population was done in this study. A total of 43550 participants (25249 women and 18301 men) were included in the final analysis. The model's discrimination and calibration were assessed by concordance statistic (C-statistic) and calibration plot, respectively. The model had an acceptable discrimination in both derivation (C-statistic: 0.76) and validation (C-statistic: 0.70) samples (p < 0.001). Also, the calibration of model in derivation and validation datasets was 0.88 and 0.91, respectively. As an assessment tool, the established simple and practical nutritional risk score is suitable for motivating at-risk individuals to change lifestyles and dietary patterns to reduce future risks and prevent health problems.
Collapse
Affiliation(s)
- Masoumeh Jabbari
- Department of Community Nutrition, Faculty of Nutrition Sciences and Food Technology, National Nutrition and Food Technology Research Institute, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Meisam Barati
- Department of Cellular and Molecular Nutrition, School of Nutrition Sciences and Dietetics, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Kalhori
- Department of Food Science and Technology, Nutritional Science, The Ohio State University, Columbus, Ohio, USA
| | - Hassan Eini-Zinab
- Department of Community Nutrition, Faculty of Nutrition Sciences and Food Technology, National Nutrition and Food Technology Research Institute, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farid Zayeri
- Proteomics Research Center and Department of Biostatistics, Faculty of Allied Medical Sciences, Shahid BeheshtiUniversity of Medical Sciences, Tehran, Iran
| | - Hossein Poustchi
- Liver and Pancreaticobiliary Disease Research Center, Digestive Diseases Research Institute, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Akram Pourshams
- Digestive Oncology Research Center, Digestive Diseases Research Institute, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Azita Hekmatdoost
- Department of Clinical Nutrition and Dietetics, Faculty of Nutrition and Food Technology, National Nutrition and Food Technology Research Institute, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Malekzadeh
- Digestive Oncology Research Center, Digestive Diseases Research Institute, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
7
|
Zhong J, Liu X, Lu J, Yang J, Zhang G, Mao S, Chen H, Yin Q, Cen Q, Jiang R, Song Y, Lu M, Chu J, Xing Y, Hu Y, Ding D, Ge X, Zhang H, Yao W. Overlooked and underpowered: a meta-research addressing sample size in radiomics prediction models for binary outcomes. Eur Radiol 2025; 35:1146-1156. [PMID: 39789271 PMCID: PMC11835977 DOI: 10.1007/s00330-024-11331-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 11/10/2024] [Accepted: 11/30/2024] [Indexed: 01/12/2025]
Abstract
OBJECTIVES To investigate how studies determine the sample size when developing radiomics prediction models for binary outcomes, and whether the sample size meets the estimates obtained by using established criteria. METHODS We identified radiomics studies that were published from 01 January 2023 to 31 December 2023 in seven leading peer-reviewed radiological journals. We reviewed the sample size justification methods, and actual sample size used. We calculated and compared the actual sample size used to the estimates obtained by using three established criteria proposed by Riley et al. We investigated which characteristics factors were associated with the sufficient sample size that meets the estimates obtained by using established criteria proposed by Riley et al. RESULTS: We included 116 studies. Eleven out of one hundred sixteen studies justified the sample size, in which 6/11 performed a priori sample size calculation. The median (first and third quartile, Q1, Q3) of the total sample size is 223 (130, 463), and those of sample size for training are 150 (90, 288). The median (Q1, Q3) difference between total sample size and minimum sample size according to established criteria are -100 (-216, 183), and those differences between total sample size and a more restrictive approach based on established criteria are -268 (-427, -157). The presence of external testing and the specialty of the topic were associated with sufficient sample size. CONCLUSION Radiomics studies are often designed without sample size justification, whose sample size may be too small to avoid overfitting. Sample size justification is encouraged when developing a radiomics model. KEY POINTS Question Sample size justification is critical to help minimize overfitting in developing a radiomics model, but is overlooked and underpowered in radiomics research. Findings Few of the radiomics models justified, calculated, or reported their sample size, and most of them did not meet the recent formal sample size criteria. Clinical relevance Radiomics models are often designed without sample size justification. Consequently, many models are too small to avoid overfitting. It should be encouraged to justify, perform, and report the considerations on sample size when developing radiomics models.
Collapse
Affiliation(s)
- Jingyu Zhong
- Laboratory of Key Technology and Materials in Minimally Invasive Spine Surgery, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- Center for Spinal Minimally Invasive Research, Shanghai Jiao Tong University, Shanghai, China.
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Xianwei Liu
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Junjie Lu
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - Jiarui Yang
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Guangcheng Zhang
- Department of Orthopedics, Shanghai Sixth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shiqi Mao
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Haoda Chen
- Department of General Surgery, Pancreatic Disease Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qian Yin
- Department of Pathology, Shanghai Sixth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qingqing Cen
- Department of Dermatology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Run Jiang
- Department of Pharmacovigilance, SciClone Pharmaceuticals (Holdings) Ltd., Shanghai, China
| | - Yang Song
- MR Scientific Marketing, Siemens Healthineers Ltd., Shanghai, China
| | - Minda Lu
- MR Application, Siemens Healthineers Ltd., Shanghai, China
| | - Jingshen Chu
- Editorial Office of Journal of Diagnostics Concepts & Practice, Department of Science and Technology Development, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yue Xing
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yangfan Hu
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Defang Ding
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiang Ge
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Huan Zhang
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Weiwu Yao
- Laboratory of Key Technology and Materials in Minimally Invasive Spine Surgery, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- Center for Spinal Minimally Invasive Research, Shanghai Jiao Tong University, Shanghai, China.
- Department of Imaging, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
8
|
Yan P, Xu Z, Hui X, Chu X, Chen Y, Yang C, Xu S, Cui H, Zhang L, Zhang W, Wang L, Zou Y, Ren Y, Liao J, Zhang Q, Yang K, Zhang L, Liu Y, Li J, Yang C, Yao Y, Liu Z, Jiang X, Zhang B. The reporting quality and methodological quality of dynamic prediction models for cancer prognosis. BMC Med Res Methodol 2025; 25:58. [PMID: 40025462 PMCID: PMC11872325 DOI: 10.1186/s12874-025-02516-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 02/20/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND To evaluate the reporting quality and methodological quality of dynamic prediction model (DPM) studies on cancer prognosis. METHODS Extensive search for DPM studies on cancer prognosis was conducted in MEDLINE, EMBASE, and the Cochrane Library databases. The Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) and the Prediction model Risk of Bias Assessment Tool (PROBAST) were used to assess reporting quality and methodological quality, respectively. RESULTS A total of 34 DPM studies were identified since the first publication in 2005, the main modeling methods for DPMs included the landmark model and the joint model. Regarding the reporting quality, the median overall TRIPOD adherence score was 75%. The TRIPOD items were poorly reported, especially the title (23.53%), model specification, including presentation (55.88%) and interpretation (50%) of the DPM usage, and implications for clinical use and future research (29.41%). Concerning methodological quality, most studies were of low quality (n = 30) or unclear (n = 3), mainly due to statistical analysis issues. CONCLUSIONS The Landmark model and joint model show potential in DPM. The suboptimal reporting and methodological qualities of current DPM studies should be improved to facilitate clinical application.
Collapse
Affiliation(s)
- Peijing Yan
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
- Clinical Research Center, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Zhengxing Xu
- Department of Epidemiology and Health Statistics, School of Public Health, Southwest Medical University, Luzhou, Sichuan, China
| | - Xu Hui
- Evidence Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, Gansu, China
- Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu, China
| | - Xiajing Chu
- Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada
| | - Yizhuo Chen
- The Second Clinical Medical Hospital, Lanzhou University, Lanzhou, Gansu, China
| | - Chao Yang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Shixi Xu
- Department of Preventive Medicine, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Huijie Cui
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Li Zhang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Wenqiang Zhang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Liqun Wang
- Department of Hygienic Toxicology, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yanqiu Zou
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yan Ren
- Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Jiaqiang Liao
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Qin Zhang
- Department of Occupational and Environmental Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Kehu Yang
- Evidence Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, Gansu, China
- Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu, China
- Key Laboratory of Evidence Based Medicine and Knowledge Translation of Gansu Province, Lanzhou, Gansu, China
| | - Ling Zhang
- Department of Iatrical Polymer Material and Artificial Apparatus, School of Polymer Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| | - Yunjie Liu
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Jiayuan Li
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Chunxia Yang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yuqin Yao
- Department of Hygienic Toxicology, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Zhenmi Liu
- Department of Maternal, Child and Adolescent Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, No. 16, Section 3, South Renmin Road, Wuhou District, Chengdu, 610041, China.
| | - Xia Jiang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China.
- Department of Nutrition and Food Hygiene, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, No. 16, Section 3, South Renmin Road, Wuhou District, Chengdu, 610041, China.
| | - Ben Zhang
- Department of Epidemiology and Biostatistics, Institute of Systems Epidemiology and West China-PUMC C. C. Chen Institute of Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China.
- Department of Occupational and Environmental Health, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, No. 16, Section 3, South Renmin Road, Wuhou District, Chengdu, 610041, China.
| |
Collapse
|
9
|
Santos CS, Amorim-Lopes M. Externally validated and clinically useful machine learning algorithms to support patient-related decision-making in oncology: a scoping review. BMC Med Res Methodol 2025; 25:45. [PMID: 39984835 PMCID: PMC11843972 DOI: 10.1186/s12874-025-02463-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/03/2025] [Indexed: 02/23/2025] Open
Abstract
BACKGROUND This scoping review systematically maps externally validated machine learning (ML)-based models in cancer patient care, quantifying their performance, and clinical utility, and examining relationships between models, cancer types, and clinical decisions. By synthesizing evidence, this study identifies, strengths, limitations, and areas requiring further research. METHODS The review followed the Joanna Briggs Institute's methodology, Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines, and the Population, Concept, and Context mnemonic. Searches were conducted across Embase, IEEE Xplore, PubMed, Scopus, and Web of Science (January 2014-September 2022), targeting English-language quantitative studies in Q1 journals (SciMago Journal and Country Ranking > 1) that used ML to evaluate clinical outcomes for human cancer patients with commonly available data. Eligible models required external validation, clinical utility assessment, and performance metric reporting. Studies involving genetics, synthetic patients, plants, or animals were excluded. Results were presented in tabular, graphical, and descriptive form. RESULTS From 4023 deduplicated abstracts and 636 full-text reviews, 56 studies (2018-2022) met the inclusion criteria, covering diverse cancer types and applications. Convolutional neural networks were most prevalent, demonstrating high performance, followed by gradient- and decision tree-based algorithms. Other algorithms, though underrepresented, showed promise. Lung and digestive system cancers were most frequently studied, focusing on diagnosis and outcome predictions. Most studies were retrospective and multi-institutional, primarily using image-based data, followed by text-based and hybrid approaches. Clinical utility assessments involved 499 clinicians and 12 tools, indicating improved clinician performance with AI assistance and superior performance to standard clinical systems. DISCUSSION Interest in ML-based clinical decision-making has grown in recent years alongside increased multi-institutional collaboration. However, small sample sizes likely impacted data quality and generalizability. Persistent challenges include limited international validation across ethnicities, inconsistent data sharing, disparities in validation metrics, and insufficient calibration reporting, hindering model comparison reliability. CONCLUSION Successful integration of ML in oncology decision-making requires standardized data and methodologies, larger sample sizes, greater transparency, and robust validation and clinical utility assessments. OTHER Financed by FCT-Fundação para a Ciência e a Tecnologia (Portugal, project LA/P/0063/2020, grant 2021.09040.BD) as part of CSS's Ph.D. This work was not registered.
Collapse
Affiliation(s)
- Catarina Sousa Santos
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal.
| | - Mário Amorim-Lopes
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal
| |
Collapse
|
10
|
Meijerink LM, Dunias ZS, Leeuwenberg AM, de Hond AAH, Jenkins DA, Martin GP, Sperrin M, Peek N, Spijker R, Hooft L, Moons KGM, van Smeden M, Schuit E. Updating methods for artificial intelligence-based clinical prediction models: a scoping review. J Clin Epidemiol 2025; 178:111636. [PMID: 39662644 DOI: 10.1016/j.jclinepi.2024.111636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 12/02/2024] [Accepted: 12/03/2024] [Indexed: 12/13/2024]
Abstract
OBJECTIVES To give an overview of methods for updating artificial intelligence (AI)-based clinical prediction models based on new data. STUDY DESIGN AND SETTING We comprehensively searched Scopus and Embase up to August 2022 for articles that addressed developments, descriptions, or evaluations of prediction model updating methods. We specifically focused on articles in the medical domain involving AI-based prediction models that were updated based on new data, excluding regression-based updating methods as these have been extensively discussed elsewhere. We categorized and described the identified methods used to update the AI-based prediction model as well as the use cases in which they were used. RESULTS We included 78 articles. The majority of the included articles discussed updating for neural network methods (93.6%) with medical images as input data (65.4%). In many articles (51.3%) existing, pretrained models for broad tasks were updated to perform specialized clinical tasks. Other common reasons for model updating were to address changes in the data over time and cross-center differences; however, more unique use cases were also identified, such as updating a model from a broad population to a specific individual. We categorized the identified model updating methods into four categories: neural network-specific methods (described in 92.3% of the articles), ensemble-specific methods (2.5%), model-agnostic methods (9.0%), and other (1.3%). Variations of neural network-specific methods are further categorized based on the following: (1) the part of the original neural network that is kept, (2) whether and how the original neural network is extended with new parameters, and (3) to what extent the original neural network parameters are adjusted to the new data. The most frequently occurring method (n = 30) involved selecting the first layer(s) of an existing neural network, appending new, randomly initialized layers, and then optimizing the entire neural network. CONCLUSION We identified many ways to adjust or update AI-based prediction models based on new data, within a large variety of use cases. Updating methods for AI-based prediction models other than neural networks (eg, random forest) appear to be underexplored in clinical prediction research. PLAIN LANGUAGE SUMMARY AI-based prediction models are increasingly used in health care, helping clinicians with diagnosing diseases, guiding treatment decisions, and informing patients. However, these prediction models do not always work well when applied to hospitals, patient populations, or times different from those used to develop the models. Developing new models for every situation is neither practical nor desired, as it wastes resources, time, and existing knowledge. A more efficient approach is to adjust existing models to new contexts ('updating'), but there is limited guidance on how to do this for AI-based clinical prediction models. To address this, we reviewed 78 studies in detail to understand how researchers are currently updating AI-based clinical prediction models, and the types of situations in which these updating methods are used. Our findings provide a comprehensive overview of the available methods to update existing models. This is intended to serve as guidance and inspiration for researchers. Ultimately, this can lead to better reuse of existing models and improve the quality and efficiency of AI-based prediction models in health care.
Collapse
Affiliation(s)
- Lotta M Meijerink
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Zoë S Dunias
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Anne A H de Hond
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - David A Jenkins
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Glen P Martin
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Niels Peek
- Department of Public Health and Primary Care, The Healthcare Improvement Studies Institute, University of Cambridge, Cambridge, United Kingdom
| | - René Spijker
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
11
|
Gonzalez R, Saha A, Campbell CJ, Nejat P, Lokker C, Norgan AP. Seeing the random forest through the decision trees. Supporting learning health systems from histopathology with machine learning models: Challenges and opportunities. J Pathol Inform 2024; 15:100347. [PMID: 38162950 PMCID: PMC10755052 DOI: 10.1016/j.jpi.2023.100347] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/06/2023] [Accepted: 11/01/2023] [Indexed: 01/03/2024] Open
Abstract
This paper discusses some overlooked challenges faced when working with machine learning models for histopathology and presents a novel opportunity to support "Learning Health Systems" with them. Initially, the authors elaborate on these challenges after separating them according to their mitigation strategies: those that need innovative approaches, time, or future technological capabilities and those that require a conceptual reappraisal from a critical perspective. Then, a novel opportunity to support "Learning Health Systems" by integrating hidden information extracted by ML models from digitalized histopathology slides with other healthcare big data is presented.
Collapse
Affiliation(s)
- Ricardo Gonzalez
- DeGroote School of Business, McMaster University, Hamilton, Ontario, Canada
- Division of Computational Pathology and Artificial Intelligence, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Ashirbani Saha
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
- Escarpment Cancer Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
| | - Clinton J.V. Campbell
- William Osler Health System, Brampton, Ontario, Canada
- Department of Pathology and Molecular Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Peyman Nejat
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Andrew P. Norgan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
12
|
Rosengaard LO, Andersen MZ, Rosenberg J, Fonnes S. Several methods for assessing research waste in reviews with a systematic search: a scoping review. PeerJ 2024; 12:e18466. [PMID: 39575170 PMCID: PMC11580664 DOI: 10.7717/peerj.18466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 10/15/2024] [Indexed: 11/24/2024] Open
Abstract
Background Research waste is present in all study designs and can have significant consequences for science, including reducing the reliability of research findings and contributing to the inefficient use of resources. Estimates suggest that as much as 85% of all biomedical research is wasted. However, it is uncertain how avoidable research waste is assessed in specific types of study designs and what methods could be used to examine different aspects of research waste. We aimed to investigate which methods, systematic reviews, scoping reviews, and overviews of reviews discussing research waste, have used to assess avoidable research waste. Materials and Methods We published a protocol in the Open Science Framework prospectively (https://osf.io/2fbp4). We searched PubMed and Embase with a 30-year limit (January 1993-August 2023). The concept examined was how research waste and related synonyms (e.g., unnecessary, redundant, duplicate, etc.) were assessed in reviews with a systematic search: systematic, scoping, or overviews of reviews. We extracted data on the method used in the review to examine for research waste and for which study design this method was applied. Results The search identified 4,285 records of which 93 reviews with systematic searches were included. The reviews examined a median of 90 (range 10-6,781) studies, where the study designs most commonly included were randomized controlled trials (48%) and systematic reviews (33%). In the last ten years, the number of reports assessing research waste has increased. More than 50% of examined reviews reported evaluating methodological research waste among included studies, typically using tools such as one of Cochrane Risk of Bias tools (n = 8) for randomized controlled trials or AMSTAR 1 or 2 (n = 12) for systematic reviews. One fourth of reviews assessed reporting guideline adherence to e.g., CONSORT (n = 4) for randomized controlled trials or PRISMA (n = 6) for systematic reviews. Conclusion Reviews with systematic searches focus on methodological quality and reporting guideline adherence when examining research waste. However, this scoping review revealed that a wide range of tools are used, which may pose difficulties in comparing examinations and performing meta-research. This review aids researchers in selecting methodologies and contributes to the ongoing discourse on optimizing research efficiency.
Collapse
Affiliation(s)
- Louise Olsbro Rosengaard
- Center for Perioperative Optimization, Department of Surgery, Copenhagen University Hospital - Herlev and Gentofte, Denmark
| | - Mikkel Zola Andersen
- Center for Perioperative Optimization, Department of Surgery, Copenhagen University Hospital - Herlev and Gentofte, Denmark
| | - Jacob Rosenberg
- Center for Perioperative Optimization, Department of Surgery, Copenhagen University Hospital - Herlev and Gentofte, Denmark
| | - Siv Fonnes
- Center for Perioperative Optimization, Department of Surgery, Copenhagen University Hospital - Herlev and Gentofte, Denmark
| |
Collapse
|
13
|
Krepper D, Cesari M, Hubel NJ, Zelger P, Sztankay MJ. Machine learning models including patient-reported outcome data in oncology: a systematic literature review and analysis of their reporting quality. J Patient Rep Outcomes 2024; 8:126. [PMID: 39499409 PMCID: PMC11538124 DOI: 10.1186/s41687-024-00808-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 10/30/2024] [Indexed: 11/07/2024] Open
Abstract
PURPOSE To critically examine the current state of machine learning (ML) models including patient-reported outcome measure (PROM) scores in cancer research, by investigating the reporting quality of currently available studies and proposing areas of improvement for future use of ML in the field. METHODS PubMed and Web of Science were systematically searched for publications of studies on patients with cancer applying ML models with PROM scores as either predictors or outcomes. The reporting quality of applied ML models was assessed utilizing an adapted version of the MI-CLAIM (Minimum Information about CLinical Artificial Intelligence Modelling) checklist. The key variables of the checklist are study design, data preparation, model development, optimization, performance, and examination. Reproducibility and transparency complement the reporting quality criteria. RESULTS The literature search yielded 1634 hits, of which 52 (3.2%) were eligible. Thirty-six (69.2%) publications included PROM scores as a predictor and 32 (61.5%) as an outcome. Results of the reporting quality appraisal indicate a potential for improvement, especially in the areas of model examination. According to the standards of the MI-CLAIM checklist, the reporting quality of ML models in included studies proved to be low. Only nine (17.3%) publications present a discussion about the clinical applicability of the developed model and reproducibility and only three (5.8%) provide a code to reproduce the model and the results. CONCLUSION The herein performed critical examination of the status quo of the application of ML models including PROM scores in published oncological studies allowed the identification of areas of improvement for reporting and future use of ML in the field.
Collapse
Affiliation(s)
- Daniela Krepper
- Department of Psychiatry, Psychotherapy, Psychosomatics and Medical Psychology, University Hospital of Psychiatry II, Medical University of Innsbruck, Innsbruck, Austria.
| | - Matteo Cesari
- Department of Neurology and Neurosurgery, Medical University of Innsbruck, Innsbruck, Austria
| | - Niclas J Hubel
- Department of Psychiatry, Psychotherapy, Psychosomatics and Medical Psychology, University Hospital of Psychiatry II, Medical University of Innsbruck, Innsbruck, Austria
| | - Philipp Zelger
- University Hospital for Hearing, Speech & Voice Disorders, Medical University of Innsbruck, Innsbruck, Austria
| | - Monika J Sztankay
- Department of Psychiatry, Psychotherapy, Psychosomatics and Medical Psychology, University Hospital of Psychiatry II, Medical University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
14
|
Feller D, Wingbermuhle R, Berg B, Vigdal ØN, Innocenti T, Grotle M, Ostelo R, Chiarotto A. Improvements Are Needed in the Adherence to the TRIPOD Statement for Clinical Prediction Models for Patients With Spinal Pain or Osteoarthritis: A Metaresearch Study. THE JOURNAL OF PAIN 2024; 25:104624. [PMID: 39002741 DOI: 10.1016/j.jpain.2024.104624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 06/26/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024]
Abstract
This metaresearch study aimed to evaluate the completeness of reporting of prediction model studies in patients with spinal pain or osteoarthritis (OA) in terms of adherence to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement. We searched for prognostic and diagnostic prediction models in patients with spinal pain or OA in MEDLINE, Embase, Web of Science, and CINAHL. Using a standardized assessment form, we assessed the adherence to the TRIPOD of the included studies. Two independent reviewers performed the study selection and data extraction phases. We included 66 studies. Approximately 35% of the studies declared to have used the TRIPOD. The median adherence to the TRIPOD was 59% overall (interquartile range (IQR): 21.8), with the items of the methods and results sections having the worst reporting. Studies on neck pain had better adherence to the TRIPOD than studies on back pain and OA (medians of 76.5%, 59%, and 53%, respectively). External validation studies had the highest total adherence (median: 79.5%, IQR: 12.8) of all the study types. The median overall adherence was 4 points higher in studies that declared TRIPOD use than those that did not. Finally, we did not observe any improvement in adherence over the years. The adherence to the TRIPOD of prediction models in the spinal and OA fields is low, with the methods and results sections being the most poorly reported. Future studies on prediction models in spinal pain and OA should follow the TRIPOD to improve their reporting completeness. PERSPECTIVE: This article provides data about adherence to the TRIPOD statement in 66 prediction model studies for spinal pain or OA. The adherence to the TRIPOD statement was found to be low (median adherence of 59%). This inadequate reporting may negatively impact the effective use of the models in clinical practice.
Collapse
Affiliation(s)
- Daniel Feller
- Department of Rehabilitation, Provincial Agency for Health of the Autonomous Province of Trento, Trento, Italy; Department of Human Resources, Provincial Agency for Health of the Autonomous Province of Trento, Trento, Italy; Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands.
| | - Roel Wingbermuhle
- Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands; Department of Physiotherapy and Rehabilitation sciences, SOMT University of Physiotherapy, Amersfoort, the Netherlands
| | - Bjørnar Berg
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
| | - Ørjan Nesse Vigdal
- Department of Rehabilitation Science and Health Technology, Faculty of Health Science, OsloMet - Oslo Metropolitan University, Oslo, Norway
| | - Tiziano Innocenti
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, the Netherlands; GIMBE Foundation, Bologna, Italy
| | - Margreth Grotle
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway; Division of Clinical Neuroscience, Department of Research and Innovation, Oslo University Hospital, Oslo, Norway
| | - Raymond Ostelo
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, the Netherlands; Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit & Amsterdam Movement Sciences, Musculoskeletal Health, Amsterdam, the Netherlands
| | - Alessandro Chiarotto
- Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
15
|
White N, Parsons R, Borg D, Collins G, Barnett A. Planned but ever published? A retrospective analysis of clinical prediction model studies registered on clinicaltrials.gov since 2000. J Clin Epidemiol 2024; 173:111433. [PMID: 38897482 DOI: 10.1016/j.jclinepi.2024.111433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 06/21/2024]
Abstract
OBJECTIVES To describe the characteristics and publication outcomes of clinical prediction model studies registered on clinicaltrials.gov since 2000. STUDY DESIGN AND SETTING Observational studies registered on clinicaltrials.gov between January 1, 2000, and March 2, 2022, describing the development of a new clinical prediction model or the validation of an existing model for predicting individual-level prognostic or diagnostic risk were analyzed. Eligible clinicaltrials.gov records were classified by modeling study type (development, validation) and the model outcome being predicted (prognostic, diagnostic). Recorded characteristics included study status, sample size information, Medical Subject Headings, and plans to share individual participant data. Publication outcomes were analyzed by linking National Clinical Trial numbers for eligible records with PubMed abstracts. RESULTS Nine hundred twenty-eight records were analyzed from a possible 89,896 observational study records. Publications searches found 170 matching peer-reviewed publications for 137 clinicaltrials.gov records. The estimated proportion of records with 1 or more matching publications after accounting for time since study start was 2.8% at 2 years (95% CI: 1.7%, 3.9%), 12.3% at 5 years (9.8% to 14.9%) and 27% at 10 years (23% to 33%). Stratifying records by study start year indicated that publication proportions improved over time. Records tended to prioritize the development of new prediction models over the validation of existing models (76%; 704/928 vs. 24%; 182/928). At the time of download, 27% of records were marked as complete, 35% were still recruiting, and 14.7% had unknown status. Only 7.4% of records stated plans to share individual participant data. CONCLUSION Published clinical prediction model studies are only a fraction of overall research efforts, with many studies planned but not completed or published. Improving the uptake of study preregistration and follow-up will increase the visibility of planned research. Introducing additional registry features and guidance may improve the identification of clinical prediction model studies posted to clinical registries.
Collapse
Affiliation(s)
- Nicole White
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Kelvin Grove, Queensland, Australia.
| | - Rex Parsons
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| | - David Borg
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Kelvin Grove, Queensland, Australia; School of Exercise and Nutrition Sciences, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| | - Gary Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Oxford, United Kingdom
| | - Adrian Barnett
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| |
Collapse
|
16
|
Clift AK, Mahon H, Khan G, Boardman-Pretty F, Worker A, Marchini E, Buendia O, Fish P, Khan MS. Identifying patients with undiagnosed small intestinal neuroendocrine tumours in primary care using statistical and machine learning: model development and validation study. Br J Cancer 2024; 131:305-311. [PMID: 38831012 PMCID: PMC11263687 DOI: 10.1038/s41416-024-02736-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 05/10/2024] [Accepted: 05/23/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Neuroendocrine tumours (NETs) are increasing in incidence, often diagnosed at advanced stages, and individuals may experience years of diagnostic delay, particularly when arising from the small intestine (SI). Clinical prediction models could present novel opportunities for case finding in primary care. METHODS An open cohort of adults (18+ years) contributing data to the Optimum Patient Care Research Database between 1st Jan 2000 and 30th March 2023 was identified. This database collects de-identified data from general practices in the UK. Model development approaches comprised logistic regression, penalised regression, and XGBoost. Performance (discrimination and calibration) was assessed using internal-external cross-validation. Decision analysis curves compared clinical utility. RESULTS Of 11.7 million individuals, 382 had recorded SI NET diagnoses (0.003%). The XGBoost model had the highest AUC (0.869, 95% confidence interval [CI]: 0.841-0.898) but was mildly miscalibrated (slope 1.165, 95% CI: 1.088-1.243; calibration-in-the-large 0.010, 95% CI: -0.164 to 0.185). Clinical utility was similar across all models. DISCUSSION Multivariable prediction models may have clinical utility in identifying individuals with undiagnosed SI NETs using information in their primary care records. Further evaluation including external validation and health economics modelling may identify cost-effective strategies for case finding for this uncommon tumour.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Peter Fish
- Mendelian, The Trampery Old Street, London, UK
| | - Mohid S Khan
- South Wales Neuroendocrine Cancer Service, University Hospital of Wales, Cardiff and Vale University Health Board, Heath Park, Cardiff, UK
- Cardiff University, School of Medicine, University Hospital of Wales, Cardiff, UK
| |
Collapse
|
17
|
Perez M, Palnaes Hansen C, Burdio F, Sanchez-Velázquez P, Giuliani A, Lancellotti F, de Liguori-Carino N, Malleo G, Marchegiani G, Podda M, Pisanu A, De Luca GM, Anselmo A, Siragusa L, Kobbelgaard Burgdorf S, Tschuor C, Cacciaguerra AB, Koh YX, Masuda Y, Hao Xuan MY, Seeger N, Breitenstein S, Grochola FL, Di Martino M, Secanella L, Busquets J, Dorcaratto D, Mora-Oliver I, Ingallinella S, Salvia R, Abu Hilal M, Aldrighetti L, Ielpo B. A machine learning predictive model for recurrence of resected distal cholangiocarcinoma: Development and validation of predictive model using artificial intelligence. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2024; 50:108375. [PMID: 38795677 DOI: 10.1016/j.ejso.2024.108375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/20/2024] [Accepted: 04/27/2024] [Indexed: 05/28/2024]
Abstract
INTRODUCTION Distal Cholangiocarcinoma (dCCA) represents a challenge in hepatobiliary oncology, that requires nuanced post-resection prognostic modeling. Conventional staging criteria may oversimplify dCCA complexities, prompting the exploration of novel prognostic factors and methodologies, including machine learning algorithms. This study aims to develop a machine learning predictive model for recurrence after resected dCCA. MATERIAL AND METHODS This retrospective multicentric observational study included patients with dCCA from 13 international centers who underwent curative pancreaticoduodenectomy (PD). A LASSO-regularized Cox regression model was used to feature selection, examine the path of the coefficient and create a model to predict recurrence. Internal and external validation and model performance were assessed using the C-index score. Additionally, a web application was developed to enhance the clinical use of the algorithm. RESULTS Among 654 patients, LNR (Lymph Node Ratio) 15, neural invasion, N stage, surgical radicality, and differentiation grade emerged as significant predictors of disease-free survival (DFS). The model showed the best discrimination capacity with a C-index value of 0.8 (CI 95 %, 0.77%-0.86 %) and highlighted LNR15 as the most influential factor. Internal and external validations showed the model's robustness and discriminative ability with an Area Under the Curve of 92.4 % (95 % CI, 88.2%-94.4 %) and 91.5 % (95 % CI, 88.4%-93.5 %), respectively. The predictive model is available at https://imim.shinyapps.io/LassoCholangioca/. CONCLUSIONS This study pioneers the integration of machine learning into prognostic modeling for dCCA, yielding a robust predictive model for DFS following PD. The tool can provide information to both patients and healthcare providers, enhancing tailored treatments and follow-up.
Collapse
Affiliation(s)
- Marc Perez
- Hepato Pancreato Biliary Division, Hospital Del Mar, Universitat Pompeu Fabra, Barcelona, Spain.
| | | | - Fernando Burdio
- Hepato Pancreato Biliary Division, Hospital Del Mar, Universitat Pompeu Fabra, Barcelona, Spain.
| | | | - Antonio Giuliani
- Unit of General Surgery, San Giuseppe Moscati Hospital, Aversa, Italy.
| | - Francesco Lancellotti
- Department of Hepato-Pancreato-Biliary Surgery, Manchester Royal Infirmary, University of Manchester, Manchester, United Kingdom.
| | - Nicola de Liguori-Carino
- Department of Hepato-Pancreato-Biliary Surgery, Manchester Royal Infirmary, University of Manchester, Manchester, United Kingdom.
| | - Giuseppe Malleo
- Unit of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, Italy.
| | - Giovanni Marchegiani
- Hepato Biliary Pancreatic (HPB) and Liver Transplant Surgery, Department of Surgery, Oncology and Gastroenterology (DiSCOG), Padova University, Padova, Italy.
| | - Mauro Podda
- Department of Surgical Science, University of Cagliari, Cagliari, Italy.
| | - Adolfo Pisanu
- Department of Surgical Science, University of Cagliari, Cagliari, Italy.
| | - Giuseppe Massimiliano De Luca
- University of Bari "A. Moro", Department of Biomedical Sciences and Human Oncology, Unit of Academic General Surgery " V. Bonomo", Bari, Italy.
| | - Alessandro Anselmo
- Department of Surgery, HPB and Transplant Surgery Unit, Policlinico Tor Vergata, Rome, Italy.
| | - Leandro Siragusa
- Division of Colon and Rectal Surgery, IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy.
| | | | - Christoph Tschuor
- Department of Surgery, Rigshospitalet, University of Copenhagen, Denmark.
| | - Andrea Benedetti Cacciaguerra
- HPB Surgery and Transplantation Unit, Department of Clinical and Experimental Medicine, Polytechnic University of Marche, Ancona, Italy.
| | - Ye Xin Koh
- Department of Hepatopancreatobiliary and Transplant Surgery, Singapore General Hospital, Singapore.
| | - Yoshio Masuda
- Department of Hepatopancreatobiliary and Transplant Surgery, Singapore General Hospital, Singapore.
| | - Mark Yeo Hao Xuan
- HPB Unit, Department of Surgery, Cantonal Hospital of Winterthur, Winterthur, Switzerland.
| | - Nico Seeger
- HPB Unit, Department of Surgery, Cantonal Hospital of Winterthur, Winterthur, Switzerland.
| | - Stefan Breitenstein
- HPB Unit, Department of Surgery, Cantonal Hospital of Winterthur, Winterthur, Switzerland.
| | - Filip Lukasz Grochola
- HPB Unit, Department of Surgery, Cantonal Hospital of Winterthur, Winterthur, Switzerland.
| | - Marcello Di Martino
- Department of Health Sciences, University of Piemonte Orientale, Novara, Italy.
| | | | | | - Dimitri Dorcaratto
- Department of General Surgery, Biomedical Research Institute INCLIVA, Hospital Clínico Universitario, University of Valencia, Spain.
| | - Isabel Mora-Oliver
- Department of General Surgery, Biomedical Research Institute INCLIVA, Hospital Clínico Universitario, University of Valencia, Spain.
| | | | - Roberto Salvia
- Unit of General and Pancreatic Surgery, The Pancreas Institute, University of Verona Hospital Trust, Italy.
| | | | | | - Benedetto Ielpo
- Hepato Pancreato Biliary Division, Hospital Del Mar, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
18
|
Lyman GH, Kuderer NM. Artificial Intelligence in Cancer Clinical Research: II. Development and Validation of Clinical Prediction Models. Cancer Invest 2024; 42:447-451. [PMID: 38775011 DOI: 10.1080/07357907.2024.2354991] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Affiliation(s)
- Gary H Lyman
- Editor-in-Chief, Cancer Investigation Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Nicole M Kuderer
- Deputy Editor, Cancer Investigation Advanced Cancer Research Group, Kirkland, WA, USA
| |
Collapse
|
19
|
Dhiman P, Ma J, Kirtley S, Mouka E, Waldron CM, Whittle R, Collins GS. Prediction model protocols indicate better adherence to recommended guidelines for study conduct and reporting. J Clin Epidemiol 2024; 169:111287. [PMID: 38387617 DOI: 10.1016/j.jclinepi.2024.111287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/07/2024] [Accepted: 02/15/2024] [Indexed: 02/24/2024]
Abstract
BACKGROUND AND OBJECTIVE Protocols are invaluable documents for any research study, especially for prediction model studies. However, the mere existence of a protocol is insufficient if key details are omitted. We reviewed the reporting content and details of the proposed design and methods reported in published protocols for prediction model research. METHODS We searched MEDLINE, Embase, and the Web of Science Core Collection for protocols for studies developing or validating a diagnostic or prognostic model using any modeling approach in any clinical area. We screened protocols published between Jan 1, 2022 and June 30, 2022. We used the abstract, introduction, methods, and discussion sections of The Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement to inform data extraction. RESULTS We identified 30 protocols, of which 28 were describing plans for model development and six for model validation. All protocols were open access, including a preprint. 15 protocols reported prospectively collecting data. 21 protocols planned to use clustered data, of which one-third planned methods to account for it. A planned sample size was reported for 93% development and 67% validation analyses. 16 protocols reported details of study registration, but all protocols reported a statement on ethics approval. Plans for data sharing were reported in 13 protocols. CONCLUSION Protocols for prediction model studies are uncommon, and few are made publicly available. Those that are available were reasonably well-reported and often described their methods following current prediction model research recommendations, likely leading to better reporting and methods in the actual study.
Collapse
Affiliation(s)
- Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK.
| | - Jie Ma
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Shona Kirtley
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Elizabeth Mouka
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Caitlin M Waldron
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK; NIHR Blood and Transplant Research Unit in Data Driven Transfusion Practice, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
20
|
Collins GS. Making the black box more transparent: improving the reporting of artificial intelligence studies in healthcare. BMJ 2024; 385:q832. [PMID: 38626954 DOI: 10.1136/bmj.q832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
21
|
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024; 385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 260] [Impact Index Per Article: 260.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
| | - Jennifer Catherine Camaradou
- Patient representative, Health Data Research UK patient and public involvement and engagement group
- Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
| | - Leo Anthony Celi
- Beth Israel Deaconess Medical Center, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, London, UK
| | - Alastair K Denniston
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Ben Glocker
- Department of Computing, Imperial College London, London, UK
| | - Robert M Golub
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | - Georg Heinze
- Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | | | - Emily Lam
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Naomi Lee
- National Institute for Health and Care Excellence, London, UK
| | - Elizabeth W Loder
- The BMJ, London, UK
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lena Maier-Hein
- Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
| | - Bilal A Mateen
- Institute of Health Informatics, University College London, London, UK
- Wellcome Trust, London, UK
- Alan Turing Institute, London, UK
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Johan Ordish
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Richard Parnell
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Sherri Rose
- Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
| | - Karandeep Singh
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
22
|
Dos Santos AL, Pinhati C, Perdigão J, Galante S, Silva L, Veloso I, Simões E Silva AC, Oliveira EA. Machine learning algorithms to predict outcomes in children and adolescents with COVID-19: A systematic review. Artif Intell Med 2024; 150:102824. [PMID: 38553164 DOI: 10.1016/j.artmed.2024.102824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/10/2023] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND AND OBJECTIVES We aimed to analyze the study designs, modeling approaches, and performance evaluation metrics in studies using machine learning techniques to develop clinical prediction models for children and adolescents with COVID-19. METHODS We searched four databases for articles published between 01/01/2020 and 10/25/2023, describing the development of multivariable prediction models using any machine learning technique for predicting several outcomes in children and adolescents who had COVID-19. RESULTS We included ten articles, six (60 % [95 % confidence interval (CI) 0.31 - 0.83]) were predictive diagnostic models and four (40% [95 % CI 0.170.69]) were prognostic models. All models were developed to predict a binary outcome (n= 10/10, 100 % [95 % CI 0.72-1]). The most frequently predicted outcome was disease detection (n=3/10, 30% [95 % CI 0.11-0.60]). The most commonly used machine learning models in the studies were tree-based (n=12/33, 36.3% [95 % CI 0.17-0.47]) and neural networks (n=9/27, 33.2% [95% CI 0.15-0.44]). CONCLUSION Our review revealed that attention is required to address problems including small sample sizes, inconsistent reporting practices on data preparation, biases in data sources, lack of reporting metrics such as calibration and discrimination, hyperparameters and other aspects that allow reproducibility by other researchers and might improve the methodology.
Collapse
Affiliation(s)
- Adriano Lages Dos Santos
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil; Federal Institute of Education, Science and Technology of Minas Gerais (IFMG), Belo Horizonte, Brazil.
| | - Clara Pinhati
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Jonathan Perdigão
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Stella Galante
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Ludmilla Silva
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Isadora Veloso
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Ana Cristina Simões E Silva
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Eduardo Araújo Oliveira
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
23
|
Talimtzi P, Ntolkeras A, Kostopoulos G, Bougioukas KI, Pagkalidou E, Ouranidis A, Pataka A, Haidich AB. The reporting completeness and transparency of systematic reviews of prognostic prediction models for COVID-19 was poor: a methodological overview of systematic reviews. J Clin Epidemiol 2024; 167:111264. [PMID: 38266742 DOI: 10.1016/j.jclinepi.2024.111264] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/08/2024] [Accepted: 01/13/2024] [Indexed: 01/26/2024]
Abstract
OBJECTIVES To conduct a methodological overview of reviews to evaluate the reporting completeness and transparency of systematic reviews (SRs) of prognostic prediction models (PPMs) for COVID-19. STUDY DESIGN AND SETTING MEDLINE, Scopus, Cochrane Database of Systematic Reviews, and Epistemonikos (epistemonikos.org) were searched for SRs of PPMs for COVID-19 until December 31, 2022. The risk of bias in systematic reviews tool was used to assess the risk of bias. The protocol for this overview was uploaded in the Open Science Framework (https://osf.io/7y94c). RESULTS Ten SRs were retrieved; none of them synthesized the results in a meta-analysis. For most of the studies, there was absence of a predefined protocol and missing information on study selection, data collection process, and reporting of primary studies and models included, while only one SR had its data publicly available. In addition, for the majority of the SRs, the overall risk of bias was judged as being high. The overall corrected covered area was 6.3% showing a small amount of overlapping among the SRs. CONCLUSION The reporting completeness and transparency of SRs of PPMs for COVID-19 was poor. Guidance is urgently required, with increased awareness and education of minimum reporting standards and quality criteria. Specific focus is needed in predefined protocol, information on study selection and data collection process, and in the reporting of findings to improve the quality of SRs of PPMs for COVID-19.
Collapse
Affiliation(s)
- Persefoni Talimtzi
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Antonios Ntolkeras
- School of Biology, Aristotle University of Thessaloniki, University Campus, 54636, Thessaloniki, Greece
| | | | - Konstantinos I Bougioukas
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Eirini Pagkalidou
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Andreas Ouranidis
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Athanasia Pataka
- Department of Respiratory Deficiency, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Anna-Bettina Haidich
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece.
| |
Collapse
|
24
|
Evans RP, Bryant LD, Russell G, Absolom K. Trust and acceptability of data-driven clinical recommendations in everyday practice: A scoping review. Int J Med Inform 2024; 183:105342. [PMID: 38266426 DOI: 10.1016/j.ijmedinf.2024.105342] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/08/2023] [Accepted: 01/14/2024] [Indexed: 01/26/2024]
Abstract
BACKGROUND Increasing attention is being given to the analysis of large health datasets to derive new clinical decision support systems (CDSS). However, few data-driven CDSS are being adopted into clinical practice. Trust in these tools is believed to be fundamental for acceptance and uptake but to date little attention has been given to defining or evaluating trust in clinical settings. OBJECTIVES A scoping review was conducted to explore how and where acceptability and trustworthiness of data-driven CDSS have been assessed from the health professional's perspective. METHODS Medline, Embase, PsycInfo, Web of Science, Scopus, ACM Digital, IEEE Xplore and Google Scholar were searched in March 2022 using terms expanded from: "data-driven" AND "clinical decision support" AND "acceptability". Included studies focused on healthcare practitioner-facing data-driven CDSS, relating directly to clinical care. They included trust or a proxy as an outcome, or in the discussion. The preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR) is followed in the reporting of this review. RESULTS 3291 papers were screened, with 85 primary research studies eligible for inclusion. Studies covered a diverse range of clinical specialisms and intended contexts, but hypothetical systems (24) outnumbered those in clinical use (18). Twenty-five studies measured trust, via a wide variety of quantitative, qualitative and mixed methods. A further 24 discussed themes of trust without it being explicitly evaluated, and from these, themes of transparency, explainability, and supporting evidence were identified as factors influencing healthcare practitioner trust in data-driven CDSS. CONCLUSION There is a growing body of research on data-driven CDSS, but few studies have explored stakeholder perceptions in depth, with limited focused research on trustworthiness. Further research on healthcare practitioner acceptance, including requirements for transparency and explainability, should inform clinical implementation.
Collapse
Affiliation(s)
- Ruth P Evans
- University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK.
| | | | - Gregor Russell
- Bradford District Care Trust, Bradford, New Mill, Victoria Rd, BD18 3LD, UK.
| | - Kate Absolom
- University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK.
| |
Collapse
|
25
|
Barreñada L, Ledger A, Dhiman P, Collins G, Wynants L, Verbakel JY, Timmerman D, Valentin L, Van Calster B. ADNEX risk prediction model for diagnosis of ovarian cancer: systematic review and meta-analysis of external validation studies. BMJ MEDICINE 2024; 3:e000817. [PMID: 38375077 PMCID: PMC10875560 DOI: 10.1136/bmjmed-2023-000817] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/25/2024] [Indexed: 02/21/2024]
Abstract
Objectives To conduct a systematic review of studies externally validating the ADNEX (Assessment of Different Neoplasias in the adnexa) model for diagnosis of ovarian cancer and to present a meta-analysis of its performance. Design Systematic review and meta-analysis of external validation studies. Data sources Medline, Embase, Web of Science, Scopus, and Europe PMC, from 15 October 2014 to 15 May 2023. Eligibility criteria for selecting studies All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass. Two independent reviewers extracted the data. Disagreements were resolved by discussion. Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool). Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed. Results 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, justification of sample size, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration. The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies). The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125). Conclusions The results of the meta-analysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum biomarker, CA125, was used as a predictor. A key limitation was that calibration was rarely assessed. Systematic review registration PROSPERO CRD42022373182.
Collapse
Affiliation(s)
- Lasai Barreñada
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
| | - Ashleigh Ledger
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford Centre for Statistics in Medicine, Oxford, UK
| | - Gary Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford Centre for Statistics in Medicine, Oxford, UK
| | - Laure Wynants
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Epidemiology, Universiteit Maastricht Care and Public Health Research Institute, Maastricht, Netherlands
| | - Jan Y Verbakel
- Department of Public Health and Primary care, KU Leuven, Leuven, Belgium
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynaecology, UZ Leuven campus Gasthuisberg Dienst gynaecologie en verloskunde, Leuven, Belgium
| | - Lil Valentin
- Department of Obstetrics and Gynaecology, Skåne University Hospital, Malmo, Sweden
- Department of Clinical Sciences Malmö, Lund University, Lund, Sweden
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
| |
Collapse
|
26
|
O'Connor S, Vercell A, Wong D, Yorke J, Fallatah FA, Cave L, Anny Chen LY. The application and use of artificial intelligence in cancer nursing: A systematic review. Eur J Oncol Nurs 2024; 68:102510. [PMID: 38310664 DOI: 10.1016/j.ejon.2024.102510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 01/07/2024] [Accepted: 01/10/2024] [Indexed: 02/06/2024]
Abstract
PURPOSE Artificial Intelligence is being applied in oncology to improve patient and service outcomes. Yet, there is a limited understanding of how these advanced computational techniques are employed in cancer nursing to inform clinical practice. This review aimed to identify and synthesise evidence on artificial intelligence in cancer nursing. METHODS CINAHL, MEDLINE, PsycINFO, and PubMed were searched using key terms between January 2010 and December 2022. Titles, abstracts, and then full texts were screened against eligibility criteria, resulting in twenty studies being included. Critical appraisal was undertaken, and relevant data extracted and analysed. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed. RESULTS Artificial intelligence was used in numerous areas including breast, colorectal, liver, and ovarian cancer care among others. Algorithms were trained and tested on primary and secondary datasets to build predictive models of health problems related to cancer. Studies reported this led to improvements in the accuracy of predicting health outcomes or identifying variables that improved outcome prediction. While nurses led most studies, few deployed an artificial intelligence based digital tool with cancer nurses in a real-world setting as studies largely focused on developing and validating predictive models. CONCLUSION Electronic cancer nursing datasets should be established to enable artificial intelligence techniques to be tested and if effective implemented in digital prediction and other AI-based tools. Cancer nurses need more education on machine learning and natural language processing, so they can lead and contribute to artificial intelligence developments in oncology.
Collapse
Affiliation(s)
- Siobhan O'Connor
- Florence Nightingale Faculty of Nursing, Midwifery and Palliative Care, King's College London, London, United Kingdom.
| | - Amy Vercell
- Florence Nightingale Faculty of Nursing, Midwifery and Palliative Care, King's College London, London, United Kingdom; The Christie NHS Foundation Trust, Wilmslow Rd, Manchester, M20 4BX, United Kingdom.
| | - David Wong
- Leeds Institute for Health Informatics, University of Leeds, Leeds, United Kingdom.
| | - Janelle Yorke
- Florence Nightingale Faculty of Nursing, Midwifery and Palliative Care, King's College London, London, United Kingdom; The Christie NHS Foundation Trust, Wilmslow Rd, Manchester, M20 4BX, United Kingdom.
| | - Fatmah Abdulsamad Fallatah
- Department of Nursing Affairs, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.
| | - Louise Cave
- NHS Transformation Directorate, NHS England, England, United Kingdom.
| | - Lu-Yen Anny Chen
- Institute of Clinical Nursing, College of Nursing, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| |
Collapse
|
27
|
Tong C, Du X, Chen Y, Zhang K, Shan M, Shen Z, Zhang H, Zheng J. Machine learning prediction model of major adverse outcomes after pediatric congenital heart surgery-a retrospective cohort study. Int J Surg 2024; 110:01279778-990000000-01006. [PMID: 38265429 PMCID: PMC11020051 DOI: 10.1097/js9.0000000000001112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024]
Abstract
BACKGROUND Major adverse postoperative outcomes (APOs) can greatly affect mortality, hospital stay, care management and planning, and quality of life. This study aimed to evaluate the performance of five machine learning (ML) algorithms for predicting four major APOs after pediatric congenital heart surgery and their clinically meaningful model interpretations. METHODS Between August 2014 and December 2021, 23,000 consecutive pediatric patients receiving congenital heart surgery were enrolled. Based on the split date of 1 January 2019, we selected 13,927 participants for the training cohort, and 9,073 participants for the testing cohort. Four predefined major APOs including low cardiac output syndrome (LCOS), pneumonia, renal failure, and deep venous thrombosis (DVT) were investigated. 39 clinical and laboratory features were inputted in five ML models: light gradient boosting machine (LightGBM), logistic regression (LR), support vector machine, random forest, and CatBoost. The performance and interpretations of ML models were evaluated using the area under the receiver operating characteristic curve (AUC) and Shapley Additive Explanations (SHAP). RESULTS In the training cohort, CatBoost algorithms outperformed others with the mean AUCs of 0.908 for LCOS and 0.957 for renal failure, while LightGBM and LR achieved the best mean AUCs of 0.886 for pneumonia and 0.942 for DVT, respectively. In the testing cohort, the best-performing ML model for each major APOs with the following mean AUCs: LCOS (LightGBM), 0.893 (95% confidence interval (CI), 0.884-0.895); pneumonia (LR), 0.929 (95% CI, 0.926-0.931); renal failure (LightGBM), 0.963 (95% CI, 0.947-0.979), and DVT (LightGBM), 0.970 (95% CI, 0.953-0.982). The performance of ML models using only clinical variables was slightly lower than those using combined data, with the mean AUCs of 0.873 for LCOS, 0.894 for pneumonia, 0.953 for renal failure, and 0.933 for DVT. The SHAP showed that mechanical ventilation time was the most important contributor of four major APOs. CONCLUSIONS In pediatric congenital heart surgery, the established ML model can accurately predict the risk of four major APOs, providing reliable interpretations for high-risk contributor identification and informed clinical decisions making.
Collapse
Affiliation(s)
| | - Xinwei Du
- Pediatric Thoracic and Cardiovascular Surgery, Shanghai Children’s Medical Center, School of Medicine and National Children’s Medical Center, Shanghai Jiao Tong University
| | | | | | | | - Ziyun Shen
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, People’s Republic of China
| | - Haibo Zhang
- Pediatric Thoracic and Cardiovascular Surgery, Shanghai Children’s Medical Center, School of Medicine and National Children’s Medical Center, Shanghai Jiao Tong University
| | | |
Collapse
|
28
|
Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, Harrell FE, Martin GP, Moons KGM, van Smeden M, Sperrin M, Bullock GS, Riley RD. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 2024; 384:e074819. [PMID: 38191193 PMCID: PMC10772854 DOI: 10.1136/bmj-2023-074819] [Citation(s) in RCA: 102] [Impact Index Per Article: 102.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/04/2023] [Indexed: 01/10/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Michael M Schlussel
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Lucinda Archer
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
| | - Ben Van Calster
- KU Leuven, Department of Development and Regeneration, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
- EPI-Centre, KU Leuven, Belgium
| | - Frank E Harrell
- Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
| |
Collapse
|
29
|
Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024; 165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
OBJECTIVE To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices. RESULTS We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs. CONCLUSION The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.
Collapse
Affiliation(s)
- Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Patricia Logullo
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer A de Beyer
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Michael M Schlussel
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
30
|
Le JP, Shashikumar SP, Malhotra A, Nemati S, Wardi G. Making the Improbable Possible: Generalizing Models Designed for a Syndrome-Based, Heterogeneous Patient Landscape. Crit Care Clin 2023; 39:751-768. [PMID: 37704338 PMCID: PMC10758922 DOI: 10.1016/j.ccc.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Syndromic conditions, such as sepsis, are commonly encountered in the intensive care unit. Although these conditions are easy for clinicians to grasp, these conditions may limit the performance of machine-learning algorithms. Individual hospital practice patterns may limit external generalizability. Data missingness is another barrier to optimal algorithm performance and various strategies exist to mitigate this. Recent advances in data science, such as transfer learning, conformal prediction, and continual learning, may improve generalizability of machine-learning algorithms in critically ill patients. Randomized trials with these approaches are indicated to demonstrate improvements in patient-centered outcomes at this point.
Collapse
Affiliation(s)
- Joshua Pei Le
- School of Medicine, University of Limerick, Castletroy, Co, Limerick V94 T9PX, Ireland
| | | | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Gabriel Wardi
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, USA; Department of Emergency Medicine, University of California San Diego, 200 W Arbor Drive, San Diego, CA 92103, USA.
| |
Collapse
|
31
|
Reeve K, On BI, Havla J, Burns J, Gosteli-Peter MA, Alabsawi A, Alayash Z, Götschi A, Seibold H, Mansmann U, Held U. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database Syst Rev 2023; 9:CD013606. [PMID: 37681561 PMCID: PMC10486189 DOI: 10.1002/14651858.cd013606.pub2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
BACKGROUND Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system that affects millions of people worldwide. The disease course varies greatly across individuals and many disease-modifying treatments with different safety and efficacy profiles have been developed recently. Prognostic models evaluated and shown to be valid in different settings have the potential to support people with MS and their physicians during the decision-making process for treatment or disease/life management, allow stratified and more precise interpretation of interventional trials, and provide insights into disease mechanisms. Many researchers have turned to prognostic models to help predict clinical outcomes in people with MS; however, to our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet. OBJECTIVES To identify and summarise multivariable prognostic models, and their validation studies for quantifying the risk of clinical disease progression, worsening, and activity in adults with MS. SEARCH METHODS We searched MEDLINE, Embase, and the Cochrane Database of Systematic Reviews from January 1996 until July 2021. We also screened the reference lists of included studies and relevant reviews, and references citing the included studies. SELECTION CRITERIA We included all statistically developed multivariable prognostic models aiming to predict clinical disease progression, worsening, and activity, as measured by disability, relapse, conversion to definite MS, conversion to progressive MS, or a composite of these in adult individuals with MS. We also included any studies evaluating the performance of (i.e. validating) these models. There were no restrictions based on language, data source, timing of prognostication, or timing of outcome. DATA COLLECTION AND ANALYSIS Pairs of review authors independently screened titles/abstracts and full texts, extracted data using a piloted form based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), assessed risk of bias using the Prediction Model Risk Of Bias Assessment Tool (PROBAST), and assessed reporting deficiencies based on the checklist items in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). The characteristics of the included models and their validations are described narratively. We planned to meta-analyse the discrimination and calibration of models with at least three external validations outside the model development study but no model met this criterion. We summarised between-study heterogeneity narratively but again could not perform the planned meta-regression. MAIN RESULTS We included 57 studies, from which we identified 75 model developments, 15 external validations corresponding to only 12 (16%) of the models, and six author-reported validations. Only two models were externally validated multiple times. None of the identified external validations were performed by researchers independent of those that developed the model. The outcome was related to disease progression in 39 (41%), relapses in 8 (8%), conversion to definite MS in 17 (18%), and conversion to progressive MS in 27 (28%) of the 96 models or validations. The disease and treatment-related characteristics of included participants, and definitions of considered predictors and outcome, were highly heterogeneous amongst the studies. Based on the publication year, we observed an increase in the percent of participants on treatment, diversification of the diagnostic criteria used, an increase in consideration of biomarkers or treatment as predictors, and increased use of machine learning methods over time. Usability and reproducibility All identified models contained at least one predictor requiring the skills of a medical specialist for measurement or assessment. Most of the models (44; 59%) contained predictors that require specialist equipment likely to be absent from primary care or standard hospital settings. Over half (52%) of the developed models were not accompanied by model coefficients, tools, or instructions, which hinders their application, independent validation or reproduction. The data used in model developments were made publicly available or reported to be available on request only in a few studies (two and six, respectively). Risk of bias We rated all but one of the model developments or validations as having high overall risk of bias. The main reason for this was the statistical methods used for the development or evaluation of prognostic models; we rated all but two of the included model developments or validations as having high risk of bias in the analysis domain. None of the model developments that were externally validated or these models' external validations had low risk of bias. There were concerns related to applicability of the models to our research question in over one-third (38%) of the models or their validations. Reporting deficiencies Reporting was poor overall and there was no observable increase in the quality of reporting over time. The items that were unclearly reported or not reported at all for most of the included models or validations were related to sample size justification, blinding of outcome assessors, details of the full model or how to obtain predictions from it, amount of missing data, and treatments received by the participants. Reporting of preferred model performance measures of discrimination and calibration was suboptimal. AUTHORS' CONCLUSIONS The current evidence is not sufficient for recommending the use of any of the published prognostic prediction models for people with MS in clinical routine today due to lack of independent external validations. The MS prognostic research community should adhere to the current reporting and methodological guidelines and conduct many more state-of-the-art external validation studies for the existing or newly developed models.
Collapse
Affiliation(s)
- Kelly Reeve
- Epidemiology, Biostatistics and Prevention Institute, University of Zürich, Zurich, Switzerland
| | - Begum Irmak On
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Joachim Havla
- lnstitute of Clinical Neuroimmunology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jacob Burns
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
- Pettenkofer School of Public Health, Munich, Germany
| | | | - Albraa Alabsawi
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Zoheir Alayash
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
- Institute of Health Services Research in Dentistry, University of Münster, Muenster, Germany
| | - Andrea Götschi
- Epidemiology, Biostatistics and Prevention Institute, University of Zürich, Zurich, Switzerland
| | | | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
- Pettenkofer School of Public Health, Munich, Germany
| | - Ulrike Held
- Epidemiology, Biostatistics and Prevention Institute, University of Zürich, Zurich, Switzerland
| |
Collapse
|
32
|
Vasey B, Collins GS. Invited Commentary: Transparent reporting of artificial intelligence models development and evaluation in surgery: The TRIPOD and DECIDE-AI checklists. Surgery 2023; 174:727-729. [PMID: 37244769 DOI: 10.1016/j.surg.2023.04.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 04/27/2023] [Indexed: 05/29/2023]
Affiliation(s)
- Baptiste Vasey
- Nuffield Department of Surgical Sciences, University of Oxford, UK; Department of Surgery, Geneva University Hospital, Switzerland.
| | - Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, UK. http://www.twitter.com/GSCollins
| |
Collapse
|
33
|
Dhiman P, Ma J, Qi C, Bullock G, Sergeant JC, Riley RD, Collins GS. Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review. BMC Med Res Methodol 2023; 23:188. [PMID: 37598153 PMCID: PMC10439652 DOI: 10.1186/s12874-023-02008-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/04/2023] [Indexed: 08/21/2023] Open
Abstract
BACKGROUND Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Cathy Qi
- Population Data Science, Faculty of Medicine, Health and Life Science, Swansea University Medical School, Swansea University, Singleton Park, Swansea, SA2 8PP, UK
| | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
| | - Jamie C Sergeant
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PL, UK
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PT, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, B15 2TT, Birmingham, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
34
|
Logullo P, MacCarthy A, Dhiman P, Kirtley S, Ma J, Bullock G, Collins GS. Artificial intelligence in lung cancer diagnostic imaging: a review of the reporting and conduct of research published 2018-2019. BJR Open 2023; 5:20220033. [PMID: 37389003 PMCID: PMC10301715 DOI: 10.1259/bjro.20220033] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 04/04/2023] [Accepted: 04/04/2023] [Indexed: 07/01/2023] Open
Abstract
Objective This study aimed to describe the methodologies used to develop and evaluate models that use artificial intelligence (AI) to analyse lung images in order to detect, segment (outline borders of), or classify pulmonary nodules as benign or malignant. Methods In October 2019, we systematically searched the literature for original studies published between 2018 and 2019 that described prediction models using AI to evaluate human pulmonary nodules on diagnostic chest images. Two evaluators independently extracted information from studies, such as study aims, sample size, AI type, patient characteristics, and performance. We summarised data descriptively. Results The review included 153 studies: 136 (89%) development-only studies, 12 (8%) development and validation, and 5 (3%) validation-only. CT scans were the most common type of image type used (83%), often acquired from public databases (58%). Eight studies (5%) compared model outputs with biopsy results. 41 studies (26.8%) reported patient characteristics. The models were based on different units of analysis, such as patients, images, nodules, or image slices or patches. Conclusion The methods used to develop and evaluate prediction models using AI to detect, segment, or classify pulmonary nodules in medical imaging vary, are poorly reported, and therefore difficult to evaluate. Transparent and complete reporting of methods, results and code would fill the gaps in information we observed in the study publications. Advances in knowledge We reviewed the methodology of AI models detecting nodules on lung images and found that the models were poorly reported and had no description of patient characteristics, with just a few comparing models' outputs with biopsies results. When lung biopsy is not available, lung-RADS could help standardise the comparisons between the human radiologist and the machine. The field of radiology should not give up principles from the diagnostic accuracy studies, such as the choice for the correct ground truth, just because AI is used. Clear and complete reporting of the reference standard used would help radiologists trust in the performance that AI models claim to have. This review presents clear recommendations about the essential methodological aspects of diagnostic models that should be incorporated in studies using AI to help detect or segmentate lung nodules. The manuscript also reinforces the need for more complete and transparent reporting, which can be helped using the recommended reporting guidelines.
Collapse
Affiliation(s)
| | | | | | | | | | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States
| | | |
Collapse
|
35
|
Clift AK, Dodwell D, Lord S, Petrou S, Brady M, Collins GS, Hippisley-Cox J. Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study. BMJ 2023; 381:e073800. [PMID: 37164379 PMCID: PMC10170264 DOI: 10.1136/bmj-2022-073800] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/28/2023] [Indexed: 05/12/2023]
Abstract
OBJECTIVE To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches. DESIGN Population based cohort study. SETTING QResearch primary care database in England, with individual level linkage to the national cancer registry, Hospital Episodes Statistics, and national mortality registers. PARTICIPANTS 141 765 women aged 20 years and older with a diagnosis of invasive breast cancer between 1 January 2000 and 31 December 2020. MAIN OUTCOME MEASURES Four model building strategies comprising two regression (Cox proportional hazards and competing risks regression) and two machine learning (XGBoost and an artificial neural network) approaches. Internal-external cross validation was used for model evaluation. Random effects meta-analysis that pooled estimates of discrimination and calibration metrics, calibration plots, and decision curve analysis were used to assess model performance, transportability, and clinical utility. RESULTS During a median 4.16 years (interquartile range 1.76-8.26) of follow-up, 21 688 breast cancer related deaths and 11 454 deaths from other causes occurred. Restricting to 10 years maximum follow-up from breast cancer diagnosis, 20 367 breast cancer related deaths occurred during a total of 688 564.81 person years. The crude breast cancer mortality rate was 295.79 per 10 000 person years (95% confidence interval 291.75 to 299.88). Predictors varied for each regression model, but both Cox and competing risks models included age at diagnosis, body mass index, smoking status, route to diagnosis, hormone receptor status, cancer stage, and grade of breast cancer. The Cox model's random effects meta-analysis pooled estimate for Harrell's C index was the highest of any model at 0.858 (95% confidence interval 0.853 to 0.864, and 95% prediction interval 0.843 to 0.873). It appeared acceptably calibrated on calibration plots. The competing risks regression model had good discrimination: pooled Harrell's C index 0.849 (0.839 to 0.859, and 0.821 to 0.876, and evidence of systematic miscalibration on summary metrics was lacking. The machine learning models had acceptable discrimination overall (Harrell's C index: XGBoost 0.821 (0.813 to 0.828, and 0.805 to 0.837); neural network 0.847 (0.835 to 0.858, and 0.816 to 0.878)), but had more complex patterns of miscalibration and more variable regional and stage specific performance. Decision curve analysis suggested that the Cox and competing risks regression models tested may have higher clinical utility than the two machine learning approaches. CONCLUSION In women with breast cancer of any stage, using the predictors available in this dataset, regression based methods had better and more consistent performance compared with machine learning approaches and may be worthy of further evaluation for potential clinical use, such as for stratified follow-up.
Collapse
Affiliation(s)
- Ash Kieran Clift
- Cancer Research UK Oxford Centre, Oxford, UK
- Nuffield Department of Primary Care Health Sciences, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, University of Oxford, Oxford OX2 6GG, UK
| | - David Dodwell
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Simon Lord
- Department of Oncology, University of Oxford, Oxford, UK
| | - Stavros Petrou
- Nuffield Department of Primary Care Health Sciences, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, University of Oxford, Oxford OX2 6GG, UK
| | - Michael Brady
- Department of Oncology, University of Oxford, Oxford, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Julia Hippisley-Cox
- Nuffield Department of Primary Care Health Sciences, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, University of Oxford, Oxford OX2 6GG, UK
| |
Collapse
|
36
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023; 157:120-133. [PMID: 36935090 PMCID: PMC11913775 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
OBJECTIVES In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; Meta-Research Centre, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
37
|
Truchot A, Raynaud M, Kamar N, Naesens M, Legendre C, Delahousse M, Thaunat O, Buchler M, Crespo M, Linhares K, Orandi BJ, Akalin E, Pujol GS, Silva HT, Gupta G, Segev DL, Jouven X, Bentall AJ, Stegall MD, Lefaucheur C, Aubert O, Loupy A. Machine learning does not outperform traditional statistical modelling for kidney allograft failure prediction. Kidney Int 2023; 103:936-948. [PMID: 36572246 DOI: 10.1016/j.kint.2022.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/04/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
Machine learning (ML) models have recently shown potential for predicting kidney allograft outcomes. However, their ability to outperform traditional approaches remains poorly investigated. Therefore, using large cohorts of kidney transplant recipients from 14 centers worldwide, we developed ML-based prediction models for kidney allograft survival and compared their prediction performances to those achieved by a validated Cox-Based Prognostication System (CBPS). In a French derivation cohort of 4000 patients, candidate determinants of allograft failure including donor, recipient and transplant-related parameters were used as predictors to develop tree-based models (RSF, RSF-ERT, CIF), Support Vector Machine models (LK-SVM, AK-SVM) and a gradient boosting model (XGBoost). Models were externally validated with cohorts of 2214 patients from Europe, 1537 from North America, and 671 from South America. Among these 8422 kidney transplant recipients, 1081 (12.84%) lost their grafts after a median post-transplant follow-up time of 6.25 years (Inter Quartile Range 4.33-8.73). At seven years post-risk evaluation, the ML models achieved a C-index of 0.788 (95% bootstrap percentile confidence interval 0.736-0.833), 0.779 (0.724-0.825), 0.786 (0.735-0.832), 0.527 (0.456-0.602), 0.704 (0.648-0.759) and 0.767 (0.711-0.815) for RSF, RSF-ERT, CIF, LK-SVM, AK-SVM and XGBoost respectively, compared with 0.808 (0.792-0.829) for the CBPS. In validation cohorts, ML models' discrimination performances were in a similar range of those of the CBPS. Calibrations of the ML models were similar or less accurate than those of the CBPS. Thus, when using a transparent methodological pipeline in validated international cohorts, ML models, despite overall good performances, do not outperform a traditional CBPS in predicting kidney allograft failure. Hence, our current study supports the continued use of traditional statistical approaches for kidney graft prognostication.
Collapse
Affiliation(s)
- Agathe Truchot
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France
| | - Marc Raynaud
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France
| | - Nassim Kamar
- Université Paul Sabatier, INSERM, Department of Nephrology and Organ Transplantation, CHU Rangueil and Purpan, Toulouse, France
| | - Maarten Naesens
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Leuven, Belgium
| | - Christophe Legendre
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France; Kidney Transplant Department, Necker Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Michel Delahousse
- Department of Transplantation, Nephrology and Clinical Immunology, Foch Hospital, Suresnes, France
| | - Olivier Thaunat
- Department of Transplantation, Nephrology and Clinical Immunology, Hospices Civils de Lyon, Lyon, France
| | - Matthias Buchler
- Nephrology and Immunology Department, Bretonneau Hospital, Tours, France
| | - Marta Crespo
- Department of Nephrology, Hospital del Mar Barcelona, Barcelona, Spain
| | - Kamilla Linhares
- Hospital do Rim, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Babak J Orandi
- University of Alabama at Birmingham Heersink School of Medicine, Birmingham, Alabama, USA
| | - Enver Akalin
- Renal Division, Montefiore Medical Centre, Kidney Transplantation Program, Albert Einstein College of Medicine, New York, New York, USA
| | - Gervacio Soler Pujol
- Unidad de Trasplante Renopancreas, Centro de Educacion Medica e Investigaciones Clinicas Buenos Aires, Buenos Aires, Argentina
| | - Helio Tedesco Silva
- Hospital do Rim, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Gaurav Gupta
- Division of Nephrology, Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, Virginia, USA
| | - Dorry L Segev
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Xavier Jouven
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France; Cardiology Department, European Georges Pompidou Hospital, Paris, France
| | - Andrew J Bentall
- William J von Liebig Centre for Transplantation and Clinical Regeneration, Mayo Clinic, Rochester, Minnesota, USA
| | - Mark D Stegall
- William J von Liebig Centre for Transplantation and Clinical Regeneration, Mayo Clinic, Rochester, Minnesota, USA
| | - Carmen Lefaucheur
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France; Kidney Transplant Department, Saint-Louis Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Olivier Aubert
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France; Kidney Transplant Department, Necker Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Alexandre Loupy
- Université de Paris, INSERM, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France; Kidney Transplant Department, Necker Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France.
| |
Collapse
|
38
|
Chen Z, Liu X, Yang Q, Wang YJ, Miao K, Gong Z, Yu Y, Leonov A, Liu C, Feng Z, Chuan-Peng H. Evaluation of Risk of Bias in Neuroimaging-Based Artificial Intelligence Models for Psychiatric Diagnosis: A Systematic Review. JAMA Netw Open 2023; 6:e231671. [PMID: 36877519 PMCID: PMC9989906 DOI: 10.1001/jamanetworkopen.2023.1671] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/07/2023] Open
Abstract
IMPORTANCE Neuroimaging-based artificial intelligence (AI) diagnostic models have proliferated in psychiatry. However, their clinical applicability and reporting quality (ie, feasibility) for clinical practice have not been systematically evaluated. OBJECTIVE To systematically assess the risk of bias (ROB) and reporting quality of neuroimaging-based AI models for psychiatric diagnosis. EVIDENCE REVIEW PubMed was searched for peer-reviewed, full-length articles published between January 1, 1990, and March 16, 2022. Studies aimed at developing or validating neuroimaging-based AI models for clinical diagnosis of psychiatric disorders were included. Reference lists were further searched for suitable original studies. Data extraction followed the CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. A closed-loop cross-sequential design was used for quality control. The PROBAST (Prediction Model Risk of Bias Assessment Tool) and modified CLEAR (Checklist for Evaluation of Image-Based Artificial Intelligence Reports) benchmarks were used to systematically evaluate ROB and reporting quality. FINDINGS A total of 517 studies presenting 555 AI models were included and evaluated. Of these models, 461 (83.1%; 95% CI, 80.0%-86.2%) were rated as having a high overall ROB based on the PROBAST. The ROB was particular high in the analysis domain, including inadequate sample size (398 of 555 models [71.7%; 95% CI, 68.0%-75.6%]), poor model performance examination (with 100% of models lacking calibration examination), and lack of handling data complexity (550 of 555 models [99.1%; 95% CI, 98.3%-99.9%]). None of the AI models was perceived to be applicable to clinical practices. Overall reporting completeness (ie, number of reported items/number of total items) for the AI models was 61.2% (95% CI, 60.6%-61.8%), and the completeness was poorest for the technical assessment domain with 39.9% (95% CI, 38.8%-41.1%). CONCLUSIONS AND RELEVANCE This systematic review found that the clinical applicability and feasibility of neuroimaging-based AI models for psychiatric diagnosis were challenged by a high ROB and poor reporting quality. Particularly in the analysis domain, ROB in AI diagnostic models should be addressed before clinical application.
Collapse
Affiliation(s)
- Zhiyi Chen
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Xuerong Liu
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Qingwu Yang
- Department of Neurology, Daping Hospital, Third Military Medical University, Chongqing, China
| | - Yan-Jiang Wang
- Department of Neurology, Daping Hospital, Third Military Medical University, Chongqing, China
| | - Kuan Miao
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Zheng Gong
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Yang Yu
- School of Psychology, Third Military Medical University, Chongqing, China
| | - Artemiy Leonov
- Department of Psychology, Clark University, Worcester, Massachusetts
| | - Chunlei Liu
- School of Psychology, Qufu Normal University, Qufu, China
| | - Zhengzhi Feng
- School of Psychology, Third Military Medical University, Chongqing, China
- Experimental Research Center for Medical and Psychological Science, Third Military Medical University, Chongqing, China
| | - Hu Chuan-Peng
- School of Psychology, Nanjing Normal University, Nanjing, China
| |
Collapse
|
39
|
Kantidakis G, Putter H, Litière S, Fiocco M. Statistical models versus machine learning for competing risks: development and validation of prognostic models. BMC Med Res Methodol 2023; 23:51. [PMID: 36829145 PMCID: PMC9951458 DOI: 10.1186/s12874-023-01866-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting). METHODS A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years. RESULTS Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated. CONCLUSIONS Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
Collapse
Affiliation(s)
- Georgios Kantidakis
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. .,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands. .,Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium.
| | - Hein Putter
- Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Saskia Litière
- Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium
| | - Marta Fiocco
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Trial and Data Center, Princess Máxima Center for pediatric oncology (PMC), Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
40
|
Chen F, Kantagowit P, Nopsopon T, Chuklin A, Pongpirul K. Prediction and diagnosis of chronic kidney disease development and progression using machine-learning: Protocol for a systematic review and meta-analysis of reporting standards and model performance. PLoS One 2023; 18:e0278729. [PMID: 36821539 PMCID: PMC9949618 DOI: 10.1371/journal.pone.0278729] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 02/06/2023] [Indexed: 02/24/2023] Open
Abstract
Chronic Kidney disease (CKD) is an important yet under-recognized contributor to morbidity and mortality globally. Machine-learning (ML) based decision support tools have been developed across many aspects of CKD care. Notably, algorithms developed in the prediction and diagnosis of CKD development and progression may help to facilitate early disease prevention, assist with early planning of renal replacement therapy, and offer potential clinical and economic benefits to patients and health systems. Clinical implementation can be affected by the uncertainty surrounding the methodological rigor and performance of ML-based models. This systematic review aims to evaluate the application of prognostic and diagnostic ML tools in CKD development and progression. The protocol has been prepared using the Preferred Items for Systematic Review and Meta-analysis Protocols (PRISMA-P) guidelines. The systematic review protocol for CKD prediction and diagnosis have been registered with the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42022356704, CRD42022372378). A systematic search will be undertaken of PubMed, Embase, the Cochrane Central Register of Controlled Trials (CENTRAL), the Web of Science, and the IEEE Xplore digital library. Studies in which ML has been applied to predict and diagnose CKD development and progression will be included. The primary outcome will be the comparison of the performance of ML-based models with non-ML-based models. Secondary analysis will consist of model use cases, model construct, and model reporting quality. This systematic review will offer valuable insight into the performance and reporting quality of ML-based models in CKD diagnosis and prediction. This will inform clinicians and technical specialists of the current development of ML in CKD care, as well as direct future model development and standardization.
Collapse
Affiliation(s)
- Fangyue Chen
- Global Health Partnerships, Health Education England, London, United Kingdom
- School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom
- Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | | | - Tanawin Nopsopon
- Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Arisa Chuklin
- Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Krit Pongpirul
- Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
- Clinical Research Center, Bumrungrad International Hospital, Bangkok, Thailand
| |
Collapse
|
41
|
Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol 2023; 154:8-22. [PMID: 36436815 DOI: 10.1016/j.jclinepi.2022.11.015] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/09/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
BACKGROUND AND OBJECTIVES We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. METHODS We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]). CONCLUSION Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION PROSPERO, CRD42019161764.
Collapse
Affiliation(s)
- Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Toshihiko Takada
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Steven W J Nijman
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Paula Dhiman
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Jie Ma
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Gary S Collins
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Ram Bajpai
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
42
|
Binuya MAE, Engelhardt EG, Schats W, Schmidt MK, Steyerberg EW. Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review. BMC Med Res Methodol 2022; 22:316. [PMID: 36510134 PMCID: PMC9742671 DOI: 10.1186/s12874-022-01801-8] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 11/22/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Clinical prediction models are often not evaluated properly in specific settings or updated, for instance, with information from new markers. These key steps are needed such that models are fit for purpose and remain relevant in the long-term. We aimed to present an overview of methodological guidance for the evaluation (i.e., validation and impact assessment) and updating of clinical prediction models. METHODS We systematically searched nine databases from January 2000 to January 2022 for articles in English with methodological recommendations for the post-derivation stages of interest. Qualitative analysis was used to summarize the 70 selected guidance papers. RESULTS Key aspects for validation are the assessment of statistical performance using measures for discrimination (e.g., C-statistic) and calibration (e.g., calibration-in-the-large and calibration slope). For assessing impact or usefulness in clinical decision-making, recent papers advise using decision-analytic measures (e.g., the Net Benefit) over simplistic classification measures that ignore clinical consequences (e.g., accuracy, overall Net Reclassification Index). Commonly recommended methods for model updating are recalibration (i.e., adjustment of intercept or baseline hazard and/or slope), revision (i.e., re-estimation of individual predictor effects), and extension (i.e., addition of new markers). Additional methodological guidance is needed for newer types of updating (e.g., meta-model and dynamic updating) and machine learning-based models. CONCLUSION Substantial guidance was found for model evaluation and more conventional updating of regression-based models. An important development in model evaluation is the introduction of a decision-analytic framework for assessing clinical usefulness. Consensus is emerging on methods for model updating.
Collapse
Affiliation(s)
- M. A. E. Binuya
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.10419.3d0000000089452978Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands ,grid.10419.3d0000000089452978Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - E. G. Engelhardt
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.430814.a0000 0001 0674 1393Division of Psychosocial Research and Epidemiology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
| | - W. Schats
- grid.430814.a0000 0001 0674 1393Scientific Information Service, The Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
| | - M. K. Schmidt
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.10419.3d0000000089452978Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - E. W. Steyerberg
- grid.10419.3d0000000089452978Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
43
|
Yang Q, Fan X, Cao X, Hao W, Lu J, Wei J, Tian J, Yin M, Ge L. Reporting and risk of bias of prediction models based on machine learning methods in preterm birth: A systematic review. Acta Obstet Gynecol Scand 2022; 102:7-14. [PMID: 36397723 PMCID: PMC9780725 DOI: 10.1111/aogs.14475] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/27/2022] [Accepted: 10/04/2022] [Indexed: 11/19/2022]
Abstract
INTRODUCTION There was limited evidence on the quality of reporting and methodological quality of prediction models using machine learning methods in preterm birth. This systematic review aimed to assess the reporting quality and risk of bias of a machine learning-based prediction model in preterm birth. MATERIAL AND METHODS We conducted a systematic review, searching the PubMed, Embase, the Cochrane Library, China National Knowledge Infrastructure, China Biology Medicine disk, VIP Database, and WanFang Data from inception to September 27, 2021. Studies that developed (validated) a prediction model using machine learning methods in preterm birth were included. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement and Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the reporting quality and the risk of bias of included studies, respectively. Findings were summarized using descriptive statistics and visual plots. The protocol was registered in PROSPERO (no. CRD 42022301623). RESULTS Twenty-nine studies met the inclusion criteria, with 24 development-only studies and 5 development-with-validation studies. Overall, TRIPOD adherence per study ranged from 17% to 79%, with a median adherence of 49%. The reporting of title, abstract, blinding of predictors, sample size justification, explanation of model, and model performance were mostly poor, with TRIPOD adherence ranging from 4% to 17%. For all included studies, 79% had a high overall risk of bias, and 21% had an unclear overall risk of bias. The analysis domain was most commonly rated as high risk of bias in included studies, mainly as a result of small effective sample size, selection of predictors based on univariable analysis, and lack of calibration evaluation. CONCLUSIONS Reporting and methodological quality of machine learning-based prediction models in preterm birth were poor. It is urgent to improve the design, conduct, and reporting of such studies to boost the application of machine learning-based prediction models in preterm birth in clinical practice.
Collapse
Affiliation(s)
- Qiuyu Yang
- Evidence‐Based Nursing Center, School of NursingLanzhou UniversityLanzhouChina
| | - Xia Fan
- Department of Obstetrics and Gynecology, The Second School of Clinical MedicineShanxi University of Chinese MedicineShanxiChina
| | - Xiao Cao
- Evidence‐Based Nursing Center, School of NursingLanzhou UniversityLanzhouChina
| | - Weijie Hao
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jiale Lu
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jia Wei
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jinhui Tian
- Key Laboratory of Evidence Based Medicine and Knowledge Translation of Gansu ProvinceLanzhouChina,Evidence‐Based Medicine Center, School of Basic Medicine ScienceLanzhou UniversityLanzhouChina
| | - Min Yin
- Health Examination CenterThe First Hospital of Lanzhou UniversityLanzhouChina
| | - Long Ge
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina,Department of Social Medicine and Health Management, and Evidence Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| |
Collapse
|
44
|
Halilaj I, Oberije C, Chatterjee A, van Wijk Y, Rad NM, Galganebanduge P, Lavrova E, Primakov S, Widaatalla Y, Wind A, Lambin P. Open Source Repository and Online Calculator of Prediction Models for Diagnosis and Prognosis in Oncology. Biomedicines 2022; 10:2679. [PMID: 36359199 PMCID: PMC9687260 DOI: 10.3390/biomedicines10112679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 10/18/2022] [Accepted: 10/20/2022] [Indexed: 04/05/2025] Open
Abstract
(1) Background: The main aim was to develop a prototype application that would serve as an open-source repository for a curated subset of predictive and prognostic models regarding oncology, and provide a user-friendly interface for the included models to allow online calculation. The focus of the application is on providing physicians and health professionals with patient-specific information regarding treatment plans, survival rates, and side effects for different expected treatments. (2) Methods: The primarily used models were the ones developed by our research group in the past. This selection was completed by a number of models, addressing the same cancer types but focusing on other outcomes that were selected based on a literature search in PubMed and Medline databases. All selected models were publicly available and had been validated TRIPOD (Transparent Reporting of studies on prediction models for Individual Prognosis Or Diagnosis) type 3 or 2b. (3) Results: The open source repository currently incorporates 18 models from different research groups, evaluated on datasets from different countries. Model types included logistic regression, Cox regression, and recursive partition analysis (decision trees). (4) Conclusions: An application was developed to enable physicians to complement their clinical judgment with user-friendly patient-specific predictions using models that have received internal/external validation. Additionally, this platform enables researchers to display their work, enhancing the use and exposure of their models.
Collapse
Affiliation(s)
- Iva Halilaj
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
- Health Innovation Ventures, 6229 EV Maastricht, The Netherlands
| | - Cary Oberije
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Avishek Chatterjee
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Yvonka van Wijk
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Nastaran Mohammadian Rad
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Prabash Galganebanduge
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Elizaveta Lavrova
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Sergey Primakov
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Yousif Widaatalla
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Anke Wind
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| | - Philippe Lambin
- The D-Lab, Department of Precision Medicine, GROW-School for Oncology, Maastricht University, 6211 LK Maastricht, The Netherlands
| |
Collapse
|
45
|
Bullock GS, Mylott J, Hughes T, Nicholson KF, Riley RD, Collins GS. Just How Confident Can We Be in Predicting Sports Injuries? A Systematic Review of the Methodological Conduct and Performance of Existing Musculoskeletal Injury Prediction Models in Sport. Sports Med 2022; 52:2469-2482. [PMID: 35689749 DOI: 10.1007/s40279-022-01698-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND An increasing number of musculoskeletal injury prediction models are being developed and implemented in sports medicine. Prediction model quality needs to be evaluated so clinicians can be informed of their potential usefulness. OBJECTIVE To evaluate the methodological conduct and completeness of reporting of musculoskeletal injury prediction models in sport. METHODS A systematic review was performed from inception to June 2021. Studies were included if they: (1) predicted sport injury; (2) used regression, machine learning, or deep learning models; (3) were written in English; (4) were peer reviewed. RESULTS Thirty studies (204 models) were included; 60% of studies utilized only regression methods, 13% only machine learning, and 27% both regression and machine learning approaches. All studies developed a prediction model and no studies externally validated a prediction model. Two percent of models (7% of studies) were low risk of bias and 98% of models (93% of studies) were high or unclear risk of bias. Three studies (10%) performed an a priori sample size calculation; 14 (47%) performed internal validation. Nineteen studies (63%) reported discrimination and two (7%) reported calibration. Four studies (13%) reported model equations for statistical predictions and no machine learning studies reported code or hyperparameters. CONCLUSION Existing sport musculoskeletal injury prediction models were poorly developed and have a high risk of bias. No models could be recommended for use in practice. The majority of models were developed with small sample sizes, had inadequate assessment of model performance, and were poorly reported. To create clinically useful sports musculoskeletal injury prediction models, considerable improvements in methodology and reporting are urgently required.
Collapse
Affiliation(s)
- Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, 475 Vine St, Bowman Gray Medical Building, Winston-Salem, NC, 27101, USA.
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK.
| | - Joseph Mylott
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, 475 Vine St, Bowman Gray Medical Building, Winston-Salem, NC, 27101, USA
- Baltimore Orioles Baseball Club, Baltimore, USA
| | - Tom Hughes
- Manchester United Football Club, Manchester, UK
- Department of Health Professions, Manchester Metropolitan University, Manchester, UK
| | - Kristen F Nicholson
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, 475 Vine St, Bowman Gray Medical Building, Winston-Salem, NC, 27101, USA
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
46
|
Neural Networks for Survival Prediction in Medicine Using Prognostic Factors: A Review and Critical Appraisal. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1176060. [PMID: 36238497 PMCID: PMC9553343 DOI: 10.1155/2022/1176060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 08/26/2022] [Accepted: 09/13/2022] [Indexed: 11/17/2022]
Abstract
Survival analysis deals with the expected duration of time until one or more events of interest occur. Time to the event of interest may be unobserved, a phenomenon commonly known as right censoring, which renders the analysis of these data challenging. Over the years, machine learning algorithms have been developed and adapted to right-censored data. Neural networks have been repeatedly employed to build clinical prediction models in healthcare with a focus on cancer and cardiology. We present the first ever attempt at a large-scale review of survival neural networks (SNNs) with prognostic factors for clinical prediction in medicine. This work provides a comprehensive understanding of the literature (24 studies from 1990 to August 2021, global search in PubMed). Relevant manuscripts are classified as methodological/technical (novel methodology or new theoretical model; 13 studies) or applications (11 studies). We investigate how researchers have used neural networks to fit survival data for prediction. There are two methodological trends: either time is added as part of the input features and a single output node is specified, or multiple output nodes are defined for each time interval. A critical appraisal of model aspects that should be designed and reported more carefully is performed. We identify key characteristics of prediction models (i.e., number of patients/predictors, evaluation measures, calibration), and compare ANN's predictive performance to the Cox proportional hazards model. The median sample size is 920 patients, and the median number of predictors is 7. Major findings include poor reporting (e.g., regarding missing data, hyperparameters) as well as inaccurate model development/validation. Calibration is neglected in more than half of the studies. Cox models are not developed to their full potential and claims for the performance of SNNs are exaggerated. Light is shed on the current state of art of SNNs in medicine with prognostic factors. Recommendations are made for the reporting of clinical prediction models. Limitations are discussed, and future directions are proposed for researchers who seek to develop existing methodology.
Collapse
|
47
|
Integration of Machine Learning Algorithms and Discrete-Event Simulation for the Cost of Healthcare Resources. Healthcare (Basel) 2022; 10:healthcare10101920. [PMID: 36292372 PMCID: PMC9601943 DOI: 10.3390/healthcare10101920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 12/23/2022] Open
Abstract
A healthcare resource allocation generally plays a vital role in the number of patients treated (pnt) and the patient waiting time (wt) in healthcare institutions. This study aimed to estimate pnt and wt as output variables by considering the number of healthcare resources employed and analyze the cost of health resources to the hospital depending on the cost coefficient (δi) in an emergency department (ED). The integration of the discrete-event simulation (DES) model and machine learning (ML) algorithms, namely random forest (RF), gradient boosting (GB), and AdaBoost (AB), was used to calculate the estimation of the output variables depending on the δi of resources cost. The AB algorithm performed best in almost all scenarios based on the results of the analysis. According to the AB algorithm based on the δ0.0, δ0.1, δ0.2, and δ0.3, the accuracy data were calculated as 0.9838, 0.9843, 0.9838, and 0.9846 for pnt; 0.9514, 0.9517, 0.9514, and 0.9514 for wt, respectively in the training stage. The GB algorithm had the best performance value, except for the results of the δ0.2 (AB had a better accuracy at 0.8709 based on the value of δ0.2 for pnt) in the test stage. According to the AB algorithm based on the δ0.0, δ0.1, δ0.2, and δ0.3, the accuracy data were calculated as 0.7956, 0.9298, 0.8288, and 0.7394 for pnt; 0.8820, 0.8821, 0.8819, and 0.8818 for wt in the training phase, respectively. All scenarios created by the δi coefficient should be preferred for ED since the income provided by the pnt value to the hospital was more than the cost of healthcare resources. On the contrary, the wt estimation results of ML algorithms based on the δi coefficient differed. Although wt values in all ML algorithms with δ0.0 and δ0.1 coefficients reduced the cost of the hospital, wt values based on δ0.2 and δ0.3 increased the cost of the hospital.
Collapse
|
48
|
Wilson D, Tweedie F, Rumball-Smith J, Ross K, Kazemi A, Galvin V, Dobbie G, Dare T, Brown P, Blakey J. Lessons learned from developing a COVID-19 algorithm governance framework in Aotearoa New Zealand. J R Soc N Z 2022; 53:82-94. [PMID: 39439990 PMCID: PMC11459790 DOI: 10.1080/03036758.2022.2121290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 08/20/2022] [Indexed: 10/14/2022]
Abstract
Aotearoa New Zealand's response to the COVID-19 pandemic has included the use of algorithms that could aid decision making. Te Pokapū Hātepe o Aotearoa, the New Zealand Algorithm Hub, was established to evaluate and host COVID-19 related models and algorithms, and provide a central and secure infrastructure to support the country's pandemic response. A critical aspect of the Hub was the formation of an appropriate governance group to ensure that algorithms being deployed underwent cross-disciplinary scrutiny prior to being made available for quick and safe implementation. This framework necessarily canvassed a broad range of perspectives, including from data science, clinical, Māori, consumer, ethical, public health, privacy, legal and governmental perspectives. To our knowledge, this is the first implementation of national algorithm governance of this type, building upon broad local and global discussion of guidelines in recent years. This paper describes the experiences and lessons learned through this process from the perspective of governance group members, emphasising the role of robust governance processes in building a high-trust platform that enables rapid translation of algorithms from research to practice.
Collapse
Affiliation(s)
- Daniel Wilson
- School of Computer Science, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | | | | | - Kevin Ross
- Precision Driven Health, Auckland, New Zealand
| | - Alex Kazemi
- Critical Care Complex, Middlemore Hospital, Auckland, New Zealand
| | | | - Gillian Dobbie
- School of Computer Science, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Tim Dare
- Department of Philosophy, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | | | - Judy Blakey
- Precision Driven Health Independent Advisory Group, Auckland, New Zealand
| |
Collapse
|
49
|
Parr H, Hall E, Porta N. Joint models for dynamic prediction in localised prostate cancer: a literature review. BMC Med Res Methodol 2022; 22:245. [PMID: 36123621 PMCID: PMC9487103 DOI: 10.1186/s12874-022-01709-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prostate cancer is a very prevalent disease in men. Patients are monitored regularly during and after treatment with repeated assessment of prostate-specific antigen (PSA) levels. Prognosis of localised prostate cancer is generally good after treatment, and the risk of having a recurrence is usually estimated based on factors measured at diagnosis. Incorporating PSA measurements over time in a dynamic prediction joint model enables updates of patients' risk as new information becomes available. We review joint model strategies that have been applied to model time-dependent PSA trajectories to predict time-to-event outcomes in localised prostate cancer. METHODS We identify articles that developed joint models for prediction of localised prostate cancer recurrence over the last two decades. We report, compare, and summarise the methodological approaches and applications that use joint modelling accounting for two processes: the longitudinal model (PSA), and the time-to-event process (clinical failure). The methods explored differ in how they specify the association between these two processes. RESULTS Twelve relevant articles were identified. A range of methodological frameworks were found, and we describe in detail shared-parameter joint models (9 of 12, 75%) and joint latent class models (3 of 12, 25%). Within each framework, these articles presented model development, estimation of dynamic predictions and model validations. CONCLUSIONS Each framework has its unique principles with corresponding advantages and differing interpretations. Regardless of the framework used, dynamic prediction models enable real-time prediction of individual patient prognosis. They utilise all available longitudinal information, in addition to baseline prognostic risk factors, and are superior to traditional baseline-only prediction models.
Collapse
Affiliation(s)
- Harry Parr
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Emma Hall
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Nuria Porta
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| |
Collapse
|
50
|
van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, de Jaegere P, Moore JH, Denaxas S, Boulesteix AL, Moons KGM. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J 2022; 43:2921-2930. [PMID: 35639667 PMCID: PMC9443991 DOI: 10.1093/eurheartj/ehac238] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/29/2022] [Accepted: 04/26/2022] [Indexed: 11/12/2022] Open
Abstract
The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.
Collapse
Affiliation(s)
- Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
| | - Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- EPI Centre, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
| | - Folkert W Asselbergs
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, UK
- Health Data Research UK and Institute of Health Informatics, University College London, London, UK
| | - Panos E Vardas
- Department of Cardiology, Heraklion University Hospital, Heraklion, Greece
- Heart Sector, Hygeia Hospitals Group, Athens, Greece
| | - Nico Bruining
- Department of Cardiology, Erasmus MC , Thorax Center, Rotterdam, The Netherlands
| | - Peter de Jaegere
- Department of Cardiology, Erasmus MC, Thorax Center, Rotterdam, The Netherlands
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Spiros Denaxas
- Health Data Research UK and Institute of Health Informatics, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Anne Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Germany
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
| |
Collapse
|