Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Janisch M, Apfaltrer G, Hržić F, Castellani C, Mittl B, Singer G, Lindbichler F, Pilhatsch A, Sorantin E, Tschauner S. Pediatric radius torus fractures in x-rays-how computer vision could render lateral projections obsolete. Front Pediatr 2022;10:1005099. [PMID: 36589159 PMCID: PMC9794847 DOI: 10.3389/fped.2022.1005099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open

For:	Janisch M, Apfaltrer G, Hržić F, Castellani C, Mittl B, Singer G, Lindbichler F, Pilhatsch A, Sorantin E, Tschauner S. Pediatric radius torus fractures in x-rays-how computer vision could render lateral projections obsolete. Front Pediatr 2022;10:1005099. [PMID: 36589159 PMCID: PMC9794847 DOI: 10.3389/fped.2022.1005099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open

Number

Cited by Other Article(s)

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S. Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays. Eur Radiol 2025:10.1007/s00330-025-11669-z. [PMID: 40379941 DOI: 10.1007/s00330-025-11669-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 02/24/2025] [Accepted: 04/14/2025] [Indexed: 05/19/2025]

Abstract

OBJECTIVES

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design.

MATERIALS AND METHODS

This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests.

RESULTS

Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested.

CONCLUSION

AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings.

KEY POINTS

Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

Collapse

Suen K, Zhang R, Kutaiba N. Accuracy of wrist fracture detection on radiographs by artificial intelligence compared to human clinicians. A systematic review and meta-analysis. Eur J Radiol 2024;178:111593. [PMID: 38981178 DOI: 10.1016/j.ejrad.2024.111593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/23/2024] [Accepted: 06/28/2024] [Indexed: 07/11/2024]

Abstract

PURPOSE

The aim of the study is to perform a systematic review and meta-analysis comparing the diagnostic performance of artificial intelligence (AI) and human readers in the detection of wrist fractures.

METHOD

This study conducted a systematic review following PRISMA guidelines. Medline and Embase databases were searched for relevant articles published up to August 14, 2023. All included studies reported the diagnostic performance of AI to detect wrist fractures, with or without comparison to human readers. A meta-analysis was performed to calculate the pooled sensitivity and specificity of AI and human experts in detecting distal radius, and scaphoid fractures respectively.

RESULTS

Of 213 identified records, 20 studies were included after abstract screening and full-text review. Nine articles examined distal radius fractures, while eight studies examined scaphoid fractures. One study included distal radius and scaphoid fractures, and two studies examined paediatric distal radius fractures. The pooled sensitivity and specificity for AI in detecting distal radius fractures were 0.92 (95% CI 0.88-0.95) and 0.89 (0.84-0.92), respectively. The corresponding values for human readers were 0.95 (0.91-0.97) and 0.94 (0.91-0.96). For scaphoid fractures, pooled sensitivity and specificity for AI were 0.85 (0.73-0.92) and 0.83 (0.76-0.89), while human experts exhibited 0.71 (0.66-0.76) and 0.93 (0.90-0.95), respectively.

CONCLUSION

The results indicate comparable diagnostic accuracy between AI and human readers, especially for distal radius fractures. For the detection of scaphoid fractures, the human readers were similarly sensitive but more specific. These findings underscore the potential of AI to enhance fracture detection accuracy and improve clinical workflow, rather than to replace human intelligence.

Collapse

Nowroozi A, Salehi MA, Shobeiri P, Agahi S, Momtazmanesh S, Kaviani P, Kalra MK. Artificial intelligence diagnostic accuracy in fracture detection from plain radiographs and comparing it with clinicians: a systematic review and meta-analysis. Clin Radiol 2024;79:579-588. [PMID: 38772766 DOI: 10.1016/j.crad.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/09/2024] [Accepted: 04/15/2024] [Indexed: 05/23/2024]

Oeding JF, Kunze KN, Messer CJ, Pareek A, Fufa DT, Pulos N, Rhee PC. Diagnostic Performance of Artificial Intelligence for Detection of Scaphoid and Distal Radius Fractures: A Systematic Review. J Hand Surg Am 2024;49:411-422. [PMID: 38551529 DOI: 10.1016/j.jhsa.2024.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 01/19/2024] [Accepted: 01/31/2024] [Indexed: 05/05/2024]

Abstract

PURPOSE

To review the existing literature to (1) determine the diagnostic efficacy of artificial intelligence (AI) models for detecting scaphoid and distal radius fractures and (2) compare the efficacy to human clinical experts.

METHODS

PubMed, OVID/Medline, and Cochrane libraries were queried for studies investigating the development, validation, and analysis of AI for the detection of scaphoid or distal radius fractures. Data regarding study design, AI model development and architecture, prediction accuracy/area under the receiver operator characteristic curve (AUROC), and imaging modalities were recorded.

RESULTS

A total of 21 studies were identified, of which 12 (57.1%) used AI to detect fractures of the distal radius, and nine (42.9%) used AI to detect fractures of the scaphoid. AI models demonstrated good diagnostic performance on average, with AUROC values ranging from 0.77 to 0.96 for scaphoid fractures and from 0.90 to 0.99 for distal radius fractures. Accuracy of AI models ranged between 72.0% to 90.3% and 89.0% to 98.0% for scaphoid and distal radius fractures, respectively. When compared to clinical experts, 13 of 14 (92.9%) studies reported that AI models demonstrated comparable or better performance. The type of fracture influenced model performance, with worse overall performance on occult scaphoid fractures; however, models trained specifically on occult fractures demonstrated substantially improved performance when compared to humans.

CONCLUSIONS

AI models demonstrated excellent performance for detecting scaphoid and distal radius fractures, with the majority demonstrating comparable or better performance compared with human experts. Worse performance was demonstrated on occult fractures. However, when trained specifically on difficult fracture patterns, AI models demonstrated improved performance.

CLINICAL RELEVANCE

AI models can help detect commonly missed occult fractures while enhancing workflow efficiency for distal radius and scaphoid fracture diagnoses. As performance varies based on fracture type, future studies focused on wrist fracture detection should clearly define whether the goal is to (1) identify difficult-to-detect fractures or (2) improve workflow efficiency by assisting in routine tasks.

Collapse

Till T, Tschauner S, Singer G, Lichtenegger K, Till H. Development and optimization of AI algorithms for wrist fracture detection in children using a freely available dataset. Front Pediatr 2023;11:1291804. [PMID: 38188914 PMCID: PMC10768054 DOI: 10.3389/fped.2023.1291804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 12/05/2023] [Indexed: 01/09/2024] Open

Abstract

Introduction

In the field of pediatric trauma computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems have emerged offering a promising avenue for improved patient care. Especially children with wrist fractures may benefit from machine learning (ML) solutions, since some of these lesions may be overlooked on conventional X-ray due to minimal compression without dislocation or mistaken for cartilaginous growth plates. In this article, we describe the development and optimization of AI algorithms for wrist fracture detection in children.

Methods

A team of IT-specialists, pediatric radiologists and pediatric surgeons used the freely available GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, a total number of 10,643 studies (20,327 images). First, a basic object detection model, a You Only Look Once object detector of the seventh generation (YOLOv7) was trained and tested on these data. Then, team decisions were taken to adjust data preparation, image sizes used for training and testing, and configuration of the detection model. Furthermore, we investigated each of these models using an Explainable Artificial Intelligence (XAI) method called Gradient Class Activation Mapping (Grad-CAM). This method visualizes where a model directs its attention to before classifying and regressing a certain class through saliency maps.

Results

Mean average precision (mAP) improved when applying optimizations pre-processing the dataset images (maximum increases of + 25.51% mAP@0.5 and + 39.78% mAP@[0.5:0.95]), as well as the object detection model itself (maximum increases of + 13.36% mAP@0.5 and + 27.01% mAP@[0.5:0.95]). Generally, when analyzing the resulting models using XAI methods, higher scoring model variations in terms of mAP paid more attention to broader regions of the image, prioritizing detection accuracy over precision compared to the less accurate models.

Discussion

This paper supports the implementation of ML solutions for pediatric trauma care. Optimization of a large X-ray dataset and the YOLOv7 model improve the model's ability to detect objects and provide valid diagnostic support to health care specialists. Such optimization protocols must be understood and advocated, before comparing ML performances against health care specialists.

Collapse