1
|
Polis B, Zawadzka-Fabijan A, Fabijan R, Kosińska R, Nowosławska E, Fabijan A. Comparative Evaluation of Large Language and Multimodal Models in Detecting Spinal Stabilization Systems on X-Ray Images. J Clin Med 2025; 14:3282. [PMID: 40429276 PMCID: PMC12112668 DOI: 10.3390/jcm14103282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 04/25/2025] [Accepted: 05/06/2025] [Indexed: 05/29/2025] Open
Abstract
Background/Objectives: Open-source AI models are increasingly applied in medical imaging, yet their effectiveness in detecting and classifying spinal stabilization systems remains underexplored. This study compares ChatGPT-4o (a large language model) and BiomedCLIP (a multimodal model) in their analysis of posturographic X-ray images (AP projection) to assess their accuracy in identifying the presence, type (growing vs. non-growing), and specific system (MCGR vs. PSF). Methods: A dataset of 270 X-ray images (93 without stabilization, 80 with MCGR, and 97 with PSF) was analyzed manually by neurosurgeons and evaluated using a three-stage AI-based questioning approach. Performance was assessed via classification accuracy, Gwet's Agreement Coefficient (AC1) for inter-rater reliability, and a two-tailed z-test for statistical significance (p < 0.05). Results: The results indicate that GPT-4o demonstrates high accuracy in detecting spinal stabilization systems, achieving near-perfect recognition (97-100%) for the presence or absence of stabilization. However, its consistency is reduced when distinguishing complex growing-rod (MCGR) configurations, with agreement scores dropping significantly (AC1 = 0.32-0.50). In contrast, BiomedCLIP displays greater response consistency (AC1 = 1.00) but struggles with detailed classification, particularly in recognizing PSF (11% accuracy) and MCGR (4.16% accuracy). Sensitivity analysis revealed GPT-4o's superior stability in hierarchical classification tasks, while BiomedCLIP excelled in binary detection but showed performance deterioration as the classification complexity increased. Conclusions: These findings highlight GPT-4o's robustness in clinical AI-assisted diagnostics, particularly for detailed differentiation of spinal stabilization systems, whereas BiomedCLIP's precision may require further optimization to enhance its applicability in complex radiographic evaluations.
Collapse
Affiliation(s)
- Bartosz Polis
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (R.K.); (E.N.)
| | - Agnieszka Zawadzka-Fabijan
- Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland;
| | | | - Róża Kosińska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (R.K.); (E.N.)
| | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (R.K.); (E.N.)
| | - Artur Fabijan
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (R.K.); (E.N.)
| |
Collapse
|
2
|
Fabijan A, Zawadzka-Fabijan A, Fabijan R, Zakrzewski K, Nowosławska E, Polis B. Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches. J Clin Med 2024; 13:4013. [PMID: 39064053 PMCID: PMC11278075 DOI: 10.3390/jcm13144013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 06/30/2024] [Accepted: 07/06/2024] [Indexed: 07/28/2024] Open
Abstract
Background: Open-source artificial intelligence models (OSAIMs) are increasingly being applied in various fields, including IT and medicine, offering promising solutions for diagnostic and therapeutic interventions. In response to the growing interest in AI for clinical diagnostics, we evaluated several OSAIMs-such as ChatGPT 4, Microsoft Copilot, Gemini, PopAi, You Chat, Claude, and the specialized PMC-LLaMA 13B-assessing their abilities to classify scoliosis severity and recommend treatments based on radiological descriptions from AP radiographs. Methods: Our study employed a two-stage methodology, where descriptions of single-curve scoliosis were analyzed by AI models following their evaluation by two independent neurosurgeons. Statistical analysis involved the Shapiro-Wilk test for normality, with non-normal distributions described using medians and interquartile ranges. Inter-rater reliability was assessed using Fleiss' kappa, and performance metrics, like accuracy, sensitivity, specificity, and F1 scores, were used to evaluate the AI systems' classification accuracy. Results: The analysis indicated that although some AI systems, like ChatGPT 4, Copilot, and PopAi, accurately reflected the recommended Cobb angle ranges for disease severity and treatment, others, such as Gemini and Claude, required further calibration. Particularly, PMC-LLaMA 13B expanded the classification range for moderate scoliosis, potentially influencing clinical decisions and delaying interventions. Conclusions: These findings highlight the need for the continuous refinement of AI models to enhance their clinical applicability.
Collapse
Affiliation(s)
- Artur Fabijan
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Agnieszka Zawadzka-Fabijan
- Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland;
| | | | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Bartosz Polis
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| |
Collapse
|
3
|
Fabijan A, Zawadzka-Fabijan A, Fabijan R, Zakrzewski K, Nowosławska E, Polis B. Artificial Intelligence in Medical Imaging: Analyzing the Performance of ChatGPT and Microsoft Bing in Scoliosis Detection and Cobb Angle Assessment. Diagnostics (Basel) 2024; 14:773. [PMID: 38611686 PMCID: PMC11011528 DOI: 10.3390/diagnostics14070773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/24/2024] [Accepted: 04/04/2024] [Indexed: 04/14/2024] Open
Abstract
Open-source artificial intelligence models (OSAIM) find free applications in various industries, including information technology and medicine. Their clinical potential, especially in supporting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in artificial intelligence (AI) for diagnostic purposes, we conducted a study evaluating the capabilities of AI models, including ChatGPT and Microsoft Bing, in the diagnosis of single-curve scoliosis based on posturographic radiological images. Two independent neurosurgeons assessed the degree of spinal deformation, selecting 23 cases of severe single-curve scoliosis. Each posturographic image was separately implemented onto each of the mentioned platforms using a set of formulated questions, starting from 'What do you see in the image?' and ending with a request to determine the Cobb angle. In the responses, we focused on how these AI models identify and interpret spinal deformations and how accurately they recognize the direction and type of scoliosis as well as vertebral rotation. The Intraclass Correlation Coefficient (ICC) with a 'two-way' model was used to assess the consistency of Cobb angle measurements, and its confidence intervals were determined using the F test. Differences in Cobb angle measurements between human assessments and the AI ChatGPT model were analyzed using metrics such as RMSEA, MSE, MPE, MAE, RMSLE, and MAPE, allowing for a comprehensive assessment of AI model performance from various statistical perspectives. The ChatGPT model achieved 100% effectiveness in detecting scoliosis in X-ray images, while the Bing model did not detect any scoliosis. However, ChatGPT had limited effectiveness (43.5%) in assessing Cobb angles, showing significant inaccuracy and discrepancy compared to human assessments. This model also had limited accuracy in determining the direction of spinal curvature, classifying the type of scoliosis, and detecting vertebral rotation. Overall, although ChatGPT demonstrated potential in detecting scoliosis, its abilities in assessing Cobb angles and other parameters were limited and inconsistent with expert assessments. These results underscore the need for comprehensive improvement of AI algorithms, including broader training with diverse X-ray images and advanced image processing techniques, before they can be considered as auxiliary in diagnosing scoliosis by specialists.
Collapse
Affiliation(s)
- Artur Fabijan
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Agnieszka Zawadzka-Fabijan
- Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland;
| | | | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| | - Bartosz Polis
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (K.Z.); (E.N.); (B.P.)
| |
Collapse
|
4
|
Fabijan A, Fabijan R, Zawadzka-Fabijan A, Nowosławska E, Zakrzewski K, Polis B. Evaluating Scoliosis Severity Based on Posturographic X-ray Images Using a Contrastive Language-Image Pretraining Model. Diagnostics (Basel) 2023; 13:2142. [PMID: 37443536 DOI: 10.3390/diagnostics13132142] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/10/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open
Abstract
Assessing severe scoliosis requires the analysis of posturographic X-ray images. One way to analyse these images may involve the use of open-source artificial intelligence models (OSAIMs), such as the contrastive language-image pretraining (CLIP) system, which was designed to combine images with text. This study aims to determine whether the CLIP model can recognise visible severe scoliosis in posturographic X-ray images. This study used 23 posturographic images of patients diagnosed with severe scoliosis that were evaluated by two independent neurosurgery specialists. Subsequently, the X-ray images were input into the CLIP system, where they were subjected to a series of questions with varying levels of difficulty and comprehension. The predictions obtained using the CLIP models in the form of probabilities ranging from 0 to 1 were compared with the actual data. To evaluate the quality of image recognition, true positives, false negatives, and sensitivity were determined. The results of this study show that the CLIP system can perform a basic assessment of X-ray images showing visible severe scoliosis with a high level of sensitivity. It can be assumed that, in the future, OSAIMs dedicated to image analysis may become commonly used to assess X-ray images, including those of scoliosis.
Collapse
Affiliation(s)
- Artur Fabijan
- Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland
| | | | | | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland
| | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland
| | - Bartosz Polis
- Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland
| |
Collapse
|
5
|
Lyu J, Ling SH, Banerjee S, Zheng JY, Lai KL, Yang D, Zheng YP, Bi X, Su S, Chamoli U. Ultrasound volume projection image quality selection by ranking from convolutional RankNet. Comput Med Imaging Graph 2021; 89:101847. [PMID: 33476927 DOI: 10.1016/j.compmedimag.2020.101847] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 11/15/2020] [Accepted: 12/11/2020] [Indexed: 01/16/2023]
Abstract
Periodic inspection and assessment are important for scoliosis patients. 3D ultrasound imaging has become an important means of scoliosis assessment as it is a real-time, cost-effective and radiation-free imaging technique. With the generation of a 3D ultrasound volume projection spine image using our Scolioscan system, a series of 2D coronal ultrasound images are produced at different depths with different qualities. Selecting a high quality image from these 2D images is the crucial task for further scoliosis measurement. However, adjacent images are similar and difficult to distinguish. To learn the nuances between these images, we propose selecting the best image automatically, based on their quality rankings. Here, the ranking algorithm we use is a pairwise learning-to-ranking network, RankNet. Then, to extract more efficient features of input images and to improve the discriminative ability of the model, we adopt the convolutional neural network as the backbone due to its high power of image exploration. Finally, by inputting the images in pairs into the proposed convolutional RankNet, we can select the best images from each case based on the output ranking orders. The experimental result shows that convolutional RankNet achieves better than 95.5% top-3 accuracy, and we prove that this performance is beyond the experience of a human expert.
Collapse
Affiliation(s)
- Juan Lyu
- College of Information and Communication Engineering, Harbin Engineering University, Harbin, China
| | - Sai Ho Ling
- School of Biomedical Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia.
| | - S Banerjee
- School of Biomedical Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - J Y Zheng
- Department of Computer Science, Imperial College London, UK
| | - K L Lai
- Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hung Hum, Hong Kong
| | - D Yang
- Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hung Hum, Hong Kong
| | - Y P Zheng
- Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hung Hum, Hong Kong
| | - Xiaojun Bi
- College of Information and Communication Engineering, Harbin Engineering University, Harbin, China; College of Information Engineering, Minzu University of China, Beijing, China
| | - Steven Su
- School of Biomedical Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Uphar Chamoli
- School of Biomedical Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| |
Collapse
|
6
|
The effect of added fat on the accuracy of Cobb angle measurements in CT SPR images: A phantom study. Radiography (Lond) 2020; 26 Suppl 2:S88-S93. [PMID: 32340911 DOI: 10.1016/j.radi.2020.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/02/2020] [Accepted: 04/06/2020] [Indexed: 11/20/2022]
Abstract
INTRODUCTION Adolescent idiopathic scoliosis (AIS) is a spinal deformity that mostly affects females aged between 10 and 17 years old. Cobb's method is the gold standard for assessing AIS. Being overweight is a common characteristic in AIS patients; therefore, the aim of this study is to investigate the effect fat mass has on the accuracy of Cobb angle measurements in 10-year-old female AIS patients. METHODS A purpose-built phantom representing an AIS patient was scanned after adding several thicknesses of lard fat (0,2,4 and 8 cm). The phantom was scanned in an antero-posterior position using the scout mode of the CT scanner. 18 observers performed Cobb angle measurements on the images. RESULTS The average Cobb angle at 0 cm of fat was 10.83° (SD = 3.06), at 2 cm it was 10.90° (SD = 3.16), at 4 cm it was 10.64° (SD = 3.06) and at 8 cm it was 10.88° (SD = 3.02). No significant difference was observed between the measurements at these thicknesses. CONCLUSION Cobb angle measurements are not affected by the presence of fat. IMPLICATIONS FOR PRACTICE When assessing overweight AIS patients, it not necessary to manipulate the acquisition parameters, which could lead to increased patient dose, in order to get more accurate Cobb angle measurement.
Collapse
|
7
|
Scoliosis imaging: An analysis of radiation risk in the CT scan projection radiograph and a comparison with projection radiography and EOS. Radiography (Lond) 2019; 25:e68-e74. [PMID: 31301794 DOI: 10.1016/j.radi.2019.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/21/2019] [Accepted: 02/04/2019] [Indexed: 11/23/2022]
Abstract
INTRODUCTION Scoliosis is defined as a deformity of the spine with lateral curvature in the coronal plane. It requires regular X-ray imaging to monitor the progress of the disorder, therefore scoliotic patients are frequently exposed to radiation. It is important to lower the risk from these exposures for young patients. The aim of this work is to compare organ dose (OD) values resulting from Scan Projection Radiograph (SPR) mode in CT against projection radiography and EOS® imaging system when assessing scoliosis. METHODS A dosimetry phantom was used to represent a 10-year old child. Thermoluminescent dosimetry detectors were used for measuring OD. The phantom was imaged with CT in SPR mode using 27 imaging parameters; projection radiography and EOS machines using local scoliosis imaging procedures. Imaging was performed in anteroposterior, posteroanterior and lateral positions. RESULTS 17 protocols delivered significantly lower radiation dose than projection radiography (p < 0.05). OD values from the CT SPR imaging protocols and projection radiography were statistically significant higher than the results from EOS. No statistically significant differences in OD were observed between 10 imaging protocols and those from projection radiography and EOS imaging protocols (p > 0.05). CONCLUSION EOS has the lowest dose. Where this technology is not available we suggest there is a potential for OD reduction in scoliosis imaging using CT SPR compared to projection radiography. Further work is required to investigate image quality in relation to the measurement of Cobb angle with CT SPR.
Collapse
|