Tabanli A, Demirkiran ND. Comparing ChatGPT 3.5 and 4.0 in Low Back Pain Patient Education: Addressing Strengths, Limitations, and Psychosocial Challenges.
World Neurosurg 2025;
196:123755. [PMID:
39952398 DOI:
10.1016/j.wneu.2025.123755]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/29/2025] [Accepted: 01/29/2025] [Indexed: 02/17/2025]
Abstract
BACKGROUND
Artificial intelligence tools like ChatGPT have gained attention for their potential to support patient education by providing accessible, evidence-based information. This study compares the performance of ChatGPT 3.5 and ChatGPT 4.0 in answering common patient questions about low back pain, focusing on response quality, readability, and adherence to clinical guidelines, while also addressing the models' limitations in managing psychosocial concerns.
METHODS
Thirty frequently asked patient questions about low back pain were categorized into 4 groups: Diagnosis, Treatment, Psychosocial Factors, and Management Approaches. Responses generated by ChatGPT 3.5 and 4.0 were evaluated on 3 key metrics: 1) response quality: rated on a scale of 1 (excellent) to 4 (unsatisfactory); 2) DISCERN criteria: evaluating reliability and adherence to clinical guidelines, with scores ranging from 1 (low reliability) to 5 (high reliability; and 3) readability: assessed using 7 readability formulas, including Flesch-Kincaid and Gunning Fog Index.
RESULTS
ChatGPT 4.0 significantly outperformed ChatGPT 3.5 in response quality across all categories, with a mean score of 1.03 compared to 2.07 for ChatGPT 3.5 (P < 0.001). ChatGPT 4.0 also demonstrated higher DISCERN scores (4.93 vs. 4.00, P < 0.001). However, both versions struggled with psychosocial factor questions, where responses were rated lower than for Diagnosis, Treatment, and Management questions (P = 0.04).
CONCLUSIONS
ChatGPT 3.5 and 4.0 limitations in addressing psychosocial concerns highlight the need for clinician oversight, particularly for emotionally sensitive issues. Enhancing artificial intelligence's capability in managing psychosocial aspects of patient care should be a priority in future iterations.
Collapse