1
|
Lone MR, Sohail SS. Comment on "Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT". Clin Imaging 2024; 114:110272. [PMID: 39243497 DOI: 10.1016/j.clinimag.2024.110272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Accepted: 08/24/2024] [Indexed: 09/09/2024]
Affiliation(s)
- Mohd Rafi Lone
- VIT Bhopal University, Bhopal-Indore Highway, Kothrikalan, Sehore, Madhya Pradesh 466114, India.
| | - Shahab Saquib Sohail
- VIT Bhopal University, Bhopal-Indore Highway, Kothrikalan, Sehore, Madhya Pradesh 466114, India
| |
Collapse
|
2
|
Villarreal-Espinosa JB, Berreta RS, Allende F, Garcia JR, Ayala S, Familiari F, Chahla J. Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery. Knee 2024; 51:84-92. [PMID: 39241674 DOI: 10.1016/j.knee.2024.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/21/2024] [Accepted: 08/14/2024] [Indexed: 09/09/2024]
Abstract
BACKGROUND The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response's accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery. METHODS A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen's kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed. RESULTS Five of the 10 responses received a 'completely accurate' grade by two-fellowship trained surgeons with three additional replies receiving a 'completely accurate' status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15-0.99). Additionally, 80% of the responses were reproducible over time. CONCLUSION ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient-surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.
Collapse
Affiliation(s)
| | | | - Felicitas Allende
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | - José Rafael Garcia
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | - Salvador Ayala
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | | | - Jorge Chahla
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA.
| |
Collapse
|
3
|
Warrier A, Singh R, Haleem A, Zaki H, Eloy JA. The Comparative Diagnostic Capability of Large Language Models in Otolaryngology. Laryngoscope 2024; 134:3997-4002. [PMID: 38563415 DOI: 10.1002/lary.31434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVES Evaluate and compare the ability of large language models (LLMs) to diagnose various ailments in otolaryngology. METHODS We collected all 100 clinical vignettes from the second edition of Otolaryngology Cases-The University of Cincinnati Clinical Portfolio by Pensak et al. With the addition of the prompt "Provide a diagnosis given the following history," we prompted ChatGPT-3.5, Google Bard, and Bing-GPT4 to provide a diagnosis for each vignette. These diagnoses were compared to the portfolio for accuracy and recorded. All queries were run in June 2023. RESULTS ChatGPT-3.5 was the most accurate model (89% success rate), followed by Google Bard (82%) and Bing GPT (74%). A chi-squared test revealed a significant difference between the three LLMs in providing correct diagnoses (p = 0.023). Of the 100 vignettes, seven require additional testing results (i.e., biopsy, non-contrast CT) for accurate clinical diagnosis. When omitting these vignettes, the revised success rates were 95.7% for ChatGPT-3.5, 88.17% for Google Bard, and 78.72% for Bing-GPT4 (p = 0.002). CONCLUSIONS ChatGPT-3.5 offers the most accurate diagnoses when given established clinical vignettes as compared to Google Bard and Bing-GPT4. LLMs may accurately offer assessments for common otolaryngology conditions but currently require detailed prompt information and critical supervision from clinicians. There is vast potential in the clinical applicability of LLMs; however, practitioners should be wary of possible "hallucinations" and misinformation in responses. LEVEL OF EVIDENCE 3 Laryngoscope, 134:3997-4002, 2024.
Collapse
Affiliation(s)
- Akshay Warrier
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Rohan Singh
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Afash Haleem
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Haider Zaki
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Jean Anderson Eloy
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
- Center for Skull Base and Pituitary Surgery, Neurological Institute of New Jersey, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| |
Collapse
|
4
|
Gupta M, Gupta P, Ho C, Wood J, Guleria S, Virostko J. Can generative AI improve the readability of patient education materials at a radiology practice? Clin Radiol 2024:S0009-9260(24)00431-8. [PMID: 39266371 DOI: 10.1016/j.crad.2024.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/14/2024]
Abstract
AIM This study evaluated the readability of existing patient education materials and explored the potential of generative AI tools, such as ChatGPT-4 and Google Gemini, to simplify these materials to a sixth-grade reading level, in accordance with guidelines. MATERIALS AND METHODS Seven patient education documents were selected from a major radiology group. ChatGPT-4 and Gemini were provided the documents and asked to reformulate to target a sixth-grade reading level. Average reading level (ARL) and proportional word count (PWC) change were calculated, and a 1-sample t-test was conducted (p=0.05). Three radiologists assessed the materials on a Likert scale for appropriateness, relevance, clarity, and information retention. RESULTS The original materials had an ARL of 11.72. ChatGPT ARL was 7.32 ± 0.76 (6/7 significant) and Gemini ARL was 6.55 ± 0.51 (7/7 significant). ChatGPT reduced word count by 15% ± 7%, with 95% retaining at least 75% of information. Gemini reduced word count by 33% ± 7%, with 68% retaining at least 75% of information. ChatGPT outputs were more appropriate (95% vs. 57%), clear (92% vs. 67%), and relevant (95% vs. 76%) than Gemini. Interrater agreement was significantly different for ChatGPT (0.91) than for Gemini (0.46). CONCLUSION Generative AI significantly enhances the readability of patient education materials, which did not achieve the recommended sixth-grade ARL. Radiologist evaluations confirmed the appropriateness and relevance of the AI-simplified texts. This study emphasizes the capabilities of generative AI tools and the necessity for ongoing expert review to maintain content accuracy and suitability.
Collapse
Affiliation(s)
- M Gupta
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA.
| | - P Gupta
- The University of Texas at Austin, Austin, TX, USA
| | - C Ho
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - J Wood
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - S Guleria
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA
| | - J Virostko
- The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA; The University of Texas at Austin, Dell Medical School, Livestrong Cancer Institutes, USA; The University of Texas at Austin, Dell Medical School, Department of Oncology, USA; The University of Texas at Austin, Oden Institute for Computational Engineering and Sciences, USA
| |
Collapse
|
5
|
Spina A, Andalib S, Flores D, Vermani R, Halaseh FF, Nelson AM. Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study. JMIR AI 2024; 3:e54371. [PMID: 39137416 DOI: 10.2196/54371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/28/2024] [Accepted: 06/29/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND Although uncertainties exist regarding implementation, artificial intelligence-driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy. OBJECTIVE The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy. METHODS Input templates related to 2 prevalent chronic diseases-type II diabetes and hypertension-were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL). RESULTS Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor's, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor's degree. Conversely, GPT-4's FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels. CONCLUSIONS GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor's degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy.
Collapse
Affiliation(s)
- Aidin Spina
- School of Medicine, University of California, Irvine, Irvine, CA, United States
| | - Saman Andalib
- School of Medicine, University of California, Irvine, Irvine, CA, United States
| | - Daniel Flores
- School of Medicine, University of California, Irvine, Irvine, CA, United States
| | - Rishi Vermani
- School of Medicine, University of California, Irvine, Irvine, CA, United States
| | - Faris F Halaseh
- School of Medicine, University of California, Irvine, Irvine, CA, United States
| | - Ariana M Nelson
- School of Medicine, University of California, Irvine, Irvine, CA, United States
- Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
6
|
Yong LPX, Tung JYM, Lee ZY, Kuan WS, Chua MT. Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey. J Med Internet Res 2024; 26:e56413. [PMID: 39121468 PMCID: PMC11344182 DOI: 10.2196/56413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 04/07/2024] [Accepted: 07/05/2024] [Indexed: 08/11/2024] Open
Abstract
BACKGROUND Patient complaints are a perennial challenge faced by health care institutions globally, requiring extensive time and effort from health care workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the use of large language models (LLMs) such as the GPT models developed by OpenAI in the health care sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be used in responding to patient complaints to improve patient satisfaction and complaint response time. OBJECTIVE This study aims to evaluate the performance of LLMs in addressing patient complaints received by a tertiary health care institution, with the goal of enhancing patient satisfaction. METHODS Anonymized patient complaint emails and associated responses from the patient relations department were obtained. ChatGPT-4.0 (OpenAI, Inc) was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario. RESULTS There was a total of 188 respondents, of which 115 (61.2%) were health care workers. A majority of the respondents, including both health care and non-health care workers, preferred replies from ChatGPT (n=164, 87.2% to n=183, 97.3%). GPT-4.0 responses were rated higher in all 4 assessed items with all median scores of 8 (IQR 7-9) compared to human responses (appropriateness 5, IQR 3-7; empathy 4, IQR 3-6; quality 5, IQR 3-6; satisfaction 5, IQR 3-6; P<.001) and had higher average word counts as compared to human responses (238 vs 76 words). Regression analyses showed that a higher word count was a statistically significant predictor of higher score in all 4 items, with every 1-word increment resulting in an increase in scores of between 0.015 and 0.019 (all P<.001). However, on subgroup analysis by authorship, this only held true for responses written by patient relations department staff and not those generated by ChatGPT which received consistently high scores irrespective of response length. CONCLUSIONS This study provides significant evidence supporting the effectiveness of LLMs in resolution of patient complaints. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence generated responses can bring in terms of time savings, cost-effectiveness, patient satisfaction, and stress reduction for the health care system.
Collapse
Affiliation(s)
- Lorraine Pei Xian Yong
- Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore
| | | | - Zi Yao Lee
- Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore
| | - Win Sen Kuan
- Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore
| | - Mui Teng Chua
- Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore
| |
Collapse
|
7
|
Swisher AR, Wu AW, Liu GC, Lee MK, Carle TR, Tang DM. Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model. Otolaryngol Head Neck Surg 2024. [PMID: 39105460 DOI: 10.1002/ohn.927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 08/07/2024]
Abstract
OBJECTIVE To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts. STUDY DESIGN Review of online material modified by AI. SETTING Academic center. METHODS Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests. RESULTS The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials. CONCLUSION Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials. LEVEL OF EVIDENCE Level VI.
Collapse
Affiliation(s)
- Austin R Swisher
- Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona, USA
| | - Arthur W Wu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Gene C Liu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Matthew K Lee
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Taylor R Carle
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Dennis M Tang
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| |
Collapse
|
8
|
Triana Rodriguez GA, Rojas-Rojas MM, Sotomayor K, Ovalle JP, Cardona Ortegón JD. Generative Artificial Intelligence: A Promising Instrument for Daily Living and Clinical Practice. J Am Coll Radiol 2024; 21:1158-1159. [PMID: 38237881 DOI: 10.1016/j.jacr.2023.12.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 12/15/2023] [Indexed: 02/12/2024]
Affiliation(s)
| | - María M Rojas-Rojas
- Radiologist, Centro Hospitalario Serena del Mar, Cartagena de Indias, Provincia de Cartagena, Bolívar
| | - Katherine Sotomayor
- Radiologist, Centro Hospitalario Serena del Mar, Cartagena de Indias, Provincia de Cartagena, Bolívar
| | - Juan P Ovalle
- Radiologist, Centro Hospitalario Serena del Mar, Cartagena de Indias, Provincia de Cartagena, Bolívar
| | | |
Collapse
|
9
|
Gumilar KE, Indraprasta BR, Hsu YC, Yu ZY, Chen H, Irawan B, Tambunan Z, Wibowo BM, Nugroho H, Tjokroprawiro BA, Dachlan EG, Mulawardhana P, Rahestyningtyas E, Pramuditya H, Putra VGE, Waluyo ST, Tan NR, Folarin R, Ibrahim IH, Lin CH, Hung TY, Lu TF, Chen YF, Shih YH, Wang SJ, Huang J, Yates CC, Lu CH, Liao LN, Tan M. Disparities in medical recommendations from AI-based chatbots across different countries/regions. Sci Rep 2024; 14:17052. [PMID: 39048640 PMCID: PMC11269683 DOI: 10.1038/s41598-024-67689-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
This study explores disparities and opportunities in healthcare information provided by AI chatbots. We focused on recommendations for adjuvant therapy in endometrial cancer, analyzing responses across four regions (Indonesia, Nigeria, Taiwan, USA) and three platforms (Bard, Bing, ChatGPT-3.5). Utilizing previously published cases, we asked identical questions to chatbots from each location within a 24-h window. Responses were evaluated in a double-blinded manner on relevance, clarity, depth, focus, and coherence by ten experts in endometrial cancer. Our analysis revealed significant variations across different countries/regions (p < 0.001). Interestingly, Bing's responses in Nigeria consistently outperformed others (p < 0.05), excelling in all evaluation criteria (p < 0.001). Bard also performed better in Nigeria compared to other regions (p < 0.05), consistently surpassing them across all categories (p < 0.001, with relevance reaching p < 0.01). Notably, Bard's overall scores were significantly higher than those of ChatGPT-3.5 and Bing in all locations (p < 0.001). These findings highlight disparities and opportunities in the quality of AI-powered healthcare information based on user location and platform. This emphasizes the necessity for more research and development to guarantee equal access to trustworthy medical information through AI technologies.
Collapse
Affiliation(s)
- Khanisyah E Gumilar
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia.
| | - Birama R Indraprasta
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Yu-Cheng Hsu
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
| | - Zih-Ying Yu
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC
| | - Hong Chen
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Budi Irawan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Zulkarnain Tambunan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Bagus M Wibowo
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Hari Nugroho
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Brahmana A Tjokroprawiro
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Erry G Dachlan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Pungky Mulawardhana
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
| | - Eccita Rahestyningtyas
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
| | - Herlangga Pramuditya
- Department of Obstetrics and Gynecology, Dr. Ramelan Naval Hospital, Surabaya, Indonesia
| | - Very Great E Putra
- Department of Obstetrics and Gynecology, Dr. Kariadi Central General Hospital, Semarang, Indonesia
| | - Setyo T Waluyo
- Department of Obstetrics and Gynecology, Ulin General Hospital, Banjarmasin, Indonesia
| | - Nathan R Tan
- Department of Modern and Classical Languages and Literature, University of South Alabama, Mobile, AL, USA
| | - Royhaan Folarin
- Department of Anatomy, Faculty of Basic Medical Sciences, Olabisi Onabanjo University, Sagamu, Nigeria
| | - Ibrahim H Ibrahim
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Cheng-Han Lin
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Tai-Yu Hung
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Ting-Fang Lu
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Yen-Fu Chen
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Yu-Hsiang Shih
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Shao-Jing Wang
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Jingshan Huang
- School of Computing and College of Medicine, University of South Alabama, Mobile, AL, USA
| | - Clayton C Yates
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Chien-Hsing Lu
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC.
| | - Li-Na Liao
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.
| | - Ming Tan
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.
- Institute of Biochemistry and Molecular Biology, Graduate Institute of Biomedical Sciences, China Medical University (Taiwan), No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.
| |
Collapse
|
10
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024; 105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
11
|
Cardona Ortegón JD, Serrano S, Romero Cortes D. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024; 86:e22. [PMID: 38644147 DOI: 10.1016/j.eururo.2024.02.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 04/23/2024]
Affiliation(s)
- José David Cardona Ortegón
- Department of Diagnostic Imaging, Fundación Santa Fe de Bogotá, Bogotá, Colombia; School of Medicine, El Bosque University, Bogotá, Colombia.
| | - Samuel Serrano
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| | - Daniel Romero Cortes
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| |
Collapse
|
12
|
Ye H. Other possible perspectives for solving the negative outcome penalty paradox in the application of artificial intelligence in clinical diagnostics. JOURNAL OF MEDICAL ETHICS 2024:jme-2024-109968. [PMID: 38871400 DOI: 10.1136/jme-2024-109968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/05/2024] [Indexed: 06/15/2024]
Abstract
Artificial intelligence (AI), represented by machine learning, artificial neural networks and deep learning, is impacting all areas of medicine, including translational research (from bench to bedside to health policy), clinical medicine (including diagnosis, treatment, prognosis and healthcare resource allocation) and public health. At a time when almost everyone is focused on how to better realise the promise of AI to transform the entire healthcare system, Dr Appel calls for public attention to the AI in medicine and the negative outcome penalty paradox. Proposing this topic has deepened our thinking about the application of AI in clinical diagnostics, and also prompted us to find more effective ways to integrate AI more effectively into future clinical practice. In addition to Dr Appel's insightful advice, I hope to offer three other possible perspectives, including changing public perceptions, re-engineering clinical practice processes and introducing more stakeholders, to further the discussion on this topic.
Collapse
Affiliation(s)
- Hongnan Ye
- Beijing Alumni Association of China Medical University, Beijing, China
| |
Collapse
|
13
|
Marti-Aguado D, Pazó J, Diaz-Gonzalez A, de Las Heras Páez de la Cadena B, Conthe A, Gallego Duran R, Rodríguez-Gandía MA, Turnes J, Romero-Gomez M. LiverAI: New tool in the landscape for liver health. GASTROENTEROLOGIA Y HEPATOLOGIA 2024; 47:646-648. [PMID: 38582150 DOI: 10.1016/j.gastrohep.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/02/2024] [Indexed: 04/08/2024]
Affiliation(s)
- David Marti-Aguado
- Digestive Disease Department, Clinic University Hospital, INCLIVA Health Research Institute, Valencia, Spain.
| | - Javier Pazó
- AI and IT Solutions Manager, Spanish Association for the Study of the Liver (AEEH), Spain
| | - Alvaro Diaz-Gonzalez
- Gastroenterology and Hepatology Department, Clinical and Translational Research in Digestive Diseases Group, Valdecilla Research Institute (IDIVAL), Marqués de Valdecilla University Hospital, Santander, Spain
| | | | - Andres Conthe
- Department of Gastroenterology and Hepatology, Hospital General Universitario Gregorio Marañón, Madrid, Spain
| | - Rocio Gallego Duran
- Digestive Diseases Unit and CIBERehd, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville (HUVR/CSIC/US), University of Seville, Seville, Spain
| | - Miguel A Rodríguez-Gandía
- Department of Gastroenterology and Hepatology, Hospital Universitario Ramón y Cajal, Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain
| | - Juan Turnes
- Department of Gastroenterology and Hepatology, Complejo Hospitalario Universitario Pontevedra & IIS Galicia Sur, Spain
| | - Manuel Romero-Gomez
- Digestive Diseases Unit and CIBERehd, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville (HUVR/CSIC/US), University of Seville, Seville, Spain
| |
Collapse
|
14
|
Tu W, Joe BN. The Era of ChatGPT and Large Language Models: Can We Advance Patient-centered Communications Appropriately and Safely? Radiol Imaging Cancer 2024; 6:e240038. [PMID: 38668641 PMCID: PMC11148828 DOI: 10.1148/rycan.240038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 05/18/2024]
Affiliation(s)
- Wendy Tu
- From the Department of Medical Imaging, University of Alberta, 116th
St & 85th Ave, Edmonton, AB, Canada T6G 2R3; and Department of Radiology
and Biomedical Imaging, University of California at San Francisco, San
Francisco, Calif
| | - Bonnie N. Joe
- From the Department of Medical Imaging, University of Alberta, 116th
St & 85th Ave, Edmonton, AB, Canada T6G 2R3; and Department of Radiology
and Biomedical Imaging, University of California at San Francisco, San
Francisco, Calif
| |
Collapse
|
15
|
Kaba E, Hürsoy N, Solak M, Çeliker FB. Accuracy of Large Language Models in Thyroid Nodule-Related Questions Based on the Korean Thyroid Imaging Reporting and Data System (K-TIRADS). Korean J Radiol 2024; 25:499-500. [PMID: 38685738 PMCID: PMC11058430 DOI: 10.3348/kjr.2024.0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 03/10/2024] [Accepted: 03/15/2024] [Indexed: 05/02/2024] Open
Affiliation(s)
- Esat Kaba
- Department of Radiology, Recep Tayyip Erdogan University, Rize, Turkey.
| | - Nur Hürsoy
- Department of Radiology, Recep Tayyip Erdogan University, Rize, Turkey
| | - Merve Solak
- Department of Radiology, Recep Tayyip Erdogan University, Rize, Turkey
| | | |
Collapse
|
16
|
Humbsch P, Horn E, Bohm K, Gintrowicz R. [ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use]. DIE ANAESTHESIOLOGIE 2024; 73:324-335. [PMID: 38691128 PMCID: PMC11076380 DOI: 10.1007/s00101-024-01403-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 02/04/2024] [Accepted: 03/10/2024] [Indexed: 05/03/2024]
Abstract
BACKGROUND The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well. RESEARCH QUESTION This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models. METHOD The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies. RESULTS The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed. DISCUSSION Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.
Collapse
Affiliation(s)
- Philipp Humbsch
- Pépinière INP gGmbH, Frankfurt (Oder), Deutschland.
- Abteilung für Anästhesiologie, Naemi-Wilke-Stift, Guben, Deutschland.
- Klinik Anästhesiologie, Intensivmedizin & perioperative Schmerztherapie, Helios Klinikum Bad Saarow, Pieskower Straße 33, 15526, Bad Saarow, Deutschland.
- Institut für Gesundheits- und Pflegewissenschaft, Charité Berlin, Berlin, Deutschland.
| | - Evelyn Horn
- Abteilung für Anästhesiologie, Naemi-Wilke-Stift, Guben, Deutschland
| | - Konrad Bohm
- Pépinière INP gGmbH, Frankfurt (Oder), Deutschland
| | - Robert Gintrowicz
- Klinik für Anästhesiologie m. S. operative Intensivmedizin und Prodekanat für Studium und Lehre, Charité Universitätsmedizin, Berlin, Deutschland
| |
Collapse
|
17
|
Doğan L, Özçakmakcı GB, Yılmaz ĬE. The Performance of Chatbots and the AAPOS Website as a Tool for Amblyopia Education. J Pediatr Ophthalmol Strabismus 2024:1-7. [PMID: 38661309 DOI: 10.3928/01913913-20240409-01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
PURPOSE To evaluate the understandability, actionability, and readability of responses provided by the website of the American Association for Pediatric Ophthalmology and Strabismus (AAPOS), ChatGPT-3.5, Bard, and Bing Chat about amblyopia and the appropriateness of the responses generated by the chatbots. METHOD Twenty-five questions provided by the AAPOS website were directed three times to fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced pediatric ophthalmologists categorized the responses of the chatbots in terms of their appropriateness. Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of the AAPOS website and chatbots. Furthermore, the understandability scores were evaluated using the Patient Education Materials Assessment Tool (PEMAT). RESULTS The appropriateness of the chatbots' responses was 84.0% for ChatGPT-3.5 and Bard and 80% for Bing Chat (P > .05). For understandability (mean PEMAT-U score AAPOS website: 81.5%, Bard: 77.6%, ChatGPT-3.5: 76.1%, and Bing Chat: 71.5%, P < .05) and actionability (mean PEMAT-A score AAPOS website: 74.6%, Bard: 69.2%, ChatGPT-3.5: 67.8%, and Bing Chat: 64.8%, P < .05), the AAPOs website scored better than the chat-bots. Three readability analyses showed that Bard had the highest mean score, followed by the AAPOS website, Bing Chat, and ChatGPT-3.5, and these scores were more challenging than the recommended level. CONCLUSIONS Chatbots have the potential to provide detailed and appropriate responses at acceptable levels. The AAPOS website has the advantage of providing information that is more understandable and actionable. The AAPOS website and chatbots, especially Chat-GPT, provided difficult-to-read data for patient education regarding amblyopia. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].
Collapse
|
18
|
Till T, Tschauner S, Singer G, Lichtenegger K, Till H. Development and optimization of AI algorithms for wrist fracture detection in children using a freely available dataset. Front Pediatr 2023; 11:1291804. [PMID: 38188914 PMCID: PMC10768054 DOI: 10.3389/fped.2023.1291804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 12/05/2023] [Indexed: 01/09/2024] Open
Abstract
Introduction In the field of pediatric trauma computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems have emerged offering a promising avenue for improved patient care. Especially children with wrist fractures may benefit from machine learning (ML) solutions, since some of these lesions may be overlooked on conventional X-ray due to minimal compression without dislocation or mistaken for cartilaginous growth plates. In this article, we describe the development and optimization of AI algorithms for wrist fracture detection in children. Methods A team of IT-specialists, pediatric radiologists and pediatric surgeons used the freely available GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, a total number of 10,643 studies (20,327 images). First, a basic object detection model, a You Only Look Once object detector of the seventh generation (YOLOv7) was trained and tested on these data. Then, team decisions were taken to adjust data preparation, image sizes used for training and testing, and configuration of the detection model. Furthermore, we investigated each of these models using an Explainable Artificial Intelligence (XAI) method called Gradient Class Activation Mapping (Grad-CAM). This method visualizes where a model directs its attention to before classifying and regressing a certain class through saliency maps. Results Mean average precision (mAP) improved when applying optimizations pre-processing the dataset images (maximum increases of + 25.51% mAP@0.5 and + 39.78% mAP@[0.5:0.95]), as well as the object detection model itself (maximum increases of + 13.36% mAP@0.5 and + 27.01% mAP@[0.5:0.95]). Generally, when analyzing the resulting models using XAI methods, higher scoring model variations in terms of mAP paid more attention to broader regions of the image, prioritizing detection accuracy over precision compared to the less accurate models. Discussion This paper supports the implementation of ML solutions for pediatric trauma care. Optimization of a large X-ray dataset and the YOLOv7 model improve the model's ability to detect objects and provide valid diagnostic support to health care specialists. Such optimization protocols must be understood and advocated, before comparing ML performances against health care specialists.
Collapse
Affiliation(s)
- Tristan Till
- Department of Applied Computer Sciences, FH JOANNEUM - University of Applied Sciences, Graz, Austria
- Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Sebastian Tschauner
- Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Georg Singer
- Department of Pediatric and Adolescent Surgery, Medical University of Graz, Graz, Austria
| | - Klaus Lichtenegger
- Department of Applied Computer Sciences, FH JOANNEUM - University of Applied Sciences, Graz, Austria
| | - Holger Till
- Department of Pediatric and Adolescent Surgery, Medical University of Graz, Graz, Austria
| |
Collapse
|
19
|
Coraci D, Maccarone MC, Regazzo G, Accordi G, Papathanasiou JV, Masiero S. ChatGPT in the development of medical questionnaires. The example of the low back pain. Eur J Transl Myol 2023; 33:12114. [PMID: 38112605 PMCID: PMC10811646 DOI: 10.4081/ejtm.2023.12114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open
Abstract
In the last year, Chat Generative Pre-Trained Transformer (ChatGPT), a web software based on artificial intelligence has been showing high potential in every field of knowledge. In the medical area, its possible application is an object of many studies with promising results. We performed the current study to investigate the possible usefulness of ChatGPT in assessing low back pain. We asked ChatGPT to generate a questionnaire about this clinical condition and we compared the obtained questions and results with the ones obtained by other validated questionnaires: Oswestry Disability Index, Quebec Back Pain Disability Scale, Roland-Morris Disability Questionnaire, and Numeric Rating Scale for pain. We enrolled 20 subjects with low back pain and we found important consistencies among the validated questionnaires. The ChatGPT questionnaire showed an acceptable significant correlation only with Oswestry Disability Index and Quebec Back Pain Disability Scale. ChatGPT showed some peculiarities, especially in the assessment of quality of life and medical consultation and treatments. Our study shows that ChatGPT can help evaluate patients, including multilevel perspectives. However, its power is limited, and further research and validation are required.
Collapse
Affiliation(s)
- Daniele Coraci
- Department of Neuroscience, Section of Rehabilitation, University of Padova, Padua.
| | | | - Gianluca Regazzo
- Department of Neuroscience, Section of Rehabilitation, University of Padova, Padua.
| | - Giorgia Accordi
- Department of Neuroscience, Section of Rehabilitation, University of Padova, Padua.
| | - Jannis V Papathanasiou
- Department of Kinesiotherapy, Faculty of Public Health, Medical University of Sofia, Sofia, Bulgaria; Department of Medical Imaging, Allergology and Physiotherapy, Faculty of Dental Medicine, Medical University of Plovdiv, Plovdiv.
| | - Stefano Masiero
- Department of Neuroscience, Section of Rehabilitation, University of Padova, Padua.
| |
Collapse
|