1
|
Cardona Ortegón JD, Serrano S, Romero Cortes D. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024; 86:e22. [PMID: 38644147 DOI: 10.1016/j.eururo.2024.02.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 04/23/2024]
Affiliation(s)
- José David Cardona Ortegón
- Department of Diagnostic Imaging, Fundación Santa Fe de Bogotá, Bogotá, Colombia; School of Medicine, El Bosque University, Bogotá, Colombia.
| | - Samuel Serrano
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| | - Daniel Romero Cortes
- School of Medicine, El Bosque University, Bogotá, Colombia; Department of Urology, El Bosque University, Bogotá, Colombia
| |
Collapse
|
2
|
Pozzi E, Velasquez DA, Varnum AA, Kava BR, Ramasamy R. Artificial Intelligence Modeling and Priapism. Curr Urol Rep 2024:10.1007/s11934-024-01221-9. [PMID: 38886246 DOI: 10.1007/s11934-024-01221-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2024] [Indexed: 06/20/2024]
Abstract
PURPOSE OF REVIEW This narrative review aims to outline the current available evidence, challenges, and future perspectives of Artificial Intelligence (AI) in the diagnosis and management of priapism, a condition marked by prolonged and often painful erections that presents unique diagnostic and therapeutic challenges. RECENT FINDINGS Recent advancements in AI offer promising solutions to face the challenges in diagnosing and treating priapism. AI models have demonstrated the potential to predict the need for surgical intervention and improve diagnostic accuracy. The integration of AI models into medical decision-making for priapism can also predict long-term consequences. AI is currently being implemented in urology to enhance diagnostics and treatment work-up for various conditions, including priapism. Traditional diagnostic approaches rely heavily on assessments based on history, leading to potential delays in treatment with possible long-term sequelae. To date, the role of AI in the management of priapism is understudied, yet to achieve dependable and effective models that can reliably assist physicians in making decisions regarding both diagnostic and treatment strategies.
Collapse
Affiliation(s)
- Edoardo Pozzi
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA.
- University Vita-Salute San Raffaele, Milan, Italy.
- Division of Experimental Oncology, Unit of Urology, URI, IRCCS Ospedale San Raffaele, Milan, Italy.
| | - David A Velasquez
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Alexandra Aponte Varnum
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Bruce R Kava
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Ranjith Ramasamy
- Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA
| |
Collapse
|
3
|
Puerto Nino AK, Garcia Perez V, Secco S, De Nunzio C, Lombardo R, Tikkinen KAO, Elterman DS. Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement? Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00847-7. [PMID: 38871841 DOI: 10.1038/s41391-024-00847-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/03/2024] [Accepted: 05/10/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND ChatGPT has recently emerged as a novel resource for patients' disease-specific inquiries. There is, however, limited evidence assessing the quality of the information. We evaluated the accuracy and quality of the ChatGPT's responses on male lower urinary tract symptoms (LUTS) suggestive of benign prostate enlargement (BPE) when compared to two reference resources. METHODS Using patient information websites from the European Association of Urology and the American Urological Association as reference material, we formulated 88 BPE-centric questions for ChatGPT 4.0+. Independently and in duplicate, we compared the ChatGPT's responses and the reference material, calculating accuracy through F1 score, precision, and recall metrics. We used a 5-point Likert scale for quality rating. We evaluated examiner agreement using the interclass correlation coefficient and assessed the difference in the quality scores with the Wilcoxon signed-rank test. RESULTS ChatGPT addressed all (88/88) LUTS/BPE-related questions. For the 88 questions, the recorded F1 score was 0.79 (range: 0-1), precision 0.66 (range: 0-1), recall 0.97 (range: 0-1), and the quality score had a median of 4 (range = 1-5). Examiners had a good level of agreement (ICC = 0.86). We found no statistically significant difference between the scores given by the examiners and the overall quality of the responses (p = 0.72). DISCUSSION ChatGPT demostrated a potential utility in educating patients about BPE/LUTS, its prognosis, and treatment that helps in the decision-making process. One must exercise prudence when recommending this as the sole information outlet. Additional studies are needed to completely understand the full extent of AI's efficacy in delivering patient education in urology.
Collapse
Affiliation(s)
- Angie K Puerto Nino
- Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
| | | | - Silvia Secco
- Department of Urology, Niguarda Hospital, Milan, Italy
| | - Cosimo De Nunzio
- Urology Unit, Ospedale Sant'Andrea, La Sapienza University of Rome, Rome, Italy
| | - Riccardo Lombardo
- Urology Unit, Ospedale Sant'Andrea, La Sapienza University of Rome, Rome, Italy
| | - Kari A O Tikkinen
- Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Urology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Department of Surgery, South Karelian Central Hospital, Lappeenranta, Finland
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
| | - Dean S Elterman
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Kıyak YS, Emekli E. ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgrad Med J 2024:qgae065. [PMID: 38840505 DOI: 10.1093/postmj/qgae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/29/2024] [Accepted: 05/23/2024] [Indexed: 06/07/2024]
Abstract
ChatGPT's role in creating multiple-choice questions (MCQs) is growing but the validity of these artificial-intelligence-generated questions is unclear. This literature review was conducted to address the urgent need for understanding the application of ChatGPT in generating MCQs for medical education. Following the database search and screening of 1920 studies, we found 23 relevant studies. We extracted the prompts for MCQ generation and assessed the validity evidence of MCQs. The findings showed that prompts varied, including referencing specific exam styles and adopting specific personas, which align with recommended prompt engineering tactics. The validity evidence covered various domains, showing mixed accuracy rates, with some studies indicating comparable quality to human-written questions, and others highlighting differences in difficulty and discrimination levels, alongside a significant reduction in question creation time. Despite its efficiency, we highlight the necessity of careful review and suggest a need for further research to optimize the use of ChatGPT in question generation. Main messages Ensure high-quality outputs by utilizing well-designed prompts; medical educators should prioritize the use of detailed, clear ChatGPT prompts when generating MCQs. Avoid using ChatGPT-generated MCQs directly in examinations without thorough review to prevent inaccuracies and ensure relevance. Leverage ChatGPT's potential to streamline the test development process, enhancing efficiency without compromising quality.
Collapse
Affiliation(s)
- Yavuz Selim Kıyak
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara 06500, Turkey
| | - Emre Emekli
- Department of Radiology, Faculty of Medicine, Eskişehir Osmangazi University, Eskişehir 26040, Turkey
| |
Collapse
|
5
|
Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024:10.1038/s41391-024-00826-y. [PMID: 38744934 DOI: 10.1038/s41391-024-00826-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 05/16/2024]
Abstract
BACKGROUND Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
Collapse
Affiliation(s)
- Jacob S Hershenhouse
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Daniel Mokhtar
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Michael B Eppler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Severin Rodler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Brian Hom
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Ryan J Davis
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - John Tran
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | | | - Andrea Cocci
- Urology Section, University of Florence, Florence, Italy
| | - Andre Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Mihir Desai
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Tsai CY, Hsieh SJ, Huang HH, Deng JH, Huang YY, Cheng PY. Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings. World J Urol 2024; 42:250. [PMID: 38652322 DOI: 10.1007/s00345-024-04957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
PURPOSE To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. METHODS 450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. RESULTS ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. CONCLUSIONS ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.
Collapse
Affiliation(s)
- Chung-You Tsai
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Shang-Ju Hsieh
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Hung-Hsiang Huang
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Juinn-Horng Deng
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Yi-You Huang
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan
| | - Pai-Yu Cheng
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan.
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
7
|
Chao-Yang, Bao YY, Yang YY, Mao CK. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024:S0302-2838(24)02309-1. [PMID: 38644140 DOI: 10.1016/j.eururo.2024.02.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 02/11/2024] [Indexed: 04/23/2024]
Affiliation(s)
- Chao-Yang
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China
| | - Yuan-Yuan Bao
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China
| | - Yuan-Yuan Yang
- Department of Electrocardiography, Anhui Maternal and Child Health Hospital, Hefei, China.
| | - Chang-Kun Mao
- Department of Urology, Anhui Provincial Children's Hospital, Hefei, China.
| |
Collapse
|
8
|
Ni Z, Peng R, Zheng X, Xie P. Embracing the future: Integrating ChatGPT into China's nursing education system. Int J Nurs Sci 2024; 11:295-299. [PMID: 38707690 PMCID: PMC11064564 DOI: 10.1016/j.ijnss.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/07/2024] Open
Abstract
This article delves into the role of ChatGPT within the rapidly evolving field of artificial intelligence, especially highlighting its significant potential in nursing education. Initially, the paper presents the notable advancements ChatGPT has achieved in facilitating interactive learning and providing real-time feedback, along with the academic community's growing interest in this technology. Subsequently, summarizing the research outcomes of ChatGPT's applications in nursing education, including various clinical disciplines and scenarios, showcases the enormous potential for multidisciplinary education and addressing clinical issues. Comparing the performance of several Large Language Models (LLMs) on China's National Nursing Licensure Examination, we observed that ChatGPT demonstrated a higher accuracy rate than its counterparts, providing a solid theoretical foundation for its application in Chinese nursing education and clinical settings. Educational institutions should establish a targeted and effective regulatory framework to leverage ChatGPT in localized nursing education while assuming corresponding responsibilities. Through standardized training for users and adjustments to existing educational assessment methods aimed at preventing potential misuse and abuse, the full potential of ChatGPT as an innovative auxiliary tool in China's nursing education system can be realized, aligning with the developmental needs of modern teaching methodologies.
Collapse
Affiliation(s)
- Zhengxin Ni
- School of Nursing, Yangzhou University, Yangzhou, China
| | - Rui Peng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Ping Xie
- Department of External Cooperation, Northern Jiangsu People’s Hospital, Nanjing, China
| |
Collapse
|
9
|
Pinto VBP, de Azevedo MF, Wroclawski ML, Gentile G, Jesus VLM, de Bessa Junior J, Nahas WC, Sacomani CAR, Sandhu JS, Gomes CM. Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence. Neurourol Urodyn 2024; 43:935-941. [PMID: 38451040 DOI: 10.1002/nau.25442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/08/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) shows immense potential in medicine and Chat generative pretrained transformer (ChatGPT) has been used for different purposes in the field. However, it may not match the complexity and nuance of certain medical scenarios. This study evaluates the accuracy of ChatGPT 3.5 and 4 in providing recommendations regarding the management of postprostatectomy urinary incontinence (PPUI), considering The Incontinence After Prostate Treatment: AUA/SUFU Guideline as the best practice benchmark. MATERIALS AND METHODS A set of questions based on the AUA/SUFU Guideline was prepared. Queries included 10 conceptual questions and 10 case-based questions. All questions were open and entered into the ChatGPT with a recommendation to limit the answer to 200 words, for greater objectivity. Responses were graded as correct (1 point); partially correct (0.5 point), or incorrect (0 point). Performances of versions 3.5 and 4 of ChatGPT were analyzed overall and separately for the conceptual and the case-based questions. RESULTS ChatGPT 3.5 scored 11.5 out of 20 points (57.5% accuracy), while ChatGPT 4 scored 18 (90.0%; p = 0.031). In the conceptual questions, ChatGPT 3.5 provided accurate answers to six questions along with one partially correct response and three incorrect answers, with a final score of 6.5. In contrast, ChatGPT 4 provided correct answers to eight questions and partially correct answers to two questions, scoring 9.0. In the case-based questions, ChatGPT 3.5 scored 5.0, while ChatGPT 4 scored 9.0. The domains where ChatGPT performed worst were evaluation, treatment options, surgical complications, and special situations. CONCLUSION ChatGPT 4 demonstrated superior performance compared to ChatGPT 3.5 in providing recommendations for the management of PPUI, using the AUA/SUFU Guideline as a benchmark. Continuous monitoring is essential for evaluating the development and precision of AI-generated medical information.
Collapse
Affiliation(s)
- Vicktor B P Pinto
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Matheus F de Azevedo
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Marcelo L Wroclawski
- Division of Urology, ABC Medical School, Sao Paulo, Brazil
- Department of Urology, Albert Einstein Jewish Hospital, Sao Paulo, Brazil
- Department of Urologic Oncology, BP-a Beneficência Portuguesa de São Paulo, Sao Paulo, Brazil
| | - Guilherme Gentile
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Vinicius L M Jesus
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | | | - William C Nahas
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Carlos A R Sacomani
- Innovation and Information Technology Sector, AC Camargo Cancer Hospital, Sao Paulo, Brazil
| | - Jaspreet S Sandhu
- Department of Surgery/Urology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Cristiano M Gomes
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| |
Collapse
|
10
|
Huang Y, Wu R, He J, Xiang Y. Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R. J Glob Health 2024; 14:04070. [PMID: 38547497 PMCID: PMC10978058 DOI: 10.7189/jogh.14.04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024] Open
Abstract
Background OpenAI's Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4), an emerging artificial intelligence (AI)-based large language model (LLM), has been receiving increasing attention from the medical research community for its innovative 'Data Analyst' feature. We aimed to compare the capabilities of ChatGPT-4 against traditional biostatistical software (i.e. SAS, SPSS, R) in statistically analysing epidemiological research data. Methods We used a data set from the China Health and Nutrition Survey, comprising 9317 participants and 29 variables (e.g. gender, age, educational level, marital status, income, occupation, weekly working hours, survival status). Two researchers independently evaluated the data analysis capabilities of GPT-4's 'Data Analyst' feature against SAS, SPSS, and R across three commonly used epidemiological analysis methods: Descriptive statistics, intergroup analysis, and correlation analysis. We used an internally developed evaluation scale to assess and compare the consistency of results, analytical efficiency of coding or operations, user-friendliness, and overall performance between ChatGPT-4, SAS, SPSS, and R. Results In descriptive statistics, ChatGPT-4 showed high consistency of results, greater analytical efficiency of code or operations, and more intuitive user-friendliness compared to SAS, SPSS, and R. In intergroup comparisons and correlational analyses, despite minor discrepancies in statistical outcomes for certain analysis tasks with SAS, SPSS, and R, ChatGPT-4 maintained high analytical efficiency and exceptional user-friendliness. Thus, employing ChatGPT-4 can significantly lower the operational threshold for conducting epidemiological data analysis while maintaining consistency with traditional biostatistical software's outcome, requiring only specific, clear analysis instructions without any additional operations or code writing. Conclusions We found ChatGPT-4 to be a powerful auxiliary tool for statistical analysis in epidemiological research. However, it showed limitations in result consistency and in applying more advanced statistical methods. Therefore, we advocate for the use of ChatGPT-4 in supporting researchers with intermediate experience in data analysis. With AI technologies like LLMs advancing rapidly, their integration with data analysis platforms promises to lower operational barriers, thereby enabling researchers to dedicate greater focus to the nuanced interpretation of analysis results. This development is likely to significantly advance epidemiological and medical research.
Collapse
Affiliation(s)
- Yeen Huang
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Ruipeng Wu
- Key Laboratory for Molecular Genetic Mechanisms and Intervention Research, On High Altitude Disease of Tibet Autonomous Region, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of High Altitude Hypoxia Environment and Life Health, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of Environmental Medicine and Engineering of Ministry of Education, Department of Nutrition and Food Hygiene, School of Public Health, Southeast University, Nanjing, Jiangsu, China
| | - Juntao He
- Physical and Chemical Testing Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| | - Yingping Xiang
- Occupational Hazard Assessment Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| |
Collapse
|
11
|
Wu RC, Li DX, Feng DC. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e87-e88. [PMID: 38151444 DOI: 10.1016/j.eururo.2023.11.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 11/23/2023] [Indexed: 12/29/2023]
Affiliation(s)
- Rui-Cheng Wu
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - Deng-Xiong Li
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - De-Chao Feng
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
12
|
Zhao Z, Li Z, Yu N. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e83-e84. [PMID: 38143217 DOI: 10.1016/j.eururo.2023.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 12/01/2023] [Indexed: 12/26/2023]
Affiliation(s)
- Zhongwei Zhao
- Department of Urology, Qilu Hospital of Shandong University, Jinan, China
| | - Zhenye Li
- Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Nengwang Yu
- Department of Urology, Qilu Hospital of Shandong University, Jinan, China.
| |
Collapse
|
13
|
Eppler M, Ganjavi C, Abreu A, Gill I, Cacciamani GE. Reply to Rui-Cheng Wu, Deng-Xiong Li, and De-Chao Feng's Letter to the Editor re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024; 85:e85-e86. [PMID: 38182492 DOI: 10.1016/j.eururo.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 01/07/2024]
Affiliation(s)
- Michael Eppler
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| | - Conner Ganjavi
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Andre Abreu
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology, Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; AI Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
14
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review. Clin Pract 2023; 14:89-105. [PMID: 38248432 PMCID: PMC10801601 DOI: 10.3390/clinpract14010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/23/2023] [Accepted: 12/28/2023] [Indexed: 01/23/2024] Open
Abstract
The emergence of artificial intelligence (AI) has greatly propelled progress across various sectors including the field of nephrology academia. However, this advancement has also given rise to ethical challenges, notably in scholarly writing. AI's capacity to automate labor-intensive tasks like literature reviews and data analysis has created opportunities for unethical practices, with scholars incorporating AI-generated text into their manuscripts, potentially undermining academic integrity. This situation gives rise to a range of ethical dilemmas that not only question the authenticity of contemporary academic endeavors but also challenge the credibility of the peer-review process and the integrity of editorial oversight. Instances of this misconduct are highlighted, spanning from lesser-known journals to reputable ones, and even infiltrating graduate theses and grant applications. This subtle AI intrusion hints at a systemic vulnerability within the academic publishing domain, exacerbated by the publish-or-perish mentality. The solutions aimed at mitigating the unethical employment of AI in academia include the adoption of sophisticated AI-driven plagiarism detection systems, a robust augmentation of the peer-review process with an "AI scrutiny" phase, comprehensive training for academics on ethical AI usage, and the promotion of a culture of transparency that acknowledges AI's role in research. This review underscores the pressing need for collaborative efforts among academic nephrology institutions to foster an environment of ethical AI application, thus preserving the esteemed academic integrity in the face of rapid technological advancements. It also makes a plea for rigorous research to assess the extent of AI's involvement in the academic literature, evaluate the effectiveness of AI-enhanced plagiarism detection tools, and understand the long-term consequences of AI utilization on academic integrity. An example framework has been proposed to outline a comprehensive approach to integrating AI into Nephrology academic writing and peer review. Using proactive initiatives and rigorous evaluations, a harmonious environment that harnesses AI's capabilities while upholding stringent academic standards can be envisioned.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bang Phli 10540, Samut Prakan, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| |
Collapse
|