1
|
Tseng LW, Lu YC, Tseng LC, Chen YC, Chen HY. Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2025; 11:e58897. [PMID: 40106227 PMCID: PMC11939018 DOI: 10.2196/58897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/27/2024] [Accepted: 11/09/2024] [Indexed: 03/22/2025]
Abstract
Background The integration of artificial intelligence (AI), notably ChatGPT, into medical education, has shown promising results in various medical fields. Nevertheless, its efficacy in traditional Chinese medicine (TCM) examinations remains understudied. Objective This study aims to (1) assess the performance of ChatGPT on the TCM licensing examination in Taiwan and (2) evaluate the model's explainability in answering TCM-related questions to determine its suitability as a TCM learning tool. Methods We used the GPT-4 model to respond to 480 questions from the 2022 TCM licensing examination. This study compared the performance of the model against that of licensed TCM doctors using 2 approaches, namely direct answer selection and provision of explanations before answer selection. The accuracy and consistency of AI-generated responses were analyzed. Moreover, a breakdown of question characteristics was performed based on the cognitive level, depth of knowledge, types of questions, vignette style, and polarity of questions. Results ChatGPT achieved an overall accuracy of 43.9%, which was lower than that of 2 human participants (70% and 78.4%). The analysis did not reveal a significant correlation between the accuracy of the model and the characteristics of the questions. An in-depth examination indicated that errors predominantly resulted from a misunderstanding of TCM concepts (55.3%), emphasizing the limitations of the model with regard to its TCM knowledge base and reasoning capability. Conclusions Although ChatGPT shows promise as an educational tool, its current performance on TCM licensing examinations is lacking. This highlights the need for enhancing AI models with specialized TCM training and suggests a cautious approach to utilizing AI for TCM education. Future research should focus on model improvement and the development of tailored educational applications to support TCM learning.
Collapse
Affiliation(s)
- Liang-Wei Tseng
- Division of Chinese Acupuncture and Traumatology, Center of Traditional Chinese Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Yi-Chin Lu
- Division of Chinese Internal Medicine, Center for Traditional Chinese Medicine, Chang Gung Memorial Hospital, No. 123, Dinghu Rd, Gueishan Dist, Taoyuan, 33378, Taiwan, 886 3 3196200 ext 2611, 886 3 3298995
| | | | - Yu-Chun Chen
- School of Medicine, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
- Taipei Veterans General Hospital, Yuli Branch, Taipei, Taiwan
- Institute of Hospital and Health Care Administration, National Yang-Ming Chiao Tung University, Taipei, Taiwan
| | - Hsing-Yu Chen
- Division of Chinese Internal Medicine, Center for Traditional Chinese Medicine, Chang Gung Memorial Hospital, No. 123, Dinghu Rd, Gueishan Dist, Taoyuan, 33378, Taiwan, 886 3 3196200 ext 2611, 886 3 3298995
- School of Traditional Chinese Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
2
|
Aster A, Laupichler MC, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review. MEDICAL SCIENCE EDUCATOR 2025; 35:555-567. [PMID: 40144083 PMCID: PMC11933646 DOI: 10.1007/s40670-024-02206-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 10/24/2024] [Indexed: 03/28/2025]
Abstract
This review aims to provide a summary of all scientific publications on the use of large language models (LLMs) in medical education over the first year of their availability. A scoping literature review was conducted in accordance with the PRISMA recommendations for scoping reviews. Five scientific literature databases were searched using predefined search terms. The search yielded 1509 initial results, of which 145 studies were ultimately included. Most studies assessed LLMs' capabilities in passing medical exams. Some studies discussed advantages, disadvantages, and potential use cases of LLMs. Very few studies conducted empirical research. Many published studies lack methodological rigor. We therefore propose a research agenda to improve the quality of studies on LLM.
Collapse
Affiliation(s)
- Alexandra Aster
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Matthias Carl Laupichler
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tamina Rockwell-Kollmann
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Gilda Masala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Ebru Bala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tobias Raupach
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| |
Collapse
|
3
|
Jin HK, Kim E. Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study. JMIR MEDICAL EDUCATION 2024; 10:e57451. [PMID: 39630413 PMCID: PMC11633516 DOI: 10.2196/57451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 08/28/2024] [Accepted: 10/09/2024] [Indexed: 12/13/2024]
Abstract
Background ChatGPT, a recently developed artificial intelligence chatbot and a notable large language model, has demonstrated improved performance on medical field examinations. However, there is currently little research on its efficacy in languages other than English or in pharmacy-related examinations. Objective This study aimed to evaluate the performance of GPT models on the Korean Pharmacist Licensing Examination (KPLE). Methods We evaluated the percentage of correct answers provided by 2 different versions of ChatGPT (GPT-3.5 and GPT-4) for all multiple-choice single-answer KPLE questions, excluding image-based questions. In total, 320, 317, and 323 questions from the 2021, 2022, and 2023 KPLEs, respectively, were included in the final analysis, which consisted of 4 units: Biopharmacy, Industrial Pharmacy, Clinical and Practical Pharmacy, and Medical Health Legislation. Results The 3-year average percentage of correct answers was 86.5% (830/960) for GPT-4 and 60.7% (583/960) for GPT-3.5. GPT model accuracy was highest in Biopharmacy (GPT-3.5 77/96, 80.2% in 2022; GPT-4 87/90, 96.7% in 2021) and lowest in Medical Health Legislation (GPT-3.5 8/20, 40% in 2022; GPT-4 12/20, 60% in 2022). Additionally, when comparing the performance of artificial intelligence with that of human participants, pharmacy students outperformed GPT-3.5 but not GPT-4. Conclusions In the last 3 years, GPT models have performed very close to or exceeded the passing threshold for the KPLE. This study demonstrates the potential of large language models in the pharmacy domain; however, extensive research is needed to evaluate their reliability and ensure their secure application in pharmacy contexts due to several inherent challenges. Addressing these limitations could make GPT models more effective auxiliary tools for pharmacy education.
Collapse
Affiliation(s)
- Hye Kyung Jin
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea
| | - EunYoung Kim
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea
- Division of Licensing of Medicines and Regulatory Science, The Graduate School of Pharmaceutical Management and Regulatory Science Policy, The Graduate School of Pharmaceutical Regulatory Sciences, Chung-Ang University, 84 Heukseok-Ro, Dongjak-gu, Seoul, 06974, Republic of Korea, 82 2-820-5791, 82 2-816-7338
| |
Collapse
|
4
|
Kleib M, Darko EM, Akingbade O, Kennedy M, Majekodunmi P, Nickel E, Vogelsang L. Current trends and future implications in the utilization of ChatGPT in nursing: A rapid review. INTERNATIONAL JOURNAL OF NURSING STUDIES ADVANCES 2024; 7:100252. [PMID: 39584012 PMCID: PMC11583729 DOI: 10.1016/j.ijnsa.2024.100252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 09/16/2024] [Accepted: 10/21/2024] [Indexed: 11/26/2024] Open
Abstract
Background The past decade has witnessed a surge in the development of artificial intelligence (AI)-based technology systems for healthcare. Launched in November 2022, ChatGPT (Generative Pre-trained Transformer), an AI-based Chatbot, is being utilized in nursing education, research and practice. However, little is known about its pattern of usage, which prompted this study. Objective To provide a concise overview of the existing literature on the application of ChatGPT in nursing education, practice and research. Methods A rapid review based on the Cochrane methodology was applied to synthesize existing literature. We conducted systematic searches in several databases, including CINAHL, Ovid Medline, Embase, Web of Science, Scopus, Education Search Complete, ERIC, and Cochrane CENTRAL, to ensure no publications were missed. All types of primary and secondary research studies, including qualitative, quantitative, mixed methods, and literature reviews published in the English language focused on the use of ChatGPT in nursing education, research, and practice, were included. Dissertations or theses, conference proceedings, government and other organizational reports, white papers, discussion papers, opinion pieces, editorials, commentaries, and published review protocols were excluded. Studies involving other healthcare professionals and/or students without including nursing participants were excluded. Studies exploring other language models without comparison to ChatGPT and those examining the technical specifications of ChatGPT were excluded. Data screening was completed in two stages: titles and abstract and full-text review, followed by data extraction and quality appraisal. Descriptive analysis and narrative synthesis were applied to summarize and categorize the findings. Results Seventeen studies were included: 15 (88.2 %) focused on nursing education and one each on nursing practice and research. Of the 17 included studies, 5 (29.4 %) were evaluation studies, 3 (17.6 %) were narrative reviews, 3 (17.6 %) were cross-sectional studies, 2 (11.8 %) were descriptive studies, and 1 (5.9 %) was a randomized controlled trial, quasi-experimental study, case study, and qualitative study, respectively. Conclusion This study has provided a snapshot of ChatGPT usage in nursing education, research, and practice. Although evidence is inconclusive, integration of ChatGPT should consider addressing ethical concerns and ongoing education on ChatGPT usage. Further research, specifically interventional studies, is recommended to ascertain and track the impact of ChatGPT in different contexts.
Collapse
Affiliation(s)
- Manal Kleib
- Faculty of Nursing, University of Alberta, Edmonton, Alberta, Canada
| | | | | | - Megan Kennedy
- Library and Museums - Faculty Engagement (Health Sciences), University of Alberta, Edmonton, Alberta, Canada
| | | | - Emma Nickel
- Alberta Health Services, Calgary, Alberta, Canada
| | - Laura Vogelsang
- Faculty of Health Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
| |
Collapse
|
5
|
Ho CN, Tian T, Ayers AT, Aaron RE, Phillips V, Wolf RM, Mathioudakis N, Dai T, Klonoff DC. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Med Inform Decis Mak 2024; 24:357. [PMID: 39593074 PMCID: PMC11590327 DOI: 10.1186/s12911-024-02757-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 11/08/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND The large language models (LLMs), most notably ChatGPT, released since November 30, 2022, have prompted shifting attention to their use in medicine, particularly for supporting clinical decision-making. However, there is little consensus in the medical community on how LLM performance in clinical contexts should be evaluated. METHODS We performed a literature review of PubMed to identify publications between December 1, 2022, and April 1, 2024, that discussed assessments of LLM-generated diagnoses or treatment plans. RESULTS We selected 108 relevant articles from PubMed for analysis. The most frequently used LLMs were GPT-3.5, GPT-4, Bard, LLaMa/Alpaca-based models, and Bing Chat. The five most frequently used criteria for scoring LLM outputs were "accuracy", "completeness", "appropriateness", "insight", and "consistency". CONCLUSIONS The most frequently used criteria for defining high-quality LLMs have been consistently selected by researchers over the past 1.5 years. We identified a high degree of variation in how studies reported their findings and assessed LLM performance. Standardized reporting of qualitative evaluation metrics that assess the quality of LLM outputs can be developed to facilitate research studies on LLMs in healthcare.
Collapse
Affiliation(s)
- Cindy N Ho
- Diabetes Technology Society, Burlingame, CA, USA
| | - Tiffany Tian
- Diabetes Technology Society, Burlingame, CA, USA
| | | | | | - Vidith Phillips
- School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Risa M Wolf
- Division of Pediatric Endocrinology, The Johns Hopkins Hospital, Baltimore, MD, USA
- Hopkins Business of Health Initiative, Johns Hopkins University, Washington, DC, USA
| | | | - Tinglong Dai
- Hopkins Business of Health Initiative, Johns Hopkins University, Washington, DC, USA
- Carey Business School, Johns Hopkins University, Baltimore, MD, USA
- School of Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - David C Klonoff
- Diabetes Research Institute, Mills-Peninsula Medical Center, 100 South San Mateo Drive, Room 1165, San Mateo, CA, 94401, USA.
| |
Collapse
|
6
|
Luo Y, Miao Y, Zhao Y, Li J, Wu Y. Exploring the Current Applications and Effectiveness of ChatGPT in Nursing: An Integrative Review. J Adv Nurs 2024. [PMID: 39555676 DOI: 10.1111/jan.16628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 10/05/2024] [Accepted: 11/04/2024] [Indexed: 11/19/2024]
Abstract
AIMS To systematically review the current application status of ChatGPT in nursing and explore its application effects. DESIGN An integrative review. METHODS Following inclusion and exclusion criteria, two researchers summarised the selected literature and conducted a quality appraisal, followed by narrative synthesis. DATA SOURCES PubMed, Web of Science and Scopus were searched from January 2022 to June 2024. RESULTS A total of 31 papers met the inclusion criteria. Fifteen empirical studies were rated as grade 5, while five were rated as grade 4. The references of a minireview were not recently published and lacked ChatGPT-related articles, and a systematic review was of low quality. The review focused on three main topics: (1) The subsidiary role of ChatGPT in nursing; (2) Comparison of different models' effectiveness and (3) Existing challenges. CONCLUSIONS While adopting new technologies such as ChatGPT, it is important to maintain a balanced perspective on both its benefits and limitations. Nursing professionals must actively address these deficiencies and explore solutions to improve ChatGPT's utility in the field. IMPLICATIONS TO THE PROFESSION AND PATIENT CARE This review synthesised evidence on ChatGPT's application and highlighted existing challenges in nursing. Nursing researchers, educators and practitioners can further validate these findings to explore its potential in various aspects of nursing practice. IMPACT For researchers, ChatGPT can enhance language quality and summarise findings effectively, but adherence to research standards is crucial. For educators, ChatGPT can serve as an effective information source for students, though caution should be taken to avoid overreliance. For practitioners, ChatGPT can offer useful suggestions for clinical practice, but these should be critically evaluated and not followed blindly, as issues of inaccuracy must be addressed. REPORTING METHOD This review adhered to Preferred Reporting Items for Systematic Reviews and Meta-Analyses. PATIENT OR PUBLIC CONTRIBUTION No patient or public contribution.
Collapse
Affiliation(s)
- Yuan Luo
- School of Nursing, Capital Medical University, Beijing, China
| | - Yiqun Miao
- School of Nursing, Capital Medical University, Beijing, China
| | - Yuhan Zhao
- School of Nursing, Capital Medical University, Beijing, China
| | - Jiawei Li
- School of Nursing, Capital Medical University, Beijing, China
| | - Ying Wu
- School of Nursing, Capital Medical University, Beijing, China
| |
Collapse
|
7
|
Gunawan J, Aungsuroch Y, Montayre J. ChatGPT integration within nursing education and its implications for nursing students: A systematic review and text network analysis. NURSE EDUCATION TODAY 2024; 141:106323. [PMID: 39068726 DOI: 10.1016/j.nedt.2024.106323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 07/04/2024] [Accepted: 07/25/2024] [Indexed: 07/30/2024]
Abstract
OBJECTIVE This study aimed to explore prevalent topics related to integrating ChatGPT into nursing education, specifically focusing on its impact on nursing students. DESIGN A systematic review and text network analysis were conducted. DATA SOURCES Full-text articles from reputable scientific databases, including PubMed, Scopus, and ScienceDirect, along with relevant articles identified through Google Scholar, were utilized, and the search period ranged from January 20th to January 23rd, 2024. REVIEW METHODS The review centered on the main text of the articles, which were initially converted from PDFs to TXT files. Text cleaning and preprocessing were conducted using Python 3.11, followed by text analysis performed with InfraNodus. RESULTS Of 145 articles, 46 full-text articles were included in the final analysis. Four key topical clusters were identified: Academic Writing, Healthcare Simulation, Data Modeling, and Personal Development. Sentiments regarding the integration of ChatGPT into nursing education and its impact on nursing students were primarily positive (48 %), with a notable percentage expressing negative views (31 %) and a smaller proportion indicating neutrality (21 %). CONCLUSION This study highlights the transformative potential of ChatGPT in nursing education, advocating for its responsible integration to empower nursing students with advanced skills and uphold ethical standards in artificial intelligence (AI) utilization. The identified four topics and the sentiments are crucial for guiding educators, researchers, and practitioners in nursing education to navigate the integration of AI tools effectively. Further research and exploration in these areas can contribute to the ongoing discourse on the intersection of technology and nursing education.
Collapse
Affiliation(s)
- Joko Gunawan
- Faculty of Nursing, Chulalongkorn University Bangkok, Thailand.
| | | | - Jed Montayre
- School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China.
| |
Collapse
|
8
|
Jin HK, Lee HE, Kim E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC MEDICAL EDUCATION 2024; 24:1013. [PMID: 39285377 PMCID: PMC11406751 DOI: 10.1186/s12909-024-05944-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 08/22/2024] [Indexed: 09/19/2024]
Abstract
BACKGROUND ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations in the medical field. However, thus far, an overall evaluation of the potential of ChatGPT models (ChatGPT-3.5 and GPT-4) in a variety of national health licensing examinations is lacking. This study aimed to provide a comprehensive assessment of the ChatGPT models' performance in national licensing examinations for medical, pharmacy, dentistry, and nursing research through a meta-analysis. METHODS Following the PRISMA protocol, full-text articles from MEDLINE/PubMed, EMBASE, ERIC, Cochrane Library, Web of Science, and key journals were reviewed from the time of ChatGPT's introduction to February 27, 2024. Studies were eligible if they evaluated the performance of a ChatGPT model (ChatGPT-3.5 or GPT-4); related to national licensing examinations in the fields of medicine, pharmacy, dentistry, or nursing; involved multiple-choice questions; and provided data that enabled the calculation of effect size. Two reviewers independently completed data extraction, coding, and quality assessment. The JBI Critical Appraisal Tools were used to assess the quality of the selected articles. Overall effect size and 95% confidence intervals [CIs] were calculated using a random-effects model. RESULTS A total of 23 studies were considered for this review, which evaluated the accuracy of four types of national licensing examinations. The selected articles were in the fields of medicine (n = 17), pharmacy (n = 3), nursing (n = 2), and dentistry (n = 1). They reported varying accuracy levels, ranging from 36 to 77% for ChatGPT-3.5 and 64.4-100% for GPT-4. The overall effect size for the percentage of accuracy was 70.1% (95% CI, 65-74.8%), which was statistically significant (p < 0.001). Subgroup analyses revealed that GPT-4 demonstrated significantly higher accuracy in providing correct responses than its earlier version, ChatGPT-3.5. Additionally, in the context of health licensing examinations, the ChatGPT models exhibited greater proficiency in the following order: pharmacy, medicine, dentistry, and nursing. However, the lack of a broader set of questions, including open-ended and scenario-based questions, and significant heterogeneity were limitations of this meta-analysis. CONCLUSIONS This study sheds light on the accuracy of ChatGPT models in four national health licensing examinations across various countries and provides a practical basis and theoretical support for future research. Further studies are needed to explore their utilization in medical and health education by including a broader and more diverse range of questions, along with more advanced versions of AI chatbots.
Collapse
Affiliation(s)
- Hye Kyung Jin
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
| | - Ha Eun Lee
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
| | - EunYoung Kim
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
- Division of Licensing of Medicines and Regulatory Science, The Graduate School of Pharmaceutical Management, and Regulatory Science Policy, The Graduate School of Pharmaceutical Regulatory Sciences, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
| |
Collapse
|
9
|
Liu CH, Wang PH. Winners of the 2023 honor awards for excellence at the annual meeting of the Chinese Medical Association-Taipei: Part IV. J Chin Med Assoc 2024; 87:817-818. [PMID: 38965650 DOI: 10.1097/jcma.0000000000001130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/06/2024] Open
Affiliation(s)
- Chia-Hao Liu
- Department of Obstetrics and Gynecology, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Peng-Hui Wang
- Department of Obstetrics and Gynecology, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Female Cancer Foundation, Taipei, Taiwan, ROC
| |
Collapse
|
10
|
Huang CH, Hsiao HJ, Yeh PC, Wu KC, Kao CH. Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health 2024; 10:20552076241233144. [PMID: 38371244 PMCID: PMC10874144 DOI: 10.1177/20552076241233144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/25/2024] [Indexed: 02/20/2024] Open
Abstract
Introduction Since its release by OpenAI in November 2022, numerous studies have subjected ChatGPT to various tests to evaluate its performance in medical exams. The objective of this study is to evaluate ChatGPT's accuracy and logical reasoning across all 10 subjects featured in Stage 1 of Senior Professional and Technical Examinations for Medical Doctors (SPTEMD) in Taiwan, with questions that encompass both Chinese and English. Methods In this study, we tested ChatGPT-4 to complete SPTEMD Stage 1. The model was presented with multiple-choice questions extracted from three separate tests conducted in February 2022, July 2022, and February 2023. These questions encompass 10 subjects, namely biochemistry and molecular biology, anatomy, embryology and developmental biology, histology, physiology, microbiology and immunology, parasitology, pharmacology, pathology, and public health. Subsequently, we analyzed the model's accuracy for each subject. Result In all three tests, ChatGPT achieved scores surpassing the 60% passing threshold, resulting in an overall average score of 87.8%. Notably, its best performance was in biochemistry, where it garnered an average score of 93.8%. Conversely, the performance of the generative pre-trained transformer (GPT)-4 assistant on anatomy, parasitology, and embryology was not as good. In addition, its scores were highly variable in embryology and parasitology. Conclusion ChatGPT has the potential to facilitate not only exam preparation but also improve the accessibility of medical education and support continuous education for medical professionals. In conclusion, this study has demonstrated ChatGPT's potential competence across various subjects within the SPTEMD Stage 1 and suggests that it could be a helpful tool for learning and exam preparation for medical students and professionals.
Collapse
Affiliation(s)
| | - Han-Jung Hsiao
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
| | - Pei-Chun Yeh
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
| | - Kuo-Chen Wu
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei
| | - Chia-Hung Kao
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
- Graduate Institute of Biomedical Sciences, School of Medicine, College of Medicine, China Medical University, Taichung
- Department of Nuclear Medicine and PET Center, China Medical University Hospital, Taichung
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung
| |
Collapse
|
11
|
Scott-Herring M. Artificial intelligence in academic writing: a detailed examination. Int J Nurs Educ Scholarsh 2024; 21:ijnes-2024-0050. [PMID: 39686885 DOI: 10.1515/ijnes-2024-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/27/2024] [Indexed: 12/18/2024]
Abstract
INTRODUCTION As AI tools have become popular in academia, concerns about their impact on student originality and academic integrity have arisen. METHODS This quality improvement project examined first-year nurse anesthesiology students' use of AI for an academic writing assignment. Students generated, edited, and reflected on AI-produced content. Their work was analyzed for commonalities related to the perceived ease of use, accuracy, and overall impressions. RESULTS Students found AI tools easy to use with fast results, but reported concerns with inaccuracies, superficiality, and unreliable citations and formatting. Despite these issues, some saw potential in AI for brainstorming and proofreading. IMPLICATIONS FOR INTERNATIONAL AUDIENCE Clear guidelines are necessary for AI use in academia. Further research should explore AI's long-term impact on academic writing and learning outcomes. CONCLUSIONS While AI tools offer speed and convenience, they currently lack the depth required for rigorous academic work.
Collapse
Affiliation(s)
- Mary Scott-Herring
- Doctor of Nurse Anesthesia Practice (DNAP) Program, Georgetown University, 3700 Reservoir Rd, NW, St Mary's Hall, 4th Floor, Room 427, Washington, DC, 20057, USA
| |
Collapse
|