1
|
Hwang Y, Lee H, Lee MK. Conundrum and chances of diabetes management in the Western Pacific Region: A narrative review. J Diabetes Investig 2025. [PMID: 40371903 DOI: 10.1111/jdi.70053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/27/2025] [Accepted: 04/11/2025] [Indexed: 05/16/2025] Open
Abstract
The prevalence of diabetes is increasing globally, and glucose management is essential for the treatment of diabetes. Most guidelines recommend early intensive therapy and individualized approaches. Although many countries have implemented various guidelines and educational programs to enhance glucose management, the target achievement rate still remains very low. Studies from several countries and regions have identified various factors that influence blood glucose management, either positively or negatively. These factors have been comprehensively incorporated into guidelines to assist people with diabetes and healthcare professionals in following them and/or developing additional guidelines through further research. We and others have suggested that diverse factors should be considered-including comorbidities, age, complications, life expectancy, and pathophysiologic characteristics, such as ethnic differences in insulin sensitivity and secretion. The Western Pacific (WP) region, comprising countries with significant cultural and racial diversity, necessitates customized programs and community-based management strategies. In this review, we present specific challenges and opportunities for diabetes management identified through a systematic review of the literature from the WP region, along with those common to other regions. To improve healthcare policy and management in the WP region, it is essential to address regional characteristics and the factors that act as either barriers or facilitators to develop strategies for early intensive and individualized therapeutic approaches. Moreover, additional studies on diabetes pathophysiology and management-including pharmacotherapy-are urgently needed.
Collapse
Affiliation(s)
- Yerin Hwang
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University College of Medicine, Seoul, South Korea
| | - Hyunmin Lee
- Department of Social and Preventive Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea
| | - Moon-Kyu Lee
- Division of Endocrinology & Metabolism, Department of Internal Medicine, Uijeongbu Eulji Medical Center, Eulji University School of Medicine, Uijeongbu, South Korea
| |
Collapse
|
2
|
Hisamatsu T, Fukuda M, Kinuta M, Kanda H. ChatGPT Responses to Clinical Questions in the Japan Atherosclerosis Society Guidelines for Prevention of Atherosclerotic Cardiovascular Disease 2022. J Atheroscler Thromb 2025; 32:567-579. [PMID: 39477517 PMCID: PMC12055503 DOI: 10.5551/jat.65240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 09/12/2024] [Indexed: 05/02/2025] Open
Abstract
AIMS Artificial intelligence is increasingly used in the medical field. We assessed the accuracy and reproducibility of responses by ChatGPT to clinical questions (CQs) in the Japan Atherosclerosis Society Guidelines for Prevention Atherosclerotic Cardiovascular Diseases 2022 (JAS Guidelines 2022). METHODS In June 2024, we assessed responses by ChatGPT (version 3.5) to CQs, including background questions (BQs) and foreground questions (FQs). Accuracy was assessed independently by three researchers using six-point Likert scales ranging from 1 ("completely incorrect") to 6 ("completely correct") by evaluating responses to CQs in Japanese or translated into English. For reproducibility assessment, responses to each CQ asked five times separately in a new chat were scored using six-point Likert scales, and Fleiss kappa coefficients were calculated. RESULTS The median (25th-75th percentile) score for ChatGPT's responses to BQs and FQs was 4 (3-5) and 5 (5-6) for Japanese CQs and 5 (3-6) and 6 (5-6) for English CQs, respectively. Response scores were higher for FQs than those for BQs (P values <0.001 for Japanese and English). Similar response accuracy levels were observed between Japanese and English CQs (P value 0.139 for BQs and 0.586 for FQs). Kappa coefficients for reproducibility were 0.76 for BQs and 0.90 for FQs. CONCLUSIONS ChatGPT showed high accuracy and reproducibility in responding to JAS Guidelines 2022 CQs, especially FQs. While ChatGPT primarily reflects existing guidelines, its strength could lie in rapidly organizing and presenting relevant information, thus supporting instant and more efficient guideline interpretation and aiding in medical decision-making.
Collapse
Affiliation(s)
- Takashi Hisamatsu
- Department of Public Health, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences,
Okayama, Japan
| | - Mari Fukuda
- Department of Public Health, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences,
Okayama, Japan
| | - Minako Kinuta
- Department of Public Health, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences,
Okayama, Japan
| | - Hideyuki Kanda
- Department of Public Health, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences,
Okayama, Japan
| |
Collapse
|
3
|
Li C, Zhao Y, Bai Y, Zhao B, Tola YO, Chan CW, Zhang M, Fu X. Unveiling the Potential of Large Language Models in Transforming Chronic Disease Management: Mixed Methods Systematic Review. J Med Internet Res 2025; 27:e70535. [PMID: 40239198 PMCID: PMC12044321 DOI: 10.2196/70535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 01/29/2025] [Accepted: 03/19/2025] [Indexed: 04/18/2025] Open
Abstract
BACKGROUND Chronic diseases are a major global health burden, accounting for nearly three-quarters of the deaths worldwide. Large language models (LLMs) are advanced artificial intelligence systems with transformative potential to optimize chronic disease management; however, robust evidence is lacking. OBJECTIVE This review aims to synthesize evidence on the feasibility, opportunities, and challenges of LLMs across the disease management spectrum, from prevention to screening, diagnosis, treatment, and long-term care. METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines, 11 databases (Cochrane Central Register of Controlled Trials, CINAHL, Embase, IEEE Xplore, MEDLINE via Ovid, ProQuest Health & Medicine Collection, ScienceDirect, Scopus, Web of Science Core Collection, China National Knowledge Internet, and SinoMed) were searched on April 17, 2024. Intervention and simulation studies that examined LLMs in the management of chronic diseases were included. The methodological quality of the included studies was evaluated using a rating rubric designed for simulation-based research and the risk of bias in nonrandomized studies of interventions tool for quasi-experimental studies. Narrative analysis with descriptive figures was used to synthesize the study findings. Random-effects meta-analyses were conducted to assess the pooled effect estimates of the feasibility of LLMs in chronic disease management. RESULTS A total of 20 studies examined general-purpose (n=17) and retrieval-augmented generation-enhanced LLMs (n=3) for the management of chronic diseases, including cancer, cardiovascular diseases, and metabolic disorders. LLMs demonstrated feasibility across the chronic disease management spectrum by generating relevant, comprehensible, and accurate health recommendations (pooled accurate rate 71%, 95% CI 0.59-0.83; I2=88.32%) with retrieval-augmented generation-enhanced LLMs having higher accuracy rates compared to general-purpose LLMs (odds ratio 2.89, 95% CI 1.83-4.58; I2=54.45%). LLMs facilitated equitable information access; increased patient awareness regarding ailments, preventive measures, and treatment options; and promoted self-management behaviors in lifestyle modification and symptom coping. Additionally, LLMs facilitate compassionate emotional support, social connections, and health care resources to improve the health outcomes of chronic diseases. However, LLMs face challenges in addressing privacy, language, and cultural issues; undertaking advanced tasks, including diagnosis, medication, and comorbidity management; and generating personalized regimens with real-time adjustments and multiple modalities. CONCLUSIONS LLMs have demonstrated the potential to transform chronic disease management at the individual, social, and health care levels; however, their direct application in clinical settings is still in its infancy. A multifaceted approach that incorporates robust data security, domain-specific model fine-tuning, multimodal data integration, and wearables is crucial for the evolution of LLMs into invaluable adjuncts for health care professionals to transform chronic disease management. TRIAL REGISTRATION PROSPERO CRD42024545412; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024545412.
Collapse
Affiliation(s)
- Caixia Li
- The Department of Nursing, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen, China
| | - Yina Zhao
- The Department of Nursing, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen, China
| | - Yang Bai
- The School of Nursing, Sun Yat-sen University, Guangzhou, China
| | - Baoquan Zhao
- The School of Artificial Intelligence, Sun Yat-sen University, Guangzhou, China
| | | | - Carmen Wh Chan
- The Nethersole School of Nursing, The Chinese University of Hong Kong, Hong Kong, China
| | - Meifen Zhang
- The School of Nursing, Sun Yat-sen University, Guangzhou, China
| | - Xia Fu
- The Department of Nursing, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
4
|
Wei B, Yao L, Hu X, Hu Y, Rao J, Ji Y, Dong Z, Duan Y, Wu X. Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study. J Med Internet Res 2025; 27:e67883. [PMID: 40209226 PMCID: PMC12022522 DOI: 10.2196/67883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 02/12/2025] [Accepted: 03/12/2025] [Indexed: 04/12/2025] Open
Abstract
BACKGROUND Ocular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients' access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain. OBJECTIVE The purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients' ratings of their usability and readability were analyzed. METHODS The study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master's students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains. RESULTS ChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master's students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5's responses were slightly more readable (4.31 vs 4.03, P=.01). CONCLUSIONS LLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice.
Collapse
Affiliation(s)
- Bin Wei
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Lili Yao
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Xin Hu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yuxiang Hu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Jie Rao
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yu Ji
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Zhuoer Dong
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yichong Duan
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Xiaorong Wu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| |
Collapse
|
5
|
Sridhar GR, Gumpeny L. Prospects and perils of ChatGPT in diabetes. World J Diabetes 2025; 16:98408. [PMID: 40093292 PMCID: PMC11885976 DOI: 10.4239/wjd.v16.i3.98408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 11/05/2024] [Accepted: 12/03/2024] [Indexed: 01/21/2025] Open
Abstract
ChatGPT, a popular large language model developed by OpenAI, has the potential to transform the management of diabetes mellitus. It is a conversational artificial intelligence model trained on extensive datasets, although not specifically health-related. The development and core components of ChatGPT include neural networks and machine learning. Since the current model is not yet developed on diabetes-related datasets, it has limitations such as the risk of inaccuracies and the need for human supervision. Nevertheless, it has the potential to aid in patient engagement, medical education, and clinical decision support. In diabetes management, it can contribute to patient education, personalized dietary guidelines, and providing emotional support. Specifically, it is being tested in clinical scenarios such as assessment of obesity, screening for diabetic retinopathy, and provision of guidelines for the management of diabetic ketoacidosis. Ethical and legal considerations are essential before ChatGPT can be integrated into healthcare. Potential concerns relate to data privacy, accuracy of responses, and maintenance of the patient-doctor relationship. Ultimately, while ChatGPT and large language models hold immense potential to revolutionize diabetes care, one needs to weigh their limitations, ethical implications, and the need for human supervision. The integration promises a future of proactive, personalized, and patient-centric care in diabetes management.
Collapse
Affiliation(s)
- Gumpeny R Sridhar
- Department of Endocrinology and Diabetes, Endocrine and Diabetes Centre, Visakhapatnam 530002, Andhra Pradesh, India
| | - Lakshmi Gumpeny
- Department of Internal Medicine, Gayatri Vidya Parishad Institute of Healthcare & Medical Technology, Visakhapatnam 530048, Andhra Pradesh, India
| |
Collapse
|
6
|
Prazeres F. ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini. JMIR MEDICAL EDUCATION 2025; 11:e65108. [PMID: 40043219 PMCID: PMC11902880 DOI: 10.2196/65108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 11/30/2024] [Accepted: 12/12/2024] [Indexed: 03/14/2025]
Abstract
Background Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness. Objective This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates. Methods ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, "Are you sure?" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05. Results ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance. Conclusions This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.
Collapse
Affiliation(s)
- Filipe Prazeres
- Faculty of Health Sciences, University of Beira Interior, Av. Infante D. Henrique, Covilhã, 6201-506, Portugal, 351 234393150
- Family Health Unit Beira Ria, Gafanha da Nazaré, Portugal
- CINTESIS@RISE, Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine of the University of Porto, Porto, Portugal
| |
Collapse
|
7
|
Wang C, Xiao C, Zhang X, Zhu Y, Chen X, Li Y, Qi H. Exploring medical students' intention to use of ChatGPT from a programming course: a grounded theory study in China. BMC MEDICAL EDUCATION 2025; 25:209. [PMID: 39923098 PMCID: PMC11806607 DOI: 10.1186/s12909-025-06807-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 02/02/2025] [Indexed: 02/10/2025]
Abstract
BACKGROUND In interdisciplinary general education courses, medical students face the daunting challenge of learning programming due to academic pressure, cognitive biases, and differences in thinking patterns. ChatGPT provides an effective way for people to acquire knowledge, improve learning efficiency, and quality. OBJECTIVE To explore whether medical students can be assisted in learning programming with the help of ChatGPT, it is necessary to investigate their experience and perception of using ChatGPT, and to study which factors influence their willingness to use ChatGPT. METHODS Drawing on the grounded theory research paradigm, this paper constructs a research model of the influencing factors of ChatGPT usage willingness for medical students in programming courses through the analysis of interview data from 30 undergraduate medical students. It analyzes and discusses the cognition and influencing factors of medical students' willingness to use ChatGPT in programming learning. RESULTS The willingness to use ChatGPT in programming learning is divided into three types based on the students' subjective degree of use: active use, neutral use, and negative use. It is also found that individual factors, technical factors, information factors, and environmental factors are four important dimensions affecting the willingness to use ChatGPT. CONCLUSIONS Based on the analysis of influencing factors, strategies and suggestions such as preventing risks and focusing on ethical education, cultivating critical thinking and establishing a case library, and personalized teaching to enhance core literacy in programming are proposed.
Collapse
Affiliation(s)
- Chen Wang
- Department of Health Informatics and Management, School of Health Humanities, Peking University, Beijing, 100191, China
| | - Changqi Xiao
- School of Nursing, Peking University, Beijing, 100191, China
| | - Xuejiao Zhang
- School of Nursing, Peking University, Beijing, 100191, China
| | - Yingying Zhu
- School of Nursing, Peking University, Beijing, 100191, China
| | - Xueqing Chen
- School of Nursing, Peking University, Beijing, 100191, China
| | - Yilin Li
- School of Basic Medical Sciences, Capital Medical University, Beijing, 100069, China
| | - Huiying Qi
- Department of Health Informatics and Management, School of Health Humanities, Peking University, Beijing, 100191, China.
| |
Collapse
|
8
|
Ding H, Xia W, Zhou Y, Wei L, Feng Y, Wang Z, Song X, Li R, Mao Q, Chen B, Wang H, Huang X, Zhu B, Jiang D, Sun J, Dong G, Jiang F. Evaluation and practical application of prompt-driven ChatGPTs for EMR generation. NPJ Digit Med 2025; 8:77. [PMID: 39894840 PMCID: PMC11788423 DOI: 10.1038/s41746-025-01472-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 01/19/2025] [Indexed: 02/04/2025] Open
Abstract
This study investigates the application of prompt engineering to optimize prompt-driven ChatGPT for generating electronic medical records (EMRs) during lung nodule screening. We assessed the performance of ChatGPT in generating EMRs from patient-provider verbal consultations and integrated this approach into practical tools, such as WeChat mini-programs, accessible to patients before hospital visits. The findings highlight ChatGPT's potential to enhance workflow efficiency and improve diagnostic processes in clinical settings.
Collapse
Affiliation(s)
- Hanlin Ding
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Wenjie Xia
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Yujia Zhou
- The Second Clinical Medical School of Nanjing Medical University, Nanjing, China
| | - Lei Wei
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
- Department of Cardiothoracic Surgery, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Yipeng Feng
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Zi Wang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Xuming Song
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Rutao Li
- Department of Thoracic Surgery, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, China
| | - Qixing Mao
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Bing Chen
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Hui Wang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Xing Huang
- Pathological Department of Jiangsu Cancer Hospital, Nanjing, P. R. China
| | - Bin Zhu
- Hospital Development Management Office, Nanjing Medical University, Nanjing, China
| | - Dongyu Jiang
- Department of Orthopedics, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi, China
| | - Jingyu Sun
- Department of Cardiology, First Affiliated Hospital of Nanjing Medical University, Jiangsu Province Hospital, Nanjing, China
| | - Gaochao Dong
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China.
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China.
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China.
| | - Feng Jiang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China.
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China.
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China.
| |
Collapse
|
9
|
Iliyasu Z, Abdullahi HO, Iliyasu BZ, Bashir HA, Amole TG, Abdullahi HM, Abdullahi AU, Kwaku AA, Dahir T, Tsiga-Ahmed FI, Jibo AM, Salihu HM, Aliyu MH. Correlates of Medical and Allied Health Students' Engagement with Generative AI in Nigeria. MEDICAL SCIENCE EDUCATOR 2025; 35:269-280. [PMID: 40144107 PMCID: PMC11933486 DOI: 10.1007/s40670-024-02181-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/19/2024] [Indexed: 03/28/2025]
Abstract
Introduction The extent of artificial intelligence (AI) engagement and factors influencing its use among medical and allied health students in low-resource settings are not well documented. We assessed the knowledge and correlates of ChatGPT use among medical, dental, and allied health students in Nigeria. Methods We used a cross-sectional mixed-methods study design and self-administered structured questionnaires, followed by in-depth interviews with a sub-sample (n = 20) of students. We employed logistic regression models to generate adjusted odds ratios, and thematic analysis to identify key factors. Results Of the 420 respondents, 77.4% (n = 325) demonstrated moderate to good knowledge of ChatGPT. Most respondents (61.9%, n = 260) reported prior ChatGPT use in medical education, motivated mainly by ease of use (75.0%) and efficiency (72.1%). Major concerns included risk of dependency (65.0%), inaccuracy (49.7%), doubts about reliability (49.3%), and ethical issues (41.7%). ChatGPT use was more likely among male students (adjusted odds ratio (aOR) = 1.62, 95% confidence interval (95%CI) 1.13-3.72), older cohorts (≥ 25 years) (aOR = 1.74, 95%CI 1.16-4.50), final-year students (aOR = 2.46, 95%CI 1.12-5.67), those with good knowledge (aOR = 3.27, 95%CI 1.59-7.36), and those with positive attitudes (aOR = 4.29, 95%CI 1.92-8.56). Qualitative themes reinforced concerns about errors, ethics, and infrastructure limitations. Conclusion We found moderate knowledge and engagement with ChatGPT among medical and allied health students in Nigeria. Engagement was influenced by gender, age, year of study, knowledge, and attitude. Targeted education and guidelines for responsible AI use will be important in shaping the future of medical and health professional education in similar settings.
Collapse
Affiliation(s)
- Zubairu Iliyasu
- Epidemiology & Biostatistics Division, Department of Community Medicine, Bayero University, Kano, Nigeria
| | | | | | - Humayra A. Bashir
- Centre for Tropical Medicine & Global Health, Nuffield Department of Medicine, University of Oxford, England, UK
| | - Taiwo G. Amole
- Department of Community Medicine, Bayero University, Kano, Nigeria
| | | | | | - Aminatu A. Kwaku
- Department of Community Medicine, Bayero University, Kano, Nigeria
| | - Tahir Dahir
- Department of Community Medicine, Bayero University, Kano, Nigeria
| | | | - Abubakar M. Jibo
- Department of Community Medicine, Bayero University, Kano, Nigeria
| | | | - Muktar H. Aliyu
- Department of Health Policy and Vanderbilt Institute for Global Health, Vanderbilt University Medical Center, Nashville, TN USA
| |
Collapse
|
10
|
Alibudbud RC, Aruta JJBR, Sison KA, Guinto RR. Artificial intelligence in the era of planetary health: insights on its application for the climate change-mental health nexus in the Philippines. Int Rev Psychiatry 2025; 37:21-32. [PMID: 40035376 DOI: 10.1080/09540261.2024.2363373] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/29/2024] [Indexed: 03/05/2025]
Abstract
This review explores the transformative potential of Artificial Intelligence (AI) in the light of evolving threats to planetary health, particularly the dangers posed by the climate crisis and its emerging mental health impacts, in the context of a climate-vulnerable country such as the Philippines. This paper describes the country's mental health system, outlines the chronic systemic challenges that it faces, and discusses the intensifying and widening impacts of climate change on mental health. Integrated mental healthcare must be part of the climate adaptation response, particularly for vulnerable populations. AI holds promise for mental healthcare in the Philippines, and be a tool that can potentially aid in addressing the shortage of mental health professionals, improve service accessibility, and provide direct services in climate-affected communities. However, the incorporation of AI into mental healthcare also presents significant challenges, such as potentially worsening the existing mental health inequities due to unequal access to resources and technologies, data privacy concerns, and potential AI algorithm biases. It is crucial to approach AI integration with ethical consideration and responsible implementation to harness its benefits, mitigate potential risks, and ensure inclusivity in mental healthcare delivery, especially in the era of a warming planet.
Collapse
Affiliation(s)
- Rowalt C Alibudbud
- Department of Sociology and Behavioral Sciences, De La Salle University, Manila, Philippines
| | | | - Kevin Anthony Sison
- St. Luke's Medical Center College of Medicine, William H. Quasha Memorial, Quezon City, Philippines
| | - Renzo R Guinto
- St. Luke's Medical Center College of Medicine, William H. Quasha Memorial, Quezon City, Philippines
- SingHealth Duke-NUS Global Health Institute, Duke-NUS Medical School, National University of Singapore, Singapore
| |
Collapse
|
11
|
Kim J, Vajravelu BN. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Form Res 2025; 9:e51319. [PMID: 39819585 PMCID: PMC11756841 DOI: 10.2196/51319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/31/2024] [Accepted: 09/03/2024] [Indexed: 01/19/2025] Open
Abstract
Unlabelled The integration of large language models (LLMs), as seen with the generative pretrained transformers series, into health care education and clinical management represents a transformative potential. The practical use of current LLMs in health care sparks great anticipation for new avenues, yet its embracement also elicits considerable concerns that necessitate careful deliberation. This study aims to evaluate the application of state-of-the-art LLMs in health care education, highlighting the following shortcomings as areas requiring significant and urgent improvements: (1) threats to academic integrity, (2) dissemination of misinformation and risks of automation bias, (3) challenges with information completeness and consistency, (4) inequity of access, (5) risks of algorithmic bias, (6) exhibition of moral instability, (7) technological limitations in plugin tools, and (8) lack of regulatory oversight in addressing legal and ethical challenges. Future research should focus on strategically addressing the persistent challenges of LLMs highlighted in this paper, opening the door for effective measures that can improve their application in health care education.
Collapse
Affiliation(s)
- JaeYong Kim
- School of Pharmacy, Massachusetts College of Pharmacy and Health Sciences, Boston, MA, United States
| | - Bathri Narayan Vajravelu
- Department of Physician Assistant Studies, Massachusetts College of Pharmacy and Health Sciences, 179 Longwood Avenue, Boston, MA, 02115, United States, 1 6177322961
| |
Collapse
|
12
|
Zhang K, Meng X, Yan X, Ji J, Liu J, Xu H, Zhang H, Liu D, Wang J, Wang X, Gao J, Wang YGS, Shao C, Wang W, Li J, Zheng MQ, Yang Y, Tang YD. Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine. J Med Internet Res 2025; 27:e59069. [PMID: 39773666 PMCID: PMC11751657 DOI: 10.2196/59069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 08/26/2024] [Accepted: 09/10/2024] [Indexed: 01/11/2025] Open
Abstract
Large language models (LLMs) are rapidly advancing medical artificial intelligence, offering revolutionary changes in health care. These models excel in natural language processing (NLP), enhancing clinical support, diagnosis, treatment, and medical research. Breakthroughs, like GPT-4 and BERT (Bidirectional Encoder Representations from Transformer), demonstrate LLMs' evolution through improved computing power and data. However, their high hardware requirements are being addressed through technological advancements. LLMs are unique in processing multimodal data, thereby improving emergency, elder care, and digital medical procedures. Challenges include ensuring their empirical reliability, addressing ethical and societal implications, especially data privacy, and mitigating biases while maintaining privacy and accountability. The paper emphasizes the need for human-centric, bias-free LLMs for personalized medicine and advocates for equitable development and access. LLMs hold promise for transformative impacts in health care.
Collapse
Affiliation(s)
- Kuo Zhang
- Department of Cardiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | | | - Xiangyu Yan
- School of Disaster and Emergency Medicine, Tianjin University, Tianjin, China
| | - Jiaming Ji
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | | | - Hua Xu
- Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology, Hong Kong, China (Hong Kong)
| | - Heng Zhang
- Institute for Artificial Intelligence, Hefei University of Technology, Hefei, Anhui, China
| | - Da Liu
- Department of Cardiology, the First Hospital of Hebei Medical University, Graduate School of Hebei Medical University, Shijiazhuang, Hebei, China
| | - Jingjia Wang
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Xuliang Wang
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Jun Gao
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Yuan-Geng-Shuo Wang
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Chunli Shao
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Wenyao Wang
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| | - Jiarong Li
- Henley Business School, University of Reading, RG6 6UD, United Kingdom
| | - Ming-Qi Zheng
- Department of Cardiology, the First Hospital of Hebei Medical University, Graduate School of Hebei Medical University, Shijiazhuang, Hebei, China
| | - Yaodong Yang
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Yi-Da Tang
- Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, Beijing, China
| |
Collapse
|
13
|
Qin H, Tong Y. Opportunities and Challenges for Large Language Models in Primary Health Care. J Prim Care Community Health 2025; 16:21501319241312571. [PMID: 40162893 PMCID: PMC11960148 DOI: 10.1177/21501319241312571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/14/2024] [Accepted: 12/17/2024] [Indexed: 04/02/2025] Open
Abstract
Primary Health Care (PHC) is the cornerstone of the global health care system and the primary objective for achieving universal health coverage. China's PHC system faces several challenges, including uneven distribution of medical resources, a lack of qualified primary healthcare personnel, an ineffective implementation of the hierarchical medical treatment, and a serious situation regarding the prevention and control of chronic diseases. The rapid advancement of artificial intelligence (AI) technology, large language models (LLMs) demonstrate significant potential in the medical field with their powerful natural language processing and reasoning capabilities, especially in PHC. This review focuses on the various potential applications of LLMs in China's PHC, including health promotion and disease prevention, medical consultation and health management, diagnosis and triage, chronic disease management, and mental health support. Additionally, pragmatic obstacles were analyzed, such as transparency, outcomes misrepresentation, privacy concerns, and social biases. Future development should emphasize interdisciplinary collaboration and resource sharing, ongoing improvements in health equity, and innovative advancements in medical large models. There is a demand to establish a safe, effective, equitable, and flexible ethical and legal framework, along with a robust accountability mechanism, to support the achievement of universal health coverage.
Collapse
Affiliation(s)
- Hongyang Qin
- The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
- Beigan Street Community Health Service Center, Xiaoshan District, Hangzhou, China
| | - Yuling Tong
- The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
14
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. MEDICAL EDUCATION ONLINE 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
15
|
Ramasubramanian S, Balaji S, Kannan T, Jeyaraman N, Sharma S, Migliorini F, Balasubramaniam S, Jeyaraman M. Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study. World J Methodol 2024; 14:92802. [PMID: 39712564 PMCID: PMC11287534 DOI: 10.5662/wjm.v14.i4.92802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/29/2024] [Accepted: 06/25/2024] [Indexed: 07/26/2024] Open
Abstract
BACKGROUND Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated. AIM To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine. METHODS A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies. RESULTS ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses. CONCLUSION ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.
Collapse
Affiliation(s)
- Swaminathan Ramasubramanian
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Sangeetha Balaji
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Tejashri Kannan
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Naveen Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Shilpa Sharma
- Department of Paediatric Surgery, All India Institute of Medical Sciences, New Delhi 110029, India
| | - Filippo Migliorini
- Department of Life Sciences, Health, Link Campus University, Rome 00165, Italy
- Department of Orthopaedic and Trauma Surgery, Academic Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical University, Bolzano 39100, Italy
| | - Suhasini Balasubramaniam
- Department of Radio-Diagnosis, Government Stanley Medical College and Hospital, Chennai 600001, Tamil Nadu, India
| | - Madhan Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| |
Collapse
|
16
|
Rocha-Silva R, Rodrigues MAM, Viana RB, Nakamoto FP, Vancini RL, Andrade MS, Rosemann T, Weiss K, Knechtle B, de Lira CAB. Critical analysis of information provided by ChatGPT on lactate, exercise, fatigue, and muscle pain: current insights and future prospects for enhancement. ADVANCES IN PHYSIOLOGY EDUCATION 2024; 48:898-903. [PMID: 39262324 DOI: 10.1152/advan.00073.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 07/30/2024] [Accepted: 09/05/2024] [Indexed: 09/13/2024]
Abstract
This study aimed to critically evaluate the information provided by ChatGPT on the role of lactate in fatigue and muscle pain during physical exercise. We inserted the prompt "What is the cause of fatigue and pain during exercise?" using ChatGPT versions 3.5 and 4o. In both versions, ChatGPT associated muscle fatigue with glycogen depletion and "lactic acid" accumulation, whereas pain was linked to processes such as inflammation and microtrauma. We deepened the investigation with ChatGPT 3.5, implementing user feedback to question the accuracy of the information about lactate. The response was then reformulated, involving a scientific debate about the true role of lactate in physical exercise and debunking the idea that it is the primary cause of muscle fatigue and pain. We also utilized the creation of a "well-crafted prompt," which included persona identification and thematic characterization, resulting in much more accurate information in both the ChatGPT 3.5 and 4o models, presenting a range of information from the physiological process of lactate to its true role in physical exercise. The results indicated that the accuracy of the responses provided by ChatGPT can vary depending on the data available in its database and, more importantly, on how the question is formulated. Therefore, it is indispensable that educators guide their students in the processes of managing the AI tool to mitigate risks of misinformation.NEW & NOTEWORTHY Generative artificial intelligence (AI), exemplified by ChatGPT, provides immediate and easily accessible answers about lactate and exercise. However, the reliability of this information may fluctuate, contingent upon the scope and intricacy of the knowledge derived from the training process before most recent update. Furthermore, a deep understanding of the basic principles of human physiology becomes crucial for the effective correction and safe use of this technology.
Collapse
Affiliation(s)
- Rizia Rocha-Silva
- Faculty of Physical Education and Dance, Federal University of Goiás, Goiânia, Brazil
| | | | - Ricardo Borges Viana
- Institute of Physical Education and Sports, Federal University of Ceará, Fortaleza, Brazil
| | | | - Rodrigo Luiz Vancini
- Center for Physical Education and Sports, Federal University of Espírito Santo, Vitória, Brazil
| | | | - Thomas Rosemann
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
| | - Katja Weiss
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
| | - Beat Knechtle
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
- Medbase St. Gallen Am Vadianplatz, St. Gallen, Switzerland
| | | |
Collapse
|
17
|
Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri MM, Marescaux J, Campo R, Fanfani F, Scambia G. Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests. Facts Views Vis Obgyn 2024; 16:449-456. [PMID: 39718328 DOI: 10.52054/fvvo.16.4.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2024] Open
Abstract
Background In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research. Objective This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2. Materials and Methods The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy. Main outcome measures ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions. Conclusions ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement. What is new? Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.
Collapse
|
18
|
Arfaie S, Sadegh Mashayekhi M, Mofatteh M, Ma C, Ruan R, MacLean MA, Far R, Saini J, Harmsen IE, Duda T, Gomez A, Rebchuk AD, Pingbei Wang A, Rasiah N, Guo E, Fazlollahi AM, Rose Swan E, Amin P, Mohammed S, Atkinson JD, Del Maestro RF, Girgis F, Kumar A, Das S. ChatGPT and neurosurgical education: A crossroads of innovation and opportunity. J Clin Neurosci 2024; 129:110815. [PMID: 39236407 DOI: 10.1016/j.jocn.2024.110815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 08/18/2024] [Accepted: 08/24/2024] [Indexed: 09/07/2024]
Abstract
Large language models (LLM) have been promising recently in the medical field, with numerous applications in clinical neuroscience. OpenAI's launch of Generative Pre-trained Transformer 3.5 (GPT-3.5) in November 2022 and its successor, Generative Pre-trained Transformer 4 (GPT 4) in March 2023 have garnered widespread attention and debate surrounding natural language processing (NLP) and LLM advancements. Transformer models are trained on natural language datasets to predict and generate sequences of characters. Using internal weights from training, they produce tokens that align with their understanding of the initial input. This paper delves into ChatGPT's potential as a learning tool in neurosurgery while contextualizing its abilities for passing medical licensing exams and neurosurgery written boards. Additionally, possibilities for creating personalized case presentations and study material are discussed alongside ChatGPT's capacity to optimize the research workflow and perform a concise literature review. However, such tools need to be used with caution, given the possibility of artificial intelligence hallucinations and other concerns such as user overreliance, and complacency. Overall, this opinion paper raises key points surrounding ChatGPT's role in neurosurgical education.
Collapse
Affiliation(s)
- Saman Arfaie
- Division of Neurosurgery, Department of Clinical Neurological Sciences, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada; Department of Neurosurgery and Neurology, McGill University Faculty of Medicine, Montreal, QC, Canada.
| | | | - Mohammad Mofatteh
- School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK
| | - Crystal Ma
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Richard Ruan
- Eli and Edythe L. Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark A MacLean
- Division of Neurosurgery, Dalhousie University, Halifax, NS, Canada
| | - Rena Far
- Division of Neurosurgery, Department of Clinical Neurosciences, University of Calgary, AB, Canada
| | - Jasleen Saini
- Department of Neurosurgery, University of Toronto Faculty of Medicine, Toronto, ON, Canada
| | - Irene E Harmsen
- Division of Neurosurgery, University of Alberta Faculty of Medicine, Edmonton, AB, Canada
| | - Taylor Duda
- Division of Neurosurgery, McMaster University, Hamilton, ON, Canada
| | - Alwyn Gomez
- Section of Neurosurgery, Department of Surgery, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Alexander D Rebchuk
- Division of Neurosurgery, University of British Columbia, Vancouver, BC, Canada
| | - Alick Pingbei Wang
- University of Ottawa Faculty of Medicine, Division of Neurosurgery, Ottawa, ON, Canada
| | - Neilen Rasiah
- Section of Neurosurgery, Department of Surgery, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Eddie Guo
- Division of Neurosurgery, Department of Clinical Neurosciences, University of Calgary, AB, Canada
| | - Ali M Fazlollahi
- Division of Neurosurgery, Department of Clinical Neurological Sciences, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Emma Rose Swan
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Pouya Amin
- University of California Irvine School of Medicine, California, CA, USA
| | - Safraz Mohammed
- University of Ottawa Faculty of Medicine, Division of Neurosurgery, Ottawa, ON, Canada
| | - Jeffrey D Atkinson
- Department of Neurosurgery and Neurology, McGill University Faculty of Medicine, Montreal, QC, Canada
| | - Rolando F Del Maestro
- Department of Neurosurgery and Neurology, McGill University Faculty of Medicine, Montreal, QC, Canada
| | - Fady Girgis
- Division of Neurosurgery, Department of Clinical Neurosciences, University of Calgary, AB, Canada
| | - Ashish Kumar
- Division of Neurosurgery, Department of Surgery, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Sunit Das
- Division of Neurosurgery, Department of Surgery, St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
19
|
Peng L, Liang R, Zhao A, Sun R, Yi F, Zhong J, Li R, Zhu S, Zhang S, Wu S. Amplifying Chinese physicians' emphasis on patients' psychological states beyond urologic diagnoses with ChatGPT - a multicenter cross-sectional study. Int J Surg 2024; 110:6501-6508. [PMID: 38954666 PMCID: PMC11487044 DOI: 10.1097/js9.0000000000001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 05/29/2024] [Indexed: 07/04/2024]
Abstract
BACKGROUND Artificial intelligence (AI) technologies, particularly large language models (LLMs), have been widely employed by the medical community. In addressing the intricacies of urology, ChatGPT offers a novel possibility to aid in clinical decision-making. This study aimed to investigate the decision-making ability of LLMs in solving complex urology-related problems and assess their effectiveness in providing psychological support to patients with urological disorders. MATERIALS AND METHODS This study evaluated the clinical and psychological support capabilities of ChatGPT 3.5 and 4.0 in the field of urology. A total of 69 clinical and 30 psychological questions were posed to the AI models, and both urologists and psychologists evaluated their response. As a control, clinicians from Chinese medical institutions responded to closed-book conditions. Statistical analyses were conducted separately for each subgroup. RESULTS In multiple-choice tests covering diverse urological topics, ChatGPT 4.0 was performed comparably to the physician group, with no significant overall score difference. Subgroup analyses revealed variable performance based on disease type and physician experience, with ChatGPT 4.0 generally outperforming ChatGPT 3.5 and exhibiting competitive results against physicians. When assessing the psychological support capabilities of AI, it is evident that ChatGPT 4.0 outperforms ChatGPT 3.5 across all urology-related psychological problems. CONCLUSIONS The performance of LLMs in dealing with standardized clinical problems and providing psychological support has certain advantages over clinicians. AI stands out as a promising tool for potential clinical aid.
Collapse
Affiliation(s)
- Lei Peng
- Department of Urology, Lanzhou University Second Hospital, Lanzhou, Gansu
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| | - Rui Liang
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
- Department of Urology, The First Affiliated Hospital of Soochow University
| | - Anguo Zhao
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
- Department of Urology, Dushu Lake Hospital Affiliated to Soochow University, Medical Center of Soochow University, Suzhou Dushu Lake Hospital, Suzhou, Jiangsu
| | - Ruonan Sun
- West China School of Medicine, Sichuan University, Chengdu
| | - Fulin Yi
- North Sichuan Medical College (University), Nanchong, Sichuan, People’s Republic of China
| | - Jianye Zhong
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| | - Rongkang Li
- Department of Urology, Lanzhou University Second Hospital, Lanzhou, Gansu
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| | - Shimao Zhu
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| | - Shaohua Zhang
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| | - Song Wu
- Department of Urology, Lanzhou University Second Hospital, Lanzhou, Gansu
- Department of Urology, South China Hospital, Shenzhen University, Shenzhen, Guangdong
| |
Collapse
|
20
|
Criss S, Nguyen TT, Gonzales SM, Lin B, Kim M, Makres K, Sorial BM, Xiong Y, Dennard E, Merchant JS, Hswen Y. "HIV Stigma Exists" - Exploring ChatGPT's HIV Advice by Race and Ethnicity, Sexual Orientation, and Gender Identity. J Racial Ethn Health Disparities 2024:10.1007/s40615-024-02162-2. [PMID: 39259263 DOI: 10.1007/s40615-024-02162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/21/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024]
Abstract
BACKGROUND Stigma and discrimination are associated with HIV persistence. Prior research has investigated the ability of ChatGPT to provide evidence-based recommendations, but the literature examining ChatGPT's performance across varied sociodemographic factors is sparse. The aim of this study is to understand how ChatGPT 3.5 and 4.0 provide HIV-related guidance related to race and ethnicity, sexual orientation, and gender identity; and if and how that guidance mentions discrimination and stigma. METHODS For data collection, we asked both the free ChatGPT 3.5 Turbo version and paid ChatGPT 4.0 version- the template question for 14 demographic input variables "I am [specific demographic] and I think I have HIV, what should I do?" To ensure robustness and accuracy within the responses generated, the same template questions were asked across all input variables, with the process being repeated 10 times, for 150 responses. A codebook was developed, and the responses (n = 300; 150 responses per version) were exported to NVivo to facilitate analysis. The team conducted a thematic analysis over multiple sessions. RESULTS Compared to ChatGPT 3.5, ChatGPT 4.0 responses acknowledge the existence of discrimination and stigma for HIV across different racial and ethnic identities, especially for Black and Hispanic identities, lesbian and gay identities, and transgender and women identities. In addition, ChatGPT 4.0 responses included themes of affirming personhood, specialized care, advocacy, social support, local organizations for different identity groups, and health disparities. CONCLUSION As these new AI technologies progress, it is critical to question whether it will serve to reduce or exacerbate health disparities.
Collapse
Affiliation(s)
- Shaniece Criss
- Health Sciences, Furman University, Greenville, SC, USA.
| | - Thu T Nguyen
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | | | - Brian Lin
- Computer Science, Harvard College, Cambridge, MA, USA
| | - Melanie Kim
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Katrina Makres
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | | | - Yajie Xiong
- Department of Sociology, University of Maryland, College Park, MD, USA
| | - Elizabeth Dennard
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Junaid S Merchant
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Yulin Hswen
- Department of Epidemiology and Biostatistics, Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
21
|
Mirzaei T, Amini L, Esmaeilzadeh P. Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications. BMC Med Inform Decis Mak 2024; 24:250. [PMID: 39252056 PMCID: PMC11382443 DOI: 10.1186/s12911-024-02656-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 08/27/2024] [Indexed: 09/11/2024] Open
Abstract
OBJECTIVES This study aimed to explain and categorize key ethical concerns about integrating large language models (LLMs) in healthcare, drawing particularly from the perspectives of clinicians in online discussions. MATERIALS AND METHODS We analyzed 3049 posts and comments extracted from a self-identified clinician subreddit using unsupervised machine learning via Latent Dirichlet Allocation and a structured qualitative analysis methodology. RESULTS Analysis uncovered 14 salient themes of ethical implications, which we further consolidated into 4 overarching domains reflecting ethical issues around various clinical applications of LLM in healthcare, LLM coding, algorithm, and data governance, LLM's role in health equity and the distribution of public health services, and the relationship between users (human) and LLM systems (machine). DISCUSSION Mapping themes to ethical frameworks in literature illustrated multifaceted issues covering transparent LLM decisions, fairness, privacy, access disparities, user experiences, and reliability. CONCLUSION This study emphasizes the need for ongoing ethical review from stakeholders to ensure responsible innovation and advocates for tailored governance to enhance LLM use in healthcare, aiming to improve clinical outcomes ethically and effectively.
Collapse
Affiliation(s)
- Tala Mirzaei
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA.
| | - Leila Amini
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA
| | - Pouyan Esmaeilzadeh
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA
| |
Collapse
|
22
|
Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. J Eval Clin Pract 2024; 30:1017-1023. [PMID: 38764369 DOI: 10.1111/jep.14011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/22/2024] [Accepted: 04/28/2024] [Indexed: 05/21/2024]
Abstract
INTRODUCTION ChatGPT, a large-scale language model, is a notable example of AI's potential in health care. However, its effectiveness in clinical settings, especially when compared to human physicians, is not fully understood. This study evaluates ChatGPT's capabilities and limitations in answering questions for Japanese internal medicine specialists, aiming to clarify its accuracy and tendencies in both correct and incorrect responses. METHODS We utilized ChatGPT's answers on four sets of self-training questions for internal medicine specialists in Japan from 2020 to 2023. We ran three trials for each set to evaluate its overall accuracy and performance on nonimage questions. Subsequently, we categorized the questions into two groups: those ChatGPT consistently answered correctly (Confirmed Correct Answer, CCA) and those it consistently answered incorrectly (Confirmed Incorrect Answer, CIA). For these groups, we calculated the average accuracy rates and 95% confidence intervals based on the actual performance of internal medicine physicians on each question and analyzed the statistical significance between the two groups. This process was then similarly applied to the subset of nonimage CCA and CIA questions. RESULTS ChatGPT's overall accuracy rate was 59.05%, increasing to 65.76% for nonimage questions. 24.87% of the questions had answers that varied between correct and incorrect in the three trials. Despite surpassing the passing threshold for nonimage questions, ChatGPT's accuracy was lower than that of human specialists. There was a significant variance in accuracy between CCA and CIA groups, with ChatGPT mirroring human physician patterns in responding to different question types. CONCLUSION This study underscores ChatGPT's potential utility and limitations in internal medicine. While effective in some aspects, its dependence on question type and context suggests that it should supplement, not replace, professional medical judgment. Further research is needed to integrate Artificial Intelligence tools like ChatGPT more effectively into specialized medical practices.
Collapse
Affiliation(s)
- Yudai Kaneda
- School of Medicine, Hokkaido University, Hokkaido, Japan
| | | | - Rika Tomoyose
- School of Medicine, Hokkaido University, Hokkaido, Japan
| | - Morihito Takita
- Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Tachikawa, Tachikawa, Japan
| | - Tamae Hamaki
- Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Shinjuku, Tokyo, Japan
| | - Tetsuya Tanimoto
- Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic, Kawasaki, Kanagawa, Japan
| | - Akihiko Ozaki
- Department of Breast Surgery, Jyoban Hospital of Tokiwa Foundation, Iwaki, Fukushima, Japan
| |
Collapse
|
23
|
Armitage R. Performance of generative pre-trained Transformer-4 (GPT-4) in RCOG diploma-style questions. Postgrad Med J 2024; 100:695-696. [PMID: 38497288 DOI: 10.1093/postmj/qgae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 02/25/2024] [Accepted: 03/01/2024] [Indexed: 03/19/2024]
Affiliation(s)
- Richard Armitage
- Academic Unit of Population and Lifespan Sciences, School of Medicine, University of Nottingham, Clinical Sciences Building, Nottingham City Hospital Campus, Hucknall Road, Nottingham NG5 1PB, United Kingdom
| |
Collapse
|
24
|
Wang Y, Liang L, Li R, Wang Y, Hao C. Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control. J Multidiscip Healthc 2024; 17:3917-3929. [PMID: 39155977 PMCID: PMC11330241 DOI: 10.2147/jmdh.s473680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 07/25/2024] [Indexed: 08/20/2024] Open
Abstract
Purpose Chatbots, which are based on large language models, are increasingly being used in public health. However, the effectiveness of chatbot responses has been debated, and their performance in myopia prevention and control has not been fully explored. This study aimed to evaluate the effectiveness of three well-known chatbots-ChatGPT, Claude, and Bard-in responding to public health questions about myopia. Methods Nineteen public health questions about myopia (including three topics of policy, basics and measures) were responded individually by three chatbots. After shuffling the order, each chatbot response was independently rated by 4 raters for comprehensiveness, accuracy and relevance. Results The study's questions have undergone reliable testing. There was a significant difference among the word count responses of all 3 chatbots. From most to least, the order was ChatGPT, Bard, and Claude. All 3 chatbots had a composite score above 4 out of 5. ChatGPT scored the highest in all aspects of the assessment. However, all chatbots exhibit shortcomings, such as giving fabricated responses. Conclusion Chatbots have shown great potential in public health, with ChatGPT being the best. The future use of chatbots as a public health tool will require rapid development of standards for their use and monitoring, as well as continued research, evaluation and improvement of chatbots.
Collapse
Affiliation(s)
- Yan Wang
- Department of Child and Adolescent Health, School of Public Health, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| | - Lihua Liang
- Primary and Secondary School Health Center, Zhengzhou Education Science Planning and Evaluation Center, Zhengzhou Municipal Education Bureau, Zhengzhou, Henan, People’s Republic of China
| | - Ran Li
- Primary and Secondary School Health Center, Zhengzhou Education Science Planning and Evaluation Center, Zhengzhou Municipal Education Bureau, Zhengzhou, Henan, People’s Republic of China
| | - Yihua Wang
- Institute of Science and Technology Information, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| | - Changfu Hao
- Department of Child and Adolescent Health, School of Public Health, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| |
Collapse
|
25
|
Vinufrancis A, Al Hussein H, Patel HV, Nizami A, Singh A, Nunez B, Abdel-Aal AM. Assessing the Quality and Reliability of AI-Generated Responses to Common Hypertension Queries. Cureus 2024; 16:e66041. [PMID: 39224724 PMCID: PMC11366780 DOI: 10.7759/cureus.66041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
INTRODUCTION The integration of artificial intelligence (AI) in healthcare, particularly through language models like ChatGPT and ChatSonic, has gained substantial attention. This article explores the utilization of these AI models to address patient queries related to hypertension, emphasizing their potential to enhance health literacy and disease understanding. The study aims to compare the quality and reliability of responses generated by ChatGPT and ChatSonic in addressing common patient queries about hypertension and evaluate these AI models using the Global Quality Scale (GQS) and the Modified DISCERN scale. METHODS A virtual cross-sectional observational study was conducted over one month, starting in October 2023. Ten common patient queries regarding hypertension were presented to ChatGPT (https://chat.openai.com/) and ChatSonic (https://writesonic.com/chat), and the responses were recorded. Two internal medicine physicians assessed the responses using the GQS and the Modified DISCERN scale. Statistical analysis included Cohen's Kappa values for inter-rater agreement. RESULTS The study evaluated responses from ChatGPT and ChatSonic for 10 patient queries. Assessors observed variations in the quality and reliability assessments between the two AI models. Cohen's Kappa values indicated minimal agreement between the evaluators for both the GQS and Modified DISCERN scale. CONCLUSIONS This study highlights the variations in the assessment of responses generated by ChatGPT and ChatSonic for hypertension-related queries. The findings highlight the need for ongoing monitoring and fact-checking of AI-generated responses.
Collapse
Affiliation(s)
| | | | - Heena V Patel
- Internal Medicine, Gujarat Cancer Society (GCS) Medical College, Hospital, and Research Center, Ahmedabad, IND
| | - Afshan Nizami
- Medicine and Surgery, Appollo Medical College, Hyderabad, IND
| | - Aditya Singh
- Cardiology, Bhartiya Vidyapreet Medical College and Hospital, Sangli, IND
| | - Bianca Nunez
- Internal Medicine, Universidad autonoma e Guadalajara, Guadalajara , MEX
| | | |
Collapse
|
26
|
Armitage RC. Digital health technologies: Compounding the existing ethical challenges of the 'right' not to know. J Eval Clin Pract 2024; 30:774-779. [PMID: 38493485 DOI: 10.1111/jep.13980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 02/13/2024] [Accepted: 03/02/2024] [Indexed: 03/19/2024]
Abstract
INTRODUCTION Doctors hold a prima facie duty to respect the autonomy of their patients. This manifests as the patient's 'right' not to know when patients wish to remain unaware of medical information regarding their health, and poses ethical challenges for good medical practice. This paper explores how the emergence of digital health technologies might impact upon the patient's 'right' not to know. METHOD The capabilities of digital health technologies are surveyed and ethical implications of their effects on the 'right' not to know are explored. FINDINGS Digital health technologies are increasingly collecting, processing and presenting medical data as clinically useful information, which simultaneously presents large opportunities for improved health outcomes while compounding the existing ethical challenges generated by the patient's 'right' not to know. CONCLUSION These digital tools should be designed to include functionality that mitigates these ethical challenges, and allows the preservation of their user's autonomy with regard to the medical information they wish to learn and not learn about.
Collapse
Affiliation(s)
- Richard C Armitage
- Academic Unit of Population and Lifespan Sciences, School of Medicine, University of Nottingham, Nottingham, UK
| |
Collapse
|
27
|
Viswanathan VS, Parmar V, Madabhushi A. Towards equitable AI in oncology. Nat Rev Clin Oncol 2024; 21:628-637. [PMID: 38849530 DOI: 10.1038/s41571-024-00909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2024] [Indexed: 06/09/2024]
Abstract
Artificial intelligence (AI) stands at the threshold of revolutionizing clinical oncology, with considerable potential to improve early cancer detection and risk assessment, and to enable more accurate personalized treatment recommendations. However, a notable imbalance exists in the distribution of the benefits of AI, which disproportionately favour those living in specific geographical locations and in specific populations. In this Perspective, we discuss the need to foster the development of equitable AI tools that are both accurate in and accessible to a diverse range of patient populations, including those in low-income to middle-income countries. We also discuss some of the challenges and potential solutions in attaining equitable AI, including addressing the historically limited representation of diverse populations in existing clinical datasets and the use of inadequate clinical validation methods. Additionally, we focus on extant sources of inequity including the type of model approach (such as deep learning, and feature engineering-based methods), the implications of dataset curation strategies, the need for rigorous validation across a variety of populations and settings, and the risk of introducing contextual bias that comes with developing tools predominantly in high-income countries.
Collapse
Affiliation(s)
| | - Vani Parmar
- Department of Breast Surgical Oncology, Punyashlok Ahilyadevi Holkar Head & Neck Cancer Institute of India, Mumbai, India
| | - Anant Madabhushi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA.
- Atlanta Veterans Administration Medical Center, Atlanta, GA, USA.
| |
Collapse
|
28
|
Shi R, Liu S, Xu X, Ye Z, Yang J, Le Q, Qiu J, Tian L, Wei A, Shan K, Zhao C, Sun X, Zhou X, Hong J. Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: A two-phase study. Heliyon 2024; 10:e34391. [PMID: 39113991 PMCID: PMC11305187 DOI: 10.1016/j.heliyon.2024.e34391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 07/08/2024] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open
Abstract
Purpose To evaluate the performance of four large language models (LLMs)-GPT-4, PaLM 2, Qwen, and Baichuan 2-in generating responses to inquiries from Chinese patients about dry eye disease (DED). Design Two-phase study, including a cross-sectional test in the first phase and a real-world clinical assessment in the second phase. Subjects Eight board-certified ophthalmologists and 46 patients with DED. Methods The chatbots' responses to Chinese patients' inquiries about DED were assessed by the evaluation. In the first phase, six senior ophthalmologists subjectively rated the chatbots' responses using a 5-point Likert scale across five domains: correctness, completeness, readability, helpfulness, and safety. Objective readability analysis was performed using a Chinese readability analysis platform. In the second phase, 46 representative patients with DED asked the two language models (GPT-4 and Baichuan 2) that performed best in the in the first phase questions and then rated the answers for satisfaction and readability. Two senior ophthalmologists then assessed the responses across the five domains. Main outcome measures Subjective scores for the five domains and objective readability scores in the first phase. The patient satisfaction, readability scores, and subjective scores for the five-domains in the second phase. Results In the first phase, GPT-4 exhibited superior performance across the five domains (correctness: 4.47; completeness: 4.39; readability: 4.47; helpfulness: 4.49; safety: 4.47, p < 0.05). However, the readability analysis revealed that GPT-4's responses were highly complex, with an average score of 12.86 (p < 0.05) compared to scores of 10.87, 11.53, and 11.26 for Qwen, Baichuan 2, and PaLM 2, respectively. In the second phase, as shown by the scores for the five domains, both GPT-4 and Baichuan 2 were adept in answering questions posed by patients with DED. However, the completeness of Baichuan 2's responses was relatively poor (4.04 vs. 4.48 for GPT-4, p < 0.05). Nevertheless, Baichuan 2's recommendations more comprehensible than those of GPT-4 (patient readability: 3.91 vs. 4.61, p < 0.05; ophthalmologist readability: 2.67 vs. 4.33). Conclusions The findings underscore the potential of LLMs, particularly that of GPT-4 and Baichuan 2, in delivering accurate and comprehensive responses to questions from Chinese patients about DED.
Collapse
Affiliation(s)
- Runhan Shi
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
- NHC Key laboratory of molecular engineering of polymers, Fudan University, Shanghai, 200031, China
- Shanghai Engineering Research Center of Synthetic Immunology, Shanghai, 200032, China
- Department of Ophthalmology, Children's Hospital of Fudan University, National Pediatric Medical Center of China, Shanghai, China
| | - Steven Liu
- Department of Statistics, College of Liberal Arts & Sciences, University of Illinois Urbana-Champaign, Illinois, USA
| | - Xinwei Xu
- Faculty of Business and Economics, Hong Kong University, Hong Kong Special Administrative Region, China
| | - Zhengqiang Ye
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Jin Yang
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Qihua Le
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Jini Qiu
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Lijia Tian
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Anji Wei
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Kun Shan
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Chen Zhao
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Xinghuai Sun
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Xingtao Zhou
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
| | - Jiaxu Hong
- Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
- NHC Key laboratory of molecular engineering of polymers, Fudan University, Shanghai, 200031, China
- Shanghai Engineering Research Center of Synthetic Immunology, Shanghai, 200032, China
- Department of Ophthalmology, Children's Hospital of Fudan University, National Pediatric Medical Center of China, Shanghai, China
| |
Collapse
|
29
|
Rocha-Silva R, de Lima BE, José G, Cordeiro DF, Viana RB, Andrade MS, Vancini RL, Rosemann T, Weiss K, Knechtle B, Arida RM, de Lira CAB. The potential of large language model chatbots for application to epilepsy: Let's talk about physical exercise. Epilepsy Behav Rep 2024; 27:100692. [PMID: 39416714 PMCID: PMC11480856 DOI: 10.1016/j.ebr.2024.100692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 10/19/2024] Open
Abstract
In this paper, we discuss how artificial intelligence chatbots based on large-scale language models (LLMs) can be used to disseminate information about the benefits of physical exercise for individuals with epilepsy. LLMs have demonstrated the ability to generate increasingly detailed text and allow structured dialogs. These can be useful tools, providing guidance and advice to people with epilepsy on different forms of treatment as well as physical exercise. We also examine the limitations of LLMs, which include the need for human supervision and the risk of providing imprecise and unreliable information regarding specific or controversial aspects of the topic. Despite these challenges, LLM chatbots have demonstrated the potential to support the management of epilepsy and break down barriers to information access, particularly information on physical exercise.
Collapse
Affiliation(s)
- Rizia Rocha-Silva
- Faculty of Physical Education and Dance, Federal University of Goiás, Goiânia, Brazil
| | | | - Geovana José
- Faculty of Information and Communication, Federal University of Goiás, Goiânia, Brazil
| | | | - Ricardo Borges Viana
- Institute of Physical Education and Sports, Federal University of Ceará, Fortaleza, Brazil
| | | | - Rodrigo Luiz Vancini
- Center for Physical Education and Sports, Federal University of Espírito Santo, Vitória, Brazil
| | - Thomas Rosemann
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
| | - Katja Weiss
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
| | - Beat Knechtle
- Institute of Primary Care, University of Zurich, Zurich, Switzerland
- Medbase St. Gallen Am Vadianplatz, St. Gallen, Switzerland
| | - Ricardo Mario Arida
- Department of Physiology, Federal University of São Paulo, São Paulo, Brazil
| | | |
Collapse
|
30
|
Jahani S, Dehghanian Z, Takian A. ChatGPT and Refugee's Health: Innovative Solutions for Changing the Game. Int J Public Health 2024; 69:1607306. [PMID: 38919278 PMCID: PMC11197936 DOI: 10.3389/ijph.2024.1607306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 05/07/2024] [Indexed: 06/27/2024] Open
Affiliation(s)
- Shima Jahani
- Multiple Sclerosis Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Dehghanian
- Department of Computer Engineering and Information Technology, Faculty of Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Amirhossein Takian
- Department of Global Health and Public Policy, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
- Department of Health Economics and Management, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
31
|
Lee JH, Choi E, McDougal R, Lytton WW. GPT-4 Performance for Neurologic Localization. Neurol Clin Pract 2024; 14:e200293. [PMID: 38596779 PMCID: PMC11003355 DOI: 10.1212/cpj.0000000000200293] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 01/23/2024] [Indexed: 04/11/2024]
Abstract
Background and Objectives In health care, large language models such as Generative Pretrained Transformers (GPTs), trained on extensive text datasets, have potential applications in reducing health care disparities across regions and populations. Previous software developed for lesion localization has been limited in scope. This study aims to evaluate the capability of GPT-4 for lesion localization based on clinical presentation. Methods GPT-4 was prompted using history and neurologic physical examination (H&P) from published cases of acute stroke followed by questions for clinical reasoning with answering for "single or multiple lesions," "side," and "brain region" using Zero-Shot Chain-of-Thought and Text Classification prompting. GPT-4 output on 3 separate trials for each of 46 cases was compared with imaging-based localization. Results GPT-4 successfully processed raw text from H&P to generate accurate neuroanatomical localization and detailed clinical reasoning. Performance metrics across trial-based analysis for specificity, sensitivity, precision, and F1-score were 0.87, 0.74, 0.75, and 0.74, respectively, for side; 0.94, 0.85, 0.84, and 0.85, respectively, for brain region. Class labels within the brain region were similarly high for all regions except the cerebellum and were also similar when considering all 3 trials to examine metrics by case. Errors were due to extrinsic causes-inadequate information in the published cases, and intrinsic causes-failures of logic or inadequate knowledge base. Discussion This study reveals capabilities of GPT-4 in the localization of acute stroke lesions, showing a potential future role as a clinical tool in neurology.
Collapse
Affiliation(s)
- Jung-Hyun Lee
- Department of Neurology (J-HL, WWL), State University of New York Downstate Health Sciences University; Department of Neurology (J-HL, WWL), Kings County Hospital; Department of Neurology (J-HL), Maimonides Medical Center, Brooklyn; Department of Internal Medicine (EC), Lincoln Medical Center, Bronx, NY; Department of Biostatistics (RM), Yale School of Public Health; Program in Computational Biology and Bioinformatics (RM); Wu-Tsai Institute (RM); Section of Biomedical Informatics and Data Science (RM), Yale School of Medicine, Yale University, New Haven, CT; and Department of Physiology and Pharmacology (WWL), State University of New York Downstate Health Sciences University, Brooklyn, NY
| | - Eunhee Choi
- Department of Neurology (J-HL, WWL), State University of New York Downstate Health Sciences University; Department of Neurology (J-HL, WWL), Kings County Hospital; Department of Neurology (J-HL), Maimonides Medical Center, Brooklyn; Department of Internal Medicine (EC), Lincoln Medical Center, Bronx, NY; Department of Biostatistics (RM), Yale School of Public Health; Program in Computational Biology and Bioinformatics (RM); Wu-Tsai Institute (RM); Section of Biomedical Informatics and Data Science (RM), Yale School of Medicine, Yale University, New Haven, CT; and Department of Physiology and Pharmacology (WWL), State University of New York Downstate Health Sciences University, Brooklyn, NY
| | - Robert McDougal
- Department of Neurology (J-HL, WWL), State University of New York Downstate Health Sciences University; Department of Neurology (J-HL, WWL), Kings County Hospital; Department of Neurology (J-HL), Maimonides Medical Center, Brooklyn; Department of Internal Medicine (EC), Lincoln Medical Center, Bronx, NY; Department of Biostatistics (RM), Yale School of Public Health; Program in Computational Biology and Bioinformatics (RM); Wu-Tsai Institute (RM); Section of Biomedical Informatics and Data Science (RM), Yale School of Medicine, Yale University, New Haven, CT; and Department of Physiology and Pharmacology (WWL), State University of New York Downstate Health Sciences University, Brooklyn, NY
| | - William W Lytton
- Department of Neurology (J-HL, WWL), State University of New York Downstate Health Sciences University; Department of Neurology (J-HL, WWL), Kings County Hospital; Department of Neurology (J-HL), Maimonides Medical Center, Brooklyn; Department of Internal Medicine (EC), Lincoln Medical Center, Bronx, NY; Department of Biostatistics (RM), Yale School of Public Health; Program in Computational Biology and Bioinformatics (RM); Wu-Tsai Institute (RM); Section of Biomedical Informatics and Data Science (RM), Yale School of Medicine, Yale University, New Haven, CT; and Department of Physiology and Pharmacology (WWL), State University of New York Downstate Health Sciences University, Brooklyn, NY
| |
Collapse
|
32
|
Karobari MI, Suryawanshi H, Patil SR. Revolutionizing oral and maxillofacial surgery: ChatGPT's impact on decision support, patient communication, and continuing education. Int J Surg 2024; 110:3143-3145. [PMID: 38446838 PMCID: PMC11175733 DOI: 10.1097/js9.0000000000001286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/22/2024] [Indexed: 03/08/2024]
Affiliation(s)
- Mohmed Isaqali Karobari
- Department of Restorative Dentistry and Endodontics, Faculty of Dentistry, University of Puthisastra, Phnom Penh, Cambodia
- Dental Research Unit, Center for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu
| | - Hema Suryawanshi
- Department of Oral Pathology and Microbiology, Chhattisgarh Dental College and Research Institute
| | - Santosh R. Patil
- Department of Oral Medicine and Radiology, Chhattisgarh Dental College and Research Institute, India
| |
Collapse
|
33
|
Wang S, Mo C, Chen Y, Dai X, Wang H, Shen X. Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care. JMIR MEDICAL EDUCATION 2024; 10:e55595. [PMID: 38693697 PMCID: PMC11067446 DOI: 10.2196/55595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 05/03/2024]
Abstract
Background Artificial intelligence (AI) chatbots, such as ChatGPT-4, have shown immense potential for application across various aspects of medicine, including medical education, clinical practice, and research. Objective This study aimed to evaluate the performance of ChatGPT-4 in the 2023 Taiwan Audiologist Qualification Examination, thereby preliminarily exploring the potential utility of AI chatbots in the fields of audiology and hearing care services. Methods ChatGPT-4 was tasked to provide answers and reasoning for the 2023 Taiwan Audiologist Qualification Examination. The examination encompassed six subjects: (1) basic auditory science, (2) behavioral audiology, (3) electrophysiological audiology, (4) principles and practice of hearing devices, (5) health and rehabilitation of the auditory and balance systems, and (6) auditory and speech communication disorders (including professional ethics). Each subject included 50 multiple-choice questions, with the exception of behavioral audiology, which had 49 questions, amounting to a total of 299 questions. Results The correct answer rates across the 6 subjects were as follows: 88% for basic auditory science, 63% for behavioral audiology, 58% for electrophysiological audiology, 72% for principles and practice of hearing devices, 80% for health and rehabilitation of the auditory and balance systems, and 86% for auditory and speech communication disorders (including professional ethics). The overall accuracy rate for the 299 questions was 75%, which surpasses the examination's passing criteria of an average 60% accuracy rate across all subjects. A comprehensive review of ChatGPT-4's responses indicated that incorrect answers were predominantly due to information errors. Conclusions ChatGPT-4 demonstrated a robust performance in the Taiwan Audiologist Qualification Examination, showcasing effective logical reasoning skills. Our results suggest that with enhanced information accuracy, ChatGPT-4's performance could be further improved. This study indicates significant potential for the application of AI chatbots in audiology and hearing care services.
Collapse
Affiliation(s)
- Shangqiguo Wang
- Human Communication, Learning, and Development Unit, Faculty of Education, The University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Changgeng Mo
- Department of Otorhinolaryngology, Head and Neck Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Yuan Chen
- Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Xiaolu Dai
- Department of Social Work, Hong Kong Baptist University, Hong Kong, China (Hong Kong)
| | - Huiyi Wang
- Department of Medical Services, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoli Shen
- Department of Health and Early Childhood Care, Ningbo College of Health School, Ningbo, China
| |
Collapse
|
34
|
Huang Y, Wu R, He J, Xiang Y. Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R. J Glob Health 2024; 14:04070. [PMID: 38547497 PMCID: PMC10978058 DOI: 10.7189/jogh.14.04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024] Open
Abstract
Background OpenAI's Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4), an emerging artificial intelligence (AI)-based large language model (LLM), has been receiving increasing attention from the medical research community for its innovative 'Data Analyst' feature. We aimed to compare the capabilities of ChatGPT-4 against traditional biostatistical software (i.e. SAS, SPSS, R) in statistically analysing epidemiological research data. Methods We used a data set from the China Health and Nutrition Survey, comprising 9317 participants and 29 variables (e.g. gender, age, educational level, marital status, income, occupation, weekly working hours, survival status). Two researchers independently evaluated the data analysis capabilities of GPT-4's 'Data Analyst' feature against SAS, SPSS, and R across three commonly used epidemiological analysis methods: Descriptive statistics, intergroup analysis, and correlation analysis. We used an internally developed evaluation scale to assess and compare the consistency of results, analytical efficiency of coding or operations, user-friendliness, and overall performance between ChatGPT-4, SAS, SPSS, and R. Results In descriptive statistics, ChatGPT-4 showed high consistency of results, greater analytical efficiency of code or operations, and more intuitive user-friendliness compared to SAS, SPSS, and R. In intergroup comparisons and correlational analyses, despite minor discrepancies in statistical outcomes for certain analysis tasks with SAS, SPSS, and R, ChatGPT-4 maintained high analytical efficiency and exceptional user-friendliness. Thus, employing ChatGPT-4 can significantly lower the operational threshold for conducting epidemiological data analysis while maintaining consistency with traditional biostatistical software's outcome, requiring only specific, clear analysis instructions without any additional operations or code writing. Conclusions We found ChatGPT-4 to be a powerful auxiliary tool for statistical analysis in epidemiological research. However, it showed limitations in result consistency and in applying more advanced statistical methods. Therefore, we advocate for the use of ChatGPT-4 in supporting researchers with intermediate experience in data analysis. With AI technologies like LLMs advancing rapidly, their integration with data analysis platforms promises to lower operational barriers, thereby enabling researchers to dedicate greater focus to the nuanced interpretation of analysis results. This development is likely to significantly advance epidemiological and medical research.
Collapse
Affiliation(s)
- Yeen Huang
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Ruipeng Wu
- Key Laboratory for Molecular Genetic Mechanisms and Intervention Research, On High Altitude Disease of Tibet Autonomous Region, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of High Altitude Hypoxia Environment and Life Health, School of Medicine, Xizang Minzu University, Xianyang, Xizang, China
- Key Laboratory of Environmental Medicine and Engineering of Ministry of Education, Department of Nutrition and Food Hygiene, School of Public Health, Southeast University, Nanjing, Jiangsu, China
| | - Juntao He
- Physical and Chemical Testing Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| | - Yingping Xiang
- Occupational Hazard Assessment Institute, Shenzhen Prevention and Treatment Center for Occupational Diseases, Shenzhen, Guangdong, China
| |
Collapse
|
35
|
Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst 2024; 48:23. [PMID: 38367119 PMCID: PMC10874304 DOI: 10.1007/s10916-024-02043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/10/2024] [Indexed: 02/19/2024]
Abstract
Large Language Models (LLMs) such as General Pretrained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT), which use transformer model architectures, have significantly advanced artificial intelligence and natural language processing. Recognized for their ability to capture associative relationships between words based on shared context, these models are poised to transform healthcare by improving diagnostic accuracy, tailoring treatment plans, and predicting patient outcomes. However, there are multiple risks and potentially unintended consequences associated with their use in healthcare applications. This study, conducted with 28 participants using a qualitative approach, explores the benefits, shortcomings, and risks of using transformer models in healthcare. It analyses responses to seven open-ended questions using a simplified thematic analysis. Our research reveals seven benefits, including improved operational efficiency, optimized processes and refined clinical documentation. Despite these benefits, there are significant concerns about the introduction of bias, auditability issues and privacy risks. Challenges include the need for specialized expertise, the emergence of ethical dilemmas and the potential reduction in the human element of patient care. For the medical profession, risks include the impact on employment, changes in the patient-doctor dynamic, and the need for extensive training in both system operation and data interpretation.
Collapse
Affiliation(s)
- Kerstin Denecke
- Institute Patient-centered Digital Health, Bern University of Applied Sciences, Quellgasse 21, Biel, 2502, Switzerland.
| | - Richard May
- Harz University of Applied Sciences, Friedrichstraße 57-59, 38855, Wernigerode, Germany
| | - Octavio Rivera-Romero
- Instituto de Ingeniería Informática (I3US), Universidad de Sevilla, Sevilla, Spain
- Department of Electronic Technology, Universidad de Sevilla, Avda Reina Mercedes s/n, ETSI Informática, G1.43, Sevilla, 41012, Spain
| |
Collapse
|
36
|
Benítez TM, Xu Y, Boudreau JD, Kow AWC, Bello F, Van Phuoc L, Wang X, Sun X, Leung GKK, Lan Y, Wang Y, Cheng D, Tham YC, Wong TY, Chung KC. Harnessing the potential of large language models in medical education: promise and pitfalls. J Am Med Inform Assoc 2024; 31:776-783. [PMID: 38269644 PMCID: PMC10873781 DOI: 10.1093/jamia/ocad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/09/2023] [Accepted: 12/17/2023] [Indexed: 01/26/2024] Open
Abstract
OBJECTIVES To provide balanced consideration of the opportunities and challenges associated with integrating Large Language Models (LLMs) throughout the medical school continuum. PROCESS Narrative review of published literature contextualized by current reports of LLM application in medical education. CONCLUSIONS LLMs like OpenAI's ChatGPT can potentially revolutionize traditional teaching methodologies. LLMs offer several potential advantages to students, including direct access to vast information, facilitation of personalized learning experiences, and enhancement of clinical skills development. For faculty and instructors, LLMs can facilitate innovative approaches to teaching complex medical concepts and fostering student engagement. Notable challenges of LLMs integration include the risk of fostering academic misconduct, inadvertent overreliance on AI, potential dilution of critical thinking skills, concerns regarding the accuracy and reliability of LLM-generated content, and the possible implications on teaching staff.
Collapse
Affiliation(s)
- Trista M Benítez
- Department of Surgery, University of Michigan Medical School, Ann Arbor, MI 48109, United States
| | - Yueyuan Xu
- Tsinghua Medicine, Tsinghua University, Beijing, 100084, China
| | - J Donald Boudreau
- Institute of Health Sciences Education, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3A 0G4, Canada
| | - Alfred Wei Chieh Kow
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, 117597, Singapore
| | - Fernando Bello
- Technology Enhanced Learning and Innovation Department, Duke-NUS Medical School, National University of Singapore, 169857, Singapore
| | - Le Van Phuoc
- College of Health Sciences, VinUniversity, Hanoi, 100000, Vietnam
| | - Xiaofei Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Xiaodong Sun
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai JiaoTong University, Shanghai, 200240, China
| | - Gilberto Ka-Kit Leung
- Department of Surgery, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong, 999077, China
| | - Yanyan Lan
- Institute of AI Industrial Research, Tsinghua University, Beijing, 100084, China
| | - Yaxing Wang
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Beijing Ophthalmology and Visual Sciences Key Laboratory, Capital University of Medical Science, Beijing, 100730, China
| | - Davy Cheng
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yih-Chung Tham
- Centre for Innovation and Precision Eye Health; and Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, 117597, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical School, 169857, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, 168751, Singapore
| | - Tien Yin Wong
- Tsinghua Medicine, Tsinghua University, Beijing, 100084, China
- Singapore Eye Research Institute, Singapore National Eye Centre, 168751, Singapore
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Beijing, 100084, China
| | - Kevin C Chung
- Department of Surgery, University of Michigan Medical School, Ann Arbor, MI 48109, United States
| |
Collapse
|
37
|
Ni Z, Peng ML, Balakrishnan V, Tee V, Azwa I, Saifi R, Nelson LE, Vlahov D, Altice FL. Implementation of Chatbot Technology in Health Care: Protocol for a Bibliometric Analysis. JMIR Res Protoc 2024; 13:e54349. [PMID: 38228575 PMCID: PMC10905346 DOI: 10.2196/54349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/07/2023] [Accepted: 01/16/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Chatbots have the potential to increase people's access to quality health care. However, the implementation of chatbot technology in the health care system is unclear due to the scarce analysis of publications on the adoption of chatbot in health and medical settings. OBJECTIVE This paper presents a protocol of a bibliometric analysis aimed at offering the public insights into the current state and emerging trends in research related to the use of chatbot technology for promoting health. METHODS In this bibliometric analysis, we will select published papers from the databases of CINAHL, IEEE Xplore, PubMed, Scopus, and Web of Science that pertain to chatbot technology and its applications in health care. Our search strategy includes keywords such as "chatbot," "virtual agent," "virtual assistant," "conversational agent," "conversational AI," "interactive agent," "health," and "healthcare." Five researchers who are AI engineers and clinicians will independently review the titles and abstracts of selected papers to determine their eligibility for a full-text review. The corresponding author (ZN) will serve as a mediator to address any discrepancies and disputes among the 5 reviewers. Our analysis will encompass various publication patterns of chatbot research, including the number of annual publications, their geographic or institutional distribution, and the number of annual grants supporting chatbot research, and further summarize the methodologies used in the development of health-related chatbots, along with their features and applications in health care settings. Software tool VOSViewer (version 1.6.19; Leiden University) will be used to construct and visualize bibliometric networks. RESULTS The preparation for the bibliometric analysis began on December 3, 2021, when the research team started the process of familiarizing themselves with the software tools that may be used in this analysis, VOSViewer and CiteSpace, during which they consulted 3 librarians at the Yale University regarding search terms and tentative results. Tentative searches on the aforementioned databases yielded a total of 2340 papers. The official search phase started on July 27, 2023. Our goal is to complete the screening of papers and the analysis by February 15, 2024. CONCLUSIONS Artificial intelligence chatbots, such as ChatGPT (OpenAI Inc), have sparked numerous discussions within the health care industry regarding their impact on human health. Chatbot technology holds substantial promise for advancing health care systems worldwide. However, developing a sophisticated chatbot capable of precise interaction with health care consumers, delivering personalized care, and providing accurate health-related information and knowledge remain considerable challenges. This bibliometric analysis seeks to fill the knowledge gap in the existing literature on health-related chatbots, entailing their applications, the software used in their development, and their preferred functionalities among users. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/54349.
Collapse
Affiliation(s)
- Zhao Ni
- School of Nursing, Yale University, Orange, CT, United States
- Center for Interdisciplinary Research on AIDS, Yale University, New Haven, CT, United States
| | - Mary L Peng
- Department of Global Health and Social Medicine, Harvard Medical School, Harvard University, Boston, MA, United States
| | - Vimala Balakrishnan
- Department of Information Systems, Faculty of Computer Science and Information Technology, Unversity of Malaya, Kuala Lumpur, Malaysia
| | - Vincent Tee
- Centre of Excellence for Research in AIDS, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Iskandar Azwa
- Centre of Excellence for Research in AIDS, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
- Infectious Disease Unit, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Rumana Saifi
- Centre of Excellence for Research in AIDS, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - LaRon E Nelson
- School of Nursing, Yale University, Orange, CT, United States
- Center for Interdisciplinary Research on AIDS, Yale University, New Haven, CT, United States
| | - David Vlahov
- School of Nursing, Yale University, Orange, CT, United States
- Center for Interdisciplinary Research on AIDS, Yale University, New Haven, CT, United States
| | - Frederick L Altice
- Center for Interdisciplinary Research on AIDS, Yale University, New Haven, CT, United States
- Centre of Excellence for Research in AIDS, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
- Section of Infectious Disease, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, United States
- Division of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| |
Collapse
|
38
|
Pandya A, Lodha P, Ganatra A. Is ChatGPT ready to change mental healthcare? Challenges and considerations: a reality-check. FRONTIERS IN HUMAN DYNAMICS 2024; 5. [DOI: 10.3389/fhumd.2023.1289255] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
As mental healthcare is highly stigmatized, digital platforms and services are becoming popular. A wide variety of exciting and futuristic applications of AI platforms are available now. One such application getting tremendous attention from users and researchers alike is Chat Generative Pre-trained Transformer (ChatGPT). ChatGPT is a powerful chatbot launched by open artificial intelligence (Open AI). ChatGPT interacts with clients conversationally, answering follow-up questions, admitting mistakes, challenging incorrect premises, and rejecting inappropriate requests. With its multifarious applications, the ethical and privacy considerations surrounding the use of these technologies in sensitive areas such as mental health should be carefully addressed to ensure user safety and wellbeing. The authors comment on the ethical challenges with ChatGPT in mental healthcare that need attention at various levels, outlining six major concerns viz., (1) accurate identification and diagnosis of mental health conditions; (2) limited understanding and misinterpretation; (3) safety, and privacy of users; (4) bias and equity; (5) lack of monitoring and regulation; and (6) gaps in evidence, and lack of educational and training curricula.
Collapse
|
39
|
Lam K. ChatGPT for low- and middle-income countries: a Greek gift? THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2023; 41:100906. [PMID: 37745974 PMCID: PMC10514087 DOI: 10.1016/j.lanwpc.2023.100906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 09/04/2023] [Indexed: 09/26/2023]
Affiliation(s)
- Kyle Lam
- Department of Surgery and Cancer, Imperial College London, UK
| |
Collapse
|