1
|
Ortaç M, Ergül RB, Yazılı HB, Özervarlı MF, Tonyalı Ş, Sarılar O, Özgör F. ChatGPT's competence in responding to urological emergencies. ULUS TRAVMA ACIL CER 2025; 31:291-295. [PMID: 40052309 PMCID: PMC11894229 DOI: 10.14744/tjtes.2024.03377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/27/2024] [Accepted: 12/30/2024] [Indexed: 03/14/2025]
Abstract
BACKGROUND In recent years, artificial intelligence (AI) applications have been increasingly used as sources of medical information, alongside their applications in many other fields. This study is the first to evaluate ChatGPT's performance in addressing urological emergencies (UE). METHODS The study included frequently asked questions (FAQs) by the public regarding UE, as well as UE-related questions formulated based on the European Association of Urology (EAU) guidelines. The FAQs were selected from questions posed by patients to doctors and hospital accounts on social media platforms (Facebook, Instagram, and X) and on websites. All questions were presented to ChatGPT 4 (premium version) in English, and the responses were recorded. Two urologists assessed the quality of the responses using a Global Quality Score (GQS) on a scale of 1 to 5. RESULTS Of the 73 total FAQs, 53 (72.6%) received a GQS score of 5, while only two (2.7%) received a GQS score of 1. The questions with a GQS score of 1 pertained to priapism and urosepsis. The topic with the highest proportion of responses receiving a GQS score of 5 was urosepsis (82.3%), whereas the lowest scores were observed in questions related to renal trauma (66.7%) and postrenal acute kidney injury (66.7%). A total of 42 questions were formulated based on the EAU guidelines, of which 23 (54.8%) received a GQS score of 5 from the physicians. The mean GQS score for FAQs was 4.38+-1.14, which was significantly higher (p=0.009) than the mean GQS score for EAU guideline-based questions (3.88+-1.47). CONCLUSION This study demonstrated for the first time that nearly three out of four FAQs were answered accurately and satisfactorily by ChatGPT. However, the accuracy and proficiency of ChatGPT's responses significantly decreased when addressing guideline-based questions on UE.
Collapse
Affiliation(s)
- Mazhar Ortaç
- Department of Urology, Istanbul Faculty of Medicine, Istanbul University, Istanbul-Türkiye
| | - Rıfat Burak Ergül
- Department of Urology, Istanbul Faculty of Medicine, Istanbul University, Istanbul-Türkiye
| | | | | | - Şenol Tonyalı
- Department of Urology, Istanbul Faculty of Medicine, Istanbul University, Istanbul-Türkiye
| | - Omer Sarılar
- Department of Urology, Haseki Training and Research Hospital, Istanbul-Türkiye
| | - Faruk Özgör
- Department of Urology, Haseki Training and Research Hospital, Istanbul-Türkiye
| |
Collapse
|
2
|
Tangsrivimol JA, Darzidehkalani E, Virk HUH, Wang Z, Egger J, Wang M, Hacking S, Glicksberg BS, Strauss M, Krittanawong C. Benefits, limits, and risks of ChatGPT in medicine. Front Artif Intell 2025; 8:1518049. [PMID: 39949509 PMCID: PMC11821943 DOI: 10.3389/frai.2025.1518049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Accepted: 01/15/2025] [Indexed: 02/16/2025] Open
Abstract
ChatGPT represents a transformative technology in healthcare, with demonstrated impacts across clinical practice, medical education, and research. Studies show significant efficiency gains, including 70% reduction in administrative time for discharge summaries and achievement of medical professional-level performance on standardized tests (60% accuracy on USMLE, 78.2% on PubMedQA). ChatGPT offers personalized learning platforms, automated scoring, and instant access to vast medical knowledge in medical education, addressing resource limitations and enhancing training efficiency. It streamlines clinical workflows by supporting triage processes, generating discharge summaries, and alleviating administrative burdens, allowing healthcare professionals to focus more on patient care. Additionally, ChatGPT facilitates remote monitoring and chronic disease management, providing personalized advice, medication reminders, and emotional support, thus bridging gaps between clinical visits. Its ability to process and synthesize vast amounts of data accelerates research workflows, aiding in literature reviews, hypothesis generation, and clinical trial designs. This paper aims to gather and analyze published studies involving ChatGPT, focusing on exploring its advantages and disadvantages within the healthcare context. To aid in understanding and progress, our analysis is organized into six key areas: (1) Information and Education, (2) Triage and Symptom Assessment, (3) Remote Monitoring and Support, (4) Mental Healthcare Assistance, (5) Research and Decision Support, and (6) Language Translation. Realizing ChatGPT's full potential in healthcare requires addressing key limitations, such as its lack of clinical experience, inability to process visual data, and absence of emotional intelligence. Ethical, privacy, and regulatory challenges further complicate its integration. Future improvements should focus on enhancing accuracy, developing multimodal AI models, improving empathy through sentiment analysis, and safeguarding against artificial hallucination. While not a replacement for healthcare professionals, ChatGPT can serve as a powerful assistant, augmenting their expertise to improve efficiency, accessibility, and quality of care. This collaboration ensures responsible adoption of AI in transforming healthcare delivery. While ChatGPT demonstrates significant potential in healthcare transformation, systematic evaluation of its implementation across different healthcare settings reveals varying levels of evidence quality-from robust randomized trials in medical education to preliminary observational studies in clinical practice. This heterogeneity in evidence quality necessitates a structured approach to future research and implementation.
Collapse
Affiliation(s)
- Jonathan A. Tangsrivimol
- Department of Neurosurgery, and Neuroscience, Weill Cornell Medicine, NewYork-Presbyterian Hospital, New York, NY, United States
- Department of Neurosurgery, Chulabhorn Hospital, Chulabhorn Royal Academy, Bangkok, Thailand
| | - Erfan Darzidehkalani
- MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Hafeez Ul Hassan Virk
- Harrington Heart & Vascular Institute, University Hospitals Cleveland Medical Center, Case Western Reserve University, Cleveland, OH, United States
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Jan Egger
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Essen, Germany
| | - Michelle Wang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Sean Hacking
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, United States
| | - Benjamin S. Glicksberg
- Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Markus Strauss
- Department of Cardiology I, Coronary and Peripheral Vascular Disease, Heart Failure Medicine, University Hospital Muenster, Muenster, Germany
- Department of Cardiology, Sector Preventive Medicine, Health Promotion, Faculty of Health, School of Medicine, University Witten/Herdecke, Hagen, Germany
| | - Chayakrit Krittanawong
- Cardiology Division, New York University Langone Health, New York University School of Medicine, New York, NY, United States
- HumanX, Delaware, DE, United States
| |
Collapse
|
3
|
Sahin Ozdemir M, Ozdemir YE. Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis. Sci Rep 2025; 15:1712. [PMID: 39799203 PMCID: PMC11724965 DOI: 10.1038/s41598-024-83575-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 12/16/2024] [Indexed: 01/15/2025] Open
Abstract
This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes "questions and answers (Q&As) for the public" determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen's kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini's mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots' answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen's kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.
Collapse
Affiliation(s)
- Meryem Sahin Ozdemir
- Department of Infectious Diseases and Clinical Microbiology, Basaksehir Cam and Sakura City Hospital, Istanbul, 34480, Turkey
| | - Yusuf Emre Ozdemir
- Department of Infectious Diseases and Clinical Microbiology, Bakirkoy Dr Sadi Konuk Training and Research Hospital, Istanbul, 34140, Turkey.
| |
Collapse
|
4
|
Başaran M, Duman C. Dialogues with artificial intelligence: Exploring medical students' perspectives on ChatGPT. MEDICAL TEACHER 2024:1-10. [PMID: 39692300 DOI: 10.1080/0142159x.2024.2438766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 12/03/2024] [Indexed: 12/19/2024]
Abstract
ChatGPT has initiated a new era of inquiry into sources of information within the scientific community. Studies leveraging ChatGPT in the medical field have demonstrated notable performance in academic processes and healthcare applications. This research presents how medical students have benefited from ChatGPT during their educational journey and the challenges they encountered, as reported through their personal experiences. The methodological framework of this study adheres to the stages of qualitative research. An explanatory case study, a qualitative research method, was adopted to determine user experiences with ChatGPT. Content analysis based on student experiences with ChatGPT indicates that it may offer advantages in health education as a resource for scientific research activities. However, adverse reports were also identified, including ethical issues, lack of personal data protection, and potential misuse in scientific research. This study emphasizes the need for comprehensive steps in effectively integrating AI tools like ChatGPT into medical education as a new technology.
Collapse
Affiliation(s)
- Mehmet Başaran
- Curriculum and Instruction, Gaziantep University, Gaziantep, Turkey
| | - Cevahir Duman
- Curriculum and Instruction, Gaziantep University, Gaziantep, Turkey
| |
Collapse
|
5
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton E, Malin B, Yin Z. Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review. J Med Internet Res 2024; 26:e22769. [PMID: 39509695 PMCID: PMC11582494 DOI: 10.2196/22769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 09/19/2024] [Accepted: 10/03/2024] [Indexed: 11/15/2024] Open
Abstract
BACKGROUND The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. OBJECTIVE This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. METHODS We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. RESULTS Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. CONCLUSIONS Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Ellen Clayton
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, United States
- School of Law, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Bradley Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
6
|
Sacoransky E, Kwan BYM, Soboleski D. ChatGPT and assistive AI in structured radiology reporting: A systematic review. Curr Probl Diagn Radiol 2024; 53:728-737. [PMID: 39004580 DOI: 10.1067/j.cpradiol.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/08/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION The rise of transformer-based large language models (LLMs), such as ChatGPT, has captured global attention with recent advancements in artificial intelligence (AI). ChatGPT demonstrates growing potential in structured radiology reporting-a field where AI has traditionally focused on image analysis. METHODS A comprehensive search of MEDLINE and Embase was conducted from inception through May 2024, and primary studies discussing ChatGPT's role in structured radiology reporting were selected based on their content. RESULTS Of the 268 articles screened, eight were ultimately included in this review. These articles explored various applications of ChatGPT, such as generating structured reports from unstructured reports, extracting data from free text, generating impressions from radiology findings and creating structured reports from imaging data. All studies demonstrated optimism regarding ChatGPT's potential to aid radiologists, though common critiques included data privacy concerns, reliability, medical errors, and lack of medical-specific training. CONCLUSION ChatGPT and assistive AI have significant potential to transform radiology reporting, enhancing accuracy and standardization while optimizing healthcare resources. Future developments may involve integrating dynamic few-shot prompting, ChatGPT, and Retrieval Augmented Generation (RAG) into diagnostic workflows. Continued research, development, and ethical oversight are crucial to fully realize AI's potential in radiology.
Collapse
Affiliation(s)
- Ethan Sacoransky
- Queen's University School of Medicine, 15 Arch St, Kingston, ON K7L 3L4, Canada.
| | - Benjamin Y M Kwan
- Queen's University School of Medicine, 15 Arch St, Kingston, ON K7L 3L4, Canada; Department of Diagnostic Radiology, Kingston Health Sciences Centre, Kingston, ON, Canada
| | - Donald Soboleski
- Queen's University School of Medicine, 15 Arch St, Kingston, ON K7L 3L4, Canada; Department of Diagnostic Radiology, Kingston Health Sciences Centre, Kingston, ON, Canada
| |
Collapse
|
7
|
Oliveira JA, Eskandar K, Kar E, de Oliveira FR, Filho ALDS. Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review. JMIR AI 2024; 3:e64593. [PMID: 39476855 PMCID: PMC11561426 DOI: 10.2196/64593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 09/02/2024] [Accepted: 09/26/2024] [Indexed: 11/17/2024]
Abstract
BACKGROUND Endometriosis is a chronic gynecological condition that affects a significant portion of women of reproductive age, leading to debilitating symptoms such as chronic pelvic pain and infertility. Despite advancements in diagnosis and management, patient education remains a critical challenge. With the rapid growth of digital platforms, artificial intelligence (AI) has emerged as a potential tool to enhance patient education and access to information. OBJECTIVE This systematic review aims to explore the role of AI in facilitating education and improving information accessibility for individuals with endometriosis. METHODS This review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines to ensure rigorous and transparent reporting. We conducted a comprehensive search of PubMed; Embase; the Regional Online Information System for Scientific Journals of Latin America, the Caribbean, Spain and Portugal (LATINDEX); Latin American and Caribbean Literature in Health Sciences (LILACS); Institute of Electrical and Electronics Engineers (IEEE) Xplore, and the Cochrane Central Register of Controlled Trials using the terms "endometriosis" and "artificial intelligence." Studies were selected based on their focus on AI applications in patient education or information dissemination regarding endometriosis. We included studies that evaluated AI-driven tools for assessing patient knowledge and addressed frequently asked questions related to endometriosis. Data extraction and quality assessment were conducted independently by 2 authors, with discrepancies resolved through consensus. RESULTS Out of 400 initial search results, 11 studies met the inclusion criteria and were fully reviewed. We ultimately included 3 studies, 1 of which was an abstract. The studies examined the use of AI models, such as ChatGPT (OpenAI), machine learning, and natural language processing, in providing educational resources and answering common questions about endometriosis. The findings indicated that AI tools, particularly large language models, offer accurate responses to frequently asked questions with varying degrees of sufficiency across different categories. AI's integration with social media platforms also highlights its potential to identify patients' needs and enhance information dissemination. CONCLUSIONS AI holds promise in advancing patient education and information access for endometriosis, providing accurate and comprehensive answers to common queries, and facilitating a better understanding of the condition. However, challenges remain in ensuring ethical use, equitable access, and maintaining accuracy across diverse patient populations. Future research should focus on developing standardized approaches for evaluating AI's impact on patient education and exploring its integration into clinical practice to enhance support for individuals with endometriosis.
Collapse
Affiliation(s)
| | - Karine Eskandar
- Department of Medicine, Pontifical Catholic University of Paraná, Curitiba, Brazil
| | - Emre Kar
- Department of Obstetrics & Gynecology, Cam and Sakura City Hospital, Istanbul, Turkey
| | | | | |
Collapse
|
8
|
Ahn S. The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions. THE KOREAN JOURNAL OF PHYSIOLOGY & PHARMACOLOGY : OFFICIAL JOURNAL OF THE KOREAN PHYSIOLOGICAL SOCIETY AND THE KOREAN SOCIETY OF PHARMACOLOGY 2024; 28:393-401. [PMID: 39198220 PMCID: PMC11362003 DOI: 10.4196/kjpp.2024.28.5.393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/10/2024] [Accepted: 06/14/2024] [Indexed: 09/01/2024]
Abstract
Large language models (LLMs) are rapidly transforming medical writing and publishing. This review article focuses on experimental evidence to provide a comprehensive overview of the current applications, challenges, and future implications of LLMs in various stages of academic research and publishing process. Global surveys reveal a high prevalence of LLM usage in scientific writing, with both potential benefits and challenges associated with its adoption. LLMs have been successfully applied in literature search, research design, writing assistance, quality assessment, citation generation, and data analysis. LLMs have also been used in peer review and publication processes, including manuscript screening, generating review comments, and identifying potential biases. To ensure the integrity and quality of scholarly work in the era of LLM-assisted research, responsible artificial intelligence (AI) use is crucial. Researchers should prioritize verifying the accuracy and reliability of AI-generated content, maintain transparency in the use of LLMs, and develop collaborative human-AI workflows. Reviewers should focus on higher-order reviewing skills and be aware of the potential use of LLMs in manuscripts. Editorial offices should develop clear policies and guidelines on AI use and foster open dialogue within the academic community. Future directions include addressing the limitations and biases of current LLMs, exploring innovative applications, and continuously updating policies and practices in response to technological advancements. Collaborative efforts among stakeholders are necessary to harness the transformative potential of LLMs while maintaining the integrity of medical writing and publishing.
Collapse
Affiliation(s)
- Sangzin Ahn
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 47392, Korea
- Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 47392, Korea
| |
Collapse
|
9
|
Takahashi H, Shikino K, Kondo T, Komori A, Yamada Y, Saita M, Naito T. Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e59133. [PMID: 39137031 PMCID: PMC11350316 DOI: 10.2196/59133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/22/2024] [Accepted: 06/27/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored. OBJECTIVE This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings. METHODS Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases. RESULTS Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations. CONCLUSIONS ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.
Collapse
Affiliation(s)
- Hiromizu Takahashi
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| | - Kiyoshi Shikino
- Department of Community-Oriented Medical Education, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Takeshi Kondo
- Center for Postgraduate Clinical Training and Career Development, Nagoya University Hospital, Aichi, Japan
| | - Akira Komori
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Tsukuba Memorial Hospital, Tsukuba, Japan
| | - Yuji Yamada
- Brookdale Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, NY, NY, United States
| | - Mizue Saita
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| | - Toshio Naito
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| |
Collapse
|
10
|
Xian X, Chang A, Xiang YT, Liu MT. Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review. Interact J Med Res 2024; 13:e53672. [PMID: 39133916 PMCID: PMC11347908 DOI: 10.2196/53672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 08/30/2024] Open
Abstract
BACKGROUND Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain. OBJECTIVE This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature. METHODS Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques). RESULTS In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care. CONCLUSIONS This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices.
Collapse
Affiliation(s)
- Xuechang Xian
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Department of Publicity, Zhaoqing University, Zhaoqing City, China
| | - Angela Chang
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Institute of Communication and Health, Lugano University, Lugano, Switzerland
| | - Yu-Tao Xiang
- Department of Public Health and Medicinal Administration, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | | |
Collapse
|
11
|
Samuel DJ, Sermet Y, Cwiertny D, Demir I. Integrating vision-based AI and large language models for real-time water pollution surveillance. WATER ENVIRONMENT RESEARCH : A RESEARCH PUBLICATION OF THE WATER ENVIRONMENT FEDERATION 2024; 96:e11092. [PMID: 39129273 DOI: 10.1002/wer.11092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 07/10/2024] [Accepted: 07/13/2024] [Indexed: 08/13/2024]
Abstract
Water pollution has become a major concern in recent years, affecting over 2 billion people worldwide, according to UNESCO. This pollution can occur by either naturally, such as algal blooms, or man-made when toxic substances are released into water bodies like lakes, rivers, springs, and oceans. To address this issue and monitor surface-level water pollution in local water bodies, an informative real-time vision-based surveillance system has been developed in conjunction with large language models (LLMs). This system has an integrated camera connected to a Raspberry Pi for processing input frames and is further linked to LLMs for generating contextual information regarding the type, causes, and impact of pollutants on both human health and the environment. This multi-model setup enables local authorities to monitor water pollution and take necessary steps to mitigate it. To train the vision model, seven major types of pollutants found in water bodies like algal bloom, synthetic foams, dead fishes, oil spills, wooden logs, industrial waste run-offs, and trashes were used for achieving accurate detection. ChatGPT API has been integrated with the model to generate contextual information about pollution detected. Thus, the multi-model system can conduct surveillance over water bodies and autonomously alert local authorities to take immediate action, eliminating the need for human intervention. PRACTITIONER POINTS: Combines cameras and LLMs with Raspberry Pi for processing and generating pollutant information. Uses YOLOv5 to detect algal blooms, synthetic foams, dead fish, oil spills, and industrial waste. Supports various modules and environments, including drones and mobile apps for broad monitoring. Educates on environmental healthand alerts authorities about water pollution.
Collapse
Affiliation(s)
| | - Yusuf Sermet
- IIHR Hydroscience and Engineering, University of Iowa, Iowa City, Iowa, USA
| | - David Cwiertny
- IIHR Hydroscience and Engineering, University of Iowa, Iowa City, Iowa, USA
- Department of Civil and Environmental Engineering, University of Iowa, Iowa City, Iowa, USA
- Department of Chemistry, University of Iowa, Iowa City, Iowa, USA
- Center for Health Effects of Environmental Contamination, University of Iowa, Iowa City, Iowa, USA
| | - Ibrahim Demir
- IIHR Hydroscience and Engineering, University of Iowa, Iowa City, Iowa, USA
- Department of Civil and Environmental Engineering, University of Iowa, Iowa City, Iowa, USA
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
12
|
Yilmaz Muluk S, Olcucu N. Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain. Cureus 2024; 16:e63580. [PMID: 39087174 PMCID: PMC11290316 DOI: 10.7759/cureus.63580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/25/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition. METHODS We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria. RESULTS Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and GoogleBard scores (p>0.05). CONCLUSIONS In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.
Collapse
Affiliation(s)
| | - Nazli Olcucu
- Physical Medicine and Rehabilitation, Antalya Ataturk State Hospital, Antalya, TUR
| |
Collapse
|
13
|
Alkhamees A. Evaluation of Artificial Intelligence as a Search Tool for Patients: Can ChatGPT-4 Provide Accurate Evidence-Based Orthodontic-Related Information? Cureus 2024; 16:e65820. [PMID: 39219978 PMCID: PMC11363007 DOI: 10.7759/cureus.65820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2024] [Indexed: 09/04/2024] Open
Abstract
INTRODUCTION Artificial intelligence (AI) is already a part of our reality. Many people started using ChatGPT in their daily life, replacing existing web browsers. The confidence people put in the ability of ChatGPT to provide accurate medical information is increasing. With that, the need for proper assessment tools for the safety and reliability of ChatGPT is also crucial. OBJECTIVE This study aimed to assess the accuracy, reliability, and quality of information provided by ChatGPT-4 on three specific orthodontic topics, namely, impacted canines, interceptive orthodontic treatment, and orthognathic surgery, as evaluated by five experienced orthodontists using a Likert scale ranking method. MATERIALS AND METHODS Using ChatGPT version 4, 20 most commonly asked questions were generated and answered on the following topics: impacted canines, interceptive treatment, and orthognathic surgery. The evaluation of the quality of the answers provided was done by five experienced orthodontists. Quality assessment was done using the Likert scale ranking method. RESULTS The quality answers generated by a conversational AI system (ChatGPT4) were evaluated by five experienced orthodontists for three topics: impacted canines, interceptive orthodontics, and orthognathic surgery. The evaluators rated each question-answer pair on a five-point scale from "very poor" to "very good." The results showed that the AI system produced generally good quality information for all topics, with no significant difference between them. The inter-rater agreement among the experts was low, indicating some variability in their judgments. CONCLUSION This study demonstrates that ChatGPT4 can provide generally good information on impacted canines, interceptive treatment, and orthognathic surgery. However, answers provided should be handled with caution due to variability and lack of reliability and should not be considered a substitute for professional opinion.
Collapse
Affiliation(s)
- Amani Alkhamees
- Department of Orthodontics and Pediatric Dentistry, College of Dentistry, Qassim University, Buraydah, SAU
| |
Collapse
|
14
|
Gurbuz DC, Varis E. Is ChatGPT knowledgeable of acute coronary syndromes and pertinent European Society of Cardiology Guidelines? Minerva Cardiol Angiol 2024; 72:299-303. [PMID: 38391252 DOI: 10.23736/s2724-5683.24.06517-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
BACKGROUND Advancements in artificial intelligence are being seen in multiple fields, including medicine, and this trend is likely to continue going forward. To analyze the accuracy and reproducibility of ChatGPT answers about acute coronary syndromes (ACS). METHODS The questions asked to ChatGPT were prepared in two categories. A list of frequently asked questions (FAQs) created from inquiries asked by the public and while preparing the scientific question list, 2023 European Society of Cardiology (ESC) Guidelines for the management of ACS and ESC Clinical Practice Guidelines were used. Accuracy and reproducibility of ChatGPT responses about ACS were evaluated by two cardiologists with ten years of experience using Global Quality Score (GQS). RESULTS Eventually, 72 FAQs related to ACS met the study inclusion criteria. In total, 65 (90.3%) ChatGPT answers scored GQS 5, which indicated highest accuracy and proficiency. None of the ChatGPT responses to FAQs about ACS scored GQS 1. In addition, highest accuracy and reliability of ChatGPT answers was obtained for the prevention and lifestyle section with GQS 5 for 19 (95%) answers, and GQS 4 for 1 (5%) answer. In contrast, accuracy and proficiency of ChatGPT answers were lowest for the treatment and management section. Moreover, 68 (88.3%) ChatGPT responses for guideline based questions scored GQS 5. Reproducibility of ChatGPT answers was 94.4% for FAQs and 90.9% for ESC guidelines questions. CONCLUSIONS This study shows for the first time that ChatGPT can give accurate and sufficient responses to more than 90% of FAQs about ACS. In addition, proficiency and correctness of ChatGPT answers about questions depending on ESC guidelines was also substantial.
Collapse
Affiliation(s)
- Dogac C Gurbuz
- Department of Cardiology, Gurlife Hospital, Eskisehir, Türkiye -
| | - Eser Varis
- Department of Cardiology, Private Hospital, Istanbul, Türkiye
| |
Collapse
|
15
|
Bumgardner VKC, Mullen A, Armstrong SE, Hickey C, Marek V, Talbert J. Local Large Language Models for Complex Structured Tasks. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:105-114. [PMID: 38827047 PMCID: PMC11141822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.
Collapse
|
16
|
Yilmaz Muluk S. Enhancing Musculoskeletal Injection Safety: Evaluating Checklists Generated by Artificial Intelligence and Revising the Preformed Checklist. Cureus 2024; 16:e59708. [PMID: 38841023 PMCID: PMC11150897 DOI: 10.7759/cureus.59708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2024] [Indexed: 06/07/2024] Open
Abstract
Background Musculoskeletal disorders are a significant global health issue, necessitating advanced management strategies such as intra-articular and extra-articular injections to alleviate pain, inflammation, and mobility challenges. As the adoption of these interventions by physicians grows, the importance of robust safety protocols becomes paramount. This study evaluates the effectiveness of conversational artificial intelligence (AI), particularly versions 3.5 and 4 of Chat Generative Pre-trained Transformer (ChatGPT), in creating patient safety checklists for managing musculoskeletal injections to enhance the preparation of safety documentation. Methodology A quantitative analysis was conducted to evaluate AI-generated safety checklists against a preformed checklist adapted from reputable medical sources. Adherence of the generated checklists to the preformed checklist was calculated and classified. The Wilcoxon signed-rank test was used to assess the performance differences between ChatGPT versions 3.5 and 4. Results ChatGPT-4 showed superior adherence to the preformed checklist compared to ChatGPT-3.5, with both versions classified as very good in safety protocol creation. Although no significant differences were present in the sign-in and sign-out parts of the checklists of both versions, ChatGPT-4 had significantly higher scores in the procedure planning part (p = 0.007), and its overall performance was also higher (p < 0.001). Subsequently, the preformed checklist was revised to incorporate new contributions from ChatGPT. Conclusions ChatGPT, especially version 4, proved effective in generating patient safety checklists for musculoskeletal injections, highlighting the potential of AI to streamline clinical practices. Further enhancements are necessary to fully meet the medical standards.
Collapse
|
17
|
Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT's responses to and references for clinical questions in physical therapy. J Phys Ther Sci 2024; 36:234-239. [PMID: 38694019 PMCID: PMC11060764 DOI: 10.1589/jpts.36.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/29/2024] [Indexed: 05/03/2024] Open
Abstract
[Purpose] This study evaluated the accuracy of ChatGPT's responses to and references for five clinical questions in physical therapy based on the Physical Therapy Guidelines and assessed this language model's potential as a tool for supporting clinical decision-making in the rehabilitation field. [Participants and Methods] Five clinical questions from the "Stroke", "Musculoskeletal disorders", and "Internal disorders" sections of the Physical Therapy Guidelines, released by the Japanese Society of Physical Therapy, were presented to ChatGPT. ChatGPT was instructed to provide responses in Japanese accompanied by references such as PubMed IDs or digital object identifiers. The accuracy of the generated content and references was evaluated by two assessors with expertise in their respective sections by using a 4-point scale, and comments were provided for point deductions. The inter-rater agreement was evaluated using weighted kappa coefficients. [Results] ChatGPT demonstrated adequate accuracy in generating content for clinical questions in physical therapy. However, the accuracy of the references was poor, with a significant number of references being non-existent or misinterpreted. [Conclusion] ChatGPT has limitations in reference selection and reliability. While ChatGPT can offer accurate responses to clinical questions in physical therapy, it should be used with caution because it is not a completely reliable model.
Collapse
Affiliation(s)
- Shogo Sawamura
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Takanobu Bito
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Takahiro Ando
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Kento Masuda
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Sakiko Kameyama
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Hiroyasu Ishida
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| |
Collapse
|
18
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
| | - Bradley A. Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
- Department of Biostatistics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| |
Collapse
|
19
|
Wang S, Mo C, Chen Y, Dai X, Wang H, Shen X. Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care. JMIR MEDICAL EDUCATION 2024; 10:e55595. [PMID: 38693697 PMCID: PMC11067446 DOI: 10.2196/55595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 05/03/2024]
Abstract
Background Artificial intelligence (AI) chatbots, such as ChatGPT-4, have shown immense potential for application across various aspects of medicine, including medical education, clinical practice, and research. Objective This study aimed to evaluate the performance of ChatGPT-4 in the 2023 Taiwan Audiologist Qualification Examination, thereby preliminarily exploring the potential utility of AI chatbots in the fields of audiology and hearing care services. Methods ChatGPT-4 was tasked to provide answers and reasoning for the 2023 Taiwan Audiologist Qualification Examination. The examination encompassed six subjects: (1) basic auditory science, (2) behavioral audiology, (3) electrophysiological audiology, (4) principles and practice of hearing devices, (5) health and rehabilitation of the auditory and balance systems, and (6) auditory and speech communication disorders (including professional ethics). Each subject included 50 multiple-choice questions, with the exception of behavioral audiology, which had 49 questions, amounting to a total of 299 questions. Results The correct answer rates across the 6 subjects were as follows: 88% for basic auditory science, 63% for behavioral audiology, 58% for electrophysiological audiology, 72% for principles and practice of hearing devices, 80% for health and rehabilitation of the auditory and balance systems, and 86% for auditory and speech communication disorders (including professional ethics). The overall accuracy rate for the 299 questions was 75%, which surpasses the examination's passing criteria of an average 60% accuracy rate across all subjects. A comprehensive review of ChatGPT-4's responses indicated that incorrect answers were predominantly due to information errors. Conclusions ChatGPT-4 demonstrated a robust performance in the Taiwan Audiologist Qualification Examination, showcasing effective logical reasoning skills. Our results suggest that with enhanced information accuracy, ChatGPT-4's performance could be further improved. This study indicates significant potential for the application of AI chatbots in audiology and hearing care services.
Collapse
Affiliation(s)
- Shangqiguo Wang
- Human Communication, Learning, and Development Unit, Faculty of Education, The University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Changgeng Mo
- Department of Otorhinolaryngology, Head and Neck Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Yuan Chen
- Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Xiaolu Dai
- Department of Social Work, Hong Kong Baptist University, Hong Kong, China (Hong Kong)
| | - Huiyi Wang
- Department of Medical Services, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoli Shen
- Department of Health and Early Childhood Care, Ningbo College of Health School, Ningbo, China
| |
Collapse
|
20
|
Sarma G, Kashyap H, Medhi PP. ChatGPT in Head and Neck Oncology-Opportunities and Challenges. Indian J Otolaryngol Head Neck Surg 2024; 76:1425-1429. [PMID: 38440617 PMCID: PMC10908741 DOI: 10.1007/s12070-023-04201-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/28/2023] [Indexed: 03/06/2024] Open
Abstract
Head and neck oncology represents a complex and challenging field, encompassing the diagnosis, treatment and management of various malignancies affecting the intricate anatomical structures of the head and neck region. With advancements in artificial intelligence (AI), chatbot applications have emerged as a promising tool to revolutionize the field of Head and Neck oncology. ChatGPT is a cutting-edge language model developed by OpenAI that can help the oncologist in the clinic in scheduling appointments, establishing a clinical diagnosis, making a treatment plan and follow-up. ChatGPT also plays an essential role in telemedicine consultations, medical documentation, scientific writing and research. ChatGPT carries its inherent drawbacks too. ChatGPT raises significant ethical concerns related to authorship, accountability, transparency, bias, and the potential for misinformation. ChatGPT's training data is limited to September 2021; thus, regular updates are required to keep pace with the rapidly evolving medical research and advancements. Therefore, a judicial approach to using ChatGPT is of utmost importance. Head and Neck Oncologists can reap the maximum benefit of this technology in terms of patient care, education and research to improve clinical outcomes.
Collapse
Affiliation(s)
- Gautam Sarma
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Hrishikesh Kashyap
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Partha Pratim Medhi
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| |
Collapse
|
21
|
Yapar D, Demir Avcı Y, Tokur Sonuvar E, Eğerci ÖF, Yapar A. ChatGPT's potential to support home care for patients in the early period after orthopedic interventions and enhance public health. Jt Dis Relat Surg 2024; 35:169-176. [PMID: 38108178 PMCID: PMC10746912 DOI: 10.52312/jdrs.2023.1402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 11/06/2023] [Indexed: 12/19/2023] Open
Abstract
OBJECTIVES This study presents the first investigation into the potential of ChatGPT to provide medical consultation for patients undergoing orthopedic interventions, with the primary objective of evaluating ChatGPT's effectiveness in supporting patient self-management during the essential early recovery phase at home. MATERIALS AND METHODS Seven scenarios, representative of common situations in orthopedics and traumatology, were presented to ChatGPT version 4.0 to obtain advice. These scenarios and ChatGPT̓s responses were then evaluated by 68 expert orthopedists (67 males, 1 female; mean age: 37.9±5.9 years; range, 30 to 59 years), 40 of whom had at least four years of orthopedic experience, while 28 were associate or full professors. Expert orthopedists used a rubric on a scale of 1 to 5 to evaluate ChatGPT's advice based on accuracy, applicability, comprehensiveness, and clarity. Those who gave ChatGPT a score of 4 or higher considered its performance as above average or excellent. RESULTS In all scenarios, the median evaluation scores were at least 4 across accuracy, applicability, comprehensiveness, and communication. As for mean scores, accuracy was the highest-rated dimension at 4.2±0.8, while mean comprehensiveness was slightly lower at 3.9±0.8. Orthopedist characteristics, such as academic title and prior use of ChatGPT, did not influence their evaluation (all p>0.05). Across all scenarios, ChatGPT demonstrated an accuracy of 79.8%, with applicability at 75.2%, comprehensiveness at 70.6%, and a 75.6% rating for communication clarity. CONCLUSION This study emphasizes ChatGPT̓s strengths in accuracy and applicability for home care after orthopedic intervention but underscores a need for improved comprehensiveness. This focused evaluation not only sheds light on ChatGPT̓s potential in specialized medical advice but also suggests its potential to play a broader role in the advancement of public health.
Collapse
Affiliation(s)
| | | | | | | | - Aliekber Yapar
- Antalya Eğitim ve Araştırma Hastanesi, Ortopedi ve Travmatoloji Kliniği, 07100 Muratpaşa, Antalya, Türkiye.
| |
Collapse
|
22
|
Ojeda Meixueiro VH, Pérez-Campos Mayoral L, Hernández Huerta MT, Matias-Cervantes CA, Pérez Campos Mayoral E, Cruz Parada E, Pérez-Campos E. Relevance of a Customized Version of ChatGPT Explaining Laboratory Test Results in Patient Education. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2024; 11:23821205241260239. [PMID: 39050188 PMCID: PMC11268000 DOI: 10.1177/23821205241260239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 05/16/2024] [Indexed: 07/27/2024]
Abstract
ChatGPT is an artificial intelligence (AI) chatbot application. In this study, we explore the creation and use of a customized version of ChatGPT designed specifically for patient education, called "Lab Explainer." Lab Explainer aims to simplify and clarify the results of complex laboratory tests for patients, using the sophisticated capabilities of AI in natural language processing; it analyses various laboratory test data and provides clear explanations and contextual information. The approach involved adapting OpenAI's ChatGPT model specifically to analyze laboratory test data. The results suggest that Lab Explainer has the potential to improve understanding by providing an interpretation of laboratory tests to the patient. In conclusion, the Lab Explainer can assist patient education by providing intelligible interpretations of laboratory tests.
Collapse
Affiliation(s)
| | - Laura Pérez-Campos Mayoral
- Centro de Investigación Facultad de Medicina UNAM-UABJO, Facultad de Medicina y Cirugía, Universidad Autónoma “Benito Juárez” de Oaxaca, Oaxaca, México
| | | | | | - Eduardo Pérez Campos Mayoral
- Centro de Investigación Facultad de Medicina UNAM-UABJO, Facultad de Medicina y Cirugía, Universidad Autónoma “Benito Juárez” de Oaxaca, Oaxaca, México
| | | | - Eduardo Pérez-Campos
- Tecnológico Nacional de México/IT Oaxaca, Oaxaca, México
- Laboratorio de Patología Clínica “Dr Eduardo Pérez Ortega”, Oaxaca, México
| |
Collapse
|
23
|
Sperlich B, Düking P, Leppich R, Holmberg HC. Strengths, weaknesses, opportunities, and threats associated with the application of artificial intelligence in connection with sport research, coaching, and optimization of athletic performance: a brief SWOT analysis. Front Sports Act Living 2023; 5:1258562. [PMID: 37920303 PMCID: PMC10618674 DOI: 10.3389/fspor.2023.1258562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/29/2023] [Indexed: 11/04/2023] Open
Abstract
Here, we performed a non-systematic analysis of the strength, weaknesses, opportunities, and threats (SWOT) associated with the application of artificial intelligence to sports research, coaching and optimization of athletic performance. The strength of AI with regards to applied sports research, coaching and athletic performance involve the automation of time-consuming tasks, processing and analysis of large amounts of data, and recognition of complex patterns and relationships. However, it is also essential to be aware of the weaknesses associated with the integration of AI into this field. For instance, it is imperative that the data employed to train the AI system be both diverse and complete, in addition to as unbiased as possible with respect to factors such as the gender, level of performance, and experience of an athlete. Other challenges include e.g., limited adaptability to novel situations and the cost and other resources required. Opportunities include the possibility to monitor athletes both long-term and in real-time, the potential discovery of novel indicators of performance, and prediction of risk for future injury. Leveraging these opportunities can transform athletic development and the practice of sports science in general. Threats include over-dependence on technology, less involvement of human expertise, risks with respect to data privacy, breaching of the integrity and manipulation of data, and resistance to adopting such new technology. Understanding and addressing these SWOT factors is essential for maximizing the benefits of AI while mitigating its risks, thereby paving the way for its successful integration into sport science research, coaching, and optimization of athletic performance.
Collapse
Affiliation(s)
- Billy Sperlich
- Integrative and Experimental Training Science, Institute of Sport Sciences, University of Würzburg, Würzburg, Germany
| | - Peter Düking
- Department of Sports Science and Movement Pedagogy, Technische Universität Braunschweig, Braunschweig, Germany
| | - Robert Leppich
- Software Engineering Group, Department of Computer Science, University of Würzburg, Würzburg, Germany
| | - Hans-Christer Holmberg
- Department of Health Sciences, Luleå University of Technology, Luleå, Sweden
- Department of Physiology and Pharmacology, Biomedicum C5, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
24
|
Yanagita Y, Yokokawa D, Uchida S, Tawara J, Ikusaka M. Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study. JMIR Form Res 2023; 7:e48023. [PMID: 37831496 PMCID: PMC10612006 DOI: 10.2196/48023] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/21/2023] [Accepted: 10/03/2023] [Indexed: 10/14/2023] Open
Abstract
BACKGROUND ChatGPT (OpenAI) has gained considerable attention because of its natural and intuitive responses. ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers, as stated by OpenAI as a limitation. However, considering that ChatGPT is an interactive AI that has been trained to reduce the output of unethical sentences, the reliability of the training data is high and the usefulness of the output content is promising. Fortunately, in March 2023, a new version of ChatGPT, GPT-4, was released, which, according to internal evaluations, was expected to increase the likelihood of producing factual responses by 40% compared with its predecessor, GPT-3.5. The usefulness of this version of ChatGPT in English is widely appreciated. It is also increasingly being evaluated as a system for obtaining medical information in languages other than English. Although it does not reach a passing score on the national medical examination in Chinese, its accuracy is expected to gradually improve. Evaluation of ChatGPT with Japanese input is limited, although there have been reports on the accuracy of ChatGPT's answers to clinical questions regarding the Japanese Society of Hypertension guidelines and on the performance of the National Nursing Examination. OBJECTIVE The objective of this study is to evaluate whether ChatGPT can provide accurate diagnoses and medical knowledge for Japanese input. METHODS Questions from the National Medical Licensing Examination (NMLE) in Japan, administered by the Japanese Ministry of Health, Labour and Welfare in 2022, were used. All 400 questions were included. Exclusion criteria were figures and tables that ChatGPT could not recognize; only text questions were extracted. We instructed GPT-3.5 and GPT-4 to input the Japanese questions as they were and to output the correct answers for each question. The output of ChatGPT was verified by 2 general practice physicians. In case of discrepancies, they were checked by another physician to make a final decision. The overall performance was evaluated by calculating the percentage of correct answers output by GPT-3.5 and GPT-4. RESULTS Of the 400 questions, 292 were analyzed. Questions containing charts, which are not supported by ChatGPT, were excluded. The correct response rate for GPT-4 was 81.5% (237/292), which was significantly higher than the rate for GPT-3.5, 42.8% (125/292). Moreover, GPT-4 surpassed the passing standard (>72%) for the NMLE, indicating its potential as a diagnostic and therapeutic decision aid for physicians. CONCLUSIONS GPT-4 reached the passing standard for the NMLE in Japan, entered in Japanese, although it is limited to written questions. As the accelerated progress in the past few months has shown, the performance of the AI will improve as the large language model continues to learn more, and it may well become a decision support system for medical professionals by providing more accurate information.
Collapse
Affiliation(s)
- Yasutaka Yanagita
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Daiki Yokokawa
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Shun Uchida
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Junsuke Tawara
- Department of Internal Medicine, Sanmu Medical Center, Chiba, Japan
| | - Masatomi Ikusaka
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| |
Collapse
|
25
|
Li J, Zhong J, Li Z, Xiao Y, Wang S. Ectopic Pituitary Neuroendocrine Tumor: A Case Report Written With the Help of ChatGPT. Cureus 2023; 15:e46999. [PMID: 37965416 PMCID: PMC10641033 DOI: 10.7759/cureus.46999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2023] [Indexed: 11/16/2023] Open
Abstract
ChatGPT (Chat Generative Pre-trained Transformer) is a chat robot program developed by OpenAI (OpenAI Incorporated, Mission District, San Francisco, USA). It is a natural-language processing tool driven by artificial intelligence technology. It can complete tasks such as writing emails, video scripts, copywriting, translation, coding, and writing papers. At present, several studies have been conducted on the use of this tool for academic writing, so we report a rare case of suprasellar ectopic pituitary neuroendocrine tumor (PitNET) secreting growth hormones, in a female 53-year-old patient with facial changes and hypertrophy of her hands and feet for six years. When asking questions on unusual topics, ChatGPT can provide answers but not specific ones, and similar to most machine learning solutions, verifying the correctness of information is a problem. Question asked from ChatGPT: Summarize ectopic pituitary neuroendocrine tumor. "Suprasellar heterotopic pituitary neuroendocrine tumors refer to tumors formed by the heterotopic growth of pituitary neuroendocrine tumor cells in the sellar region other than the pituitary glands. This tumor is common in middle-aged women, and it often manifests as headache, visual impairment, and endocrine dysfunction. Diagnosis is usually made by combining history, physical examination, blood tests, and imaging. Treatment is primarily performed through surgical resection of the tumor, radiotherapy, and drug therapy, and the prognosis is mostly good." The content in quotation marks is generated by ChatGPT.
Collapse
Affiliation(s)
- Jun Li
- Neurosurgery, Department of Neurosurgery, Fuzhou 900th Hospital, Fuzong Clinical Medical College of Fujian Medical University, Fuzhou, CHN
| | - Jiansheng Zhong
- Neurosurgery, Department of Neurosurgery, Fuzhou 900th Hospital, Fuzong Clinical Medical College of Fujian Medical University, Fuzhou, CHN
| | - Ziqi Li
- Neurosurgery, Department of Neurosurgery, Oriental Hospital Affiliated to Xiamen University, Fuzhou, CHN
| | - Yong Xiao
- Neurosurgery, Central Institute for Mental Health, University of Heidelberg, Heidelberg, DEU
| | - Shousen Wang
- Neurosurgery, Department of Neurosurgery, Oriental Hospital Affiliated to Xiamen University, Fuzhou, CHN
| |
Collapse
|
26
|
Irfan B, Yaqoob A. ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome. Cureus 2023; 15:e47754. [PMID: 38022092 PMCID: PMC10676288 DOI: 10.7759/cureus.47754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2023] [Indexed: 12/01/2023] Open
Abstract
INTRODUCTION The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions. MATERIALS AND METHODS A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool. RESULTS ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent. DISCUSSION LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial. CONCLUSION LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.
Collapse
Affiliation(s)
- Bilal Irfan
- Microbiology and Immunology, University of Michigan, Ann Arbor, USA
| | | |
Collapse
|
27
|
Garg RK, Urs VL, Agarwal AA, Chaudhary SK, Paliwal V, Kar SK. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promot Perspect 2023; 13:183-191. [PMID: 37808939 PMCID: PMC10558973 DOI: 10.34172/hpp.2023.22] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/06/2023] [Indexed: 10/10/2023] Open
Abstract
BACKGROUND ChatGPT is an artificial intelligence based tool developed by OpenAI (California, USA). This systematic review examines the potential of ChatGPT in patient care and its role in medical research. METHODS The systematic review was done according to the PRISMA guidelines. Embase, Scopus, PubMed and Google Scholar data bases were searched. We also searched preprint data bases. Our search was aimed to identify all kinds of publications, without any restrictions, on ChatGPT and its application in medical research, medical publishing and patient care. We used search term "ChatGPT". We reviewed all kinds of publications including original articles, reviews, editorial/ commentaries, and even letter to the editor. Each selected records were analysed using ChatGPT and responses generated were compiled in a table. The word table was transformed in to a PDF and was further analysed using ChatPDF. RESULTS We reviewed full texts of 118 articles. ChatGPT can assist with patient enquiries, note writing, decision-making, trial enrolment, data management, decision support, research support, and patient education. But the solutions it offers are usually insufficient and contradictory, raising questions about their originality, privacy, correctness, bias, and legality. Due to its lack of human-like qualities, ChatGPT's legitimacy as an author is questioned when used for academic writing. ChatGPT generated contents have concerns with bias and possible plagiarism. CONCLUSION Although it can help with patient treatment and research, there are issues with accuracy, authorship, and bias. ChatGPT can serve as a "clinical assistant" and be a help in research and scholarly writing.
Collapse
Affiliation(s)
| | - Vijeth L Urs
- Department of Neurology, King George’s Medical University, Lucknow, India
| | | | | | - Vimal Paliwal
- Department of Neurology, Sanjay Gandhi Institute of Medical Sciences, Lucknow, India
| | - Sujita Kumar Kar
- Department of Psychiatry, King George’s Medical University, Lucknow, India
| |
Collapse
|
28
|
Abstract
INTRODUCTION This study evaluates the knowledge of ChatGPT about osteoporosis. METHODS Osteoporosis-related frequently asked questions (FAQs) created by examining the websites frequently visited by patients, the official websites of hospitals, and social media. Questions based on these scientific data have been prepared in accordance with National Osteoporosis Guideline Group guides. Rater scored all ChatGPT answers between 1 and 4 (1 stated that the information was completely correct, 2 stated that the information was correct but insufficient, 3 stated that although some of the information was correct, there was incorrect information in the answer, and 4 stated that the answer consisted of completely incorrect information). The reproducibility of ChatGPT responses on osteoporosis was assessed by asking each question twice. The repeatability of the ChatGPT answer was considered as getting the same score twice. RESULTS ChatGPT responded to 72 FAQs with an accuracy rate of 80.6%. The highest accuracy in ChatGPT's answers about osteoporosis was in the prevention category, 91.7%, and in the general knowledge category, 85.8%. Only 19 of the 31 (61.3%) questions prepared according to the National Osteoporosis Guideline Group guidelines were answered correctly by ChatGPT, and two answers (6.4%) were categorized as grade 4. The reproducibility rate of ChatGPT answers on 72 FAQs was 86.1% and the reproducibility rate of ChatGPT answers on National Osteoporosis Guideline Group guidelines was 83.9%. CONCLUSION Present study outcomes for the first time showed that ChatGPT provided adequate answers to more than 80% of FAQs about osteoporosis. However, the accuracy of ChatGPT's answers to inquiries based on National Osteoporosis Guideline Group guidelines was decreased to 61.3%.
Collapse
Affiliation(s)
- Cigdem Cinar
- Department of Interventional Physiatry, Biruni University, Istanbul, TUR
| |
Collapse
|
29
|
Bhatt P, Sethi A, Tasgaonkar V, Shroff J, Pendharkar I, Desai A, Sinha P, Deshpande A, Joshi G, Rahate A, Jain P, Walambe R, Kotecha K, Jain NK. Machine learning for cognitive behavioral analysis: datasets, methods, paradigms, and research directions. Brain Inform 2023; 10:18. [PMID: 37524933 PMCID: PMC10390406 DOI: 10.1186/s40708-023-00196-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/06/2023] [Indexed: 08/02/2023] Open
Abstract
Human behaviour reflects cognitive abilities. Human cognition is fundamentally linked to the different experiences or characteristics of consciousness/emotions, such as joy, grief, anger, etc., which assists in effective communication with others. Detection and differentiation between thoughts, feelings, and behaviours are paramount in learning to control our emotions and respond more effectively in stressful circumstances. The ability to perceive, analyse, process, interpret, remember, and retrieve information while making judgments to respond correctly is referred to as Cognitive Behavior. After making a significant mark in emotion analysis, deception detection is one of the key areas to connect human behaviour, mainly in the forensic domain. Detection of lies, deception, malicious intent, abnormal behaviour, emotions, stress, etc., have significant roles in advanced stages of behavioral science. Artificial Intelligence and Machine learning (AI/ML) has helped a great deal in pattern recognition, data extraction and analysis, and interpretations. The goal of using AI and ML in behavioral sciences is to infer human behaviour, mainly for mental health or forensic investigations. The presented work provides an extensive review of the research on cognitive behaviour analysis. A parametric study is presented based on different physical characteristics, emotional behaviours, data collection sensing mechanisms, unimodal and multimodal datasets, modelling AI/ML methods, challenges, and future research directions.
Collapse
Affiliation(s)
- Priya Bhatt
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Amanrose Sethi
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Vaibhav Tasgaonkar
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Jugal Shroff
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Isha Pendharkar
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Aditya Desai
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Pratyush Sinha
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Aditya Deshpande
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Gargi Joshi
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Anil Rahate
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India
| | - Priyanka Jain
- Centre for Development of Advanced Computing (C-DAC), Delhi, India
| | - Rahee Walambe
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India.
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International Deemed University, Pune, India.
| | - Ketan Kotecha
- Symbiosis Institute of Technology, Symbiosis International Deemed University, Pune, India.
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International Deemed University, Pune, India.
- UCSI University, Kuala Lumpur, Malaysia.
| | - N K Jain
- Centre for Development of Advanced Computing (C-DAC), Delhi, India
| |
Collapse
|
30
|
Grewal H, Dhillon G, Monga V, Sharma P, Buddhavarapu VS, Sidhu G, Kashyap R. Radiology Gets Chatty: The ChatGPT Saga Unfolds. Cureus 2023; 15:e40135. [PMID: 37425598 PMCID: PMC10329466 DOI: 10.7759/cureus.40135] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
As artificial intelligence (AI) continues to evolve and mature, it is increasingly finding applications in the field of healthcare, particularly in specialties like radiology that are data-heavy and image-focused. Language learning models (LLMs) such as OpenAI's Generative Pre-trained Transformer-4 (GPT-4) are new in the field of medicine and there is a paucity of literature regarding the possible utilities of GPT-4 given its novelty. We aim to present an in-depth exploration of the role of GPT-4, an advanced language model, in radiology. Giving the GPT-4 model prompts for generating reports, template generation, enhancing clinical decision-making, and suggesting captivating titles for research articles, patient communication, and education, can occasionally be quite generic, and at times, it may present factually incorrect content, which could lead to errors. The responses were then analyzed in detail regarding their potential utility in day-to-day radiologist workflow, patient education, and research processes. Further research is required to evaluate LLMs' accuracy and safety in clinical practice and to develop comprehensive guidelines for their implementation.
Collapse
Affiliation(s)
- Harpreet Grewal
- Radiology, Florida State University College of Medicine, Pensacola, USA
| | - Gagandeep Dhillon
- Internal Medicine, Baltimore Washington Medical Center, Glen Burnie, USA
| | | | - Pranjal Sharma
- Nephrology, Northeast Ohio Medical University, Rootstown, USA
| | | | | | - Rahul Kashyap
- Medicine, Drexel University College of Medicine, Philadelphia, USA
- Global Clinical Scholars Research Training, Harvard Medical School, Boston, USA
- Research, Global Remote Research Scholars Program, Saint Paul, USA
- Critical Care Medicine, Mayo Clinic, Rochester, USA
| |
Collapse
|