1
|
Mace MI, Lala-Trindade A, Fendler TJ, Sauer AJ. Emerging use of pulmonary artery and cardiac pressure sensing technology in the management of worsening heart failure events. Heart Fail Rev 2025:10.1007/s10741-025-10513-2. [PMID: 40343668 DOI: 10.1007/s10741-025-10513-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/19/2025] [Indexed: 05/11/2025]
Abstract
Unplanned admissions for worsening heart failure (WHF) are the largest resource cost in heart failure (HF) management. Despite advances in pharmacological agents and interventional therapy, HF remains a global epidemic. One crucial-and costly-gap in HF management is the inability to obtain objective information to identify and quantify congestion and personalize treatment plans to effectively manage WHF events without resorting to expensive, invasive methods. Although the causes of WHF are varied and complex, the universal effect of HF decompensation is the significant decline in quality of life due to symptoms of hypervolemic congestion and the resultant reduction in cardiac output, which can be quantified via increased pulmonary venous congestion due to high intracardiac filling pressures. Accessible and reliable markers of congestion could more precisely quantify the severity of WHF events and stabilize patients earlier by interrupting and reversing this process with timely introduction or modification of evidence-based treatments. Pulmonary artery and cardiac pressure sensing tools have gained evidential credence and increased clinical uptake in recent years for the prevention and treatment of WHF, as studies of implantable hemodynamic devices have iteratively and reliably demonstrated substantial reductions in WHF events. Recent advances in sensing technologies have ranged from single-parameter invasive pulmonary artery monitors to completely non-invasive multi-parameter devices incorporating multi-sensor concept technologies aided by machine learning or artificial intelligence, although many remain investigational. This review aims to evaluate the potential for novel pulmonary artery and cardiac pressure sensing technology to reshape the management of WHF from within the hospitalized and ambulatory care environments.
Collapse
Affiliation(s)
- Matthew I Mace
- Academy for Health Care Science (AHCS), 6 The Terrace, Rugby Road, Lutterworth, Leicestershire, LE17 4BW, UK.
- , 54 State St, STE 804 #13308, Albany, NY, 12207, USA.
| | - Anuradha Lala-Trindade
- Zena and Michael A. Wiener Cardiovascular Institute and Department of Population Health Science and Policy, Mount Sinai, New York, NY, USA
| | - Timothy J Fendler
- Saint Luke's Mid America Heart Institute, Kansas City, MO, USA
- University of Missouri-Kansas City, Kansas City, MO, USA
| | - Andrew J Sauer
- Saint Luke's Mid America Heart Institute, Kansas City, MO, USA
- University of Missouri-Kansas City, Kansas City, MO, USA
| |
Collapse
|
2
|
Alessandro L, Crema S, Castiglione JI, Dossi D, Eberbach F, Kohler A, Laffue A, Marone A, Nagel V, Pastor Rueda JM, Varela F, Fernandez Slezak D, Rodríguez Murúa S, Debasa C, Claudio P, Farez MF. Validation of an Artificial Intelligence-Powered Virtual Assistant for Emergency Triage in Neurology. Neurologist 2025; 30:155-163. [PMID: 39912331 DOI: 10.1097/nrl.0000000000000594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2025]
Abstract
OBJECTIVES Neurological emergencies pose significant challenges in medical care in resource-limited countries. Artificial intelligence (AI), particularly health chatbots, offers a promising solution. Rigorous validation is required to ensure safety and accuracy. Our objective is to evaluate the diagnostic safety and effectiveness of an AI-powered virtual assistant (VA) designed for the triage of neurological pathologies. METHODS The performance of an AI-powered VA for emergency neurological triage was tested. Ten patients over 18 years old with urgent neurological pathologies were selected. In the first stage, 9 neurologists assessed the safety of the VA using their clinical records. In the second stage, the assistant's accuracy when used by patients was evaluated. Finally, VA performance was compared with ChatGPT 3.5 and 4. RESULTS In stage 1, neurologists agreed with the VA in 98.5% of the cases for syndromic diagnosis, and in all cases, the definitive diagnosis was among the top 5 differentials. In stage 2, neurologists agreed with all diagnostic parameters and recommendations suggested by the assistant to patients. The average use time was 5.5 minutes (average of 16.5 questions). VA showed superiority over both versions of ChatGPT in all evaluated diagnostic and safety aspects ( P <0.0001). In 57.8% of the evaluations, neurologists rated the VA as "excellent" (suggesting adequate utility). CONCLUSIONS In this study, the VA showcased promising diagnostic accuracy and user satisfaction, bolstering confidence in further development. These outcomes encourage proceeding to a comprehensive phase 1/2 trial with 100 patients to thoroughly assess its "real-time" application in emergency neurological triage.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Diego Fernandez Slezak
- Entelai
- Department of Computing, Faculty of Exact and Natural Sciences, University of Buenos Aires (UBA)
- Institute of Research in Computer Science (ICC), CONICET-UBA, Buenos Aires, Argentina
| | | | | | | | - Mauricio F Farez
- Center for Research in Neuroimmunological Diseases (CIEN), Fleni
- Entelai
| |
Collapse
|
3
|
Lin C, Kuo CF. Roles and Potential of Large Language Models in Healthcare: A Comprehensive Review. Biomed J 2025:100868. [PMID: 40311872 DOI: 10.1016/j.bj.2025.100868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 04/14/2025] [Accepted: 04/28/2025] [Indexed: 05/03/2025] Open
Abstract
Large Language Models (LLMs) are capable of transforming healthcare by demonstrating remarkable capabilities in language understanding and generation. They have matched or surpassed human performance in standardized medical examinations and assisted in diagnostics across specialties like dermatology, radiology, and ophthalmology. LLMs can enhance patient education by providing accurate, readable, and empathetic responses, and they can streamline clinical workflows through efficient information extraction from unstructured data such as clinical notes. Integrating LLM into clinical practice involves user interface design, clinician training, and effective collaboration between Artificial Intelligence (AI) systems and healthcare professionals. Users must possess a solid understanding of generative AI and domain knowledge to assess the generated content critically. Ethical considerations to ensure patient privacy, data security, mitigating biases, and maintaining transparency are critical for responsible deployment. Future directions for LLMs in healthcare include interdisciplinary collaboration, developing new benchmarks that incorporate safety and ethical measures, advancing multimodal LLMs that integrate text and imaging data, creating LLM-based medical agents capable of complex decision-making, addressing underrepresented specialties like rare diseases, and integrating LLMs with robotic systems to enhance precision in procedures. Emphasizing patient safety, ethical integrity, and human-centered implementation is essential for maximizing the benefits of LLMs, while mitigating potential risks, thereby helping to ensure that these AI tools enhance rather than replace human expertise and compassion in healthcare.
Collapse
Affiliation(s)
- Chihung Lin
- Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Chang-Fu Kuo
- Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan; Division of Rheumatology, Allergy, and Immunology, Chang Gung Memorial Hospital, Taoyuan, Taiwan; Division of Rheumatology, Orthopaedics and Dermatology, School of Medicine, University of Nottingham, Nottingham, UK.
| |
Collapse
|
4
|
Srivastava SP, Chauhan S, Singh A, Tiwari SK, Kudi SR, Gupta A. Nursing students' attitudes toward cross-gender care: a cross-sectional study. BMC Res Notes 2025; 18:191. [PMID: 40269962 PMCID: PMC12020158 DOI: 10.1186/s13104-025-07254-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 04/11/2025] [Indexed: 04/25/2025] Open
Abstract
OBJECTIVES This study aimed to assess nursing students' attitudes toward providing cross-gender care and to identify the factors influencing these attitudes. RESULTS A cross-sectional study of 338 nursing students in Northern India found that over half (50.9%) had unfavorable attitudes towards cross-gender care. Female students demonstrated significantly more positive attitudes than male students (p < 0.01). Multiple regression analysis identified gender (β = 0.246, p < 0.001), academic year (β=-0.150, p = 0.009), and prior experience with cross-gender care (β = 0.100, p = 0.048) as significant predictors of attitudes. The regression model explained 19.9% of the variance in attitudes (R²=0.199, p < 0.001). Male students expressed concerns about providing quality physical care and emotional support for patients and felt inadequately prepared to provide physical and emotional support to opposite-gender patients. These findings highlight the need for enhanced gender-sensitive training in nursing education to improve attitudes and competencies in cross-gender care provision.
Collapse
Affiliation(s)
- Saumya P Srivastava
- College of Nursing, Dr. Ram Manohar Lohia Institute of Medical Sciences, Lucknow, Uttar Pradesh, India
| | - Soni Chauhan
- Yatharth Nursing College and Paramedical Institute, Chandauli, Uttar Pradesh, India
| | - Anuj Singh
- Career College of Nursing, Lucknow, Uttar Pradesh, India
| | - Surya Kant Tiwari
- College of Nursing, All India Institute of Medical Sciences, Raebareli, Uttar Pradesh, India.
| | - Surat Ram Kudi
- College of Nursing, All India Institute of Medical Sciences, Vijaypur, Jammu, India
| | - Anchal Gupta
- Faculty of Nursing, Uttar Pradesh University of Medical Sciences, Saifai, Etawah, Uttar Pradesh, India
| |
Collapse
|
5
|
Suárez A, Arena S, Herranz Calzada A, Castillo Varón AI, Diaz-Flores García V, Freire Y. Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment. Comput Struct Biotechnol J 2025; 28:141-147. [PMID: 40271108 PMCID: PMC12017887 DOI: 10.1016/j.csbj.2025.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 04/08/2025] [Accepted: 04/09/2025] [Indexed: 04/25/2025] Open
Abstract
The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %-41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet's coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots. These findings highlight ChatGPT-4o's current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| | - Stefania Arena
- Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| | - Alberto Herranz Calzada
- Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
- Department of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| | - Ana Isabel Castillo Varón
- Department of Medicine. Faculty of Medicine, Health and Sports. Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| | - Victor Diaz-Flores García
- Department of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain
| |
Collapse
|
6
|
Khalaf WS, Morgan RN, Elkhatib WF. Clinical microbiology and artificial intelligence: Different applications, challenges, and future prospects. J Microbiol Methods 2025; 232-234:107125. [PMID: 40188989 DOI: 10.1016/j.mimet.2025.107125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 03/10/2025] [Accepted: 04/03/2025] [Indexed: 04/10/2025]
Abstract
Conventional clinical microbiological techniques are enhanced by the introduction of artificial intelligence (AI). Comprehensive data processing and analysis enabled the development of curated datasets that has been effectively used in training different AI algorithms. Recently, a number of machine learning (ML) and deep learning (DL) algorithms are developed and evaluated using diverse microbiological datasets. These datasets included spectral analysis (Raman and MALDI-TOF spectroscopy), microscopic images (Gram and acid fast stains), and genomic and protein sequences (whole genome sequencing (WGS) and protein data banks (PDBs)). The primary objective of these algorithms is to minimize the time, effort, and expenses linked to conventional analytical methods. Furthermore, AI algorithms are incorporated with quantitative structure-activity relationship (QSAR) models to predict novel antimicrobial agents that address the continuing surge of antimicrobial resistance. During the COVID-19 pandemic, AI algorithms played a crucial role in vaccine developments and the discovery of new antiviral agents, and introduced potential drug candidates via drug repurposing. However, despite their significant benefits, the implementation of AI encounters various challenges, including ethical considerations, the potential for bias, and errors related to data training. This review seeks to provide an overview of the most recent applications of artificial intelligence in clinical microbiology, with the intention of educating a wider audience of clinical practitioners regarding the current uses of machine learning algorithms and encouraging their implementation. Furthermore, it will discuss the challenges related to the incorporation of AI into clinical microbiology laboratories and examine future opportunities for AI within the realm of infectious disease epidemiology.
Collapse
Affiliation(s)
- Wafaa S Khalaf
- Department of Microbiology and Immunology, Faculty of Pharmacy (Girls), Al-Azhar University, Nasr city, Cairo 11751, Egypt.
| | - Radwa N Morgan
- National Centre for Radiation Research and Technology (NCRRT), Drug Radiation Research Department, Egyptian Atomic Energy Authority (EAEA), Cairo 11787, Egypt.
| | - Walid F Elkhatib
- Department of Microbiology & Immunology, Faculty of Pharmacy, Galala University, New Galala City, Suez, Egypt; Microbiology and Immunology Department, Faculty of Pharmacy, Ain Shams University, African Union Organization St., Abbassia, Cairo 11566, Egypt.
| |
Collapse
|
7
|
Zhu J, Dong A, Wang C, Veldhuizen S, Abdelwahab M, Brown A, Selby P, Rose J. The Impact of ChatGPT Exposure on User Interactions With a Motivational Interviewing Chatbot: Quasi-Experimental Study. JMIR Form Res 2025; 9:e56973. [PMID: 40117496 PMCID: PMC11952273 DOI: 10.2196/56973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 02/20/2025] [Accepted: 02/21/2025] [Indexed: 03/23/2025] Open
Abstract
Background The worldwide introduction of ChatGPT in November 2022 may have changed how its users perceive and interact with other chatbots. This possibility may confound the comparison of responses to pre-ChatGPT and post-ChatGPT iterations of pre-existing chatbots, in turn affecting the direction of their evolution. Before the release of ChatGPT, we created a therapeutic chatbot, MIBot, whose goal is to use motivational interviewing to guide smokers toward making the decision to quit smoking. We were concerned that measurements going forward would not be comparable to those in the past, impacting the evaluation of future changes to the chatbot. Objective The aim of the study is to explore changes in how users interact with MIBot after the release of ChatGPT and examine the relationship between these changes and users' familiarity with ChatGPT. Methods We compared user interactions with MIBot prior to ChatGPT's release and 6 months after the release. Participants (N=143) were recruited through a web-based platform in November of 2022, prior to the release of ChatGPT, to converse with MIBot, in an experiment we refer to as MIBot (version 5.2). In May 2023, a set of (n=129) different participants were recruited to interact with the same version of MIBot and asked additional questions about their familiarity with ChatGPT, in the experiment called MIBot (version 5.2A). We used the Mann-Whitney U test to compare metrics between cohorts and Spearman rank correlation to assess relationships between familiarity with ChatGPT and other metrics within the MIBot (version 5.2A) cohort. Results In total, 83(64.3%) participants in the MIBot (version 5.2A) cohort had used ChatGPT, with 66 (51.2%) using it on a regular basis. Satisfaction with MIBot was significantly lower in the post-ChatGPT cohort (U=11,331.0; P=.001), driven by a decrease in perceived empathy as measured by the Average Consultation and Relational Empathy Measure (U=10,838.0; P=.01). Familiarity with ChatGPT was positively correlated with average response length (ρ=0.181; P=.04) and change in perceived importance of quitting smoking (ρ=0.296; P<.001). Conclusions The widespread reach of ChatGPT has changed how users interact with MIBot. Post-ChatGPT users are less satisfied with MIBot overall, particularly in terms of perceived empathy. However, users with greater familiarity with ChatGPT provide longer responses and demonstrated a greater increase in their perceived importance of quitting smoking after a session with MIBot. These findings suggest the need for chatbot developers to adapt to evolving user expectations in the era of advanced generative artificial intelligence.
Collapse
Affiliation(s)
- Jiading Zhu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Alec Dong
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Cindy Wang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Scott Veldhuizen
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Institute for Mental Health Policy Research, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
| | - Mohamed Abdelwahab
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Andrew Brown
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Peter Selby
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Jonathan Rose
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
| |
Collapse
|
8
|
Wang R, Situ X, Sun X, Zhan J, Liu X. Assessing AI in Various Elements of Enhanced Recovery After Surgery (ERAS)-Guided Ankle Fracture Treatment: A Comparative Analysis with Expert Agreement. J Multidiscip Healthc 2025; 18:1629-1638. [PMID: 40130076 PMCID: PMC11930842 DOI: 10.2147/jmdh.s508511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 03/06/2025] [Indexed: 03/26/2025] Open
Abstract
Objective This study aimed to assess and compare the performance of ChatGPT and iFlytek Spark, two AI-powered large language models (LLMs), in generating clinical recommendations aligned with expert consensus on Enhanced Recovery After Surgery (ERAS)-guided ankle fracture treatment. This study aims to determine the applicability and reliability of AI in supporting ERAS protocols for optimized patient outcomes. Methods A qualitative comparative analysis was conducted using 35 structured clinical questions derived from the Expert Consensus on Optimizing Ankle Fracture Treatment Protocols under ERAS Principles. Questions covered preoperative preparation, intraoperative management, postoperative pain control and rehabilitation, and complication management. Responses from ChatGPT and iFlytek Spark were independently evaluated by two experienced trauma orthopedic specialists based on clinical relevance, consistency with expert consensus, and depth of reasoning. Results ChatGPT demonstrated higher alignment with expert consensus (29/35 questions, 82.9%), particularly in comprehensive perioperative recommendations, detailed medical rationales, and structured treatment plans. However, discrepancies were noted in intraoperative blood pressure management and preoperative antiemetic selection. iFlytek Spark aligned with expert consensus in 22/35 questions (62.9%), but responses were often more generalized, less clinically detailed, and occasionally inconsistent with best practices. Agreement between ChatGPT and iFlytek Spark was observed in 23/35 questions (65.7%), with ChatGPT generally exhibiting greater specificity, timeliness, and precision in its recommendations. Conclusion AI-powered LLMs, particularly ChatGPT, show promise in supporting clinical decision-making for ERAS-guided ankle fracture management. While ChatGPT provided more accurate and contextually relevant responses, inconsistencies with expert consensus highlight the need for further refinement, validation, and clinical integration. iFlytek Spark's lower conformity suggests potential differences in training data and underlying algorithms, underscoring the variability in AI-generated medical advice. To optimize AI's role in orthopedic care, future research should focus on enhancing AI alignment with medical guidelines, improving model transparency, and integrating physician oversight to ensure safe and effective clinical applications.
Collapse
Affiliation(s)
- Rui Wang
- Department of Orthopaedic, Zhongshan City Orthopaedic Hospital, Zhongshan, Guangdong Province, People’s Republic of China
| | - Xuanming Situ
- Department of Orthopaedic, Zhongshan City Orthopaedic Hospital, Zhongshan, Guangdong Province, People’s Republic of China
| | - Xu Sun
- Department of Orthopaedic Trauma, Beijing Jishuitan Hospital, Beijing, People’s Republic of China
| | - Jinchang Zhan
- Department of Orthopaedic, Zhongshan City Orthopaedic Hospital, Zhongshan, Guangdong Province, People’s Republic of China
| | - Xi Liu
- Department of Sports, Sun Yat-sen Memorial Primary School, Zhongshan, Guangdong Province, People’s Republic of China
| |
Collapse
|
9
|
Şişman AÇ, Acar AH. Artificial intelligence-based chatbot assistance in clinical decision-making for medically complex patients in oral surgery: a comparative study. BMC Oral Health 2025; 25:351. [PMID: 40055745 PMCID: PMC11887094 DOI: 10.1186/s12903-025-05732-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 02/27/2025] [Indexed: 05/13/2025] Open
Abstract
AIM This study aims to evaluate the potential of AI-based chatbots in assisting with clinical decision-making in the management of medically complex patients in oral surgery. MATERIALS AND METHODS A team of oral and maxillofacial surgeons developed a pool of open-ended questions de novo. The validity of the questions was assessed using Lawshe's Content Validity Index. The questions, which focused on systemic diseases and common conditions that may raise concerns during oral surgery, were presented to ChatGPT 3.5 and Claude-instant in two separate sessions, spaced one week apart. Two experienced maxillofacial surgeons, blinded to the chatbots, assessed the responses for quality, accuracy, and completeness using a modified DISCERN tool and Likert scale. Intraclass correlation, Mann-Whitney U test, skewness, and kurtosis coefficients were employed to compare the performances of the chatbots. RESULTS Most responses were high quality: 86% and 79.6% for ChatGPT, and 81.25% and 89% for Claude-instant in sessions 1 and 2, respectively. In terms of accuracy, ChatGPT had 92% and 93.4% of its responses rated as completely correct in sessions 1 and 2, respectively, while Claude-instant had 95.2% and 89%. For completeness, ChatGPT had 88.5% and 86.8% of its responses rated as adequate or comprehensive in sessions 1 and 2, respectively, while Claude-instant had 95.2% and 86%. CONCLUSION Ongoing software developments and the increasing acceptance of chatbots among healthcare professionals hold promise that these tools can provide rapid solutions to the high demand for medical care, ease professionals' workload, reduce costs, and save time.
Collapse
Affiliation(s)
- Alanur Çiftçi Şişman
- Hamidiye Faculty of Dental Medicine, Department of Oral and Maxillofacial Surgery, University of Health Sciences, Istanbul, Türkiye.
| | - Ahmet Hüseyin Acar
- Faculty of Dentistry, Department of Oral and Maxillofacial Surgery, Istanbul Medeniyet University, Istanbul, Türkiye
| |
Collapse
|
10
|
Chow JCL, Li K. Developing Effective Frameworks for Large Language Model-Based Medical Chatbots: Insights From Radiotherapy Education With ChatGPT. JMIR Cancer 2025; 11:e66633. [PMID: 39965195 PMCID: PMC11888077 DOI: 10.2196/66633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 12/15/2024] [Accepted: 01/16/2025] [Indexed: 02/20/2025] Open
Abstract
This Viewpoint proposes a robust framework for developing a medical chatbot dedicated to radiotherapy education, emphasizing accuracy, reliability, privacy, ethics, and future innovations. By analyzing existing research, the framework evaluates chatbot performance and identifies challenges such as content accuracy, bias, and system integration. The findings highlight opportunities for advancements in natural language processing, personalized learning, and immersive technologies. When designed with a focus on ethical standards and reliability, large language model-based chatbots could significantly impact radiotherapy education and health care delivery, positioning them as valuable tools for future developments in medical education globally.
Collapse
Affiliation(s)
- James C L Chow
- Department of Medical Physics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Radiation Oncology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Kay Li
- Department of English, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
11
|
Yüce A, Yerli M, Misir A. Can Chat-GPT assist orthopedic surgeons in evaluating the quality of rotator cuff surgery patient information videos? J Shoulder Elbow Surg 2025; 34:141-146. [PMID: 38852711 DOI: 10.1016/j.jse.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 06/11/2024]
Abstract
BACKGROUND Patients and healthcare professionals extensively rely on the internet for medical information. Low-quality videos can significantly impact the patient-doctor relationship, potentially affecting consultation efficiency and decision-making process. Chat Generative Pre-Trained Transformer (ChatGPT) is an artificial intelligence application with the potential to improve medical reports, provide medical information, and supplement orthopedic knowledge acquisition. This study aimed to assess the ability of ChatGPT-4 to detect deficiencies in these videos, assuming it would be successful in identifying such deficiencies. MATERIALS AND METHODS YouTube was searched for "rotator cuff surgery" and "rotator cuff surgery clinic" videos. A total of 90 videos were evaluated, with 40 included in the study after exclusions. Using the Google Chrome extension ''YouTube Summary with ChatGPT & Claude,'' transcripts of these videos were accessed. Two senior orthopedic surgeons and ChatGPT-4 evaluated the videos using the rotator cuff surgery YouTube score (RCSS) system and DISCERN criteria. RESULTS ChatGPT-4's RCSS evaluations were comparable to those of the observers in 25% of instances, and 40% for DISCERN. The interobserver agreement between human observers and ChatGPT-4 was fair (AC1: 0.575 for DISCERN and AC1: 0.516 for RCSS). Even after correcting ChatGPT-4's incorrect answers, the agreement did not change significantly. ChatGPT-4 tended to give higher scores than the observers, particularly in sections related to anatomy, surgical technique, and indications for surgery. CONCLUSION The use of ChatGPT-4 as an observer in evaluating rotator cuff surgery-related videos and identifying deficiencies is not currently recommended. Future studies with trained ChatGPT models may address these deficiencies and enable ChatGPT to evaluate videos at a human observer level.
Collapse
Affiliation(s)
- Ali Yüce
- Department of Orthopedic and Traumatology, Prof. Dr. Cemil Taşcıoğlu City Hospital, Istanbul, Turkey
| | - Mustafa Yerli
- Department of Orthopedic and Traumatology, Prof. Dr. Cemil Taşcıoğlu City Hospital, Istanbul, Turkey.
| | - Abdulhamit Misir
- Department of Orthopedic and Traumatology, Private Sefa Hospital, İstanbul, Turkey
| |
Collapse
|
12
|
Battisti ES, Roman MK, Bellei EA, Kirsten VR, De Marchi ACB, Da Silva Leal GV. A virtual assistant for primary care's food and nutrition surveillance system: Development and validation study in Brazil. PATIENT EDUCATION AND COUNSELING 2025; 130:108461. [PMID: 39413720 DOI: 10.1016/j.pec.2024.108461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/18/2024] [Accepted: 10/04/2024] [Indexed: 10/18/2024]
Abstract
OBJECTIVE The study aimed to develop and validate a conversational agent (chatbot) designed to support Food and Nutrition Surveillance (FNS) practices in primary health care settings. METHODS This mixed-methods research was conducted in three stages. Initially, the study identified barriers and challenges in FNS practices through a literature review and feedback from 655 health professionals and FNS experts across Brazil. Following this, a participatory design approach was employed to develop and validate the chatbot's content. The final stage involved evaluating the chatbot's user experience with FNS experts. RESULTS The chatbot could accurately understand and respond to 60 different intents or keywords related to FNS. Themes such as training, guidance, and access emerged as crucial for guiding FNS initiatives and addressing implementation challenges, primarily related to human resources. The chatbot achieved a Global Content Validation Index of 0.88. CONCLUSION The developed chatbot represents a significant advancement in supporting FNS practices within primary health care. PRACTICE IMPLICATION By providing an innovative, interactive, educational tool that is both accessible and reliable, this digital assistant has the potential to facilitate the operationalization of FNS practices, addressing the critical need for effective training and counseling in developing countries.
Collapse
Affiliation(s)
- Eliza Sella Battisti
- Graduate Program in Human Aging, Institute of Health, University of Passo Fundo (UPF), Passo Fundo, RS, Brazil; Graduate Program in Gerontology, Department of Foods and Nutrition, Federal University of Santa Maria (UFSM), Palmeira das Missões, RS, Brazil
| | - Mateus Klein Roman
- Graduate Program in Applied Computing, Institute of Technology, University of Passo Fundo (UPF), Passo Fundo, RS, Brazil
| | - Ericles Andrei Bellei
- Graduate Program in Human Aging, Institute of Health, University of Passo Fundo (UPF), Passo Fundo, RS, Brazil.
| | - Vanessa Ramos Kirsten
- Graduate Program in Gerontology, Department of Foods and Nutrition, Federal University of Santa Maria (UFSM), Palmeira das Missões, RS, Brazil
| | - Ana Carolina Bertoletti De Marchi
- Graduate Program in Human Aging, Institute of Health, University of Passo Fundo (UPF), Passo Fundo, RS, Brazil; Graduate Program in Applied Computing, Institute of Technology, University of Passo Fundo (UPF), Passo Fundo, RS, Brazil
| | - Greisse Viero Da Silva Leal
- Graduate Program in Gerontology, Department of Foods and Nutrition, Federal University of Santa Maria (UFSM), Palmeira das Missões, RS, Brazil
| |
Collapse
|
13
|
Pagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, Maderbacher G, Renkawitz T, Schuster M. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J 2024; 28:9-15. [PMID: 39850460 PMCID: PMC11754967 DOI: 10.1016/j.csbj.2024.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 12/14/2024] [Accepted: 12/18/2024] [Indexed: 01/25/2025] Open
Abstract
Background Large Language Models (LLMs) such as ChatGPT are gaining attention for their potential applications in healthcare. This study aimed to evaluate the diagnostic sensitivity of various LLMs in detecting hip or knee osteoarthritis (OA) using only patient-reported data collected via a structured questionnaire, without prior medical consultation. Methods A prospective observational study was conducted at an orthopaedic outpatient clinic specialized in hip and knee OA treatment. A total of 115 patients completed a paper-based questionnaire covering symptoms, medical history, and demographic information. The diagnostic performance of five different LLMs-including four versions of ChatGPT, two of Gemini, Llama, Gemma 2, and Mistral-Nemo-was analysed. Model-generated diagnoses were compared against those provided by experienced orthopaedic clinicians, which served as the reference standard. Results GPT-4o achieved the highest diagnostic sensitivity at 92.3 %, significantly outperforming other LLMs. The completeness of patient responses to symptom-related questions was the strongest predictor of accuracy for GPT-4o (p < 0.001). Inter-model agreement was moderate among GPT-4 versions, whereas models such as Llama-3.1 demonstrated notably lower accuracy and concordance. Conclusions GPT-4o demonstrated high accuracy and consistency in diagnosing OA based solely on patient-reported questionnaires, underscoring its potential as a supplementary diagnostic tool in clinical settings. Nevertheless, the reliance on patient-reported data without direct physician involvement highlights the critical need for medical oversight to ensure diagnostic accuracy. Further research is needed to refine LLM capabilities and expand their utility in broader diagnostic applications.
Collapse
Affiliation(s)
- Stefano Pagano
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Luigi Strumolo
- Freelance health consultant & senior data analyst, Avellino, Italy
| | - Katrin Michalk
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Julia Schiegl
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Loreto C. Pulido
- Department of Orthopaedics Hospital of Trauma Surgery, Marktredwitz Hospital, Marktredwitz, Germany
| | - Jan Reinhard
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Guenther Maderbacher
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Tobias Renkawitz
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| | - Marie Schuster
- Department of Orthopaedic Surgery, University of Regensburg, Asklepios Klinikum, Bad Abbach, Germany
| |
Collapse
|
14
|
Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024; 24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
AI has revolutionized the way we interact with technology. Noteworthy advances in AI algorithms and large language models (LLM) have led to the development of natural generative language (NGL) systems such as ChatGPT. Although these LLM can simulate human conversations and generate content in real time, they face challenges related to the topicality and accuracy of the information they generate. This study aimed to assess whether ChatGPT-4 could provide accurate and reliable answers to general dentists in the field of oral surgery, and thus explore its potential as an intelligent virtual assistant in clinical decision making in oral surgery. Thirty questions related to oral surgery were posed to ChatGPT4, each question repeated 30 times. Subsequently, a total of 900 responses were obtained. Two surgeons graded the answers according to the guidelines of the Spanish Society of Oral Surgery, using a three-point Likert scale (correct, partially correct/incomplete, and incorrect). Disagreements were arbitrated by an experienced oral surgeon, who provided the final grade Accuracy was found to be 71.7%, and consistency of the experts' grading across iterations, ranged from moderate to almost perfect. ChatGPT-4, with its potential capabilities, will inevitably be integrated into dental disciplines, including oral surgery. In the future, it could be considered as an auxiliary intelligent virtual assistant, though it would never replace oral surgery experts. Proper training and verified information by experts will remain vital to the implementation of the technology. More comprehensive research is needed to ensure the safe and successful application of AI in oral surgery.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Jaime Jiménez
- Department of Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Cristina Andreu-Vázquez
- Department of Veterinary Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| |
Collapse
|
15
|
Johnvictor AC, Poonkodi M, Prem Sankar N, VS T. TinyML-Based Lightweight AI Healthcare Mobile Chatbot Deployment. J Multidiscip Healthc 2024; 17:5091-5104. [PMID: 39539515 PMCID: PMC11559246 DOI: 10.2147/jmdh.s483247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction In healthcare applications, AI-driven innovations are set to revolutionise patient interactions and care, with the aim of improving patient satisfaction. Recent advancements in Artificial Intelligence have significantly affected nursing, assistive management, medical diagnoses, and other critical medical procedures. Purpose Many artificial intelligence (AI) solutions operate online, posing potential risks to patient data security. To address these security concerns and ensure swift operation, this study has developed a chatbot tailored for hospital environments, running on a local server, and utilising TinyML for processing patient data. Patients and Methods Edge computing technology enables secure on-site data processing. The implementation includes patient identification using a Histogram of Gradient (HOG)-based classification, followed by basic patient care tasks, such as temperature measurement and demographic recording. Results The classification accuracy of patient detection was 95.8%. An autonomous temperature-sensing unit equipped with a medical-grade infrared temperature scanner detected and recorded patient temperature. Following the temperature assessment, the tinyML-powered chatbot engaged patients in a series of questions customised by doctors to train the model for diagnostic scenarios. Patients' responses, recorded as "yes" or "no", are stored and printed in their case sheet. The accuracy of the TinyML model is 95.3% and the on-device processing time is 217 ms. The implemented TinyML model uses only 8.8Kb RAM and 50.3Kb Flash memory, with a latency of only 4 ms. Conclusion Each patient was assigned a unique ID, and their data were securely stored for further consultation and diagnosis via hospital management. This research demonstrates faster patient data recording and increased security compared to existing AI-based healthcare solutions, as all processes occur within the local host.
Collapse
Affiliation(s)
| | - M Poonkodi
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - N Prem Sankar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - Thinesh VS
- Arista Networks Pvt Ltd, Bangalore, India
| |
Collapse
|
16
|
Chow JCL, Li K. Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e64406. [PMID: 39321336 PMCID: PMC11579624 DOI: 10.2196/64406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 09/27/2024]
Abstract
The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy and precision. Human-centered AI emphasizes ethical principles, empathy, and user-centric approaches, ensuring technology aligns with human values and needs. This review critically examines the ethical implications of using large language models (LLMs) like GPT-3 and GPT-4 (OpenAI) in oncology chatbots. It examines how these models replicate human-like language patterns, impacting the design of ethical AI systems. The paper identifies key strategies for ethically developing oncology chatbots, focusing on potential biases arising from extensive datasets and neural networks. Specific datasets, such as those sourced from predominantly Western medical literature and patient interactions, may introduce biases by overrepresenting certain demographic groups. Moreover, the training methodologies of LLMs, including fine-tuning processes, can exacerbate these biases, leading to outputs that may disproportionately favor affluent or Western populations while neglecting marginalized communities. By providing examples of biased outputs in oncology chatbots, the review highlights the ethical challenges LLMs present and the need for mitigation strategies. The study emphasizes integrating human-centric values into AI to mitigate these biases, ultimately advocating for the development of oncology chatbots that are aligned with ethical principles and capable of serving diverse patient populations equitably.
Collapse
Affiliation(s)
- James C L Chow
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Kay Li
- Department of English, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
17
|
Keating M, Bollard SM, Potter S. Assessing the Quality, Readability, and Acceptability of AI-Generated Information in Plastic and Aesthetic Surgery. Cureus 2024; 16:e73874. [PMID: 39697940 PMCID: PMC11652792 DOI: 10.7759/cureus.73874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/16/2024] [Indexed: 12/20/2024] Open
Abstract
INTRODUCTION Within plastic surgery, a patient's most commonly used first point of information before consulting a surgeon is the internet. Free-to-use artificial intelligence (AI) websites like ChatGPT (Generative Pre-trained Transformers) are attractive applications for patient information due to their ability to instantaneously answer almost any query. Although relatively new, ChatGPT is now one of the most popular artificial intelligence conversational software tools. The aim of this study was to evaluate the quality and readability of information given by ChatGPT-4 on key areas in plastic and reconstructive surgery. METHODS The ten plastic and aesthetic surgery topics with the highest worldwide search volume in the 15 years were identified. These were rephrased into question format to create nine individual questions. These questions were then input into ChatGPT-4. The response quality was assessed using the DISCERN. The readability and grade reading level of the responses were calculated using the Flesch-Kincaid Reading Ease Index and Coleman-Liau Index. Twelve physicians working in a plastic and reconstructive surgery unit were asked to rate the clarity and accuracy of the answers on a scale of 1-10 and state 'yes or no' if they would share the generated response with a patient. RESULTS All answers were scored as poor or very poor according to the DISCERN tool. The mean DISCERN score for all questions was 34. The responses also scored low in readability and understandability. The mean FKRE index was 33.6, and the CL index was 15.6. Clinicians working in plastics and reconstructive surgery rated the questions well in clarity and accuracy. The mean clarity score was 7.38, and the accuracy score was 7.4. CONCLUSION This study found that according to validated quality assessment tools, ChatGPT-4 produced low-quality information when asked about popular queries relating to plastic and aesthetic surgery. Furthermore, the information produced was pitched at a high reading level. However, the responses were still rated well in clarity and accuracy, according to clinicians working in plastic surgery. Although improvements need to be made, this study suggests that language models such as ChatGPT could be a useful starting point when developing written health information. With the expansion of AI, improvements in content quality are anticipated.
Collapse
Affiliation(s)
- Muireann Keating
- Department of Plastic and Reconstructive Surgery, St James's Hospital, Dublin, IRL
| | - Stephanie M Bollard
- School of Medicine, University College Dublin, Dublin, IRL
- Department of Plastic and Reconstructive Surgery, St James's Hospital, Dublin, IRL
| | - Shirley Potter
- Plastic and Reconstructive Surgery, Mater Misericordiae University Hospital, Dublin, IRL
- School of Medicine, University College Dublin, Dublin, IRL
| |
Collapse
|
18
|
Huo B, Marfo N, Sylla P, Calabrese E, Kumar S, Slater BJ, Walsh DS, Vosburg W. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg Endosc 2024; 38:5668-5677. [PMID: 39134725 DOI: 10.1007/s00464-024-11155-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/04/2024] [Indexed: 10/08/2024]
Abstract
BACKGROUND Large Language Models (LLMs) provide clinical guidance with inconsistent accuracy due to limitations with their training dataset. LLMs are "teachable" through customization. We compared the ability of the generic ChatGPT-4 model and a customized version of ChatGPT-4 to provide recommendations for the surgical management of gastroesophageal reflux disease (GERD) to both surgeons and patients. METHODS Sixty patient cases were developed using eligibility criteria from the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) & United European Gastroenterology (UEG)-European Association of Endoscopic. Surgery (EAES) guidelines for the surgical management of GERD. Standardized prompts were engineered for physicians as the end-user, with separate layperson prompts for patients. A customized GPT was developed to generate recommendations based on guidelines, called the GERD Tool for Surgery (GTS). Both the GTS and generic ChatGPT-4 were queried July 21st, 2024. Model performance was evaluated by comparing responses to SAGES & UEG-EAES guideline recommendations. Outcome data was presented using descriptive statistics including counts and percentages. RESULTS The GTS provided accurate recommendations for the surgical management of GERD for 60/60 (100.0%) surgeon inquiries and 40/40 (100.0%) patient inquiries based on guideline recommendations. The Generic ChatGPT-4 model generated accurate guidance for 40/60 (66.7%) surgeon inquiries and 19/40 (47.5%) patient inquiries. The GTS produced recommendations based on the 2021 SAGES & UEG-EAES guidelines on the surgical management of GERD, while the generic ChatGPT-4 model generated guidance without citing evidence to support its recommendations. CONCLUSION ChatGPT-4 can be customized to overcome limitations with its training dataset to provide recommendations for the surgical management of GERD with reliable accuracy and consistency. The training of LLM models can be used to help integrate this efficient technology into the creation of robust and accurate information for both surgeons and patients. Prospective data is needed to assess its effectiveness in a pragmatic clinical environment.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada
| | - Nana Marfo
- Ross University School of Medicine, Miramar, FL, USA
| | - Patricia Sylla
- Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Sunjay Kumar
- Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | | | - Danielle S Walsh
- Department of Surgery, University of Kentucky, Lexington, KY, USA
| | - Wesley Vosburg
- Department of Surgery, Mount Auburn Hospital, Harvard Medical School, Cambridge, MA, USA.
| |
Collapse
|
19
|
Mondal H, De R, Mondal S, Juhi A. A large language model in solving primary healthcare issues: A potential implication for remote healthcare and medical education. JOURNAL OF EDUCATION AND HEALTH PROMOTION 2024; 13:362. [PMID: 39679030 PMCID: PMC11639534 DOI: 10.4103/jehp.jehp_688_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/17/2023] [Indexed: 12/17/2024]
Abstract
BACKGROUND AND AIM Access to quality health care is essential, particularly in remote areas where the availability of healthcare professionals may be limited. The advancement of artificial intelligence (AI) and natural language processing (NLP) has led to the development of large language models (LLMs) that exhibit capabilities in understanding and generating human-like text. This study aimed to evaluate the performance of a LLM, ChatGPT, in addressing primary healthcare issues. MATERIALS AND METHODS This study was conducted in May 2023 with ChatGPT May 12 version. A total of 30 multiple-choice questions (MCQs) related to primary health care were selected to test the proficiency of ChatGPT. These MCQs covered various topics commonly encountered in primary healthcare practice. ChatGPT answered the questions in two segments-one is choosing the single best answer of MCQ and another is supporting text for the answer. The answers to MCQs were compared with the predefined answer keys. The justifications of the answers were checked by two primary healthcare professionals on a 5-point Likert-type scale. The data were presented as number and percentage. RESULTS Among the 30 questions, ChatGPT provided correct responses for 28 yielding an accuracy of 93.33%. The mean score for explanation in supporting the answer was 4.58 ± 0.85. There was an inter-item correlation of 0.896, and the average measure intraclass correlation coefficient (ICC) was 0.94 (95% confidence interval 0.88-0.97) indicating a high level of interobserver agreement. CONCLUSION LLMs, such as ChatGPT, show promising potential in addressing primary healthcare issues. The high accuracy rate achieved by ChatGPT in answering primary healthcare-related MCQs underscores the value of these models as resources for patients and healthcare providers in remote healthcare settings. This can also help in self-directed learning by medical students.
Collapse
Affiliation(s)
- Himel Mondal
- Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India
| | - Rajesh De
- Department of Community Medicine, Malda Medical College and Hospital, Malda, West Bengal, India
| | - Shaikat Mondal
- Department of Physiology, Raiganj Government Medical College and Hospital, Raiganj, West Bengal, India
| | - Ayesha Juhi
- Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India
| |
Collapse
|
20
|
Haran C, Allan P, Dholakia J, Lai S, Lim E, Xu W, Hart O, Cain J, Narayanan A, Khashram M. The application and uses of telemedicine in vascular surgery: A narrative review. Semin Vasc Surg 2024; 37:290-297. [PMID: 39277344 DOI: 10.1053/j.semvascsurg.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 07/14/2024] [Accepted: 07/22/2024] [Indexed: 09/17/2024]
Abstract
Technological advances over the past century have accelerated the pace and breadth of medical and surgical care. From the initial delivery of "telemedicine" over the radio in the 1920s, the delivery of medicine and surgery in the 21st century is no longer limited by connectivity. The COVID-19 pandemic hastened the uptake of telemedicine to ensure that health care can be maintained despite limited face-to-face contact. Like other areas of medicine, vascular surgery has adopted telemedicine, although its role is not well described in the literature. This narrative review explores how telemedicine has been delivered in vascular surgery. Specific themes of telemedicine are outlined with real-world examples, including consultation, triaging, collaboration, mentoring, monitoring and surveillance, mobile health, and education. This review also explores possible future advances in telemedicine and issues around equity of care. Finally, important ethical considerations and limitations related to the applications of telemedicine are outlined.
Collapse
Affiliation(s)
- Cheyaanthan Haran
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand; Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Philip Allan
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand
| | - Jhanvi Dholakia
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand
| | - Simon Lai
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand; Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Eric Lim
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand
| | - William Xu
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand
| | - Odette Hart
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand; Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Justin Cain
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand
| | - Anantha Narayanan
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand; Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Manar Khashram
- Department of Vascular Surgery, Waikato Hospital, 183 Pembroke Street, Hamilton 3204, New Zealand; Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand.
| |
Collapse
|
21
|
Lee JW, Yoo IS, Kim JH, Kim WT, Jeon HJ, Yoo HS, Shin JG, Kim GH, Hwang S, Park S, Kim YJ. Development of AI-generated medical responses using the ChatGPT for cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108302. [PMID: 38996805 DOI: 10.1016/j.cmpb.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/28/2024] [Accepted: 06/22/2024] [Indexed: 07/14/2024]
Abstract
BACKGROUND AND OBJECTIVE To develop a healthcare chatbot service (AI-guided bot) that conducts real-time conversations using large language models to provide accurate health information to patients. METHODS To provide accurate and specialized medical responses, we integrated several cancer practice guidelines. The size of the integrated meta-dataset was 1.17 million tokens. The integrated and classified metadata were extracted, transformed into text, segmented to specific character lengths, and vectorized using the embedding model. The AI-guide bot was implemented using Python 3.9. To enhance the scalability and incorporate the integrated dataset, we combined the AI-guide bot with OpenAI and the LangChain framework. To generate user-friendly conversations, a language model was developed based on Chat-Generative Pretrained Transformer (ChatGPT), an interactive conversational chatbot powered by GPT-3.5. The AI-guide bot was implemented using ChatGPT3.5 from Sep. 2023 to Jan. 2024. RESULTS The AI-guide bot allowed users to select their desired cancer type and language for conversational interactions. The AI-guided bot was designed to expand its capabilities to encompass multiple major cancer types. The performance of the AI-guide bot responses was 90.98 ± 4.02 (obtained by summing up the Likert scores). CONCLUSIONS The AI-guide bot can provide medical information quickly and accurately to patients with cancer who are concerned about their health.
Collapse
Affiliation(s)
- Jae-Woo Lee
- Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Family Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - In-Sang Yoo
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - Ji-Hye Kim
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Won Tae Kim
- Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea
| | - Hyun Jeong Jeon
- Department of Internal Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Internal Medicine, College of Medicine, Chungbuk National University, Cheongju, Republic of Korea
| | - Hyo-Sun Yoo
- Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Jae Gwang Shin
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Geun-Hyeong Kim
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - ShinJi Hwang
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Seung Park
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - Yong-June Kim
- Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea.
| |
Collapse
|
22
|
Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, Kertes PJ, Muni RH. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye (Lond) 2024; 38:2530-2535. [PMID: 38615098 PMCID: PMC11383935 DOI: 10.1038/s41433-024-03067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/08/2024] [Accepted: 04/04/2024] [Indexed: 04/15/2024] Open
Abstract
PURPOSE With the popularization of ChatGPT (Open AI, San Francisco, California, United States) in recent months, understanding the potential of artificial intelligence (AI) chatbots in a medical context is important. Our study aims to evaluate Google Gemini and Bard's (Google, Mountain View, California, United States) knowledge in ophthalmology. METHODS In this study, we evaluated Google Gemini and Bard's performance on EyeQuiz, a platform containing ophthalmology board certification examination practice questions, when used from the United States (US). Accuracy, response length, response time, and provision of explanations were evaluated. Subspecialty-specific performance was noted. A secondary analysis was conducted using Bard from Vietnam, and Gemini from Vietnam, Brazil, and the Netherlands. RESULTS Overall, Google Gemini and Bard both had accuracies of 71% across 150 text-based multiple-choice questions. The secondary analysis revealed an accuracy of 67% using Bard from Vietnam, with 32 questions (21%) answered differently than when using Bard from the US. Moreover, the Vietnam version of Gemini achieved an accuracy of 74%, with 23 (15%) answered differently than the US version of Gemini. While the Brazil (68%) and Netherlands (65%) versions of Gemini performed slightly worse than the US version, differences in performance across the various country-specific versions of Bard and Gemini were not statistically significant. CONCLUSION Google Gemini and Bard had an acceptable performance in responding to ophthalmology board examination practice questions. Subtle variability was noted in the performance of the chatbots across different countries. The chatbots also tended to provide a confident explanation even when providing an incorrect answer.
Collapse
Affiliation(s)
- Andrew Mihalache
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Justin Grad
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | - Nikhil S Patil
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Marko M Popovic
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada
| | - Ashwin Mallipatna
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada
- Department of Ophthalmology, Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
| | - Peter J Kertes
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Sciences Centre, Toronto, ON, Canada
| | - Rajeev H Muni
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada.
- Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, ON, Canada.
| |
Collapse
|
23
|
Cherrez-Ojeda I, Gallardo-Bastidas JC, Robles-Velasco K, Osorio MF, Velez Leon EM, Leon Velastegui M, Pauletto P, Aguilar-Díaz FC, Squassi A, González Eras SP, Cordero Carrasco E, Chavez Gonzalez KL, Calderon JC, Bousquet J, Bedbrook A, Faytong-Haro M. Understanding Health Care Students' Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024; 10:e51757. [PMID: 39137029 DOI: 10.2196/51757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/26/2023] [Accepted: 04/30/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area. OBJECTIVE The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants' attitudes toward the use of ChatGPT. METHODS A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses. RESULTS Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was "minimal" (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) "somewhat agreed" that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results). CONCLUSIONS Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs.
Collapse
Affiliation(s)
- Ivan Cherrez-Ojeda
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | | | - Karla Robles-Velasco
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | - María F Osorio
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | | | | | | | - F C Aguilar-Díaz
- Departamento Salud Pública, Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México, Guanajuato, Mexico
| | - Aldo Squassi
- Universidad de Buenos Aires, Facultad de Odontologìa, Cátedra de Odontología Preventiva y Comunitaria, Buenos Aires, Argentina
| | | | - Erita Cordero Carrasco
- Departamento de cirugía y traumatología bucal y maxilofacial, Universidad de Chile, Santiago, Chile
| | | | - Juan C Calderon
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | - Jean Bousquet
- Institute of Allergology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Allergology and Immunology, Berlin, Germany
- MASK-air, Montpellier, France
| | | | - Marco Faytong-Haro
- Respiralab Research Group, Guayaquil, Ecuador
- Universidad Estatal de Milagro, Cdla Universitaria "Dr. Rómulo Minchala Murillo", Milagro, Ecuador
- Ecuadorian Development Research Lab, Daule, Ecuador
| |
Collapse
|
24
|
Takahashi H, Shikino K, Kondo T, Komori A, Yamada Y, Saita M, Naito T. Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e59133. [PMID: 39137031 PMCID: PMC11350316 DOI: 10.2196/59133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/22/2024] [Accepted: 06/27/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored. OBJECTIVE This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings. METHODS Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases. RESULTS Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations. CONCLUSIONS ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.
Collapse
Affiliation(s)
- Hiromizu Takahashi
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| | - Kiyoshi Shikino
- Department of Community-Oriented Medical Education, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Takeshi Kondo
- Center for Postgraduate Clinical Training and Career Development, Nagoya University Hospital, Aichi, Japan
| | - Akira Komori
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Tsukuba Memorial Hospital, Tsukuba, Japan
| | - Yuji Yamada
- Brookdale Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, NY, NY, United States
| | - Mizue Saita
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| | - Toshio Naito
- Department of General Medicine, Juntendo University Faculty of Medicine, Tokyo, Japan
| |
Collapse
|
25
|
Sharma H, Ruikar M. Artificial intelligence at the pen's edge: Exploring the ethical quagmires in using artificial intelligence models like ChatGPT for assisted writing in biomedical research. Perspect Clin Res 2024; 15:108-115. [PMID: 39140014 PMCID: PMC11318783 DOI: 10.4103/picr.picr_196_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 08/15/2024] Open
Abstract
Chat generative pretrained transformer (ChatGPT) is a conversational language model powered by artificial intelligence (AI). It is a sophisticated language model that employs deep learning methods to generate human-like text outputs to inputs in the natural language. This narrative review aims to shed light on ethical concerns about using AI models like ChatGPT in writing assistance in the health care and medical domains. Currently, all the AI models like ChatGPT are in the infancy stage; there is a risk of inaccuracy of the generated content, lack of contextual understanding, dynamic knowledge gaps, limited discernment, lack of responsibility and accountability, issues of privacy, data security, transparency, and bias, lack of nuance, and originality. Other issues such as authorship, unintentional plagiarism, falsified and fabricated content, and the threat of being red-flagged as AI-generated content highlight the need for regulatory compliance, transparency, and disclosure. If the legitimate issues are proactively considered and addressed, the potential applications of AI models as writing assistance could be rewarding.
Collapse
Affiliation(s)
- Hunny Sharma
- Department of Community and Family Medicine, All India Institute of Medical Sciences, Raipur, Chhattisgarh, India
| | - Manisha Ruikar
- Department of Community and Family Medicine, All India Institute of Medical Sciences, Raipur, Chhattisgarh, India
| |
Collapse
|
26
|
Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res 2024; 26:e54571. [PMID: 38935937 PMCID: PMC11240076 DOI: 10.2196/54571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/02/2024] [Accepted: 04/29/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. OBJECTIVE This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. METHODS A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. RESULTS Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. CONCLUSIONS ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
Collapse
Affiliation(s)
- Adi Lahat
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Gastroenterology, Samson Assuta Ashdod Medical Center, Affiliated with Ben Gurion University of the Negev, Be'er Sheva, Israel
| | - Kassem Sharif
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Narmin Zoabi
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
| | | | - Yousra Sharif
- Department of Internal Medicine C, Hadassah Medical Center, Jerusalem, Israel
| | - Lior Fisher
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Uria Shani
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Mohamad Arow
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Roni Levin
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
27
|
Hussain T, Wang D, Li B. The influence of the COVID-19 pandemic on the adoption and impact of AI ChatGPT: Challenges, applications, and ethical considerations. Acta Psychol (Amst) 2024; 246:104264. [PMID: 38626597 DOI: 10.1016/j.actpsy.2024.104264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/18/2024] Open
Abstract
DESIGN/METHODOLOGY/APPROACH This article employs qualitative thematic modeling to gather insights from 30 informants. The study explores various aspects related to the impact of the COVID-19 pandemic on AI ChatGPT technologies. PURPOSE The purpose of this research is to examine how the COVID-19 pandemic has influenced the increased usage and adoption of AI ChatGPT. It aims to explore the pandemic's impact on AI ChatGPT and its applications in specific domains, as well as the challenges and opportunities it presents. FINDINGS The findings highlight that the pandemic has led to a surge in online activities, resulting in a heightened demand for AI ChatGPT. It has been widely used in areas such as healthcare, mental health support, remote collaboration, and personalized customer experiences. The article showcases examples of AI ChatGPT's application during the pandemic. STRENGTH OF STUDY This qualitative framework enables the study to delve deeply into the multifaceted dimensions of AI ChatGPT's role during the pandemic, capturing the diverse experiences and insights of users, practitioners, and experts. By embracing the qualitative nature of inquiry and this research offers a comprehensive understanding of the challenges, opportunities, and ethical considerations associated with the adoption and utilization of AI ChatGPT in crisis contexts. PRACTICAL IMPLICATIONS The insights from this research have practical implications for policymakers, developers, and researchers. This reserach emphasize the need for responsible and ethical implementation of AI ChatGPT to fully harness its potential in addressing societal needs during and beyond the pandemic. SOCIAL IMPLICATIONS The increased reliance on AI ChatGPT during the pandemic has led to changes in user behavior, expectations, and interactions. However, it has also unveiled ethical considerations and potential risks. Addressing societal and ethical concerns, such as user impact and autonomy, privacy and security, bias and fairness, and transparency and accountability, is crucial for the responsible deployment of AI ChatGPT. ORIGINALITY/VALUE This research contributes to the understanding of the novel role of AI ChatGPT in times of crisis, particularly in the era of COVID-19 pandemic. It highlights the necessity of responsible and ethical implementation of AI ChatGPT and provides valuable insights for the development and application of AI technology in the future.
Collapse
Affiliation(s)
- Talib Hussain
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China; Department of Media Management, University of Religions and Denominations, Qom 37491-13357, Iran.
| | - Dake Wang
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China.
| | - Benqian Li
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China.
| |
Collapse
|
28
|
Denecke K, May R, Rivera Romero O. Potential of Large Language Models in Health Care: Delphi Study. J Med Internet Res 2024; 26:e52399. [PMID: 38739445 PMCID: PMC11130776 DOI: 10.2196/52399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/10/2023] [Accepted: 04/19/2024] [Indexed: 05/14/2024] Open
Abstract
BACKGROUND A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. OBJECTIVE The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. METHODS We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. RESULTS The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. CONCLUSIONS Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.
Collapse
Affiliation(s)
| | - Richard May
- Harz University of Applied Sciences, Wernigerode, Germany
| | - Octavio Rivera Romero
- Instituto de Ingeniería Informática (I3US), Universidad de Sevilla, Sevilla, Spain
- Department of Electronic Technology, Universidad de Sevilla, Sevilla, Spain
| |
Collapse
|
29
|
Pinto DS, Noronha SM, Saigal G, Quencer RM. Comparison of an AI-Generated Case Report With a Human-Written Case Report: Practical Considerations for AI-Assisted Medical Writing. Cureus 2024; 16:e60461. [PMID: 38883028 PMCID: PMC11179998 DOI: 10.7759/cureus.60461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/18/2024] Open
Abstract
INTRODUCTION The utility of ChatGPT has recently caused consternation in the medical world. While it has been utilized to write manuscripts, only a few studies have evaluated the quality of manuscripts generated by AI (artificial intelligence). OBJECTIVE We evaluate the ability of ChatGPT to write a case report when provided with a framework. We also provide practical considerations for manuscript writing using AI. METHODS We compared a manuscript written by a blinded human author (10 years of medical experience) with a manuscript written by ChatGPT on a rare presentation of a common disease. We used multiple iterations of the manuscript generation request to derive the best ChatGPT output. Participants, outcomes, and measures: 22 human reviewers compared the manuscripts using parameters that characterize human writing and relevant standard manuscript assessment criteria, viz., scholarly impact quotient (SIQ). We also compared the manuscripts using the "average perplexity score" (APS), "burstiness score" (BS), and "highest perplexity of a sentence" (GPTZero parameters to detect AI-generated content). RESULTS The human manuscript had a significantly higher quality of presentation and nuanced writing (p<0.05). Both manuscripts had a logical flow. 12/22 reviewers were able to identify the AI-generated manuscript (p<0.05), but 4/22 reviewers wrongly identified the human-written manuscript as AI-generated. GPTZero software erroneously identified four sentences of the human-written manuscript to be AI-generated. CONCLUSION Though AI showed an ability to highlight the novelty of the case report and project a logical flow comparable to the human manuscript, it could not outperform the human writer on all parameters. The human manuscript showed a better quality of presentation and more nuanced writing. The practical considerations we provide for AI-assisted medical writing will help to better utilize AI in manuscript writing.
Collapse
Affiliation(s)
| | | | - Gaurav Saigal
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| | - Robert M Quencer
- Radiology, University of Miami Miller School of Medicine, Miami, USA
| |
Collapse
|
30
|
Lang S, Vitale J, Fekete TF, Haschtmann D, Reitmeir R, Ropelato M, Puhakka J, Galbusera F, Loibl M. Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective. BRAIN & SPINE 2024; 4:102804. [PMID: 38706800 PMCID: PMC11067000 DOI: 10.1016/j.bas.2024.102804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/19/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024]
Abstract
Introduction Generative AI is revolutionizing patient education in healthcare, particularly through chatbots that offer personalized, clear medical information. Reliability and accuracy are vital in AI-driven patient education. Research question How effective are Large Language Models (LLM), such as ChatGPT and Google Bard, in delivering accurate and understandable patient education on lumbar disc herniation? Material and methods Ten Frequently Asked Questions about lumbar disc herniation were selected from 133 questions and were submitted to three LLMs. Six experienced spine surgeons rated the responses on a scale from "excellent" to "unsatisfactory," and evaluated the answers for exhaustiveness, clarity, empathy, and length. Statistical analysis involved Fleiss Kappa, Chi-square, and Friedman tests. Results Out of the responses, 27.2% were excellent, 43.9% satisfactory with minimal clarification, 18.3% satisfactory with moderate clarification, and 10.6% unsatisfactory. There were no significant differences in overall ratings among the LLMs (p = 0.90); however, inter-rater reliability was not achieved, and large differences among raters were detected in the distribution of answer frequencies. Overall, ratings varied among the 10 answers (p = 0.043). The average ratings for exhaustiveness, clarity, empathy, and length were above 3.5/5. Discussion and conclusion LLMs show potential in patient education for lumbar spine surgery, with generally positive feedback from evaluators. The new EU AI Act, enforcing strict regulation on AI systems, highlights the need for rigorous oversight in medical contexts. In the current study, the variability in evaluations and occasional inaccuracies underline the need for continuous improvement. Future research should involve more advanced models to enhance patient-physician communication.
Collapse
Affiliation(s)
- Siegmund Lang
- Department of Trauma Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Jacopo Vitale
- Spine Center, Schulthess Klinik, Zurich, Switzerland
| | | | | | | | | | - Jani Puhakka
- Spine Center, Schulthess Klinik, Zurich, Switzerland
| | | | - Markus Loibl
- Spine Center, Schulthess Klinik, Zurich, Switzerland
| |
Collapse
|
31
|
Valentini M, Szkandera J, Smolle MA, Scheipl S, Leithner A, Andreou D. Artificial intelligence large language model ChatGPT: is it a trustworthy and reliable source of information for sarcoma patients? Front Public Health 2024; 12:1303319. [PMID: 38584922 PMCID: PMC10995284 DOI: 10.3389/fpubh.2024.1303319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/06/2024] [Indexed: 04/09/2024] Open
Abstract
Introduction Since its introduction in November 2022, the artificial intelligence large language model ChatGPT has taken the world by storm. Among other applications it can be used by patients as a source of information on diseases and their treatments. However, little is known about the quality of the sarcoma-related information ChatGPT provides. We therefore aimed at analyzing how sarcoma experts evaluate the quality of ChatGPT's responses on sarcoma-related inquiries and assess the bot's answers in specific evaluation metrics. Methods The ChatGPT responses to a sample of 25 sarcoma-related questions (5 definitions, 9 general questions, and 11 treatment-related inquiries) were evaluated by 3 independent sarcoma experts. Each response was compared with authoritative resources and international guidelines and graded on 5 different metrics using a 5-point Likert scale: completeness, misleadingness, accuracy, being up-to-date, and appropriateness. This resulted in maximum 25 and minimum 5 points per answer, with higher scores indicating a higher response quality. Scores ≥21 points were rated as very good, between 16 and 20 as good, while scores ≤15 points were classified as poor (11-15) and very poor (≤10). Results The median score that ChatGPT's answers achieved was 18.3 points (IQR, i.e., Inter-Quartile Range, 12.3-20.3 points). Six answers were classified as very good, 9 as good, while 5 answers each were rated as poor and very poor. The best scores were documented in the evaluation of how appropriate the response was for patients (median, 3.7 points; IQR, 2.5-4.2 points), which were significantly higher compared to the accuracy scores (median, 3.3 points; IQR, 2.0-4.2 points; p = 0.035). ChatGPT fared considerably worse with treatment-related questions, with only 45% of its responses classified as good or very good, compared to general questions (78% of responses good/very good) and definitions (60% of responses good/very good). Discussion The answers ChatGPT provided on a rare disease, such as sarcoma, were found to be of very inconsistent quality, with some answers being classified as very good and others as very poor. Sarcoma physicians should be aware of the risks of misinformation that ChatGPT poses and advise their patients accordingly.
Collapse
Affiliation(s)
- Marisa Valentini
- Department of Orthopaedics and Trauma, Medical University of Graz, Graz, Austria
| | - Joanna Szkandera
- Division of Oncology, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Maria Anna Smolle
- Department of Orthopaedics and Trauma, Medical University of Graz, Graz, Austria
| | - Susanne Scheipl
- Department of Orthopaedics and Trauma, Medical University of Graz, Graz, Austria
| | - Andreas Leithner
- Department of Orthopaedics and Trauma, Medical University of Graz, Graz, Austria
| | - Dimosthenis Andreou
- Department of Orthopaedics and Trauma, Medical University of Graz, Graz, Austria
| |
Collapse
|
32
|
Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024; 17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
ChatGPT, an AI-driven conversational large language model (LLM), has garnered significant scholarly attention since its inception, owing to its manifold applications in the realm of medical science. This study primarily examines the merits, limitations, anticipated developments, and practical applications of ChatGPT in clinical practice, healthcare, medical education, and medical research. It underscores the necessity for further research and development to enhance its performance and deployment. Moreover, future research avenues encompass ongoing enhancements and standardization of ChatGPT, mitigating its limitations, and exploring its integration and applicability in translational and personalized medicine. Reflecting the narrative nature of this review, a focused literature search was performed to identify relevant publications on ChatGPT's use in medicine. This process was aimed at gathering a broad spectrum of insights to provide a comprehensive overview of the current state and future prospects of ChatGPT in the medical domain. The objective is to aid healthcare professionals in understanding the groundbreaking advancements associated with the latest artificial intelligence tools, while also acknowledging the opportunities and challenges presented by ChatGPT.
Collapse
Affiliation(s)
- Yonglin Mu
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Dawei He
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
33
|
Bellini V, Semeraro F, Montomoli J, Cascella M, Bignami E. Between human and AI: assessing the reliability of AI text detection tools. Curr Med Res Opin 2024; 40:353-358. [PMID: 38265047 DOI: 10.1080/03007995.2024.2310086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 01/25/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT-4 have raised critical questions regarding their distinguishability from human-generated content. In this research, we evaluated the effectiveness of online detection tools in identifying ChatGPT-4 vs human-written text. METHODS A two texts produced by ChatGPT-4 using differing prompts and one text created by a human author were analytically assessed using the following online detection tools: GPTZero, ZeroGPT, Writer ACD, and Originality. RESULTS The findings revealed a notable variance in the detection capabilities of the employed detection tools. GPTZero and ZeroGPT exhibited inconsistent assessments regarding the AI-origin of the texts. Writer ACD predominantly identified texts as human-written, whereas Originality consistently recognized the AI-generated content in both samples from ChatGPT-4. This highlights Originality's enhanced sensitivity to patterns characteristic of AI-generated text. CONCLUSION The study demonstrates that while automatic detection tools may discern texts generated by ChatGPT-4 significant variability exists in their accuracy. Undoubtedly, there is an urgent need for advanced detection tools to ensure the authenticity and integrity of content, especially in scientific and academic research. However, our findings underscore an urgent need for more refined detection methodologies to prevent the misdetection of human-written content as AI-generated and vice versa.
Collapse
Affiliation(s)
- Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Maggiore Hospital Carlo Alberto Pizzardi, Bologna, Italy
| | - Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Rimini, Italy
| | - Marco Cascella
- Anesthesia and Pain Medicine. Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Baronissi, Italy
| | - Elena Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy
| |
Collapse
|
34
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
35
|
Ajagunde J, Das NK. ChatGPT Versus Medical Professionals. Health Serv Insights 2024; 17:11786329241230161. [PMID: 38322596 PMCID: PMC10845989 DOI: 10.1177/11786329241230161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024] Open
Affiliation(s)
- Jyoti Ajagunde
- Department of Microbiology, Dr. D Y Patil Medical College, Dr. D Y Patil Vidyapeeth, Pimpri, Pune, Maharashtra, India
| | - Nikunja Kumar Das
- Department of Microbiology, Dr. D Y Patil Medical College, Dr. D Y Patil Vidyapeeth, Pimpri, Pune, Maharashtra, India
| |
Collapse
|
36
|
Yan S, Du D, Liu X, Dai Y, Kim MK, Zhou X, Wang L, Zhang L, Jiang X. Assessment of the Reliability and Clinical Applicability of ChatGPT's Responses to Patients' Common Queries About Rosacea. Patient Prefer Adherence 2024; 18:249-253. [PMID: 38313827 PMCID: PMC10838492 DOI: 10.2147/ppa.s444928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/22/2024] [Indexed: 02/06/2024] Open
Abstract
Objective Artificial intelligence chatbot, particularly ChatGPT (Chat Generative Pre-trained Transformer), is capable of analyzing human input and generating human-like responses, which shows its potential application in healthcare. People with rosacea often have questions about alleviating symptoms and daily skin-care, which is suitable for ChatGPT to response. This study aims to assess the reliability and clinical applicability of ChatGPT 3.5 in responding to patients' common queries about rosacea and to evaluate the extent of ChatGPT's coverage in dermatology resources. Methods Based on a qualitative analysis of the literature on the queries from rosacea patients, we have extracted 20 questions of patients' greatest concerns, covering four main categories: treatment, triggers and diet, skincare, and special manifestations of rosacea. Each question was inputted into ChatGPT separately for three rounds of question-and-answer conversations. The generated answers will be evaluated by three experienced dermatologists with postgraduate degrees and over five years of clinical experience in dermatology, to assess their reliability and applicability for clinical practice. Results The analysis results indicate that the reviewers unanimously agreed that ChatGPT achieved a high reliability of 92.22% to 97.78% in responding to patients' common queries about rosacea. Additionally, almost all answers were applicable for supporting rosacea patient education, with a clinical applicability ranging from 98.61% to 100.00%. The consistency of the expert ratings was excellent (all significance levels were less than 0.05), with a consistency coefficient of 0.404 for content reliability and 0.456 for clinical practicality, indicating significant consistency in the results and a high level of agreement among the expert ratings. Conclusion ChatGPT 3.5 exhibits excellent reliability and clinical applicability in responding to patients' common queries about rosacea. This artificial intelligence tool is applicable for supporting rosacea patient education.
Collapse
Affiliation(s)
- Sihan Yan
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Dan Du
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Xu Liu
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Yingying Dai
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Min-Kyu Kim
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Xinyu Zhou
- Department of Dermatology, Nanbu County People’s Hospital, Nanbu County, Nanchong, Sichuan, People’s Republic of China
| | - Lian Wang
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Lu Zhang
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| | - Xian Jiang
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
| |
Collapse
|
37
|
Zaleski AL, Berkowsky R, Craig KJT, Pescatello LS. Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e51308. [PMID: 38206661 PMCID: PMC10811574 DOI: 10.2196/51308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/05/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024]
Abstract
BACKGROUND Regular physical activity is critical for health and disease prevention. Yet, health care providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored. OBJECTIVE The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot. METHODS A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including (1) health condition-specific benefits of exercise, (2) exercise preparticipation health screening, (3) frequency, (4) intensity, (5) time, (6) type, (7) volume, (8) progression, (9) special considerations, and (10) references to the primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0%-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output. RESULTS AI-generated exercise recommendations were 41.2% (107/260) comprehensive and 90.7% (146/161) accurate, with the majority (8/15, 53%) of inaccuracy related to the need for exercise preparticipation medical clearance. Average readability level of AI-generated exercise recommendations was at the college level (mean 13.7, SD 1.7), with an average Flesch reading ease score of 31.1 (SD 7.7). Several recurring themes and observations of AI-generated output included concern for liability and safety, preference for aerobic exercise, and potential bias and direct discrimination against certain age-based populations and individuals with disabilities. CONCLUSIONS There were notable gaps in the comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and health care professionals should be aware of these limitations when using and endorsing AI-based technologies as a tool to support lifestyle change involving exercise.
Collapse
Affiliation(s)
- Amanda L Zaleski
- Clinical Evidence Development, Aetna Medical Affairs, CVS Health Corporation, Hartford, CT, United States
- Department of Preventive Cardiology, Hartford Hospital, Hartford, CT, United States
| | - Rachel Berkowsky
- Department of Kinesiology, University of Connecticut, Storrs, CT, United States
| | - Kelly Jean Thomas Craig
- Clinical Evidence Development, Aetna Medical Affairs, CVS Health Corporation, Hartford, CT, United States
| | - Linda S Pescatello
- Department of Kinesiology, University of Connecticut, Storrs, CT, United States
| |
Collapse
|
38
|
Younis HA, Eisa TAE, Nasser M, Sahib TM, Noor AA, Alyasiri OM, Salisu S, Hayder IM, Younis HA. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics (Basel) 2024; 14:109. [PMID: 38201418 PMCID: PMC10802884 DOI: 10.3390/diagnostics14010109] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in various sectors, including medicine and healthcare. Large language models like ChatGPT showcase AI's potential by generating human-like text through prompts. ChatGPT's adaptability holds promise for reshaping medical practices, improving patient care, and enhancing interactions among healthcare professionals, patients, and data. In pandemic management, ChatGPT rapidly disseminates vital information. It serves as a virtual assistant in surgical consultations, aids dental practices, simplifies medical education, and aids in disease diagnosis. A total of 82 papers were categorised into eight major areas, which are G1: treatment and medicine, G2: buildings and equipment, G3: parts of the human body and areas of the disease, G4: patients, G5: citizens, G6: cellular imaging, radiology, pulse and medical images, G7: doctors and nurses, and G8: tools, devices and administration. Balancing AI's role with human judgment remains a challenge. A systematic literature review using the PRISMA approach explored AI's transformative potential in healthcare, highlighting ChatGPT's versatile applications, limitations, motivation, and challenges. In conclusion, ChatGPT's diverse medical applications demonstrate its potential for innovation, serving as a valuable resource for students, academics, and researchers in healthcare. Additionally, this study serves as a guide, assisting students, academics, and researchers in the field of medicine and healthcare alike.
Collapse
Affiliation(s)
- Hussain A. Younis
- College of Education for Women, University of Basrah, Basrah 61004, Iraq
| | | | - Maged Nasser
- Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Thaeer Mueen Sahib
- Kufa Technical Institute, Al-Furat Al-Awsat Technical University, Kufa 54001, Iraq;
| | - Ameen A. Noor
- Computer Science Department, College of Education, University of Almustansirya, Baghdad 10045, Iraq;
| | | | - Sani Salisu
- Department of Information Technology, Federal University Dutse, Dutse 720101, Nigeria;
| | - Israa M. Hayder
- Qurna Technique Institute, Southern Technical University, Basrah 61016, Iraq;
| | - Hameed AbdulKareem Younis
- Department of Cybersecurity, College of Computer Science and Information Technology, University of Basrah, Basrah 61016, Iraq;
| |
Collapse
|
39
|
Malik S, Zaheer S. ChatGPT as an aid for pathological diagnosis of cancer. Pathol Res Pract 2024; 253:154989. [PMID: 38056135 DOI: 10.1016/j.prp.2023.154989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]
Abstract
Diagnostic workup of cancer patients is highly reliant on the science of pathology using cytopathology, histopathology, and other ancillary techniques like immunohistochemistry and molecular cytogenetics. Data processing and learning by means of artificial intelligence (AI) has become a spearhead for the advancement of medicine, with pathology and laboratory medicine being no exceptions. ChatGPT, an artificial intelligence (AI)-based chatbot, that was recently launched by OpenAI, is currently a talk of the town, and its role in cancer diagnosis is also being explored meticulously. Pathology workflow by integration of digital slides, implementation of advanced algorithms, and computer-aided diagnostic techniques extend the frontiers of the pathologist's view beyond a microscopic slide and enables effective integration, assimilation, and utilization of knowledge that is beyond human limits and boundaries. Despite of it's numerous advantages in the pathological diagnosis of cancer, it comes with several challenges like integration of digital slides with input language parameters, problems of bias, and legal issues which have to be addressed and worked up soon so that we as a pathologists diagnosing malignancies are on the same band wagon and don't miss the train.
Collapse
Affiliation(s)
- Shaivy Malik
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
| | - Sufian Zaheer
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India.
| |
Collapse
|
40
|
Alotaibi SS, Rehman A, Hasnain M. Revolutionizing ocular cancer management: a narrative review on exploring the potential role of ChatGPT. Front Public Health 2023; 11:1338215. [PMID: 38192545 PMCID: PMC10773849 DOI: 10.3389/fpubh.2023.1338215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
This paper pioneers the exploration of ocular cancer, and its management with the help of Artificial Intelligence (AI) technology. Existing literature presents a significant increase in new eye cancer cases in 2023, experiencing a higher incidence rate. Extensive research was conducted using online databases such as PubMed, ACM Digital Library, ScienceDirect, and Springer. To conduct this review, Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines are used. Of the collected 62 studies, only 20 documents met the inclusion criteria. The review study identifies seven ocular cancer types. Important challenges associated with ocular cancer are highlighted, including limited awareness about eye cancer, restricted healthcare access, financial barriers, and insufficient infrastructure support. Financial barriers is one of the widely examined ocular cancer challenges in the literature. The potential role and limitations of ChatGPT are discussed, emphasizing its usefulness in providing general information to physicians, noting its inability to deliver up-to-date information. The paper concludes by presenting the potential future applications of ChatGPT to advance research on ocular cancer globally.
Collapse
Affiliation(s)
- Saud S. Alotaibi
- Information Systems Department, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Amna Rehman
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| | - Muhammad Hasnain
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| |
Collapse
|
41
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
42
|
Au K, Yang W. Auxiliary use of ChatGPT in surgical diagnosis and treatment. Int J Surg 2023; 109:3940-3943. [PMID: 37678271 PMCID: PMC10720849 DOI: 10.1097/js9.0000000000000686] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023]
Abstract
ChatGPT can be used as an auxiliary tool in surgical diagnosis and treatment in several ways. One of the most incredible values of using ChatGPT is its ability to quickly process and handle large amounts of data and provide relatively accurate information to healthcare workers. Due to its high accuracy and ability to process big data, ChatGPT has been widely used in the healthcare industry for tasks such as assisting medical diagnosis, giving predictions of some diseases, and analyzing some medical cases. Surgical diagnosis and treatment can serve as an auxiliary tool to help healthcare professionals. Process large amounts of medical data, provide real-time guidance and feedback, and increase healthcare's overall speed and quality. Although it has great acceptance, it still faces issues such as ethics, patient privacy, data security, law, trustworthiness, and accuracy. This study aimed to explore the auxiliary use of ChatGPT in surgical diagnosis and treatment.
Collapse
Affiliation(s)
- Kahei Au
- School of Medicine, Jinan University
| | - Wah Yang
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, People’s Republic of China
| |
Collapse
|
43
|
Abu-Farha R, Fino L, Al-Ashwal FY, Zawiah M, Gharaibeh L, Harahsheh MM, Darwish Elhajji F. Evaluation of community pharmacists' perceptions and willingness to integrate ChatGPT into their pharmacy practice: A study from Jordan. J Am Pharm Assoc (2003) 2023; 63:1761-1767.e2. [PMID: 37648157 DOI: 10.1016/j.japh.2023.08.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/10/2023] [Accepted: 08/22/2023] [Indexed: 09/01/2023]
Abstract
OBJECTIVES This study aimed to examine the extent of community pharmacists' awareness of Chat Generative Pretraining Transformer (ChatGPT), their willingness to embark on this new development of artificial intelligence (AI) development, and barriers that face the incorporation of this nonconventional source of information into pharmacy practice. METHODS A cross-sectional study was conducted among community pharmacists in Jordanian cities between April 26, 2023, and May 10, 2023. Convenience and snowball sampling techniques were used to select study participants owing to resource and time constraints. The questionnaire was distributed by research assistants through popular social media platforms. Logistic regression analysis was used to assess predictors affecting their willingness to use this service in the future. RESULTS A total of 221 community pharmacists participated in the current study (response rate was not calculated because opt-in recruitment strategies were used). Remarkably, nearly half of the pharmacists (n = 107, 48.4%) indicated a willingness to incorporate the ChatGPT into their pharmacy practice. Nearly half of the pharmacists (n = 105, 47.5%) demonstrated a high perceived benefit score for ChatGPT, whereas approximately 37% of pharmacists (n = 81) expressed a high concern score about ChatGPT. More than 70% of pharmacists believed that ChatGPT lacked the ability to use human judgment and make complicated ethical judgments in its responses (n = 168). Finally, logistics regression analysis showed that pharmacists who had previous experience in using ChatGPT were more willing to integrate ChatGPT in their pharmacy practice than those with no previous experience in using ChatGPT (odds ratio 2.312, P = 0.035). CONCLUSION Although pharmacists show a willingness to incorporate ChatGPT into their practice, especially those with previous experience, there are major concerns. These mainly revolve around the tool's ability to make human-like judgments and ethical decisions. These findings are crucial for the future development and integration of AI tools in pharmacy practice.
Collapse
|
44
|
Karakas C, Brock D, Lakhotia A. Leveraging ChatGPT in the Pediatric Neurology Clinic: Practical Considerations for Use to Improve Efficiency and Outcomes. Pediatr Neurol 2023; 148:157-163. [PMID: 37725885 DOI: 10.1016/j.pediatrneurol.2023.08.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 08/17/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is progressively influencing healthcare sectors, including pediatric neurology. This paper aims to investigate the potential and limitations of using ChatGPT, a large language model (LLM) developed by OpenAI, in an outpatient pediatric neurology clinic. The analysis focuses on the tool's capabilities in enhancing clinical efficiency, productivity, and patient education. METHOD This is an opinion-based exploration supplemented with practical examples. We assessed ChatGPT's utility in administrative and educational tasks such as drafting medical necessity letters and creating patient educational materials. RESULTS ChatGPT showed efficacy in streamlining administrative work, particularly in drafting administrative letters and formulating personalized patient education materials. However, the model has limitations in performing higher-order tasks like formulating nuanced differential diagnoses. Additionally, ethical and legal concerns, including data privacy and the potential dissemination of misinformation, warrant cautious implementation. CONCLUSIONS The integration of AI tools like ChatGPT in pediatric neurology clinics has demonstrated promising results in boosting efficiency and patient education, despite present limitations and ethical concerns. As technology advances, we anticipate future applications may extend to more complex clinical tasks like precise differential diagnoses and treatment strategy guidance. Careful, patient-centered implementation is essential for leveraging the potential benefits of AI in pediatric neurology effectively.
Collapse
Affiliation(s)
- Cemal Karakas
- Division of Pediatric Neurology, Department of Neurology, University of Louisville, Louisville, Kentucky; Norton Neuroscience Institute, Louisville, Kentucky.
| | - Dylan Brock
- Division of Pediatric Neurology, Department of Neurology, University of Louisville, Louisville, Kentucky; Norton Neuroscience Institute, Louisville, Kentucky
| | - Arpita Lakhotia
- Division of Pediatric Neurology, Department of Neurology, University of Louisville, Louisville, Kentucky; Norton Neuroscience Institute, Louisville, Kentucky
| |
Collapse
|
45
|
Mese I, Taslicay CA, Sivrioglu AK. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging 2023; 103:109993. [PMID: 37812965 DOI: 10.1016/j.clinimag.2023.109993] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/19/2023] [Accepted: 09/28/2023] [Indexed: 10/11/2023]
Abstract
Artificial Intelligence is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. One of the branches of artificial intelligence is natural language processing, which is dedicated to studying the interaction between computers and human language. ChatGPT is a sophisticated natural language processing tool that can understand and respond to complex questions and commands in natural language. Radiology is a vital aspect of modern medicine that involves the use of imaging technologies to diagnose and treat medical conditions artificial intelligence, including ChatGPT, can be integrated into radiology workflows to improve efficiency, accuracy, and patient care. ChatGPT can streamline various radiology workflow steps, including patient registration, scheduling, patient check-in, image acquisition, interpretation, and reporting. While ChatGPT has the potential to transform radiology workflows, there are limitations to the technology that must be addressed, such as the potential for bias in artificial intelligence algorithms and ethical concerns. As technology continues to advance, ChatGPT is likely to become an increasingly important tool in the field of radiology, and in healthcare more broadly.
Collapse
Affiliation(s)
- Ismail Mese
- Department of Radiology, Health Sciences University, Erenkoy Mental Health and Neurology Training and Research Hospital, 19 Mayıs, Sinan Ercan Cd. No: 23, Kadıköy/Istanbul 34736, Turkey.
| | | | - Ali Kemal Sivrioglu
- Department of Radiology, Liv Hospital Vadistanbul, Ayazağa Mahallesi, Kemerburgaz Caddesi, Vadistanbul Park Etabı, 7F Blok, 34396 Sarıyer/İstanbul, Turkey
| |
Collapse
|
46
|
Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 2023; 6:1237704. [PMID: 38028668 PMCID: PMC10644239 DOI: 10.3389/frai.2023.1237704] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open
Abstract
The release of ChatGPT has initiated new thinking about AI-based Chatbot and its application and has drawn huge public attention worldwide. Researchers and doctors have started thinking about the promise and application of AI-related large language models in medicine during the past few months. Here, the comprehensive review highlighted the overview of Chatbot and ChatGPT and their current role in medicine. Firstly, the general idea of Chatbots, their evolution, architecture, and medical use are discussed. Secondly, ChatGPT is discussed with special emphasis of its application in medicine, architecture and training methods, medical diagnosis and treatment, research ethical issues, and a comparison of ChatGPT with other NLP models are illustrated. The article also discussed the limitations and prospects of ChatGPT. In the future, these large language models and ChatGPT will have immense promise in healthcare. However, more research is needed in this direction.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | - Snehasish Dash
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging and Orthopedic Surgery, Hallym University Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, Republic of Korea
| |
Collapse
|
47
|
Irfan B, Yaqoob A. ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome. Cureus 2023; 15:e47754. [PMID: 38022092 PMCID: PMC10676288 DOI: 10.7759/cureus.47754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2023] [Indexed: 12/01/2023] Open
Abstract
INTRODUCTION The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions. MATERIALS AND METHODS A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool. RESULTS ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent. DISCUSSION LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial. CONCLUSION LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.
Collapse
Affiliation(s)
- Bilal Irfan
- Microbiology and Immunology, University of Michigan, Ann Arbor, USA
| | | |
Collapse
|
48
|
Turner JH. Cancer Care by Committee to be Superseded by Personal Physician-Patient Partnership Informed by Artificial Intelligence. Cancer Biother Radiopharm 2023; 38:497-505. [PMID: 37366774 DOI: 10.1089/cbr.2023.0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2023] Open
Abstract
Multidisciplinary tumor boards (MTBs) have become the reference standard of cancer management, founded upon randomized controlled trial (RCT) evidence-based guidelines. The inordinate delays inherent in awaiting formal regulatory agency approvals of novel therapeutic agents, and the rigidities and nongeneralizability of this regimented approach, often deny cancer patients timely access to effective innovative treatment. Reluctance of MTBs to accept theranostic care of patients with advanced neuroendocrine tumors (NETs) and metastatic castrate-resistant prostate cancer resulted in decades of delay in the incorporation of 177Lu-octreotate and 177Lu-prostate-specific membrane antigen (PSMA) into routine clinical oncology practice. Recent developments in immunotherapy and molecular targeted precision therapy, based on N-of-One individual multifactorial genome analyses, have greatly increased the complexity of decision-making. Burgeoning specialist workload and tight time frames now threaten to overwhelm the logistically, and emotionally, demanding MTB system. It is hypothesized that the advent of advanced artificial intelligence technology and Chatbot natural language algorithms will shift the cancer care paradigm from a MTB management model toward a personal physician-patient shared-care partnership for real-world practice of precision individualized holistic oncology.
Collapse
Affiliation(s)
- J Harvey Turner
- Department of Nuclear Medicine, Fiona Stanley Fremantle Hospitals Group, The University of Western Australia, Murdoch, Australia
| |
Collapse
|
49
|
Chou YH, Lin C, Lee SH, Chang Chien YW, Cheng LC. Potential Mobile Health Applications for Improving the Mental Health of the Elderly: A Systematic Review. Clin Interv Aging 2023; 18:1523-1534. [PMID: 37727447 PMCID: PMC10506600 DOI: 10.2147/cia.s410396] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
The rapid aging of the global population presents challenges in providing mental health care resources for older adults aged 65 and above. The COVID-19 pandemic has further exacerbated the global population's psychological distress due to social isolation and distancing. Thus, there is an urgent need to update scholarly knowledge on the effectiveness of mHealth applications to improve older people's mental health. This systematic review summarizes recent literature on chatbots aimed at enhancing mental health and well-being. Sixteen papers describing six apps or prototypes were reviewed, indicating the practicality, feasibility, and acceptance of chatbots for promoting mental health in older adults. Engaging with chatbots led to improvements in well-being and stress reduction, as well as a decrement in depressive symptoms. Mobile health applications addressing these studies are categorized for reference.
Collapse
Affiliation(s)
- Ya-Hsin Chou
- Department of Psychiatry, Taoyuan Chang Gung Memorial Hospital, Taoyuan County, Taiwan
| | - Chemin Lin
- College of Medicine, Chang Gung University, Taoyuan County, Taiwan
- Department of Psychiatry, Keelung Chang Gung Memorial Hospital, Keelung City, Taiwan
- Community Medicine Research Center, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Shwu-Hua Lee
- College of Medicine, Chang Gung University, Taoyuan County, Taiwan
- Department of Psychiatry, Linkou Chang Gung Memorial Hospital, Taoyuan County, Taiwan
| | - Ya-Wen Chang Chien
- Department of Photography and Virtual Reality Design, Huafan University, New Taipei, Taiwan
| | - Li-Chen Cheng
- Department of Information and Finance Management, National Taipei University of Technology, Taipei, Taiwan
| |
Collapse
|
50
|
Stanbrook MB, Weinhold M, Kelsall D. Nouvelle politique sur l’utilisation des outils d’intelligence artificielle dans les manuscrits soumis au JAMC. CMAJ 2023; 195:E1168-E1169. [PMID: 37669792 PMCID: PMC10479997 DOI: 10.1503/cmaj.230949-f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023] Open
|