1
|
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen JP, McKechnie T, Lee Y, Mayol J, Antoniou SA, Thirunavukarasu AJ, Sanger S, Ramji K, Guyatt G. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Netw Open 2025; 8:e2457879. [PMID: 39903463 PMCID: PMC11795331 DOI: 10.1001/jamanetworkopen.2024.57879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/26/2024] [Indexed: 02/06/2025] Open
Abstract
Importance There is much interest in the clinical integration of large language models (LLMs) in health care. Many studies have assessed the ability of LLMs to provide health advice, but the quality of their reporting is uncertain. Objective To perform a systematic review to examine the reporting variability among peer-reviewed studies evaluating the performance of generative artificial intelligence (AI)-driven chatbots for summarizing evidence and providing health advice to inform the development of the Chatbot Assessment Reporting Tool (CHART). Evidence Review A search of MEDLINE via Ovid, Embase via Elsevier, and Web of Science from inception to October 27, 2023, was conducted with the help of a health sciences librarian to yield 7752 articles. Two reviewers screened articles by title and abstract followed by full-text review to identify primary studies evaluating the clinical accuracy of generative AI-driven chatbots in providing health advice (chatbot health advice studies). Two reviewers then performed data extraction for 137 eligible studies. Findings A total of 137 studies were included. Studies examined topics in surgery (55 [40.1%]), medicine (51 [37.2%]), and primary care (13 [9.5%]). Many studies focused on treatment (91 [66.4%]), diagnosis (60 [43.8%]), or disease prevention (29 [21.2%]). Most studies (136 [99.3%]) evaluated inaccessible, closed-source LLMs and did not provide enough information to identify the version of the LLM under evaluation. All studies lacked a sufficient description of LLM characteristics, including temperature, token length, fine-tuning availability, layers, and other details. Most studies (136 [99.3%]) did not describe a prompt engineering phase in their study. The date of LLM querying was reported in 54 (39.4%) studies. Most studies (89 [65.0%]) used subjective means to define the successful performance of the chatbot, while less than one-third addressed the ethical, regulatory, and patient safety implications of the clinical integration of LLMs. Conclusions and Relevance In this systematic review of 137 chatbot health advice studies, the reporting quality was heterogeneous and may inform the development of the CHART reporting standards. Ethical, regulatory, and patient safety considerations are crucial as interest grows in the clinical integration of LLMs.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Amy Boyle
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Nana Marfo
- H. Ross University School of Medicine, Miramar, Florida
| | - Wimonchat Tangamornsuksan
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Jeremy P. Steen
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Tyler McKechnie
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Yung Lee
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Julio Mayol
- Hospital Clinico San Carlos, IdISSC, Universidad Complutense de Madrid, Madrid, Spain
| | | | | | - Stephanie Sanger
- Health Science Library, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Karim Ramji
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Gordon Guyatt
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
2
|
Lu L, Zhu Y, Yang J, Yang Y, Ye J, Ai S, Zhou Q. Healthcare professionals and the public sentiment analysis of ChatGPT in clinical practice. Sci Rep 2025; 15:1223. [PMID: 39774168 PMCID: PMC11707298 DOI: 10.1038/s41598-024-84512-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 12/24/2024] [Indexed: 01/11/2025] Open
Abstract
To explore the attitudes of healthcare professionals and the public on applying ChatGPT in clinical practice. The successful application of ChatGPT in clinical practice depends on technical performance and critically on the attitudes and perceptions of non-healthcare and healthcare. This study has a qualitative design based on artificial intelligence. This study was divided into five steps: data collection, data cleaning, validation of relevance, sentiment analysis, and content analysis using the K-means algorithm. This study comprised 3130 comments amounting to 1,593,650 words. The dictionary method showed positive and negative emotions such as anger, disgust, fear, sadness, surprise, good, and happy emotions. Healthcare professionals prioritized ChatGPT's efficiency but raised ethical and accountability concerns, while the public valued its accessibility and emotional support but expressed worries about privacy and misinformation. Bridging these perspectives by improving reliability, safeguarding privacy, and clearly defining ChatGPT's role is essential for its practical and ethical integration into clinical practice.
Collapse
Affiliation(s)
- Lizhen Lu
- Integrated Traditional and Western Medicine Hospital of Linping District, Hangzhou, 311100, China
| | - Yueli Zhu
- Integrated Traditional and Western Medicine Hospital of Linping District, Hangzhou, 311100, China
| | - Jiekai Yang
- Department of Nursing, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, 322000, China
| | - Yuting Yang
- Department of Nursing, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, 322000, China
| | - Junwei Ye
- Integrated Traditional and Western Medicine Hospital of Linping District, Hangzhou, 311100, China
| | - Shanshan Ai
- Integrated Traditional and Western Medicine Hospital of Linping District, Hangzhou, 311100, China
| | - Qi Zhou
- Integrated Traditional and Western Medicine Hospital of Linping District, Hangzhou, 311100, China.
| |
Collapse
|
3
|
Ma Y, Zeng Y, Liu T, Sun R, Xiao M, Wang J. Integrating large language models in mental health practice: a qualitative descriptive study based on expert interviews. Front Public Health 2024; 12:1475867. [PMID: 39559378 PMCID: PMC11571062 DOI: 10.3389/fpubh.2024.1475867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Accepted: 10/15/2024] [Indexed: 11/20/2024] Open
Abstract
Background Progress in developing artificial intelligence (AI) products represented by large language models (LLMs) such as OpenAI's ChatGPT has sparked enthusiasm for their potential use in mental health practice. However, the perspectives on the integration of LLMs within mental health practice remain an underreported topic. Therefore, this study aimed to explore how mental health and AI experts conceptualize LLMs and perceive the use of integrating LLMs into mental health practice. Method In February-April 2024, online semi-structured interviews were conducted with 21 experts (12 psychiatrists, 7 mental health nurses, 2 researchers in medical artificial intelligence) from four provinces in China, using snowballing and purposive selection sampling. Respondents' discussions about their perspectives and expectations of integrating LLMs in mental health were analyzed with conventional content analysis. Results Four themes and eleven sub-themes emerged from this study. Firstly, participants discussed the (1) practice and application reform brought by LLMs into mental health (fair access to mental health services, enhancement of patient participation, improvement in work efficiency and quality), and then analyzed the (2) technological-mental health gap (misleading information, lack of professional nuance and depth, user risk). Based on these points, they provided a range of (3) prerequisites for the integration of LLMs in mental health (training and competence, guidelines for use and management, patient engagement and transparency) and expressed their (4) expectations for future developments (reasonable allocation of workload, upgrades and revamps of LLMs). Conclusion These findings provide valuable insights into integrating LLMs within mental health practice, offering critical guidance for institutions to effectively implement, manage, and optimize these tools, thereby enhancing the quality and accessibility of mental health services.
Collapse
Affiliation(s)
| | | | | | | | - Mingzhao Xiao
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Jun Wang
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
4
|
Guo Z, Lai A, Thygesen JH, Farrington J, Keen T, Li K. Large Language Models for Mental Health Applications: Systematic Review. JMIR Ment Health 2024; 11:e57400. [PMID: 39423368 PMCID: PMC11530718 DOI: 10.2196/57400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/17/2024] [Accepted: 09/03/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate. OBJECTIVE This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on their applicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessing the evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlighting the potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use. METHODS Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, this review searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM Digital Library. Keywords used were (mental health OR mental illness OR mental disorder OR psychiatry) AND (large language models). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published in languages other than English. RESULTS In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideation detection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on other applications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues and providing accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated with clinical use might surpass their benefits. These risks include inconsistencies in generated text; the production of hallucinations; and the absence of a comprehensive, benchmarked ethical framework. CONCLUSIONS This systematic review examines the clinical applications of LLMs in mental health, highlighting their potential and inherent risks. The study identifies several issues: the lack of multilingual datasets annotated by experts, concerns regarding the accuracy and reliability of generated content, challenges in interpretability due to the "black box" nature of LLMs, and ongoing ethical dilemmas. These ethical concerns include the absence of a clear, benchmarked ethical framework; data privacy issues; and the potential for overreliance on LLMs by both physicians and patients, which could compromise traditional medical practices. As a result, LLMs should not be considered substitutes for professional mental health services. However, the rapid development of LLMs underscores their potential as valuable clinical aids, emphasizing the need for continued research and development in this area. TRIAL REGISTRATION PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617.
Collapse
Affiliation(s)
- Zhijun Guo
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Alvina Lai
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Johan H Thygesen
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Joseph Farrington
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Thomas Keen
- Institute of Health Informatics University College, London, London, United Kingdom
- Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Kezhi Li
- Institute of Health Informatics University College, London, London, United Kingdom
| |
Collapse
|
5
|
Xian X, Chang A, Xiang YT, Liu MT. Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review. Interact J Med Res 2024; 13:e53672. [PMID: 39133916 PMCID: PMC11347908 DOI: 10.2196/53672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 08/30/2024] Open
Abstract
BACKGROUND Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain. OBJECTIVE This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature. METHODS Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques). RESULTS In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care. CONCLUSIONS This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices.
Collapse
Affiliation(s)
- Xuechang Xian
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Department of Publicity, Zhaoqing University, Zhaoqing City, China
| | - Angela Chang
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Institute of Communication and Health, Lugano University, Lugano, Switzerland
| | - Yu-Tao Xiang
- Department of Public Health and Medicinal Administration, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | | |
Collapse
|
6
|
Ferrario A, Sedlakova J, Trachsel M. The Role of Humanization and Robustness of Large Language Models in Conversational Artificial Intelligence for Individuals With Depression: A Critical Analysis. JMIR Ment Health 2024; 11:e56569. [PMID: 38958218 PMCID: PMC11231450 DOI: 10.2196/56569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/27/2024] [Accepted: 04/27/2024] [Indexed: 07/04/2024] Open
Abstract
Unlabelled Large language model (LLM)-powered services are gaining popularity in various applications due to their exceptional performance in many tasks, such as sentiment analysis and answering questions. Recently, research has been exploring their potential use in digital health contexts, particularly in the mental health domain. However, implementing LLM-enhanced conversational artificial intelligence (CAI) presents significant ethical, technical, and clinical challenges. In this viewpoint paper, we discuss 2 challenges that affect the use of LLM-enhanced CAI for individuals with mental health issues, focusing on the use case of patients with depression: the tendency to humanize LLM-enhanced CAI and their lack of contextualized robustness. Our approach is interdisciplinary, relying on considerations from philosophy, psychology, and computer science. We argue that the humanization of LLM-enhanced CAI hinges on the reflection of what it means to simulate "human-like" features with LLMs and what role these systems should play in interactions with humans. Further, ensuring the contextualization of the robustness of LLMs requires considering the specificities of language production in individuals with depression, as well as its evolution over time. Finally, we provide a series of recommendations to foster the responsible design and deployment of LLM-enhanced CAI for the therapeutic support of individuals with depression.
Collapse
Affiliation(s)
- Andrea Ferrario
- Institute Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
- Mobiliar Lab for Analytics at ETH, ETH Zurich, Zurich, Switzerland
| | - Jana Sedlakova
- Institute Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
| | - Manuel Trachsel
- University of Basel, Basel, Switzerland
- University Hospital Basel, Basel, Switzerland
- University Psychiatric Clinics Basel, Basel, Switzerland
| |
Collapse
|
7
|
Liu J. ChatGPT: perspectives from human-computer interaction and psychology. Front Artif Intell 2024; 7:1418869. [PMID: 38957452 PMCID: PMC11217544 DOI: 10.3389/frai.2024.1418869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 06/04/2024] [Indexed: 07/04/2024] Open
Abstract
The release of GPT-4 has garnered widespread attention across various fields, signaling the impending widespread adoption and application of Large Language Models (LLMs). However, previous research has predominantly focused on the technical principles of ChatGPT and its social impact, overlooking its effects on human-computer interaction and user psychology. This paper explores the multifaceted impacts of ChatGPT on human-computer interaction, psychology, and society through a literature review. The author investigates ChatGPT's technical foundation, including its Transformer architecture and RLHF (Reinforcement Learning from Human Feedback) process, enabling it to generate human-like responses. In terms of human-computer interaction, the author studies the significant improvements GPT models bring to conversational interfaces. The analysis extends to psychological impacts, weighing the potential of ChatGPT to mimic human empathy and support learning against the risks of reduced interpersonal connections. In the commercial and social domains, the paper discusses the applications of ChatGPT in customer service and social services, highlighting the improvements in efficiency and challenges such as privacy issues. Finally, the author offers predictions and recommendations for ChatGPT's future development directions and its impact on social relationships.
Collapse
Affiliation(s)
- Jiaxi Liu
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
8
|
Sohail SS. A Promising Start and Not a Panacea: ChatGPT's Early Impact and Potential in Medical Science and Biomedical Engineering Research. Ann Biomed Eng 2024; 52:1131-1135. [PMID: 37540292 DOI: 10.1007/s10439-023-03335-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 07/26/2023] [Indexed: 08/05/2023]
Abstract
The advent of artificial intelligence (AI) has catalyzed a revolutionary transformation across various industries, including healthcare. Medical applications of ChatGPT, a powerful language model based on the generative pre-trained transformer (GPT) architecture, encompass the creation of conversational agents capable of accessing and generating medical information from multiple sources and formats. This study investigates the research trends of large language models such as ChatGPT, GPT 4, and Google Bard, comparing their publication trends with early COVID-19 research. The findings underscore the current prominence of AI research and its potential implications in biomedical engineering. A search of the Scopus database on July 23, 2023, yielded 1,096 articles related to ChatGPT, with approximately 26% being medical science-related. Keywords related to artificial intelligence, natural language processing (NLP), LLM, and generative AI dominate ChatGPT research, while a focused representation of medical science research emerges, with emphasis on biomedical research and engineering. This analysis serves as a call to action for researchers, healthcare professionals, and policymakers to recognize and harness AI's potential in healthcare, particularly in the realm of biomedical research.
Collapse
Affiliation(s)
- Shahab Saquib Sohail
- Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, 110062, India.
| |
Collapse
|
9
|
Farhat F, Chaudhry BM, Nadeem M, Sohail SS, Madsen DØ. Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard. JMIR MEDICAL EDUCATION 2024; 10:e51523. [PMID: 38381486 PMCID: PMC10918540 DOI: 10.2196/51523] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/22/2023] [Accepted: 10/30/2023] [Indexed: 02/22/2024]
Abstract
BACKGROUND Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. OBJECTIVE This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. METHODS In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. RESULTS It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. CONCLUSIONS The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.
Collapse
Affiliation(s)
- Faiza Farhat
- Department of Zoology, Aligarh Muslim University, Aligarh, India
| | - Beenish Moalla Chaudhry
- School of Computing and Informatics, The University of Louisiana, Lafayette, LA, United States
| | - Mohammad Nadeem
- Department of Computer Science, Aligarh Muslim University, Aligarh, India
| | - Shahab Saquib Sohail
- School of Computing Science and Engineering, VIT Bhopal University, Sehore, India
| | - Dag Øivind Madsen
- School of Business, University of South-Eastern Norway, Hønefoss, Norway
| |
Collapse
|
10
|
Alanezi F. Assessing the Effectiveness of ChatGPT in Delivering Mental Health Support: A Qualitative Study. J Multidiscip Healthc 2024; 17:461-471. [PMID: 38314011 PMCID: PMC10838501 DOI: 10.2147/jmdh.s447368] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/08/2024] [Indexed: 02/06/2024] Open
Abstract
Background Artificial Intelligence (AI) applications are widely researched for their potential in effectively improving the healthcare operations and disease management. However, the research trend shows that these applications also have significant negative implications on the service delivery. Purpose To assess the use of ChatGPT for mental health support. Methods Due to the novelty and unfamiliarity of the ChatGPT technology, a quasi-experimental design was chosen for this study. Outpatients from a public hospital were included in the sample. A two-week experiment followed by semi-structured interviews was conducted in which participants used ChatGPT for mental health support. Semi-structured interviews were conducted with 24 individuals with mental health conditions. Results Eight positive factors (psychoeducation, emotional support, goal setting and motivation, referral and resource information, self-assessment and monitoring, cognitive behavioral therapy, crisis interventions, and psychotherapeutic exercises) and four negative factors (ethical and legal considerations, accuracy and reliability, limited assessment capabilities, and cultural and linguistic considerations) were associated with the use of ChatGPT for mental health support. Conclusion It is important to carefully consider the ethical, reliability, accuracy, and legal challenges and develop appropriate strategies to mitigate them in order to ensure safe and effective use of AI-based applications like ChatGPT in mental health support.
Collapse
Affiliation(s)
- Fahad Alanezi
- College of Business Administration, Department Management Information Systems, Imam Abdulrahman Bin Faisal University, Dammam, 31441, Saudi Arabia
| |
Collapse
|
11
|
Alam S, Sohail SS. Integrating ChatGPT: Enhancing postpartum mental healthcare with artificial intelligence (AI) support. Digit Health 2024; 10:20552076241295565. [PMID: 39655059 PMCID: PMC11626650 DOI: 10.1177/20552076241295565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/08/2024] [Indexed: 12/12/2024] Open
Abstract
We are writing to extend the discourse on the innovative work published in digital health on the feasibility of videoconferencing-based therapy groups for postpartum depression and anxiety. The pragmatic evaluation demonstrated promising outcomes in terms of acceptability, appropriateness, and group process, suggesting that this modality is a viable alternative to traditional in-person therapy, especially in addressing the challenges faced by new mothers. Building on this study, we propose considering the integration of artificial intelligence (AI)-driven tools such as ChatGPT2 into such group therapy settings. ChatGPT, a large language model developed by OpenAI, has demonstrated considerable potential in generating therapeutic dialogs, offering empathetic responses, and assisting in therapeutic guidance. It can be employed as a supplementary tool to enhance the therapeutic process by providing personalized, real-time responses during or between sessions.
Collapse
Affiliation(s)
- Sultan Alam
- School of Computing Science and Engineering, VIT Bhopal University, Sehore, India
| | - Shahab Saquib Sohail
- School of Computing Science and Engineering, VIT Bhopal University, Sehore, India
| |
Collapse
|
12
|
Guest PC, Vasilevska V, Al-Hamadi A, Eder J, Falkai P, Steiner J. Digital technology and mental health during the COVID-19 pandemic: a narrative review with a focus on depression, anxiety, stress, and trauma. Front Psychiatry 2023; 14:1227426. [PMID: 38188049 PMCID: PMC10766703 DOI: 10.3389/fpsyt.2023.1227426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024] Open
Abstract
The sudden appearance and devastating effects of the COVID-19 pandemic resulted in the need for multiple adaptive changes in societies, business operations and healthcare systems across the world. This review describes the development and increased use of digital technologies such as chat bots, electronic diaries, online questionnaires and even video gameplay to maintain effective treatment standards for individuals with mental health conditions such as depression, anxiety and post-traumatic stress syndrome. We describe how these approaches have been applied to help meet the challenges of the pandemic in delivering mental healthcare solutions. The main focus of this narrative review is on describing how these digital platforms have been used in diagnostics, patient monitoring and as a treatment option for the general public, as well as for frontline medical staff suffering with mental health issues.
Collapse
Affiliation(s)
- Paul C. Guest
- Department of Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- Laboratory of Translational Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- Laboratory of Neuroproteomics, Department of Biochemistry and Tissue Biology, Institute of Biology University of Campinas (UNICAMP), Campinas, Brazil
| | - Veronika Vasilevska
- Department of Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- Laboratory of Translational Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Ayoub Al-Hamadi
- Department of Neuro-Information Technology, Institute for Information Technology and Communications Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Julia Eder
- Department of Psychiatry and Psychotherapy, University Hospital Ludwig-Maximilians-University Munich, Munich, Germany
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital Ludwig-Maximilians-University Munich, Munich, Germany
| | - Johann Steiner
- Department of Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- Laboratory of Translational Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- Center for Health and Medical Prevention (CHaMP), Magdeburg, Germany
- German Center for Mental Health (DZPG), Center for Intervention and Research on Adaptive and Maladaptive Brain Circuits Underlying Mental Health (C-I-R-C), Halle-Jena-Magdeburg, Magdeburg, Germany
- Center for Behavioral Brain Sciences (CBBS), Magdeburg, Germany
| |
Collapse
|
13
|
Azeez MA, Siddiqui ZH, Sohail SS. Correspondence to ChatGPT: A Double-Edged Sword? Health Serv Insights 2023; 16:11786329231212857. [PMID: 38028124 PMCID: PMC10647918 DOI: 10.1177/11786329231212857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2023] Open
Affiliation(s)
- Mohammad Anas Azeez
- Department of Computer Science and Engineering, SEST, Jamia Hamdard, New Delhi, India
| | - Zohaib Hasan Siddiqui
- Department of Computer Science and Engineering, SEST, Jamia Hamdard, New Delhi, India
| | - Shahab Saquib Sohail
- Department of Computer Science and Engineering, SEST, Jamia Hamdard, New Delhi, India
| |
Collapse
|
14
|
Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023; 10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Purpose of review ChatGPT has emerged as a potential tool for facilitating doctors' workflows. However, when it comes to applying these findings within a urological context, there have not been many studies. Thus, our objective was rooted in analyzing the pros and cons of ChatGPT use and how it can be exploited and used by urologists. Recent findings ChatGPT can facilitate clinical documentation and note-taking, patient communication and support, medical education, and research. In urology, it was proven that ChatGPT has the potential as a virtual healthcare aide for benign prostatic hyperplasia, an educational and prevention tool on prostate cancer, educational support for urological residents, and as an assistant in writing urological papers and academic work. However, several concerns about its exploitation are presented, such as lack of web crawling, risk of accidental plagiarism, and concerns about patients-data privacy. Summary The existing limitations mediate the need for further improvement of ChatGPT, such as ensuring the privacy of patient data and expanding the learning dataset to include medical databases, and developing guidance on its appropriate use. Urologists can also help by conducting studies to determine the effectiveness of ChatGPT in urology in clinical scenarios and nosologies other than those previously listed.
Collapse
Affiliation(s)
- Ali Talyshinskii
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Nithesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | | | | | - Gafur Khairli
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Bakhman Guliev
- Department of Urology, Mariinsky Hospital, St Petersburg, Russia
| | | | - Lazaros Tzelves
- Department of Urology, National and Kapodistrian University of Athens, Sismanogleion Hospital, Athens, Marousi, Greece
| | - Bhaskar Kumar Somani
- Department of Urology, University Hospital Southampton NHS Trust, Southampton, United Kingdom
| |
Collapse
|