1
|
Guo S, Li G, Du W, Situ F, Li Z, Lei J. The performance of ChatGPT and ERNIE Bot in surgical resident examinations. Int J Med Inform 2025; 200:105906. [PMID: 40220627 DOI: 10.1016/j.ijmedinf.2025.105906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 03/10/2025] [Accepted: 03/30/2025] [Indexed: 04/14/2025]
Abstract
STUDY PURPOSE To assess the application of these two large language models (LLMs) for surgical resident examinations and to compare the performance of these LLMs with that of human residents. STUDY DESIGN In this study, 596 questions with a total of 183,556 responses were first included from the Medical Vision World, an authoritative medical education platform across China. Both Chinese prompted and non-prompted questions were input into ChatGPT-4.0 and ERNIE Bot-4.0 to compare their performance in a Chinese question database. Additionally, we screened another 210 surgical questions with detailed response results from 43 residents to compare the performance of residents and these two LLMs. RESULTS There were no significant differences in the correctness of the responses to the 596 questions with or without prompts between the two LLMs (ChatGPT-4.0: 68.96 % [without prompt], 71.14 % [with prompts], p = 0.411; ERNIE Bot-4.0: 78.36 % [without prompt], 78.86 % [with prompts], p = 0.832), but ERNIE Bot-4.0 displayed higher correctness than ChatGPT-4.0 did (with prompts: p = 0.002; without prompts: p < 0.001). For another 210 questions with prompts, the two LLMs, especially ERNIE Bot-4.0 (ranking in the top 95 % of the 43 residents' scores), significantly outperformed the residents. CONCLUSIONS The performance of ERNIE Bot-4.0 was superior to that of ChatGPT-4.0 and that of residents on surgical resident examinations in a Chinese question database.
Collapse
Affiliation(s)
- Siyin Guo
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
| | - Genpeng Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
| | - Wei Du
- Beijing Medical Vision Times Technology Development Company Limited, Beijing, China.
| | - Fangzhi Situ
- Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.
| | - Zhihui Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
| | - Jianyong Lei
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China; The Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
2
|
Biassoni F, Gnerre M. Exploring ChatGPT's communication behaviour in healthcare interactions: A psycholinguistic perspective. PATIENT EDUCATION AND COUNSELING 2025; 134:108663. [PMID: 39854890 DOI: 10.1016/j.pec.2025.108663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Accepted: 01/12/2025] [Indexed: 01/27/2025]
Abstract
OBJECTIVES Conversational artificial agents such as ChatGPT are commonly used by people seeking healthcare information. This study investigates whether ChatGPT exhibits distinct communicative behaviors in healthcare settings based on the nature of the disorder (medical or psychological) and the user communication style (neutral vs. expressing concern). METHOD Queries were conducted with ChatGPT to gather information on the diagnosis and treatment of two conditions (arthritis and anxiety) using different styles (neutral vs. expressing concern). ChatGPT's responses were analyzed using Linguistic Inquiry and Word Count (LIWC) to identify linguistic markers of the agent's adjustment to different inquiries and interaction modes. Statistical analyses, including repeated measures ANOVA and k-means cluster analysis, identified patterns in ChatGPT's responses. RESULTS ChatGPT used more engaging language in treatment contexts and psychological inquiries. It exhibited more analytical thinking in neutral contexts while demonstrating higher levels of empathy in psychological conditions and when the user expressed concern. Wellness-related language was more prevalent in psychological and treatment contexts, whereas illness-related language was more common in diagnostic interactions for physical conditions. Cluster analysis revealed two distinct patterns: high empathy and engagement in psychological/expressing-concern scenarios, and lower empathy and engagement in neutral/physical disease contexts. CONCLUSIONS These findings suggest that ChatGPT's responses vary according to disorder type and interaction context, potentially improving its effectiveness in patient engagement. PRACTICE IMPLICATIONS Through context and user-concern language adaptation, ChatGPT can enhance patient engagement.
Collapse
Affiliation(s)
- Federica Biassoni
- Department of Psychology, Catholic University of the Sacred Heart, Largo Gemelli 1, Milan, Italy; Research Center in Communication Psychology, Catholic University of the Sacred Heart, Milan 20123, Italy.
| | - Martina Gnerre
- Department of Psychology, Catholic University of the Sacred Heart, Largo Gemelli 1, Milan, Italy
| |
Collapse
|
3
|
Mansoor M, Hamide A, Tran T. Conversational AI in Pediatric Mental Health: A Narrative Review. CHILDREN (BASEL, SWITZERLAND) 2025; 12:359. [PMID: 40150640 PMCID: PMC11941195 DOI: 10.3390/children12030359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 03/13/2025] [Accepted: 03/13/2025] [Indexed: 03/29/2025]
Abstract
BACKGROUND/OBJECTIVES Mental health disorders among children and adolescents represent a significant global health challenge, with approximately 50% of conditions emerging before age 14. Despite substantial investment in services, persistent barriers such as provider shortages, stigma, and accessibility issues continue to limit effective care delivery. This narrative review examines the emerging application of conversational artificial intelligence (AI) in pediatric mental health contexts, mapping the current evidence base, identifying therapeutic mechanisms, and exploring unique developmental considerations required for implementation. METHODS We searched multiple electronic databases (PubMed/MEDLINE, PsycINFO, ACM Digital Library, IEEE Xplore, and Scopus) for literature published between January 2010 and February 2025 that addressed conversational AI applications relevant to pediatric mental health. We employed a narrative synthesis approach with thematic analysis to organize findings across technological approaches, therapeutic applications, developmental considerations, implementation contexts, and ethical frameworks. RESULTS The review identified promising applications for conversational AI in pediatric mental health, particularly for common conditions like anxiety and depression, psychoeducation, skills practice, and bridging to traditional care. However, most robust empirical research has focused on adult populations, with pediatric applications only beginning to receive dedicated investigation. Key therapeutic mechanisms identified include reduced barriers to self-disclosure, cognitive change, emotional validation, and behavioral activation. Developmental considerations emerged as fundamental challenges, necessitating age-appropriate adaptations across cognitive, emotional, linguistic, and ethical dimensions rather than simple modifications of adult-oriented systems. CONCLUSIONS Conversational AI has potential to address significant unmet needs in pediatric mental health as a complement to, rather than replacement for, human-delivered care. Future research should prioritize developmental validation, longitudinal outcomes, implementation science, safety monitoring, and equity-focused design. Interdisciplinary collaboration involving children and families is essential to ensure these technologies effectively address the unique mental health needs of young people while mitigating potential risks.
Collapse
Affiliation(s)
- Masab Mansoor
- Edward Via College of Osteopathic Medicine—Louisiana Campus, Monroe, LA 71203, USA; (A.H.); (T.T.)
| | | | | |
Collapse
|
4
|
Frisch‐Aviram N, Spanghero Lotta G, Jordão de Carvalho L. “ Chat‐Up”: The role of competition in street‐level bureaucrats' willingness to break technological rules and use generative pre‐trained transformers ( GPTs). PUBLIC ADMINISTRATION REVIEW 2025; 85:468-485. [DOI: 10.1111/puar.13824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 03/26/2024] [Indexed: 01/06/2025]
Abstract
AbstractOrganizations worldwide are concerned about workers using generative pretrained transformers (GPTs), which can generate human‐like text in seconds at work. These organizations are setting rules on how and when to use GPTs. This article focuses on street‐level bureaucrats' (SLBs) intentions to use GPTs even if their public organization does not allow its use (tech rule‐breaking). Based on a mixed‐methods exploratory design, using focus groups (N = 14) and a survey experiment (N = 279), we demonstrate that SLBs intend to break the rules and use GPTs when their competitors from the private sector have access to artificial intelligence (AI) tools. We discuss these findings in the context of hybrid forms of public management and the Promethean moment of GPTs.
Collapse
|
5
|
de Ruiter EJ, Eimermann VM, Rijcken C, Taxis K, Borgsteede SD. The extent and type of use, opportunities and concerns of ChatGPT in community pharmacy: A survey of community pharmacy staff. EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY 2025; 17:100575. [PMID: 40026321 PMCID: PMC11872116 DOI: 10.1016/j.rcsop.2025.100575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 12/15/2024] [Accepted: 02/01/2025] [Indexed: 03/05/2025] Open
Abstract
Background Since the widespread availability of Chat Generative Pre-Trained Transformer (ChatGPT), the public is confronted with accessible artificial intelligence tools. There is limited knowledge on the use, concerns and opportunities of ChatGPT in pharmacy practice in the Netherlands. Objectives The aims of this study were to explore the extent and type of use of ChatGPT in community pharmacy and to identify concerns and opportunities for pharmacy practice. Methods A questionnaire was developed, tested and distributed to professionals that work in community pharmacy. The answers were analysed descriptively using frequency tables. Results Of all participants (n = 106), 50.9 % had used ChatGPT, and 38.7 % (n = 24) of these users has used it in pharmacy. Participants saw opportunities for using ChatGPT as writing assistant or in quickly answering clinical questions. Concerns included not knowing what ChatGPT could be used for in pharmacy and not knowing what ChatGPT's answer is based on. Conclusions This research shows that using ChatGPT as a writing assistant is valuable and can free up time. Although clinical questions seem promising, ChatGPT's answers are currently too unreliable and do not meet the required quality standards for good pharmaceutical care. If ChatGPT is used to answer clinical questions, crossreferencing with reliable sources is recommended.
Collapse
Affiliation(s)
- Emma Janske de Ruiter
- Health Base Foundation, Department of Clinical Decision Support, Houten, Netherlands
| | - Vesna Maria Eimermann
- Health Base Foundation, Department of Clinical Decision Support, Houten, Netherlands
| | | | - Katja Taxis
- Groningen Research Institute of Pharmacy, Unit of Pharmacotherapy, -Epidemiology & -Economics, University of Groningen, Groningen, Netherlands
| | | |
Collapse
|
6
|
Arbanas G, Periša A, Biliškov I, Sušac J, Badurina M, Arbanas D. Patients prefer human psychiatrists over chatbots: a cross-sectional study. Croat Med J 2025; 66:13-19. [PMID: 40047157 PMCID: PMC11947973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 01/19/2025] [Indexed: 03/30/2025] Open
Abstract
AIM To rate the level of patients' satisfaction with responses on questions regarding mental health provided by human psychiatrists, pharmacists, and chatbot platforms. METHODS This cross-sectional study enrolled 89 patients who were pharmacologically treated for their mental disorder in one institution in Croatia and one in Bosnia and Herzegovina during October 2023. They asked psychiatrists, pharmacists, ChatGPT, and one Croatian chatbot questions about their mental disorder and medications and rated the satisfaction with the responses. RESULTS Almost half of the patients had used ChatGPT before the study, and only 12.4% had used the Croatian platform. The patients were most satisfied with the information provided by psychiatrists (4.67 out of 5 about mental disorder and 4.51 about medications), followed by pharmacists (3.94 about medications), ChatGPT (3.66 about mental disorder and 3.45 about medications), and the Croatian platform (3.66 about mental disorder and 3.44 about medications). Almost half of the participants believed it was easier for them to put a question to a psychiatrist than to a chatbot, and only 10% claimed it was easier to ask ChatGPT. CONCLUSION Patients with mental health disorders were more satisfied with responses from their psychiatrists than from chatbots, and satisfaction with chatbots' knowledge on mental disorders and medications was still too low to justify their usage in these patients.
Collapse
Affiliation(s)
- Goran Arbanas
- Goran Arbanas, Vrapče University Psychiatric Hospital, Bolnička cesta 32, 10000 Zagreb, Croatia, goran.arbanas@bolnica-vrapce .hr
| | | | | | | | | | | |
Collapse
|
7
|
Hasei J, Hanzawa M, Nagano A, Maeda N, Yoshida S, Endo M, Yokoyama N, Ochi M, Ishida H, Katayama H, Fujiwara T, Nakata E, Nakahara R, Kunisada T, Tsukahara H, Ozaki T. Empowering pediatric, adolescent, and young adult patients with cancer utilizing generative AI chatbots to reduce psychological burden and enhance treatment engagement: a pilot study. Front Digit Health 2025; 7:1543543. [PMID: 40070545 PMCID: PMC11893593 DOI: 10.3389/fdgth.2025.1543543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/13/2025] [Indexed: 03/14/2025] Open
Abstract
Background Pediatric and adolescent/young adult (AYA) cancer patients face profound psychological challenges, exacerbated by limited access to continuous mental health support. While conventional therapeutic interventions often follow structured protocols, the potential of generative artificial intelligence (AI) chatbots to provide continuous conversational support remains unexplored. This study evaluates the feasibility and impact of AI chatbots in alleviating psychological distress and enhancing treatment engagement in this vulnerable population. Methods Two age-appropriate AI chatbots, leveraging GPT-4, were developed to provide natural, empathetic conversations without structured therapeutic protocols. Five pediatric and AYA cancer patients participated in a two-week intervention, engaging with the chatbots via a messaging platform. Pre- and post-intervention anxiety and stress levels were self-reported, and usage patterns were analyzed to assess the chatbots' effectiveness. Results Four out of five participants reported significant reductions in anxiety and stress levels post-intervention. Participants engaged with the chatbot every 2-3 days, with sessions lasting approximately 10 min. All participants noted improved treatment motivation, with 80% disclosing personal concerns to the chatbot they had not shared with healthcare providers. The 24/7 availability particularly benefited patients experiencing nighttime anxiety. Conclusions This pilot study demonstrates the potential of generative AI chatbots to complement traditional mental health services by addressing unmet psychological needs in pediatric and AYA cancer patients. The findings suggest these tools can serve as accessible, continuous support systems. Further large-scale studies are warranted to validate these promising results.
Collapse
Affiliation(s)
- Joe Hasei
- Department of Medical Information and Assistive Technology Development, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Mana Hanzawa
- Department of Pediatrics, Okayama University Hospital, Okayama, Japan
| | - Akihito Nagano
- Department of Orthopedic Surgery, Gifu University Graduate School of Medicine, Gifu, Japan
| | - Naoko Maeda
- Department of Pediatrics, NHO National Hospital Organization Nagoya Medical Center, Nagoya, Japan
| | - Shinichirou Yoshida
- Department of Orthopedic Surgery, Tohoku University Graduate School of Medicine, Sendai, Japan
| | - Makoto Endo
- Department of Orthopedic Surgery, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Nobuhiko Yokoyama
- Department of Orthopedic Surgery, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Motoharu Ochi
- Department of Pediatrics, Okayama University Hospital, Okayama, Japan
| | - Hisashi Ishida
- Department of Pediatrics, Okayama University Hospital, Okayama, Japan
| | - Hideki Katayama
- Department of Palliative and Supportive Care, Okayama University Hospital, Okayama, Japan
| | - Tomohiro Fujiwara
- Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Eiji Nakata
- Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Ryuichi Nakahara
- Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Toshiyuki Kunisada
- Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Hirokazu Tsukahara
- Department of Pediatrics, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Toshifumi Ozaki
- Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| |
Collapse
|
8
|
Garcia-Rudolph A, Sánchez-Pinsach D, Gilabert A, Saurí J, Soler MD, Opisso E. Building Trust with AI: How Essential is Validating AI Models in the Therapeutic Triad of Therapist, Patient, and Artificial Third? Comment on What is the Current and Future Status of Digital Mental Health Interventions? THE SPANISH JOURNAL OF PSYCHOLOGY 2025; 28:e3. [PMID: 39988913 DOI: 10.1017/sjp.2024.32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Since the publication of "What is the Current and Future Status of Digital Mental Health Interventions?" the exponential growth and widespread adoption of ChatGPT have underscored the importance of reassessing its utility in digital mental health interventions. This review critically examined the potential of ChatGPT, particularly focusing on its application within clinical psychology settings as the technology has continued evolving through 2023 and 2024. Alongside this, our literature review spanned US Medical Licensing Examination (USMLE) validations, assessments of the capacity to interpret human emotions, analyses concerning the identification of depression and its determinants at treatment initiation, and reported our findings. Our review evaluated the capabilities of GPT-3.5 and GPT-4.0 separately in clinical psychology settings, highlighting the potential of conversational AI to overcome traditional barriers such as stigma and accessibility in mental health treatment. Each model displayed different levels of proficiency, indicating a promising yet cautious pathway for integrating AI into mental health practices.
Collapse
Affiliation(s)
- Alejandro Garcia-Rudolph
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| | - David Sánchez-Pinsach
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| | - Anna Gilabert
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| | - Joan Saurí
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| | - Maria Dolors Soler
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| | - Eloy Opisso
- Universitat Autònoma de Barcelona, Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Spain
| |
Collapse
|
9
|
Ogunwale A, Smith A, Fakorede O, Ogunlesi AO. Artificial intelligence and forensic mental health in Africa: a narrative review. Int Rev Psychiatry 2025; 37:3-13. [PMID: 40035373 DOI: 10.1080/09540261.2024.2405174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 09/12/2024] [Indexed: 03/05/2025]
Abstract
This narrative review examines the integration of Artificial Intelligence (AI) tools into forensic psychiatry in Africa, highlighting possible opportunities and challenges. Specifically, AI may have the potential to augment screening in prisons, risk assessment/management, and forensic-psychiatric treatment, alongside offering benefits for training and research purposes. These use-cases may be particularly advantageous in contexts of forensic practice in Africa, where there remains a need for capacity building and service improvements in jurisdictions affected by distinctive sociolegal and socioeconomic challenges. However, AI can also entail ethical risks associated with misinformation, privacy concerns, and an overreliance on automated systems that need to be considered within implementation and policy planning. Equally, the political and regulatory backdrop surrounding AI in countries in Africa needs to be carefully scrutinised (and, where necessary, strengthened). Accordingly, this review calls for rigorous feasibility studies and the development of training programmes to ensure the effective application of AI in enhancing forensic-psychiatric services in Africa.
Collapse
Affiliation(s)
- A Ogunwale
- Forensic Unit, Department of Clinical Services, Neuropsychiatric Hospital, Aro, Abeokuta, Nigeria
- Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College, London, UK
| | - A Smith
- Department of Forensic Psychiatry, University of Bern, Bern, Switzerland
| | - O Fakorede
- Department of Mental Health & Behavioural Medicine, Federal Medical Centre, Abeokuta, Nigeria
| | - A O Ogunlesi
- Retired forensic psychiatrist/former Provost/Medical Director, Neuropsychiatric Hospital, Abeokuta, Nigeria
| |
Collapse
|
10
|
Krysta K, Cullivan R, Brittlebank A, Dragasek J, Hermans M, Strkalj Ivezics S, van Veelen N, Casanova Dias M. Artificial Intelligence in Healthcare and Psychiatry. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2025; 49:10-12. [PMID: 39313674 DOI: 10.1007/s40596-024-02036-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 08/18/2024] [Indexed: 09/25/2024]
Affiliation(s)
- Krzysztof Krysta
- Faculty of Medical Sciences in Katowice, Medical University of Silesia in Katowice, Katowice, Poland
| | - Rachael Cullivan
- Cavan/Monaghan Mental Health Services Ireland, Monaghan, Ireland
| | - Andrew Brittlebank
- Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust, Cumbria, UK
| | - Jozef Dragasek
- Faculty of Medicine, University Hospital of Louis Pasteur and Pavol Jozef Safarik University, Trieda, Kosice, Slovak Republic
| | - Marc Hermans
- European Union of Medical Specialists, Brussels, Belgium
| | | | - Nicoletta van Veelen
- Brain Center, Psychiatry, Diagnostic and Early Psychosis, Universitair Medisch Centrum Utrecht, Utrecht, the Netherlands
| | | |
Collapse
|
11
|
Li L, Kong S, Zhao H, Li C, Teng Y, Wang Y. Chain of Risks Evaluation (CORE): A framework for safer large language models in public mental health. Psychiatry Clin Neurosci 2025. [PMID: 39853828 DOI: 10.1111/pcn.13781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/31/2024] [Accepted: 12/20/2024] [Indexed: 01/26/2025]
Abstract
Large language models (LLMs) have gained significant attention for their capabilities in natural language understanding and generation. However, their widespread adoption potentially raises public mental health concerns, including issues related to inequity, stigma, dependence, medical risks, and security threats. This review aims to offer a perspective within the actor-network framework, exploring the technical architectures, linguistic dynamics, and psychological effects underlying human-LLMs interactions. Based on this theoretical foundation, we propose four categories of risks, presenting increasing challenges in identification and mitigation: universal, context-specific, user-specific, and user-context-specific risks. Correspondingly, we introduce CORE: Chain of Risk Evaluation, a structured conceptual framework for assessing and mitigating the risks associated with LLMs in public mental health contexts. Our approach suggests viewing the development of responsible LLMs as a continuum from technical to public efforts. We summarize technical approaches and potential contributions from mental health practitioners that could help evaluate and regulate risks in human-LLMs interactions. We propose that mental health practitioners could play a crucial role in this emerging field by collaborating with LLMs developers, conducting empirical studies to better understand the psychological impacts on human-LLMs interactions, developing guidelines for LLMs use in mental health contexts, and engaging in public education.
Collapse
Affiliation(s)
- Lingyu Li
- Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Shanghai Mental Health Center, Shanghai, China
| | - Shuqi Kong
- Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Shanghai Mental Health Center, Shanghai, China
| | - Haiquan Zhao
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Chunbo Li
- Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Mental Health Center, Shanghai, China
| | - Yan Teng
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Yingchun Wang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| |
Collapse
|
12
|
Omar M, Nassar S, SharIf K, Glicksberg BS, Nadkarni GN, Klang E. Emerging applications of NLP and large language models in gastroenterology and hepatology: a systematic review. Front Med (Lausanne) 2025; 11:1512824. [PMID: 39917263 PMCID: PMC11799763 DOI: 10.3389/fmed.2024.1512824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 12/09/2024] [Indexed: 02/09/2025] Open
Abstract
Background and aim In the last years, natural language processing (NLP) has transformed significantly with the introduction of large language models (LLM). This review updates on NLP and LLM applications and challenges in gastroenterology and hepatology. Methods Registered with PROSPERO (CRD42024542275) and adhering to PRISMA guidelines, we searched six databases for relevant studies published from 2003 to 2024, ultimately including 57 studies. Results Our review of 57 studies notes an increase in relevant publications in 2023-2024 compared to previous years, reflecting growing interest in newer models such as GPT-3 and GPT-4. The results demonstrate that NLP models have enhanced data extraction from electronic health records and other unstructured medical data sources. Key findings include high precision in identifying disease characteristics from unstructured reports and ongoing improvement in clinical decision-making. Risk of bias assessments using ROBINS-I, QUADAS-2, and PROBAST tools confirmed the methodological robustness of the included studies. Conclusion NLP and LLMs can enhance diagnosis and treatment in gastroenterology and hepatology. They enable extraction of data from unstructured medical records, such as endoscopy reports and patient notes, and for enhancing clinical decision-making. Despite these advancements, integrating these tools into routine practice is still challenging. Future work should prospectively demonstrate real-world value.
Collapse
Affiliation(s)
- Mahmud Omar
- Maccabi Health Services, Tel Aviv, Israel
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Kassem SharIf
- Department of Gastroenterology, Sheba Medical Center, Tel HaShomer, Israel
| | - Benjamin S. Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
13
|
Hara K, Tachibana R, Kumashiro R, Ichihara K, Uemura T, Maeda H, Yamaguchi M, Inoue T. Emotional analysis of operating room nurses in acute care hospitals in Japan: insights using ChatGPT. BMC Nurs 2025; 24:30. [PMID: 39789556 PMCID: PMC11716517 DOI: 10.1186/s12912-024-02655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 12/23/2024] [Indexed: 01/12/2025] Open
Abstract
AIM This study aimed to explore the emotions of operating room nurses in Japan towards perioperative nursing using generative AI and human analysis, and to identify factors contributing to burnout and turnover. METHODS A single-center cross-sectional study was conducted from February 2023 to February 2024, involving semi-structured interviews with 10 operating room nurses from a national hospital in Japan. Interview transcripts were analyzed using generative AI (ChatGPT-4o) and human researchers for thematic, emotional, and subjectivity analysis. A comparison between AI and human analysis was performed, and data visualization techniques, including keyword co-occurrence networks and cluster analysis, were employed to identify patterns and relationships. RESULTS Key themes such as patient care, surgical safety, and nursing skills were identified through thematic analysis. Emotional analysis revealed a range of tones, with AI providing an efficient overview and human researchers capturing nuanced emotional insights. High subjectivity scores indicated deeply personal reflections. Keyword co-occurrence networks and cluster analysis highlighted connections between themes and distinct emotional experiences. CONCLUSIONS Combining generative AI with human expertise offered nuanced insights into the emotions of operating room nurses. The findings emphasize the importance of emotional support, effective communication, and safety protocols in improving nurse well-being and job satisfaction. This hybrid approach can help address emotional challenges, reduce burnout, and enhance retention rates. Future research with larger and more diverse samples is needed to validate these findings and explore the broader applications of AI in healthcare.
Collapse
Affiliation(s)
- Kentaro Hara
- Department of Operation Center and Department of Nursing, Chiba University and Hospital National Hospital Organization Nagasaki Medical Center and Nagasaki University Graduate School of Biomedical Sciences, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan.
| | - Reika Tachibana
- Department of Operation Center, National Hospital Organization Nagasaki Medical Center, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan
| | - Ryosuke Kumashiro
- Department of Operation Center, National Hospital Organization Nagasaki Medical Center, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan
| | - Kodai Ichihara
- Department of Operation Center, National Hospital Organization Nagasaki Medical Center, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan
| | - Takahiro Uemura
- Department of Operation Center, National Hospital Organization Nagasaki Medical Center, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan
| | - Hiroshi Maeda
- Department of Operation Center, Juntendo University School of Medicine Juntendo Hospital, Hongo, Bunkyo-ku, Tokyo, 113-8431, Japan
| | - Michiko Yamaguchi
- Department of Anesthesiology, National Hospital Organization Nagasaki Medical Center, Kubara 2-1001-1, Omura, Nagasaki, 856-8562, Japan
| | - Takahiro Inoue
- Department of Healthcare Management Research Center, Chiba University Hospital, 1-8-1 Inohana, Chuo-ku, Chiba, 260-8677, Japan
| |
Collapse
|
14
|
Cheng HY. ChatGPT's Attitude, Knowledge, and Clinical Application in Geriatrics Practice and Education: Exploratory Observational Study. JMIR Form Res 2025; 9:e63494. [PMID: 39752214 PMCID: PMC11742095 DOI: 10.2196/63494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 10/26/2024] [Accepted: 11/17/2024] [Indexed: 01/04/2025] Open
Abstract
BACKGROUND The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation of its reliability, particularly in geriatrics. OBJECTIVE This study aimed to evaluate ChatGPT's trustworthiness in geriatrics through 3 distinct approaches: evaluating ChatGPT's geriatrics attitude, knowledge, and clinical application with 2 vignettes of geriatric syndromes (polypharmacy and falls). METHODS We used the validated University of California, Los Angeles, geriatrics attitude and knowledge instruments to evaluate ChatGPT's geriatrics attitude and knowledge and compare its performance with that of medical students, residents, and geriatrics fellows from reported results in the literature. We also evaluated ChatGPT's application to 2 vignettes of geriatric syndromes (polypharmacy and falls). RESULTS The mean total score on geriatrics attitude of ChatGPT was significantly lower than that of trainees (medical students, internal medicine residents, and geriatric medicine fellows; 2.7 vs 3.7 on a scale from 1-5; 1=strongly disagree; 5=strongly agree). The mean subscore on positive geriatrics attitude of ChatGPT was higher than that of the trainees (medical students, internal medicine residents, and neurologists; 4.1 vs 3.7 on a scale from 1 to 5 where a higher score means a more positive attitude toward older adults). The mean subscore on negative geriatrics attitude of ChatGPT was lower than that of the trainees and neurologists (1.8 vs 2.8 on a scale from 1 to 5 where a lower subscore means a less negative attitude toward aging). On the University of California, Los Angeles geriatrics knowledge test, ChatGPT outperformed all medical students, internal medicine residents, and geriatric medicine fellows from validated studies (14.7 vs 11.3 with a score range of -18 to +18 where +18 means that all questions were answered correctly). Regarding the polypharmacy vignette, ChatGPT not only demonstrated solid knowledge of potentially inappropriate medications but also accurately identified 7 common potentially inappropriate medications and 5 drug-drug and 3 drug-disease interactions. However, ChatGPT missed 5 drug-disease and 1 drug-drug interaction and produced 2 hallucinations. Regarding the fall vignette, ChatGPT answered 3 of 5 pretests correctly and 2 of 5 pretests partially correctly, identified 6 categories of fall risks, followed fall guidelines correctly, listed 6 key physical examinations, and recommended 6 categories of fall prevention methods. CONCLUSIONS This study suggests that ChatGPT can be a valuable supplemental tool in geriatrics, offering reliable information with less age bias, robust geriatrics knowledge, and comprehensive recommendations for managing 2 common geriatric syndromes (polypharmacy and falls) that are consistent with evidence from guidelines, systematic reviews, and other types of studies. ChatGPT's potential as an educational and clinical resource could significantly benefit trainees, health care providers, and laypeople. Further research using GPT-4o, larger geriatrics question sets, and more geriatric syndromes is needed to expand and confirm these findings before adopting ChatGPT widely for geriatrics education and practice.
Collapse
Affiliation(s)
- Huai Yong Cheng
- Minneapolis VA Health Care System, Minneapolis, MN, United States
| |
Collapse
|
15
|
Liu I, Liu F, Xiao Y, Huang Y, Wu S, Ni S. Investigating the Key Success Factors of Chatbot-Based Positive Psychology Intervention with Retrieval- and Generative Pre-Trained Transformer (GPT)-Based Chatbots. INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 2025; 41:341-352. [DOI: 10.1080/10447318.2023.2300015] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/30/2023] [Accepted: 12/21/2023] [Indexed: 10/06/2024]
Affiliation(s)
- Ivan Liu
- Department of Psychology, Faculty of Arts and Science, Beijing Normal University at Zhuhai
- Faculty of Psychology, Beijing Normal University
| | - Fangyuan Liu
- Department of Psychology, Faculty of Arts and Science, Beijing Normal University at Zhuhai
| | - Yuting Xiao
- Faculty of Psychology, Beijing Normal University
| | - Yajia Huang
- Faculty of Psychology, Beijing Normal University
| | - Shuming Wu
- Faculty of Psychology, Beijing Normal University
| | - Shiguang Ni
- Shenzhen International Graduate School, Tsinghua University
| |
Collapse
|
16
|
Liu Y, Kauttonen J, Zhao B, Li X, Peng W. Editorial: Towards Emotion AI to next generation healthcare and education. Front Psychol 2024; 15:1533053. [PMID: 39749281 PMCID: PMC11694222 DOI: 10.3389/fpsyg.2024.1533053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Accepted: 12/04/2024] [Indexed: 01/04/2025] Open
Affiliation(s)
- Yang Liu
- Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
| | - Janne Kauttonen
- RDI & Competences, Haaga-Helia University of Applied Sciences, Helsinki, Finland
| | - Bowen Zhao
- Guangzhou Institute of Technology, Xidian University, Guangzhou, China
| | - Xiaobai Li
- School of Cyber Science and Technology, Zhejing University, Hangzhou, China
| | - Wei Peng
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, United States
| |
Collapse
|
17
|
Oliveira ACD, Bessa RF, Teles AS. Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study. CAD SAUDE PUBLICA 2024; 40:e00028824. [PMID: 39607132 DOI: 10.1590/0102-311xen028824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/03/2024] [Indexed: 11/29/2024] Open
Abstract
Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot's response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.
Collapse
Affiliation(s)
- Adonias Caetano de Oliveira
- Instituto Federal de Educação, Ciência e Tecnologia do Ceará, Fortaleza, Brasil
- Universidade Federal do Delta do Parnaíba, Parnaíba, Brasil
| | | | - Ariel Soares Teles
- Universidade Federal do Delta do Parnaíba, Parnaíba, Brasil
- Instituto Federal do Maranhão, São Luís, Brasil
| |
Collapse
|
18
|
Chang Y, Su CY, Liu YC. Assessing the Performance of Chatbots on the Taiwan Psychiatry Licensing Examination Using the Rasch Model. Healthcare (Basel) 2024; 12:2305. [PMID: 39595502 PMCID: PMC11594248 DOI: 10.3390/healthcare12222305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND/OBJECTIVES The potential and limitations of chatbots in medical education and clinical decision support, particularly in specialized fields like psychiatry, remain unknown. By using the Rasch model, our study aimed to evaluate the performance of various state-of-the-art chatbots on psychiatry licensing exam questions to explore their strengths and weaknesses. METHODS We assessed the performance of 22 leading chatbots, selected based on LMArena benchmark rankings, using 100 multiple-choice questions from the 2024 Taiwan psychiatry licensing examination, a nationally standardized test required for psychiatric licensure in Taiwan. Chatbot responses were scored for correctness, and we used the Rasch model to evaluate chatbot ability. RESULTS Chatbots released after February 2024 passed the exam, with ChatGPT-o1-preview achieving the highest score of 85. ChatGPT-o1-preview showed a statistically significant superiority in ability (p < 0.001), with a 1.92 logits improvement compared to the passing threshold. It demonstrated strengths in complex psychiatric problems and ethical understanding, yet it presented limitations in up-to-date legal updates and specialized psychiatry knowledge, such as recent amendments to the Mental Health Act, psychopharmacology, and advanced neuroimaging. CONCLUSIONS Chatbot technology could be a valuable tool for medical education and clinical decision support in psychiatry, and as technology continues to advance, these models are likely to play an increasingly integral role in psychiatric practice.
Collapse
Affiliation(s)
- Yu Chang
- Department of Psychiatry, Changhua Christian Hospital, Changhua 500, Taiwan;
| | - Chu-Yun Su
- Taichung Municipal Taichung Special Education School for The Hearing Impaired, Taichung 407, Taiwan
| | - Yi-Chun Liu
- Department of Psychiatry, Changhua Christian Hospital, Changhua 500, Taiwan;
- Department of Psychiatry, Changhua Christian Children’s Hospital, Changhua 500, Taiwan
| |
Collapse
|
19
|
Owen D, Lynham AJ, Smart SE, Pardiñas AF, Camacho Collados J. AI for Analyzing Mental Health Disorders Among Social Media Users: Quarter-Century Narrative Review of Progress and Challenges. J Med Internet Res 2024; 26:e59225. [PMID: 39546783 DOI: 10.2196/59225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 09/08/2024] [Accepted: 10/01/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Mental health disorders are currently the main contributor to poor quality of life and years lived with disability. Symptoms common to many mental health disorders lead to impairments or changes in the use of language, which are observable in the routine use of social media. Detection of these linguistic cues has been explored throughout the last quarter century, but interest and methodological development have burgeoned following the COVID-19 pandemic. The next decade may see the development of reliable methods for predicting mental health status using social media data. This might have implications for clinical practice and public health policy, particularly in the context of early intervention in mental health care. OBJECTIVE This study aims to examine the state of the art in methods for predicting mental health statuses of social media users. Our focus is the development of artificial intelligence-driven methods, particularly natural language processing, for analyzing large volumes of written text. This study details constraints affecting research in this area. These include the dearth of high-quality public datasets for methodological benchmarking and the need to adopt ethical and privacy frameworks acknowledging the stigma experienced by those with a mental illness. METHODS A Google Scholar search yielded peer-reviewed articles dated between 1999 and 2024. We manually grouped the articles by 4 primary areas of interest: datasets on social media and mental health, methods for predicting mental health status, longitudinal analyses of mental health, and ethical aspects of the data and analysis of mental health. Selected articles from these groups formed our narrative review. RESULTS Larger datasets with precise dates of participants' diagnoses are needed to support the development of methods for predicting mental health status, particularly in severe disorders such as schizophrenia. Inviting users to donate their social media data for research purposes could help overcome widespread ethical and privacy concerns. In any event, multimodal methods for predicting mental health status appear likely to provide advancements that may not be achievable using natural language processing alone. CONCLUSIONS Multimodal methods for predicting mental health status from voice, image, and video-based social media data need to be further developed before they may be considered for adoption in health care, medical support, or as consumer-facing products. Such methods are likely to garner greater public confidence in their efficacy than those that rely on text alone. To achieve this, more high-quality social media datasets need to be made available and privacy concerns regarding the use of these data must be formally addressed. A social media platform feature that invites users to share their data upon publication is a possible solution. Finally, a review of literature studying the effects of social media use on a user's depression and anxiety is merited.
Collapse
Affiliation(s)
- David Owen
- School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | - Amy J Lynham
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Sophie E Smart
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Antonio F Pardiñas
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Jose Camacho Collados
- School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
20
|
So JH, Chang J, Kim E, Na J, Choi J, Sohn JY, Kim BH, Chu SH. Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study. JMIR Form Res 2024; 8:e58418. [PMID: 39447159 PMCID: PMC11544339 DOI: 10.2196/58418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/03/2024] [Accepted: 10/03/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND Recent advancements in large language models (LLMs) have accelerated their use across various domains. Psychiatric interviews, which are goal-oriented and structured, represent a significantly underexplored area where LLMs can provide substantial value. In this study, we explore the application of LLMs to enhance psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced traumatic events and mental health issues. OBJECTIVE This study aims to investigate whether LLMs can (1) delineate parts of the conversation that suggest psychiatric symptoms and identify those symptoms, and (2) summarize stressors and symptoms based on the interview dialogue transcript. METHODS Given the interview transcripts, we align the LLMs to perform 3 tasks: (1) extracting stressors from the transcripts, (2) delineating symptoms and their indicative sections, and (3) summarizing the patients based on the extracted stressors and symptoms. These 3 tasks address the 2 objectives, where delineating symptoms is based on the output from the second task, and generating the summary of the interview incorporates the outputs from all 3 tasks. In this context, the transcript data were labeled by mental health experts for the training and evaluation of the LLMs. RESULTS First, we present the performance of LLMs in estimating (1) the transcript sections related to psychiatric symptoms and (2) the names of the corresponding symptoms. In the zero-shot inference setting using the GPT-4 Turbo model, 73 out of 102 transcript segments demonstrated a recall mid-token distance d<20 for estimating the sections associated with the symptoms. For evaluating the names of the corresponding symptoms, the fine-tuning method demonstrates a performance advantage over the zero-shot inference setting of the GPT-4 Turbo model. On average, the fine-tuning method achieves an accuracy of 0.82, a precision of 0.83, a recall of 0.82, and an F1-score of 0.82. Second, the transcripts are used to generate summaries for each interviewee using LLMs. This generative task was evaluated using metrics such as Generative Evaluation (G-Eval) and Bidirectional Encoder Representations from Transformers Score (BERTScore). The summaries generated by the GPT-4 Turbo model, utilizing both symptom and stressor information, achieve high average G-Eval scores: coherence of 4.66, consistency of 4.73, fluency of 2.16, and relevance of 4.67. Furthermore, it is noted that the use of retrieval-augmented generation did not lead to a significant improvement in performance. CONCLUSIONS LLMs, using either (1) appropriate prompting techniques or (2) fine-tuning methods with data labeled by mental health experts, achieved an accuracy of over 0.8 for the symptom delineation task when measured across all segments in the transcript. Additionally, they attained a G-Eval score of over 4.6 for coherence in the summarization task. This research contributes to the emerging field of applying LLMs in psychiatric interviews and demonstrates their potential effectiveness in assisting mental health practitioners.
Collapse
Affiliation(s)
- Jae-Hee So
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - Joonhwan Chang
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - Eunji Kim
- Department of Psychiatry, Yonsei University College of Medicine, Seoul, Republic of Korea
- Institute of Behavioral Sciences in Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Junho Na
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - JiYeon Choi
- Department of Nursing, Mo-Im Kim Nursing Research Institute, Yonsei University College of Nursing, Seoul, Republic of Korea
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea
| | - Jy-Yong Sohn
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - Byung-Hoon Kim
- Department of Psychiatry, Yonsei University College of Medicine, Seoul, Republic of Korea
- Institute of Behavioral Sciences in Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sang Hui Chu
- Department of Nursing, Mo-Im Kim Nursing Research Institute, Yonsei University College of Nursing, Seoul, Republic of Korea
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
21
|
Gargari OK, Fatehi F, Mohammadi I, Firouzabadi SR, Shafiee A, Habibi G. Diagnostic accuracy of large language models in psychiatry. Asian J Psychiatr 2024; 100:104168. [PMID: 39111087 DOI: 10.1016/j.ajp.2024.104168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 07/20/2024] [Accepted: 07/22/2024] [Indexed: 09/13/2024]
Abstract
INTRODUCTION Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5. METHODS We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text. RESULTS The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models. DISCUSSION The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application. CONCLUSION AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.
Collapse
Affiliation(s)
- Omid Kohandel Gargari
- Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran
| | - Farhad Fatehi
- Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, Australia; School of Psychological Sciences, Monash University, Melbourne, Australia
| | - Ida Mohammadi
- Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran
| | - Shahryar Rajai Firouzabadi
- Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran
| | - Arman Shafiee
- Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran
| | - Gholamreza Habibi
- Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran.
| |
Collapse
|
22
|
Salah M, Abdelfattah F, Al Halbusi H. The good, the bad, and the GPT: Reviewing the impact of generative artificial intelligence on psychology. Curr Opin Psychol 2024; 59:101872. [PMID: 39197407 DOI: 10.1016/j.copsyc.2024.101872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 07/10/2024] [Accepted: 08/12/2024] [Indexed: 09/01/2024]
Abstract
This review explores the impact of Generative Artificial Intelligence (GenAI)-a technology capable of autonomously creating new content, ideas, or solutions by learning from extensive data-on psychology. GenAI is changing research methodologies, diagnostics, and treatments by enhancing diagnostic accuracy, personalizing therapeutic interventions, and providing deeper insights into cognitive processes. However, these advancements come with significant ethical concerns, including privacy, bias, and the risk of depersonalization in therapy. By focusing on the current capabilities of GenAI, this study aims to provide a balanced understanding and guide the ethical integration of AI into psychological practices and research. We argue that while GenAI presents profound opportunities, its integration must be approached cautiously using robust ethical frameworks.
Collapse
Affiliation(s)
- Mohammed Salah
- Management Department, College of Business Administration (COBA), A'Sharqiyah University (ASU), Ibra, Oman; Modern College of Business and Science (MCBS), Muscat, Oman.
| | | | | |
Collapse
|
23
|
Grosshans M, Paul T, Fischer SKM, Lotzmann N, List H, Haag C, Mutschler J. Conversation-based AI for anxiety disorders might lower the threshold for traditional medical assistance: a case report. Front Public Health 2024; 12:1399702. [PMID: 39371214 PMCID: PMC11449728 DOI: 10.3389/fpubh.2024.1399702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 09/11/2024] [Indexed: 10/08/2024] Open
Abstract
Artificial intelligence (AI) offers a wealth of opportunities for medicine, if we also bear in mind the risks associated with this technology. In recent years the potential future integration of AI with medicine has been the subject of much debate, although practical clinical experience of relevant cases is still largely absent. This case study examines a particular patient's experience with different forms of care. Initially, the patient communicated with the conversation (chat) based AI (CAI) for self-treatment. However, over time she found herself increasingly drawn to a low-threshold internal company support system that is grounded in an existing, more traditional human-based care structure. This pattern of treatment May represent a useful addition to existing care structures, particularly for patients receptive to technology.
Collapse
Affiliation(s)
- Martin Grosshans
- Department of Global Health, Safety and Well-being, SAP SE, Walldorf, Germany
| | - Torsten Paul
- Department of Global Health, Safety and Well-being, SAP SE, Walldorf, Germany
| | - Sebastian Karl Maximilian Fischer
- Psychiatric Services Lucerne, Lucerne, Switzerland
- Institute of General Practice and Family Medicine, University Hospital of the Ludwig-Maximilians University of Munich, Munich, Germany
| | - Natalie Lotzmann
- Department of Global Health, Safety and Well-being, SAP SE, Walldorf, Germany
| | - Hannah List
- Department of Global Health, Safety and Well-being, SAP SE, Walldorf, Germany
| | - Christina Haag
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | | |
Collapse
|
24
|
Shin D, Kim H, Lee S, Cho Y, Jung W. Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study. J Med Internet Res 2024; 26:e54617. [PMID: 39292502 PMCID: PMC11447422 DOI: 10.2196/54617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 05/17/2024] [Accepted: 08/11/2024] [Indexed: 09/19/2024] Open
Abstract
BACKGROUND Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT. OBJECTIVE We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source. METHODS Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content. RESULTS We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929. CONCLUSIONS Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.
Collapse
Affiliation(s)
- Daun Shin
- Department of Psychiatry, Anam Hospital, Korea University, Seoul, Republic of Korea
- Doctorpresso, Seoul, Republic of Korea
| | | | | | - Younhee Cho
- Doctorpresso, Seoul, Republic of Korea
- Department of Design, Seoul National University, Seoul, Republic of Korea
| | | |
Collapse
|
25
|
Pool J, Indulska M, Sadiq S. Large language models and generative AI in telehealth: a responsible use lens. J Am Med Inform Assoc 2024; 31:2125-2136. [PMID: 38441296 PMCID: PMC11339524 DOI: 10.1093/jamia/ocae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 08/23/2024] Open
Abstract
OBJECTIVE This scoping review aims to assess the current research landscape of the application and use of large language models (LLMs) and generative Artificial Intelligence (AI), through tools such as ChatGPT in telehealth. Additionally, the review seeks to identify key areas for future research, with a particular focus on AI ethics considerations for responsible use and ensuring trustworthy AI. MATERIALS AND METHODS Following the scoping review methodological framework, a search strategy was conducted across 6 databases. To structure our review, we employed AI ethics guidelines and principles, constructing a concept matrix for investigating the responsible use of AI in telehealth. Using the concept matrix in our review enabled the identification of gaps in the literature and informed future research directions. RESULTS Twenty studies were included in the review. Among the included studies, 5 were empirical, and 15 were reviews and perspectives focusing on different telehealth applications and healthcare contexts. Benefit and reliability concepts were frequently discussed in these studies. Privacy, security, and accountability were peripheral themes, with transparency, explainability, human agency, and contestability lacking conceptual or empirical exploration. CONCLUSION The findings emphasized the potential of LLMs, especially ChatGPT, in telehealth. They provide insights into understanding the use of LLMs, enhancing telehealth services, and taking ethical considerations into account. By proposing three future research directions with a focus on responsible use, this review further contributes to the advancement of this emerging phenomenon of healthcare AI.
Collapse
Affiliation(s)
- Javad Pool
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| | - Marta Indulska
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- Business School, The University of Queensland, Brisbane 4072, Australia
| | - Shazia Sadiq
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
26
|
Alliende LM, Sands BR, Mittal VA. Chatbots and Stigma in Schizophrenia: The Need for Transparency. Schizophr Bull 2024; 50:957-960. [PMID: 38917476 PMCID: PMC11348995 DOI: 10.1093/schbul/sbae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Affiliation(s)
| | - Beckett Ryden Sands
- Weinberg College, Department of Psychology, Northwestern University, Evanston, IL, USA
| | | |
Collapse
|
27
|
Sawamura S, Kohiyama K, Takenaka T, Sera T, Inoue T, Nagai T. Performance of ChatGPT 4.0 on Japan's National Physical Therapist Examination: A Comprehensive Analysis of Text and Visual Question Handling. Cureus 2024; 16:e67347. [PMID: 39310431 PMCID: PMC11413471 DOI: 10.7759/cureus.67347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2024] [Indexed: 09/25/2024] Open
Abstract
INTRODUCTION ChatGPT 4.0, a large-scale language model (LLM) developed by OpenAI, has demonstrated the capability to pass Japan's national medical examination and other medical assessments. However, the impact of imaging-based questions and different question types on its performance has not been thoroughly examined. This study evaluated ChatGPT 4.0's performance on Japan's national examination for physical therapists, particularly its ability to handle complex questions involving images and tables. The study also assessed the model's potential in the field of rehabilitation and its performance with Japanese language inputs. METHODS The evaluation utilized 1,000 questions from the 54th to 58th national exams for physical therapists in Japan, comprising 160 general questions and 40 practical questions per exam. All questions were input in Japanese and included additional information such as images or tables. The answers generated by ChatGPT were then compared with the official correct answers. ANALYSIS ChatGPT's performance was evaluated based on accuracy rates using various criteria: general and practical questions were analyzed with Fisher's exact test, A-type (single correct answer) and X2-type (two correct answers) questions, text-only questions versus questions with images and tables, and different question lengths using Student's t-test. RESULTS ChatGPT 4.0 met the passing criteria with an overall accuracy of 73.4%. The accuracy rates for general and practical questions were 80.1% and 46.6%, respectively. No significant difference was found between the accuracy rates for A-type (74.3%) and X2-type (67.4%) questions. However, a significant difference was observed between the accuracy rates for text-only questions (80.5%) and questions with images and tables (35.4%). DISCUSSION The results indicate that ChatGPT 4.0 satisfies the passing criteria for the national exam and demonstrates adequate knowledge and application skills. However, its performance on practical questions and those with images and tables is lower, indicating areas for improvement. The effective handling of Japanese inputs suggests its potential use in non-English-speaking regions. CONCLUSION ChatGPT 4.0 can pass the national examination for physical therapists, particularly with text-based questions. However, improvements are needed for specialized practical questions and those involving images and tables. The model shows promise for supporting clinical rehabilitation and medical education in Japanese-speaking contexts, though further enhancements are required for a comprehensive application.
Collapse
Affiliation(s)
- Shogo Sawamura
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| | - Kengo Kohiyama
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| | - Takahiro Takenaka
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| | - Tatsuya Sera
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| | - Tadatoshi Inoue
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| | - Takashi Nagai
- Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN
| |
Collapse
|
28
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
OBJECTIVES This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. METHODS This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. RESULTS LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. CONCLUSION Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
29
|
Yang W. Beyond algorithms: The human touch machine-generated titles for enhancing click-through rates on social media. PLoS One 2024; 19:e0306639. [PMID: 38995930 PMCID: PMC11244827 DOI: 10.1371/journal.pone.0306639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Artificial intelligence (AI) has the potential to revolutionize various domains by automating language-driven tasks. This study evaluates the effectiveness of an AI-assisted methodology, called the "POP Title AI Five-Step Optimization Method," in optimizing content titles on the RED social media platform. By leveraging advancements in natural language generation, this methodology aims to enhance the impact of titles by incorporating emotional sophistication and cultural proficiency, addressing existing gaps in AI capabilities. The methodology entails training generative models using human-authored examples that align with the aspirations of the target audience. By incorporating popular keywords derived from user searches, the relevance and discoverability of titles are enhanced. Audience-centric filtering is subsequently employed to further refine the generated outputs. Furthermore, human oversight is introduced to provide essential intuition that AI systems alone may lack. A total of one thousand titles, generated by AI, underwent linguistic and engagement analyses. Qualitatively, 65% of the titles exhibited intrigue and conveyed meaning comparable to those generated by humans. However, attaining full emotional sophistication remained a challenge. Quantitatively, titles emphasizing curiosity and contrast demonstrated positive correlations with user interactions, thus validating the efficacy of these techniques. Consequently, the machine-generated titles achieved coherence on par with 65% of human-generated titles, signifying significant progress and potential for further refinement. Nevertheless, achieving socio-cultural awareness is vital to match human understanding across diverse contexts, thus presenting a critical avenue for future improvement in the methodology. Continuous advancements in AI can enhance adaptability and reduce subjectivity by promoting flexibility instead of relying solely on manual reviews. As AI gains a deeper understanding of humanity, opportunities for its application across various industries through experiential reasoning abilities emerge. This case study exemplifies the nurturing of AI's potential by refining its skills through an evolutionary process.
Collapse
Affiliation(s)
- Wenyu Yang
- Foki Media Co., Ltd. Hangzhou, Hangzhou, Zhejiang Province, China
| |
Collapse
|
30
|
Ferrario A, Sedlakova J, Trachsel M. The Role of Humanization and Robustness of Large Language Models in Conversational Artificial Intelligence for Individuals With Depression: A Critical Analysis. JMIR Ment Health 2024; 11:e56569. [PMID: 38958218 PMCID: PMC11231450 DOI: 10.2196/56569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/27/2024] [Accepted: 04/27/2024] [Indexed: 07/04/2024] Open
Abstract
Unlabelled Large language model (LLM)-powered services are gaining popularity in various applications due to their exceptional performance in many tasks, such as sentiment analysis and answering questions. Recently, research has been exploring their potential use in digital health contexts, particularly in the mental health domain. However, implementing LLM-enhanced conversational artificial intelligence (CAI) presents significant ethical, technical, and clinical challenges. In this viewpoint paper, we discuss 2 challenges that affect the use of LLM-enhanced CAI for individuals with mental health issues, focusing on the use case of patients with depression: the tendency to humanize LLM-enhanced CAI and their lack of contextualized robustness. Our approach is interdisciplinary, relying on considerations from philosophy, psychology, and computer science. We argue that the humanization of LLM-enhanced CAI hinges on the reflection of what it means to simulate "human-like" features with LLMs and what role these systems should play in interactions with humans. Further, ensuring the contextualization of the robustness of LLMs requires considering the specificities of language production in individuals with depression, as well as its evolution over time. Finally, we provide a series of recommendations to foster the responsible design and deployment of LLM-enhanced CAI for the therapeutic support of individuals with depression.
Collapse
Affiliation(s)
- Andrea Ferrario
- Institute Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
- Mobiliar Lab for Analytics at ETH, ETH Zurich, Zurich, Switzerland
| | - Jana Sedlakova
- Institute Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
| | - Manuel Trachsel
- University of Basel, Basel, Switzerland
- University Hospital Basel, Basel, Switzerland
- University Psychiatric Clinics Basel, Basel, Switzerland
| |
Collapse
|
31
|
Omar M, Soffer S, Charney AW, Landi I, Nadkarni GN, Klang E. Applications of large language models in psychiatry: a systematic review. Front Psychiatry 2024; 15:1422807. [PMID: 38979501 PMCID: PMC11228775 DOI: 10.3389/fpsyt.2024.1422807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 06/05/2024] [Indexed: 07/10/2024] Open
Abstract
Background With their unmatched ability to interpret and engage with human language and context, large language models (LLMs) hint at the potential to bridge AI and human cognitive processes. This review explores the current application of LLMs, such as ChatGPT, in the field of psychiatry. Methods We followed PRISMA guidelines and searched through PubMed, Embase, Web of Science, and Scopus, up until March 2024. Results From 771 retrieved articles, we included 16 that directly examine LLMs' use in psychiatry. LLMs, particularly ChatGPT and GPT-4, showed diverse applications in clinical reasoning, social media, and education within psychiatry. They can assist in diagnosing mental health issues, managing depression, evaluating suicide risk, and supporting education in the field. However, our review also points out their limitations, such as difficulties with complex cases and potential underestimation of suicide risks. Conclusion Early research in psychiatry reveals LLMs' versatile applications, from diagnostic support to educational roles. Given the rapid pace of advancement, future investigations are poised to explore the extent to which these models might redefine traditional roles in mental health care.
Collapse
Affiliation(s)
- Mahmud Omar
- Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Shelly Soffer
- Internal Medicine B, Assuta Medical Center, Ashdod, Israel
- Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | | | - Isotta Landi
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish N Nadkarni
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
32
|
Maggio MG, Tartarisco G, Cardile D, Bonanno M, Bruschetta R, Pignolo L, Pioggia G, Calabrò RS, Cerasa A. Exploring ChatGPT's potential in the clinical stream of neurorehabilitation. Front Artif Intell 2024; 7:1407905. [PMID: 38903157 PMCID: PMC11187276 DOI: 10.3389/frai.2024.1407905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
In several medical fields, generative AI tools such as ChatGPT have achieved optimal performance in identifying correct diagnoses only by evaluating narrative clinical descriptions of cases. The most active fields of application include oncology and COVID-19-related symptoms, with preliminary relevant results also in psychiatric and neurological domains. This scoping review aims to introduce the arrival of ChatGPT applications in neurorehabilitation practice, where such AI-driven solutions have the potential to revolutionize patient care and assistance. First, a comprehensive overview of ChatGPT, including its design, and potential applications in medicine is provided. Second, the remarkable natural language processing skills and limitations of these models are examined with a focus on their use in neurorehabilitation. In this context, we present two case scenarios to evaluate ChatGPT ability to resolve higher-order clinical reasoning. Overall, we provide support to the first evidence that generative AI can meaningfully integrate as a facilitator into neurorehabilitation practice, aiding physicians in defining increasingly efficacious diagnostic and personalized prognostic plans.
Collapse
Affiliation(s)
| | - Gennaro Tartarisco
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | | | - Roberta Bruschetta
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Giovanni Pioggia
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Antonio Cerasa
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
- S’Anna Institute, Crotone, Italy
- Pharmacotechnology Documentation and Transfer Unit, Preclinical and Translational Pharmacology, Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Rende, Italy
| |
Collapse
|
33
|
Monosov IE, Zimmermann J, Frank MJ, Mathis MW, Baker JT. Ethological computational psychiatry: Challenges and opportunities. Curr Opin Neurobiol 2024; 86:102881. [PMID: 38696972 PMCID: PMC11162904 DOI: 10.1016/j.conb.2024.102881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 05/04/2024]
Abstract
Studying the intricacies of individual subjects' moods and cognitive processing over extended periods of time presents a formidable challenge in medicine. While much of systems neuroscience appropriately focuses on the link between neural circuit functions and well-constrained behaviors over short timescales (e.g., trials, hours), many mental health conditions involve complex interactions of mood and cognition that are non-stationary across behavioral contexts and evolve over extended timescales. Here, we discuss opportunities, challenges, and possible future directions in computational psychiatry to quantify non-stationary continuously monitored behaviors. We suggest that this exploratory effort may contribute to a more precision-based approach to treating mental disorders and facilitate a more robust reverse translation across animal species. We conclude with ethical considerations for any field that aims to bridge artificial intelligence and patient monitoring.
Collapse
Affiliation(s)
- Ilya E. Monosov
- Departments of Neuroscience, Biomedical Engineering, Electrical Engineering, and Neurosurgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Jan Zimmermann
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA
| | - Michael J. Frank
- Carney Center for Computational Brain Science, Brown University, Providence, RI, USA
| | | | | |
Collapse
|
34
|
Li DJ, Kao YC, Tsai SJ, Bai YM, Yeh TC, Chu CS, Hsu CW, Cheng SW, Hsu TW, Liang CS, Su KP. Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists. Psychiatry Clin Neurosci 2024; 78:347-352. [PMID: 38404249 DOI: 10.1111/pcn.13656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/08/2023] [Accepted: 02/05/2024] [Indexed: 02/27/2024]
Abstract
AIM Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied. METHOD In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. RESULT Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ2 = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1). CONCLUSION Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.
Collapse
Affiliation(s)
- Dian-Jeng Li
- Department of Addiction Science, Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan
- Department of Nursing, Meiho University, Pingtung, Taiwan
| | - Yu-Chen Kao
- Department of Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
- Department of Psychiatry, Tri-Service General Hospital, Beitou branch, Taipei, Taiwan
| | - Shih-Jen Tsai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan
- Department of Psychiatry, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ya-Mei Bai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan
- Department of Psychiatry, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ta-Chuan Yeh
- Department of Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Che-Sheng Chu
- Center for Geriatric and Gerontology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
- Non-invasive Neuromodulation Consortium for Mental Disorders, Society of Psychophysiology, Taipei, Taiwan
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Psychiatry, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
| | - Chih-Wei Hsu
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Szu-Wei Cheng
- Department of General Medicine, Chi Mei Medical Center, Tainan, Taiwan
- Mind-Body Interface Laboratory (MBI-Lab) and Department of Psychiatry, China Medical University Hospital, Taichung, Taiwan
| | - Tien-Wei Hsu
- Department of Psychiatry, E-DA Dachang Hospital, I-Shou University, Kaohsiung, Taiwan
- Department of Psychiatry, E-DA Hospital, I-Shou University, Kaohsiung, Taiwan
| | - Chih-Sung Liang
- Department of Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
- Department of Psychiatry, Tri-Service General Hospital, Beitou branch, Taipei, Taiwan
| | - Kuan-Pin Su
- Mind-Body Interface Laboratory (MBI-Lab) and Department of Psychiatry, China Medical University Hospital, Taichung, Taiwan
- College of Medicine, China Medical University, Taichung, Taiwan
- An-Nan Hospital, China Medical University, Tainan, Taiwan
| |
Collapse
|
35
|
Treder MS, Lee S, Tsvetanov KA. Introduction to Large Language Models (LLMs) for dementia care and research. FRONTIERS IN DEMENTIA 2024; 3:1385303. [PMID: 39081594 PMCID: PMC11285660 DOI: 10.3389/frdem.2024.1385303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/23/2024] [Indexed: 08/02/2024]
Abstract
Introduction Dementia is a progressive neurodegenerative disorder that affects cognitive abilities including memory, reasoning, and communication skills, leading to gradual decline in daily activities and social engagement. In light of the recent advent of Large Language Models (LLMs) such as ChatGPT, this paper aims to thoroughly analyse their potential applications and usefulness in dementia care and research. Method To this end, we offer an introduction into LLMs, outlining the key features, capabilities, limitations, potential risks, and practical considerations for deployment as easy-to-use software (e.g., smartphone apps). We then explore various domains related to dementia, identifying opportunities for LLMs to enhance understanding, diagnostics, and treatment, with a broader emphasis on improving patient care. For each domain, the specific contributions of LLMs are examined, such as their ability to engage users in meaningful conversations, deliver personalized support, and offer cognitive enrichment. Potential benefits encompass improved social interaction, enhanced cognitive functioning, increased emotional well-being, and reduced caregiver burden. The deployment of LLMs in caregiving frameworks also raises a number of concerns and considerations. These include privacy and safety concerns, the need for empirical validation, user-centered design, adaptation to the user's unique needs, and the integration of multimodal inputs to create more immersive and personalized experiences. Additionally, ethical guidelines and privacy protocols must be established to ensure responsible and ethical deployment of LLMs. Results We report the results on a questionnaire filled in by people with dementia (PwD) and their supporters wherein we surveyed the usefulness of different application scenarios of LLMs as well as the features that LLM-powered apps should have. Both PwD and supporters were largely positive regarding the prospect of LLMs in care, although concerns were raised regarding bias, data privacy and transparency. Discussion Overall, this review corroborates the promising utilization of LLMs to positively impact dementia care by boosting cognitive abilities, enriching social interaction, and supporting caregivers. The findings underscore the importance of further research and development in this field to fully harness the benefits of LLMs and maximize their potential for improving the lives of individuals living with dementia.
Collapse
Affiliation(s)
- Matthias S. Treder
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Sojin Lee
- Olive AI Limited, London, United Kingdom
| | - Kamen A. Tsvetanov
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, United Kingdom
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
36
|
Bartal A, Jagodnik KM, Chan SJ, Dekel S. AI and narrative embeddings detect PTSD following childbirth via birth stories. Sci Rep 2024; 14:8336. [PMID: 38605073 PMCID: PMC11009279 DOI: 10.1038/s41598-024-54242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/10/2024] [Indexed: 04/13/2024] Open
Abstract
Free-text analysis using machine learning (ML)-based natural language processing (NLP) shows promise for diagnosing psychiatric conditions. Chat Generative Pre-trained Transformer (ChatGPT) has demonstrated preliminary initial feasibility for this purpose; however, whether it can accurately assess mental illness remains to be determined. This study evaluates the effectiveness of ChatGPT and the text-embedding-ada-002 (ADA) model in detecting post-traumatic stress disorder following childbirth (CB-PTSD), a maternal postpartum mental illness affecting millions of women annually, with no standard screening protocol. Using a sample of 1295 women who gave birth in the last six months and were 18+ years old, recruited through hospital announcements, social media, and professional organizations, we explore ChatGPT's and ADA's potential to screen for CB-PTSD by analyzing maternal childbirth narratives. The PTSD Checklist for DSM-5 (PCL-5; cutoff 31) was used to assess CB-PTSD. By developing an ML model that utilizes numerical vector representation of the ADA model, we identify CB-PTSD via narrative classification. Our model outperformed (F1 score: 0.81) ChatGPT and six previously published large text-embedding models trained on mental health or clinical domains data, suggesting that the ADA model can be harnessed to identify CB-PTSD. Our modeling approach could be generalized to assess other mental health disorders.
Collapse
Affiliation(s)
- Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Kathleen M Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, 5290002, Israel
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
| | - Sabrina J Chan
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Sharon Dekel
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
37
|
Sathyam PKR, Surapaneni KM. Assessing the performance of ChatGPT in psychiatry: A study using clinical cases from foreign medical graduate examination (FMGE). Indian J Psychiatry 2024; 66:408-410. [PMID: 38778847 PMCID: PMC11107915 DOI: 10.4103/indianjpsychiatry.indianjpsychiatry_919_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/01/2024] [Accepted: 02/08/2024] [Indexed: 05/25/2024] Open
Affiliation(s)
- Praveen Kumar Ratavarapu Sathyam
- Department of Psychiatry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India E-mail:
- Department of Medical Education, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
| |
Collapse
|
38
|
Cheng J. Applications of Large Language Models in Pathology. Bioengineering (Basel) 2024; 11:342. [PMID: 38671764 PMCID: PMC11047860 DOI: 10.3390/bioengineering11040342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Large language models (LLMs) are transformer-based neural networks that can provide human-like responses to questions and instructions. LLMs can generate educational material, summarize text, extract structured data from free text, create reports, write programs, and potentially assist in case sign-out. LLMs combined with vision models can assist in interpreting histopathology images. LLMs have immense potential in transforming pathology practice and education, but these models are not infallible, so any artificial intelligence generated content must be verified with reputable sources. Caution must be exercised on how these models are integrated into clinical practice, as these models can produce hallucinations and incorrect results, and an over-reliance on artificial intelligence may lead to de-skilling and automation bias. This review paper provides a brief history of LLMs and highlights several use cases for LLMs in the field of pathology.
Collapse
Affiliation(s)
- Jerome Cheng
- Department of Pathology, University of Michigan, Ann Arbor, MI 48105, USA
| |
Collapse
|
39
|
Liu XQ, Zhang ZR. Potential use of large language models for mitigating students' problematic social media use: ChatGPT as an example. World J Psychiatry 2024; 14:334-341. [PMID: 38617990 PMCID: PMC11008388 DOI: 10.5498/wjp.v14.i3.334] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/15/2024] [Accepted: 02/05/2024] [Indexed: 03/19/2024] Open
Abstract
The problematic use of social media has numerous negative impacts on individuals' daily lives, interpersonal relationships, physical and mental health, and more. Currently, there are few methods and tools to alleviate problematic social media, and their potential is yet to be fully realized. Emerging large language models (LLMs) are becoming increasingly popular for providing information and assistance to people and are being applied in many aspects of life. In mitigating problematic social media use, LLMs such as ChatGPT can play a positive role by serving as conversational partners and outlets for users, providing personalized information and resources, monitoring and intervening in problematic social media use, and more. In this process, we should recognize both the enormous potential and endless possibilities of LLMs such as ChatGPT, leveraging their advantages to better address problematic social media use, while also acknowledging the limitations and potential pitfalls of ChatGPT technology, such as errors, limitations in issue resolution, privacy and security concerns, and potential overreliance. When we leverage the advantages of LLMs to address issues in social media usage, we must adopt a cautious and ethical approach, being vigilant of the potential adverse effects that LLMs may have in addressing problematic social media use to better harness technology to serve individuals and society.
Collapse
Affiliation(s)
- Xin-Qiao Liu
- School of Education, Tianjin University, Tianjin 300350, China
| | - Zi-Ru Zhang
- School of Education, Tianjin University, Tianjin 300350, China
| |
Collapse
|
40
|
Dimitriadis F, Alkagiet S, Tsigkriki L, Kleitsioti P, Sidiropoulos G, Efstratiou D, Askalidi T, Tsaousidis A, Siarkos M, Giannakopoulou P, Mavrogianni AD, Zarifis J, Koulaouzidis G. ChatGPT and Patients With Heart Failure. Angiology 2024:33197241238403. [PMID: 38451243 DOI: 10.1177/00033197241238403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
ChatGPT (Generative Pre-trained Transformer) is a large-scale language processing model, with possibilities for professional patient support in a patient-friendly way. The aim of the study was to examine the accuracy and reproducibility of ChatGPT in answering questions about knowledge and management of heart failure (HF). First, we recorded 47 most frequently asked questions by patients about HF. The answers of ChatGPT to these questions were independently assessed by two researchers. ChatGPT was able to render the definition of the disease in a very simple and explanatory way. It listed a number of the most important causes of HF and the most important risk factors for its occurrence. It provided correct answers about the most important diagnostic tests and why they are recommended. In addition, it answered health and dietary questions, such as the daily fluid and the alcohol intake. ChatGPT listed the most important classes of drugs in HF and their mechanism of action. It also answered with arguments to questions about patient's sex life, whether they could work, drive, or travel by plane. The performance of ChatGPT was described as very good as it was able to adequately answer all questions posed to it.
Collapse
Affiliation(s)
- Fotis Dimitriadis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Stelina Alkagiet
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Lamprini Tsigkriki
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | | | - George Sidiropoulos
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Dimitris Efstratiou
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Taisa Askalidi
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Adam Tsaousidis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - Michail Siarkos
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | | | | | - John Zarifis
- Cardiology Department, General Hospital G. Papanikolaou, Thessaloniki, Greece
| | - George Koulaouzidis
- Department of Biochemical Sciences, Pomeranian Medical University, Szczecin, Poland
| |
Collapse
|
41
|
Bartal A, Jagodnik KM, Chan SJ, Dekel S. OpenAI's Narrative Embeddings Can Be Used for Detecting Post-Traumatic Stress Following Childbirth Via Birth Stories. RESEARCH SQUARE 2024:rs.3.rs-3428787. [PMID: 37886525 PMCID: PMC10602164 DOI: 10.21203/rs.3.rs-3428787/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2024]
Abstract
Free-text analysis using Machine Learning (ML)-based Natural Language Processing (NLP) shows promise for diagnosing psychiatric conditions. Chat Generative Pre-trained Transformer (ChatGPT) has demonstrated preliminary initial feasibility for this purpose; however, whether it can accurately assess mental illness remains to be determined. This study evaluates the effectiveness of ChatGPT and the text-embedding-ada-002 (ADA) model in detecting post-traumatic stress disorder following childbirth (CB-PTSD), a maternal postpartum mental illness affecting millions of women annually, with no standard screening protocol. Using a sample of 1,295 women who gave birth in the last six months and were 18+ years old, recruited through hospital announcements, social media, and professional organizations, we explore ChatGPT's and ADA's potential to screen for CB-PTSD by analyzing maternal childbirth narratives. The PTSD Checklist for DSM-5 (PCL-5; cutoff 31) was used to assess CB-PTSD. By developing an ML model that utilizes numerical vector representation of the ADA model, we identify CB-PTSD via narrative classification. Our model outperformed (F1 score: 0.82) ChatGPT and six previously published large language models (LLMs) trained on mental health or clinical domains data, suggesting that the ADA model can be harnessed to identify CB-PTSD. Our modeling approach could be generalized to assess other mental health disorders.
Collapse
Affiliation(s)
- Alon Bartal
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
| | - Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| | - Sabrina J. Chan
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
| | - Sharon Dekel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| |
Collapse
|
42
|
Kalam KT, Rahman JM, Islam MR, Dewan SMR. ChatGPT and mental health: Friends or foes? Health Sci Rep 2024; 7:e1912. [PMID: 38361805 PMCID: PMC10867692 DOI: 10.1002/hsr2.1912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/30/2023] [Accepted: 01/31/2024] [Indexed: 02/17/2024] Open
Abstract
Background ChatGPT is an artificial intelligence (AI) language model that has gained popularity as a virtual assistant because of its exceptional capacity to solve problems and make decisions. However, there are some ways in which technological misuse and incorrect interpretations can have potentially hazardous consequences for a user's mental health. Discussion Because it lacks real-time fact-checking capabilities, ChatGPT may create misleading or erroneous information. Considering AI technology has the potential to influence a person's thinking, we anticipate ChatGPT's future repercussions on mental health by considering instances in which inappropriate usage may lead to mental disorders. While several studies have demonstrated how the AI model may transform mental health care and therapy, certain drawbacks, including bias and privacy violations, have also been identified. Conclusion Educating people and organizing workshops on AI technology usage, strengthening privacy measures, and updating ethical standards are crucial initiatives to prevent misuse and resultant dire impacts on mental health. Longitudinal research on the potential of these platforms to impact a variety of mental health problems is recommended in the future.
Collapse
Affiliation(s)
| | - Jannatul Mabia Rahman
- Department of Electrical and Electronic EngineeringUniversity of Asia PacificDhakaBangladesh
| | | | | |
Collapse
|
43
|
Bekbolatova M, Mayer J, Ong CW, Toma M. Transformative Potential of AI in Healthcare: Definitions, Applications, and Navigating the Ethical Landscape and Public Perspectives. Healthcare (Basel) 2024; 12:125. [PMID: 38255014 PMCID: PMC10815906 DOI: 10.3390/healthcare12020125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/24/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a crucial tool in healthcare with the primary aim of improving patient outcomes and optimizing healthcare delivery. By harnessing machine learning algorithms, natural language processing, and computer vision, AI enables the analysis of complex medical data. The integration of AI into healthcare systems aims to support clinicians, personalize patient care, and enhance population health, all while addressing the challenges posed by rising costs and limited resources. As a subdivision of computer science, AI focuses on the development of advanced algorithms capable of performing complex tasks that were once reliant on human intelligence. The ultimate goal is to achieve human-level performance with improved efficiency and accuracy in problem-solving and task execution, thereby reducing the need for human intervention. Various industries, including engineering, media/entertainment, finance, and education, have already reaped significant benefits by incorporating AI systems into their operations. Notably, the healthcare sector has witnessed rapid growth in the utilization of AI technology. Nevertheless, there remains untapped potential for AI to truly revolutionize the industry. It is important to note that despite concerns about job displacement, AI in healthcare should not be viewed as a threat to human workers. Instead, AI systems are designed to augment and support healthcare professionals, freeing up their time to focus on more complex and critical tasks. By automating routine and repetitive tasks, AI can alleviate the burden on healthcare professionals, allowing them to dedicate more attention to patient care and meaningful interactions. However, legal and ethical challenges must be addressed when embracing AI technology in medicine, alongside comprehensive public education to ensure widespread acceptance.
Collapse
Affiliation(s)
- Molly Bekbolatova
- Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
| | - Jonathan Mayer
- Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
| | - Chi Wei Ong
- School of Chemistry, Chemical Engineering, and Biotechnology, Nanyang Technological University, 62 Nanyang Drive, Singapore 637459, Singapore
| | - Milan Toma
- Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
| |
Collapse
|
44
|
Cho CH, Lee HJ, Kim YK. The New Emerging Treatment Choice for Major Depressive Disorders: Digital Therapeutics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1456:307-331. [PMID: 39261436 DOI: 10.1007/978-981-97-4402-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
The chapter provides an in-depth analysis of digital therapeutics (DTx) as a revolutionary approach to managing major depressive disorder (MDD). It discusses the evolution and definition of DTx, their application across various medical fields, regulatory considerations, and their benefits and limitations. This chapter extensively covers DTx for MDD, including smartphone applications, virtual reality interventions, cognitive-behavioral therapy (CBT) platforms, artificial intelligence (AI) and chatbot therapies, biofeedback, wearable technologies, and serious games. It evaluates the effectiveness of these digital interventions, comparing them with traditional treatments and examining patient perspectives, compliance, and engagement. The integration of DTx into clinical practice is also explored, along with the challenges and barriers to their adoption, such as technological limitations, data privacy concerns, ethical considerations, reimbursement issues, and the need for improved digital literacy. This chapter concludes by looking at the future direction of DTx in mental healthcare, emphasizing the need for personalized treatment plans, integration with emerging modalities, and the expansion of access to these innovative solutions globally.
Collapse
Affiliation(s)
- Chul-Hyun Cho
- Department of Psychiatry, Korea University College of Medicine, Seoul, Republic of Korea
| | - Heon-Jeong Lee
- Department of Psychiatry, Korea University College of Medicine, Seoul, Republic of Korea
| | - Yong-Ku Kim
- Department of Psychiatry, Korea University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
45
|
Di H, Wen Y. Generalist medical artificial intelligence: Embracing the future with flexible interactions. Psychiatry Clin Neurosci 2023; 77:625-626. [PMID: 37671749 DOI: 10.1111/pcn.13595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 08/31/2023] [Accepted: 09/01/2023] [Indexed: 09/07/2023]
Affiliation(s)
- Huajie Di
- Department of Pediatrics, Xuzhou Medical University, Xuzhou, China
- Evidence-Based Medicine Research Center, Xuzhou Medical University, Xuzhou, China
- Department of Pediatric Urology, The Affiliated Xuzhou Children's Hospital of Xuzhou Medical University, Xuzhou, China
| | - Yi Wen
- Department of Pediatrics, Xuzhou Medical University, Xuzhou, China
- Department of Pediatric Urology, The Affiliated Xuzhou Children's Hospital of Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
46
|
Randhawa J, Khan A. A Conversation With ChatGPT About the Usage of Lithium in Pregnancy for Bipolar Disorder. Cureus 2023; 15:e46548. [PMID: 37933339 PMCID: PMC10625495 DOI: 10.7759/cureus.46548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2023] [Indexed: 11/08/2023] Open
Abstract
This conversation with ChatGPT explores the use of lithium in pregnancy for bipolar disorder, a topic of significant importance in psychiatry. Bipolar disorder is characterized by extreme mood swings, and its prevalence varies globally. ChatGPT provides valuable information on bipolar disorder, its prevalence, age of onset, and gender differences. It also discusses the use of lithium during pregnancy, emphasizing the need for individualized decisions, close monitoring, and potential risks and benefits. However, it is essential to note that ChatGPT's responses lack specific references, raising concerns about the reliability of the information provided. Further research is needed to quantify the correctness and dependability of ChatGPT-generated answers in the healthcare context.
Collapse
Affiliation(s)
- Jaismeen Randhawa
- Psychiatry, Sri Guru Ram Das Institute of Medical Sciences and Research, Amritsar, IND
| | - Aadil Khan
- Trauma Surgery, OSF Saint Francis Medical Center, University of Illinois Chicago, Peoria, USA
- Cardiology, University of Illinois Chicago, Illinois, USA
- Internal Medicine, Lala Lajpat Rai Hospital, Kanpur, IND
| |
Collapse
|
47
|
Perera Molligoda Arachchige AS, Chebaro K, Jelmoni AJM. Advances in large language models: ChatGPT expands the horizons of neuroscience. STEM EDUCATION 2023; 3:263-272. [DOI: 10.3934/steme.2023016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/29/2025]
Abstract
<abstract>
<p>The field of neuroscience has been significantly impacted by the emergence of artificial intelligence (AI), particularly language models like ChatGPT. ChatGPT, developed by OpenAI, is a powerful conversational AI tool with the ability to communicate in multiple languages and process vast amounts of data. The commentary explores the significant impact of ChatGPT on the field of neuroscience, emphasizing its potential contributions, challenges, and ethical considerations. ChatGPT has shown promise in various aspects of neuroscience research, including hypothesis generation, data analysis, literature review, collaboration, and education. However, it is not without limitations, particularly in terms of accuracy, potential bias, and ethical concerns. The commentary highlights the potential applications of ChatGPT in the context of child and adolescent mental health, where it could revolutionize assessment and treatment processes. By analyzing text from young patients, ChatGPT can identify patterns related to mental health issues, enhancing diagnostic accuracy and treatment planning. It can also improve communication between patients and healthcare professionals, offering real-time insights and educational resources. While ChatGPT presents exciting opportunities, the commentary acknowledges the need for careful oversight and control to address privacy concerns, biases, and potential misuse. Ethical considerations surrounding the model's impact on emotions, behavior, and biases require ongoing scrutiny and safeguards. In conclusion, ChatGPT offers transformative potential in neuroscience and mental health, but it must be harnessed responsibly, with a focus on ethical considerations and scientific rigor to ensure its positive impact on research and clinical practice.</p>
</abstract>
Collapse
|