51
|
Safrai M, Orwig KE. Utilizing artificial intelligence in academic writing: an in-depth evaluation of a scientific review on fertility preservation written by ChatGPT-4. J Assist Reprod Genet 2024; 41:1871-1880. [PMID: 38619763 PMCID: PMC11263262 DOI: 10.1007/s10815-024-03089-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 03/07/2024] [Indexed: 04/16/2024] Open
Abstract
PURPOSE To evaluate the ability of ChatGPT-4 to generate a biomedical review article on fertility preservation. METHODS ChatGPT-4 was prompted to create an outline for a review on fertility preservation in men and prepubertal boys. The outline provided by ChatGPT-4 was subsequently used to prompt ChatGPT-4 to write the different parts of the review and provide five references for each section. The different parts of the article and the references provided were combined to create a single scientific review that was evaluated by the authors, who are experts in fertility preservation. The experts assessed the article and the references for accuracy and checked for plagiarism using online tools. In addition, both experts independently scored the relevance, depth, and currentness of the ChatGPT-4's article using a scoring matrix ranging from 0 to 5 where higher scores indicate higher quality. RESULTS ChatGPT-4 successfully generated a relevant scientific article with references. Among 27 statements needing citations, four were inaccurate. Of 25 references, 36% were accurate, 48% had correct titles but other errors, and 16% were completely fabricated. Plagiarism was minimal (mean = 3%). Experts rated the article's relevance highly (5/5) but gave lower scores for depth (2-3/5) and currentness (3/5). CONCLUSION ChatGPT-4 can produce a scientific review on fertility preservation with minimal plagiarism. While precise in content, it showed factual and contextual inaccuracies and inconsistent reference reliability. These issues limit ChatGPT-4 as a sole tool for scientific writing but suggest its potential as an aid in the writing process.
Collapse
Affiliation(s)
- Myriam Safrai
- Department of Obstetrics, Gynecology and Reproductive Sciences, Magee-Womens Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.
- Department of Obstetrics and Gynecology, Chaim Sheba Medical Center (Tel Hashomer), Sackler Faculty of Medicine, Tel Aviv University, 52621, Tel Aviv, Israel.
| | - Kyle E Orwig
- Department of Obstetrics, Gynecology and Reproductive Sciences, Magee-Womens Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| |
Collapse
|
52
|
Aghamaliyev U, Karimbayli J, Giessen-Jung C, Matthias I, Unger K, Andrade D, Hofmann FO, Weniger M, Angele MK, Benedikt Westphalen C, Werner J, Renz BW. ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner? Eur J Cancer 2024; 205:114100. [PMID: 38729055 DOI: 10.1016/j.ejca.2024.114100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 04/23/2024] [Indexed: 05/12/2024]
Abstract
OBJECTIVES This study aimed to assess the consistency and replicability of treatment recommendations provided by ChatGPT 3.5 compared to gastrointestinal tumor cases presented at multidisciplinary tumor boards (MTBs). It also aimed to distinguish between general and case-specific responses and investigated the precision of ChatGPT's recommendations in replicating exact treatment plans, particularly regarding chemotherapy regimens and follow-up protocols. MATERIAL AND METHODS A retrospective study was carried out on 115 cases of gastrointestinal malignancies, selected from 448 patients reviewed in MTB meetings. A senior resident fed patient data into ChatGPT 3.5 to produce treatment recommendations, which were then evaluated against the tumor board's decisions by senior oncology fellows. RESULTS Among the examined cases, ChatGPT 3.5 provided general information about the malignancy without considering individual patient characteristics in 19% of cases. However, only in 81% of cases, ChatGPT generated responses that were specific to the individual clinical scenarios. In the subset of case-specific responses, 83% of recommendations exhibited overall treatment strategy concordance between ChatGPT and MTB. However, the exact treatment concordance dropped to 65%, notably lower in recommending specific chemotherapy regimens. Cases recommended for surgery showed the highest concordance rates, while those involving chemotherapy recommendations faced challenges in precision. CONCLUSIONS ChatGPT 3.5 demonstrates potential in aligning conceptual approaches to treatment strategies with MTB guidelines. However, it falls short in accurately duplicating specific treatment plans, especially concerning chemotherapy regimens and follow-up procedures. Ethical concerns and challenges in achieving exact replication necessitate prudence when considering ChatGPT 3.5 for direct clinical decision-making in MTBs.
Collapse
Affiliation(s)
- Ughur Aghamaliyev
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Javad Karimbayli
- Division of Molecular Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, National Cancer Institute, Aviano, Italy
| | - Clemens Giessen-Jung
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany
| | - Ilmer Matthias
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Kristian Unger
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany; Department of Radiation Oncology, University Hospital, LMU Munich, 81377; Bavarian Cancer Research Center (BZKF), Munich, Germany
| | - Dorian Andrade
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Felix O Hofmann
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Maximilian Weniger
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Martin K Angele
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - C Benedikt Westphalen
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Jens Werner
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Bernhard W Renz
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.
| |
Collapse
|
53
|
Herzog I, Mendiratta D, Para A, Berg A, Kaushal N, Vives M. Assessing the potential role of ChatGPT in spine surgery research. J Exp Orthop 2024; 11:e12057. [PMID: 38873173 PMCID: PMC11170336 DOI: 10.1002/jeo2.12057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/12/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024] Open
Abstract
PURPOSE Since its release in November 2022, Chat Generative Pre-Trained Transformer 3.5 (ChatGPT), a complex machine learning model, has garnered more than 100 million users worldwide. The aim of this study is to determine how well ChatGPT can generate novel systematic review ideas on topics within spine surgery. METHODS ChatGPT was instructed to give ten novel systematic review ideas for five popular topics in spine surgery literature: microdiscectomy, laminectomy, spinal fusion, kyphoplasty and disc replacement. A comprehensive literature search was conducted in PubMed, CINAHL, EMBASE and Cochrane. The number of nonsystematic review articles and number of systematic review papers that had been published on each ChatGPT-generated idea were recorded. RESULTS Overall, ChatGPT had a 68% accuracy rate in creating novel systematic review ideas. More specifically, the accuracy rates were 80%, 80%, 40%, 70% and 70% for microdiscectomy, laminectomy, spinal fusion, kyphoplasty and disc replacement, respectively. However, there was a 32% rate of ChatGPT generating ideas for which there were 0 nonsystematic review articles published. There was a 71.4%, 50%, 22.2%, 50%, 62.5% and 51.2% success rate of generating novel systematic review ideas, for which there were also nonsystematic reviews published, for microdiscectomy, laminectomy, spinal fusion, kyphoplasty, disc replacement and overall, respectively. CONCLUSIONS ChatGPT generated novel systematic review ideas at an overall rate of 68%. ChatGPT can help identify knowledge gaps in spine research that warrant further investigation, when used under supervision of an experienced spine specialist. This technology can be erroneous and lacks intrinsic logic; so, it should never be used in isolation. LEVEL OF EVIDENCE Not applicable.
Collapse
Affiliation(s)
- Isabel Herzog
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | | | - Ashok Para
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Ari Berg
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Neil Kaushal
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Michael Vives
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| |
Collapse
|
54
|
Carnino JM, Pellegrini WR, Willis M, Cohen MB, Paz-Lansberg M, Davis EM, Grillone GA, Levi JR. Assessing ChatGPT's Responses to Otolaryngology Patient Questions. Ann Otol Rhinol Laryngol 2024; 133:658-664. [PMID: 38676440 DOI: 10.1177/00034894241249621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
OBJECTIVE This study aims to evaluate ChatGPT's performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare. METHODS A cross-sectional study was conducted using patient questions from the public online forum Reddit's r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified. RESULTS Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous. CONCLUSION While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.
Collapse
Affiliation(s)
- Jonathan M Carnino
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - William R Pellegrini
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Megan Willis
- Department of Biostatistics, Boston University, Boston, MA, USA
| | - Michael B Cohen
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Marianella Paz-Lansberg
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Elizabeth M Davis
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Gregory A Grillone
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Jessica R Levi
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| |
Collapse
|
55
|
Shiraishi M, Tomioka Y, Miyakuni A, Ishii S, Hori A, Park H, Ohba J, Okazaki M. Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis. Aesthetic Plast Surg 2024; 48:2389-2398. [PMID: 38684536 DOI: 10.1007/s00266-024-04005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/11/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022. METHODS CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries. RESULTS A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed. CONCLUSIONS ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Hwayoung Park
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
56
|
Heinke A, Radgoudarzi N, Huang BB, Baxter SL. A review of ophthalmology education in the era of generative artificial intelligence. Asia Pac J Ophthalmol (Phila) 2024; 13:100089. [PMID: 39134176 PMCID: PMC11934932 DOI: 10.1016/j.apjo.2024.100089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 07/31/2024] [Accepted: 08/02/2024] [Indexed: 08/18/2024] Open
Abstract
PURPOSE To explore the integration of generative AI, specifically large language models (LLMs), in ophthalmology education and practice, addressing their applications, benefits, challenges, and future directions. DESIGN A literature review and analysis of current AI applications and educational programs in ophthalmology. METHODS Analysis of published studies, reviews, articles, websites, and institutional reports on AI use in ophthalmology. Examination of educational programs incorporating AI, including curriculum frameworks, training methodologies, and evaluations of AI performance on medical examinations and clinical case studies. RESULTS Generative AI, particularly LLMs, shows potential to improve diagnostic accuracy and patient care in ophthalmology. Applications include aiding in patient, physician, and medical students' education. However, challenges such as AI hallucinations, biases, lack of interpretability, and outdated training data limit clinical deployment. Studies revealed varying levels of accuracy of LLMs on ophthalmology board exam questions, underscoring the need for more reliable AI integration. Several educational programs nationwide provide AI and data science training relevant to clinical medicine and ophthalmology. CONCLUSIONS Generative AI and LLMs offer promising advancements in ophthalmology education and practice. Addressing challenges through comprehensive curricula that include fundamental AI principles, ethical guidelines, and updated, unbiased training data is crucial. Future directions include developing clinically relevant evaluation metrics, implementing hybrid models with human oversight, leveraging image-rich data, and benchmarking AI performance against ophthalmologists. Robust policies on data privacy, security, and transparency are essential for fostering a safe and ethical environment for AI applications in ophthalmology.
Collapse
Affiliation(s)
- Anna Heinke
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, 9415 Campus Point Drive, La Jolla, CA 92037, USA; Jacobs Retina Center, 9415 Campus Point Drive, La Jolla, CA 92037, USA
| | - Niloofar Radgoudarzi
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, 9415 Campus Point Drive, La Jolla, CA 92037, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego Health System, University of California San Diego, La Jolla, CA, USA
| | - Bonnie B Huang
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, 9415 Campus Point Drive, La Jolla, CA 92037, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego Health System, University of California San Diego, La Jolla, CA, USA; Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, 9415 Campus Point Drive, La Jolla, CA 92037, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego Health System, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
57
|
Shemer A, Cohen M, Altarescu A, Atar-Vardi M, Hecht I, Dubinsky-Pertzov B, Shoshany N, Zmujack S, Or L, Einan-Lifshitz A, Pras E. Diagnostic capabilities of ChatGPT in ophthalmology. Graefes Arch Clin Exp Ophthalmol 2024; 262:2345-2352. [PMID: 38183467 DOI: 10.1007/s00417-023-06363-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/08/2024] Open
Abstract
PURPOSE The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology. METHODS This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination (Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then, we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration for completing the assignment between the groups. RESULTS ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam findings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01), respectively. ChatGPT was 4 to 5 times faster than the attendings and residents. CONCLUSIONS AND RELEVANCE ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attendings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task faster than the physicians.
Collapse
Affiliation(s)
- Asaf Shemer
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel.
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Michal Cohen
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Health Science, Ben-Gurion University of the Negev, South District, Beer-Sheva, Israel
| | - Aya Altarescu
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Maya Atar-Vardi
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Idan Hecht
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Biana Dubinsky-Pertzov
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nadav Shoshany
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Sigal Zmujack
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lior Or
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Adi Einan-Lifshitz
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran Pras
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Matlow's Ophthalmo-Genetics Laboratory, Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
| |
Collapse
|
58
|
Kim HJ, Yang JH, Chang DG, Lenke LG, Pizones J, Castelein R, Watanabe K, Trobisch PD, Mundis GM, Suh SW, Suk SI. Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis. J Med Internet Res 2024; 26:e52001. [PMID: 38924787 PMCID: PMC11237793 DOI: 10.2196/52001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/15/2024] [Accepted: 04/26/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as "Gemini"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. OBJECTIVE The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. METHODS In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. RESULTS The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. CONCLUSIONS Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.
Collapse
Affiliation(s)
- Hong Jin Kim
- Department of Orthopedic Surgery, Inje University Sanggye Paik Hospital, College of Medicine, Inje University, Seoul, Republic of Korea
| | - Jae Hyuk Yang
- Department of Orthopedic Surgery, Korea University Anam Hospital, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Dong-Gune Chang
- Department of Orthopedic Surgery, Inje University Sanggye Paik Hospital, College of Medicine, Inje University, Seoul, Republic of Korea
| | - Lawrence G Lenke
- Department of Orthopedic Surgery, The Daniel and Jane Och Spine Hospital, Columbia University, New York, NY, United States
| | - Javier Pizones
- Department of Orthopedic Surgery, Hospital Universitario La Paz, Madrid, Spain
| | - René Castelein
- Department of Orthopedic Surgery, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Kota Watanabe
- Department of Orthopedic Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Per D Trobisch
- Department of Spine Surgery, Eifelklinik St. Brigida, Simmerath, Germany
| | - Gregory M Mundis
- Department of Orthopaedic Surgery, Scripps Clinic, La Jolla, CA, United States
| | - Seung Woo Suh
- Department of Orthopedic Surgery, Korea University Guro Hospital, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Se-Il Suk
- Department of Orthopedic Surgery, Inje University Sanggye Paik Hospital, College of Medicine, Inje University, Seoul, Republic of Korea
| |
Collapse
|
59
|
Costa ICP, do Nascimento MC, Treviso P, Chini LT, Roza BDA, Barbosa SDFF, Mendes KDS. Using the Chat Generative Pre-trained Transformer in academic writing in health: a scoping review. Rev Lat Am Enfermagem 2024; 32:e4194. [PMID: 38922265 PMCID: PMC11182606 DOI: 10.1590/1518-8345.7133.4194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/04/2024] [Indexed: 06/27/2024] Open
Abstract
OBJECTIVE to map the scientific literature regarding the use of the Chat Generative Pre-trained Transformer, ChatGPT, in academic writing in health. METHOD this was a scoping review, following the JBI methodology. Conventional databases and gray literature were included. The selection of studies was applied after removing duplicates and individual and paired evaluation. Data were extracted based on an elaborate script, and presented in a descriptive, tabular and graphical format. RESULTS the analysis of the 49 selected articles revealed that ChatGPT is a versatile tool, contributing to scientific production, description of medical procedures and preparation of summaries aligned with the standards of scientific journals. Its application has been shown to improve the clarity of writing and benefits areas such as innovation and automation. Risks were also observed, such as the possibility of lack of originality and ethical issues. Future perspectives highlight the need for adequate regulation, agile adaptation and the search for an ethical balance in incorporating ChatGPT into academic writing. CONCLUSION ChatGPT presents transformative potential in academic writing in health. However, its adoption requires rigorous human supervision, solid regulation, and transparent guidelines to ensure its responsible and beneficial use by the scientific community.
Collapse
Affiliation(s)
| | | | - Patrícia Treviso
- Universidade do Vale do Rio dos Sinos, Escola de Saúde, São Leopoldo, RS, Brazil
| | | | | | | | - Karina Dal Sasso Mendes
- Universidade de São Paulo, Escola de Enfermagem de Ribeirão Preto, PAHO/WHO Collaborating Centre for Nursing Research Development, Ribeirão Preto, SP, Brazil
| |
Collapse
|
60
|
Ren D, Tagg AJ, Wilcox H, Roland D. Identification of Human-Generated vs AI-Generated Research Abstracts by Health Care Professionals. JAMA Pediatr 2024; 178:625-626. [PMID: 38683595 PMCID: PMC11059037 DOI: 10.1001/jamapediatrics.2024.0760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/21/2024] [Indexed: 05/01/2024]
Abstract
This survey study assesses the ability of health care professionals to discern whether abstracts were written by investigators or by an artificial intelligence (AI) chatbot.
Collapse
Affiliation(s)
- Dennis Ren
- Division of Emergency Medicine, Children’s National Hospital, Washington, DC
| | | | - Helena Wilcox
- St George’s University Hospitals, National Health Service Foundation Trust, London, United Kingdom
| | - Damian Roland
- SAPPHIRE Group, Population Health Sciences, Leicester University, Leicester, United Kingdom
- Paediatric Emergency Medicine Leicester Academic (PEMLA) Group, Children's Emergency Department, Leicester Royal Infirmary, Leicester, United Kingdom
| |
Collapse
|
61
|
Shiraishi M, Tanigawa K, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Blepharoptosis Consultation with Artificial Intelligence: Aesthetic Surgery Advice and Counseling from Chat Generative Pre-Trained Transformer (ChatGPT). Aesthetic Plast Surg 2024; 48:2057-2063. [PMID: 38589561 DOI: 10.1007/s00266-024-04002-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/11/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Chat generative pre-trained transformer (ChatGPT) is a publicly available extensive artificial intelligence (AI) language model that leverages deep learning to generate text that mimics human conversations. In this study, the performance of ChatGPT was assessed by offering insightful and precise answers to a series of fictional questions and emulating a preliminary consultation on blepharoplasty. METHODS ChatGPT was posed with questions derived from a blepharoplasty checklist provided by the American Society of Plastic Surgeons. Board-certified plastic surgeons and non-medical staff members evaluated the responses for accuracy, informativeness, and accessibility. RESULTS Nine questions were used in this study. Regarding informativeness, the average score given by board-certified plastic surgeons was significantly lower than that given by non-medical staff members (2.89 ± 0.72 vs 4.41 ± 0.71; p = 0.042). No statistically significant differences were observed in accuracy (p = 0.56) or accessibility (p = 0.11). CONCLUSIONS Our results emphasize the effectiveness of ChatGPT in simulating doctor-patient conversations during blepharoplasty. Non-medical individuals found its responses more informative compared with the surgeons. Although limited in terms of specialized guidance, ChatGPT offers foundational surgical information. Further exploration is warranted to elucidate the broader role of AI in esthetic surgical consultations. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Koji Tanigawa
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Rui Yang
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Oba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
62
|
Nachalon Y, Broer M, Nativ-Zeltzer N. Using ChatGPT to Generate Research Ideas in Dysphagia: A Pilot Study. Dysphagia 2024; 39:407-411. [PMID: 37907728 DOI: 10.1007/s00455-023-10623-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/20/2023] [Indexed: 11/02/2023]
Abstract
Current research in dysphagia faces challenges due to the rapid growth of scientific literature and the interdisciplinary nature of the field. To address this, the study evaluates ChatGPT, an AI language model, as a supplementary resource to assist clinicians and researchers in generating research ideas for dysphagia, utilizing recent advancements in natural language processing and machine learning. The research ideas were generated through ChatGPT's command to explore diverse aspects of dysphagia. A web-based survey was conducted, 45 dysphagia experts were asked to rank each study on a scale of 1 to 5 according to feasibility, novelty, clinical implications, and relevance to current practice. A total of 26 experts (58%) completed the survey. The mean (± sd) rankings of research ideas were 4.03 (± 0.17) for feasibility, 3.5 (± 0.17) for potential impact on the field, 3.84 (± 0.12) for clinical relevance, and 3.08 (± 0.36) for novelty and innovation. Results of this study suggest that ChatGPT offers a promising approach to generating research ideas in dysphagia. While its current capability to generate innovative ideas appears limited, it can serve as a supplementary resource for researchers.
Collapse
Affiliation(s)
- Yuval Nachalon
- Department of Otolaryngology, Head and Neck Surgery and Maxillofacial Surgery, Tel-Aviv Sourasky Medical Center, 6 Weizman Street, 6423906, Tel-Aviv, Israel.
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Maya Broer
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
63
|
Shen SA, Perez-Heydrich CA, Xie DX, Nellis JC. ChatGPT vs. web search for patient questions: what does ChatGPT do better? Eur Arch Otorhinolaryngol 2024; 281:3219-3225. [PMID: 38416195 PMCID: PMC11410109 DOI: 10.1007/s00405-024-08524-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/31/2024] [Indexed: 02/29/2024]
Abstract
PURPOSE Chat generative pretrained transformer (ChatGPT) has the potential to significantly impact how patients acquire medical information online. Here, we characterize the readability and appropriateness of ChatGPT responses to a range of patient questions compared to results from traditional web searches. METHODS Patient questions related to the published Clinical Practice Guidelines by the American Academy of Otolaryngology-Head and Neck Surgery were sourced from existing online posts. Questions were categorized using a modified Rothwell classification system into (1) fact, (2) policy, and (3) diagnosis and recommendations. These were queried using ChatGPT and traditional web search. All results were evaluated on readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool). Accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale. RESULTS 54 questions were organized into fact (37.0%), policy (37.0%), and diagnosis (25.8%). The average readability for ChatGPT responses was lower than traditional web search (FRE: 42.3 ± 13.1 vs. 55.6 ± 10.5, p < 0.001), while the PEMAT understandability was equivalent (93.8% vs. 93.5%, p = 0.17). ChatGPT scored higher than web search for questions the 'Diagnosis' category (p < 0.01); there was no difference in questions categorized as 'Fact' (p = 0.15) or 'Policy' (p = 0.22). Additional prompting improved ChatGPT response readability (FRE 55.6 ± 13.6, p < 0.01). CONCLUSIONS ChatGPT outperforms web search in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policy. Appropriate prompting can further improve readability while maintaining accuracy. Further patient education is needed to relay the benefits and limitations of this technology as a source of medial information.
Collapse
Affiliation(s)
- Sarek A Shen
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA.
| | | | - Deborah X Xie
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA
| | - Jason C Nellis
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA
| |
Collapse
|
64
|
Chelli M, Descamps J, Lavoué V, Trojani C, Azar M, Deckert M, Raynier JL, Clowez G, Boileau P, Ruetsch-Chelli C. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res 2024; 26:e53164. [PMID: 38776130 PMCID: PMC11153973 DOI: 10.2196/53164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 01/22/2024] [Accepted: 02/21/2024] [Indexed: 05/24/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist. OBJECTIVE The aim of the study is to assess the performance of LLMs such as ChatGPT and Bard (subsequently rebranded Gemini) to produce references in the context of scientific writing. METHODS The performance of ChatGPT and Bard in replicating the results of human-conducted systematic reviews was assessed. Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards. The study used 3 key performance metrics: recall, precision, and F1-score, alongside the hallucination rate. Papers were considered "hallucinated" if any 2 of the following information were wrong: title, first author, or year of publication. RESULTS In total, 11 systematic reviews across 4 fields yielded 33 prompts to LLMs (3 LLMs×11 reviews), with 471 references analyzed. Precision rates for GPT-3.5, GPT-4, and Bard were 9.4% (13/139), 13.4% (16/119), and 0% (0/104) respectively (P<.001). Recall rates were 11.9% (13/109) for GPT-3.5 and 13.7% (15/109) for GPT-4, with Bard failing to retrieve any relevant papers (P<.001). Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard (P<.001). Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs. CONCLUSIONS Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes.
Collapse
Affiliation(s)
- Mikaël Chelli
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Jules Descamps
- Orthopedic and Traumatology Unit, Hospital Lariboisière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Vincent Lavoué
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Christophe Trojani
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Michel Azar
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Marcel Deckert
- Université Côte d'Azur, INSERM, C3M, Team Microenvironment, Signalling and Cancer, Nice, France
| | - Jean-Luc Raynier
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Gilles Clowez
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Pascal Boileau
- Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France
| | - Caroline Ruetsch-Chelli
- Université Côte d'Azur, INSERM, C3M, Team Microenvironment, Signalling and Cancer, Nice, France
| |
Collapse
|
65
|
Levin G, Pareja R, Viveros-Carreño D, Sanchez Diaz E, Yates EM, Zand B, Ramirez PT. Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts. Int J Gynecol Cancer 2024; 34:669-674. [PMID: 38627032 DOI: 10.1136/ijgc-2023-005162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/22/2024] [Indexed: 04/24/2024] Open
Abstract
OBJECTIVE To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts. METHODS Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers' evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate. RESULTS The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2-64.1%) vs 45.0% (43.2-48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (β=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (β=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001. CONCLUSION A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.
Collapse
Affiliation(s)
- Gabriel Levin
- Division of Gynecologic Oncology, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | - Rene Pareja
- Gynecologic Oncology, Clinica ASTORGA, Medellin, and Instituto Nacional de Cancerología, Bogotá, Colombia
| | - David Viveros-Carreño
- Unidad Ginecología Oncológica, Grupo de Investigación GIGA, Centro de Tratamiento e Investigación sobre Cáncer Luis Carlos Sarmiento Angulo - CTIC, Bogotá, Colombia
- Department of Gynecologic Oncology, Clínica Universitaria Colombia, Bogotá, Colombia
| | - Emmanuel Sanchez Diaz
- Universidad Pontificia Bolivariana Clinica Universitaria Bolivariana, Medellin, Colombia
| | - Elise Mann Yates
- Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
| | - Behrouz Zand
- Gynecologic Oncology, Houston Methodist, Shenandoah, Texas, USA
| | - Pedro T Ramirez
- Department of Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
| |
Collapse
|
66
|
Harskamp RE, De Clercq L. Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2). Acta Cardiol 2024; 79:358-366. [PMID: 38348835 DOI: 10.1080/00015385.2024.2303528] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 01/01/2024] [Indexed: 05/22/2024]
Abstract
BACKGROUND It is thought that ChatGPT, an advanced language model developed by OpenAI, may in the future serve as an AI-assisted decision support tool in medicine. OBJECTIVE To evaluate the accuracy of ChatGPT's recommendations on medical questions related to common cardiac symptoms or conditions. METHODS We tested ChatGPT's ability to address medical questions in two ways. First, we assessed its accuracy in correctly answering cardiovascular trivia questions (n = 50), based on quizzes for medical professionals. Second, we entered 20 clinical case vignettes on the ChatGPT platform and evaluated its accuracy compared to expert opinion and clinical course. Lastly, we compared the latest research version (v3.5; 27 September 2023) with a prior version (v3.5; 30 January 2023) to evaluate improvement over time. RESULTS We found that ChatGPT latest version correctly answered 92% of the trivia questions, with slight variation in accuracy in the domains coronary artery disease (100%), pulmonary and venous thrombotic embolism (100%), atrial fibrillation (90%), heart failure (90%) and cardiovascular risk management (80%). In the 20 case vignettes, ChatGPT's response matched in 17 (85%) of the cases with the actual advice given. Straightforward patient-to-physician questions were all answered correctly (10/10). In more complex cases, where physicians (general practitioners) asked other physicians (cardiologists) for assistance or decision support, ChatGPT was correct in 70% of cases, and otherwise provided incomplete, inconclusive, or inappropriate recommendations when compared with expert consultation. ChatGPT showed significant improvement over time; as the January version correctly answered 74% (vs 92%) of trivia questions (p = 0.031), and correctly answered a mere 50% of complex cases. CONCLUSIONS Our study suggests that ChatGPT has potential as an AI-assisted decision support tool in medicine, particularly for straightforward, low-complex medical questions, but further research is needed to fully evaluate its potential.
Collapse
Affiliation(s)
- Ralf E Harskamp
- Department of General Practice, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health, Personalized Medicine, Amsterdam, The Netherlands
| | - Lukas De Clercq
- Department of General Practice, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health, Personalized Medicine, Amsterdam, The Netherlands
| |
Collapse
|
67
|
Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. ANNALES PHARMACEUTIQUES FRANÇAISES 2024; 82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]
Abstract
OBJECTIVES Clinical pharmacists rely on different scientific references to ensure appropriate, safe, and cost-effective drug use. Tools based on artificial intelligence (AI) such as ChatGPT (Generative Pre-trained Transformer) could offer valuable support. The objective of this study was to assess ChatGPT's capacity to correctly respond to clinical pharmacy questions asked by healthcare professionals in our university hospital. MATERIAL AND METHODS ChatGPT's capacity to respond correctly to the last 100 consecutive questions recorded in our clinical pharmacy database was assessed. Questions were copied from our FileMaker Pro database and pasted into ChatGPT March 14 version online platform. The generated answers were then copied verbatim into an Excel file. Two blinded clinical pharmacists reviewed all the questions and the answers given by the software. In case of disagreements, a third blinded pharmacist intervened to decide. RESULTS Documentation-related issues (n=36) and drug administration mode (n=30) were preponderantly recorded. Among 69 applicable questions, the rate of correct answers varied from 30 to 57.1% depending on questions type with a global rate of 44.9%. Regarding inappropriate answers (n=38), 20 were incorrect, 18 gave no answers and 8 were incomplete with 8 answers belonging to 2 different categories. No better answers than the pharmacists were observed. CONCLUSIONS ChatGPT demonstrated a mitigated performance in answering clinical pharmacy questions. It should not replace human expertise as a high rate of inappropriate answers was highlighted. Future studies should focus on the optimization of ChatGPT for specific clinical pharmacy questions and explore the potential benefits and limitations of integrating this technology into clinical practice.
Collapse
Affiliation(s)
- A Fournier
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - C Fallet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - F Sadeghipour
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland; Center for Research and Innovation in Clinical Pharmaceutical Sciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - N Perrottet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| |
Collapse
|
68
|
Vlachopoulos C, Antonopoulos A, Terentes-Printzios D. Generative artificial intelligence tools in scientific writing: entering a brave new world? Hellenic J Cardiol 2024; 77:120-121. [PMID: 38797284 DOI: 10.1016/j.hjc.2024.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 05/29/2024] Open
|
69
|
Baladrón C, Sevilla T, Carrasco-Moraleja M, Gómez-Salvador I, Peral-Oliveira J, San Román JA. Assessing the accuracy of ChatGPT as a decision support tool in cardiology. REVISTA ESPANOLA DE CARDIOLOGIA (ENGLISH ED.) 2024; 77:433-435. [PMID: 38056773 DOI: 10.1016/j.rec.2023.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023]
Affiliation(s)
- Carlos Baladrón
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain. https://twitter.com/@cbalzor
| | - Teresa Sevilla
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain.
| | - Manuel Carrasco-Moraleja
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain
| | - Itziar Gómez-Salvador
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain
| | - Julio Peral-Oliveira
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain
| | - José Alberto San Román
- Servicio de Cardiología, Hospital Clínico Universitario de Valladolid, Valladolid, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Spain
| |
Collapse
|
70
|
Abstract
OBJECTIVES This study aimed to assess the capabilities of ChatGPT (Chat Generative Pre-Trained Transformer) in generating informative content related to bipolar disorders. The objectives were to evaluate its ability to provide accurate information on symptoms, classification, causes, and management of bipolar disorder and to explore its creativity in generating topic-related songs. METHODS ChatGPT3 was used for the study, and a series of clinically relevant questions were asked to test its knowledge and creativity. Questions ranged from common symptom descriptions to more artistic requests for songs related to bipolar disorder. RESULTS ChatGPT demonstrated the capacity to provide basic and informative material on bipolar disorders, including descriptions of symptoms, classification types, causes, and treatment options. It also showed creativity in generating songs that capture the nuances of bipolar symptoms, both during high and low states. CONCLUSIONS While ChatGPT3 can offer superficial information on psychiatric topics like bipolar disorder, its inability to provide accurate and up-to-date references limits its utility for creating a comprehensive review article for scientific journals. However, it may be helpful in generating educational material and assisting in component tasks for those with bipolar disorder or other psychiatric conditions. As newer versions of AI models are continually developed, their capabilities in producing more accurate and advanced content will need further evaluation.
Collapse
Affiliation(s)
- Gordon Parker
- Discipline of Psychiatry and Mental Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
| | - Michael J Spoelma
- Discipline of Psychiatry and Mental Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Black Dog Institute, Sydney, New South Wales, Australia
| |
Collapse
|
71
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
72
|
Cappellani F, Card KR, Shields CL, Pulido JS, Haller JA. Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients. Eye (Lond) 2024; 38:1368-1373. [PMID: 38245622 PMCID: PMC11076805 DOI: 10.1038/s41433-023-02906-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/05/2023] [Accepted: 12/13/2023] [Indexed: 01/22/2024] Open
Abstract
PURPOSE To assess the accuracy of ophthalmic information provided by an artificial intelligence chatbot (ChatGPT). METHODS Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5. Three questions were asked to ChatGPT for each disease: what is x?; how is x diagnosed?; how is x treated? (x = name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from -3 (unvalidated and potentially harmful to a patient's health or well-being if they pursue such a suggestion) to 2 (correct and complete). MAIN OUTCOMES Accuracy of responses from ChatGPT in response to prompts related to ophthalmic health information in the form of scores on a scale from -3 to 2. RESULTS Of the 120 questions, 93 (77.5%) scored ≥ 1. 27. (22.5%) scored ≤ -1; among these, 9 (7.5%) obtained a score of -3. The overall median score amongst all subspecialties was 2 for the question "What is x", 1.5 for "How is x diagnosed", and 1 for "How is x treated", though this did not achieve significance by Kruskal-Wallis testing. CONCLUSIONS Despite the positive scores, ChatGPT on its own still provides incomplete, incorrect, and potentially harmful information about common ophthalmic conditions, defined as the recommendation of invasive procedures or other interventions with potential for adverse sequelae which are not supported by the AAO for the disease in question. ChatGPT may be a valuable adjunct to patient education, but currently, it is not sufficient without concomitant human medical supervision.
Collapse
Affiliation(s)
- Francesco Cappellani
- Retina Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Kevin R Card
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Carol L Shields
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jose S Pulido
- Retina Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Julia A Haller
- Retina Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.
| |
Collapse
|
73
|
Chalhoub R, Mouawad A, Aoun M, Daher M, El-Sett P, Kreichati G, Kharrat K, Sebaaly A. Will ChatGPT be Able to Replace a Spine Surgeon in the Clinical Setting? World Neurosurg 2024; 185:e648-e652. [PMID: 38417624 DOI: 10.1016/j.wneu.2024.02.101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/17/2024] [Accepted: 02/19/2024] [Indexed: 03/01/2024]
Abstract
OBJECTIVE This study evaluates ChatGPT's performance in diagnosing and managing spinal pathologies. METHODS Patients underwent evaluation by two spine surgeons (and the case was discussed and a consensus was reached) and ChatGPT. Patient data, including demographics, symptoms, and available imaging reports, were collected using a standardized form. This information was then processed by ChatGPT for diagnosis and management recommendations. The study assessed ChatGPT's diagnostic and management accuracy through descriptive statistics, comparing its performance to that of experienced spine specialists. RESULTS A total of 97 patients with various spinal pathologies participated in the study, with a gender distribution of 40 males and 57 females. ChatGPT achieved a 70% diagnostic accuracy rate and provided suitable management recommendations for 95% of patients. However, it struggled with certain pathologies, misdiagnosing 100% of vertebral trauma and facet joint syndrome, 40% of spondylolisthesis, stenosis, and scoliosis, and 22% of disc-related pathologies. Furthermore, ChatGPT's management recommendations were poor in 53% of cases, often failing to suggest the most appropriate treatment options and occasionally providing incomplete advice. CONCLUSIONS While helpful in the medical field, ChatGPT falls short in providing reliable management recommendations, with a 30% misdiagnosis rate and 53% mismanagement rate in our study. Its limitations, including reliance on outdated data and the inability to interactively gather patient information, must be acknowledged. Surgeons should use ChatGPT cautiously as a supplementary tool rather than a substitute for their clinical expertise, as the complexities of healthcare demand human judgment and interaction.
Collapse
Affiliation(s)
- Ralph Chalhoub
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon
| | - Antoine Mouawad
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon
| | - Marven Aoun
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon
| | - Mohammad Daher
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon; Department of Orthopedic Surgery, Brown University, Providence, Rhode Island, USA
| | - Pierre El-Sett
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon; Department of Orthopedic Surgery, Hotel Dieu de France Hospital, Beirut, Lebanon
| | - Gaby Kreichati
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon; Department of Orthopedic Surgery, Hotel Dieu de France Hospital, Beirut, Lebanon
| | - Khalil Kharrat
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon; Department of Orthopedic Surgery, Hotel Dieu de France Hospital, Beirut, Lebanon
| | - Amer Sebaaly
- Saint Joseph University, Faculty of medicine, Beirut, Lebanon; Department of Orthopedic Surgery, Hotel Dieu de France Hospital, Beirut, Lebanon.
| |
Collapse
|
74
|
Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med 2024; 62:835-843. [PMID: 38019961 DOI: 10.1515/cclm-2023-1136] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns. CONTENT This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment. SUMMARY AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment. OUTLOOK As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.
Collapse
Affiliation(s)
- Anna Carobene
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Andrea Padoan
- Department of Medicine-DIMED, University of Padova, Padova, Italy
- Laboratory Medicine Unit, University Hospital of Padova, Padova, Italy
| | - Federico Cabitza
- DISCo, Università Degli Studi di Milano-Bicocca, Milan, Italy
- IRCCS Ospedale Galeazzi - Sant'Ambrogio, Milan, Italy
| | - Giuseppe Banfi
- IRCCS Ospedale Galeazzi - Sant'Ambrogio, Milan, Italy
- University Vita-Salute San Raffaele, Milan, Italy
| | - Mario Plebani
- Laboratory Medicine Unit, University Hospital of Padova, Padova, Italy
- University of Padova, Padova, Italy
| |
Collapse
|
75
|
Raman R, Lathabai HH, Mandal S, Das P, Kaur T, Nedungadi P. ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS One 2024; 19:e0297521. [PMID: 38656952 PMCID: PMC11042716 DOI: 10.1371/journal.pone.0297521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative AI tools, such as ChatGPT, are progressively transforming numerous sectors, demonstrating a capacity to impact human life dramatically. This research seeks to evaluate the UN Sustainable Development Goals (SDGs) literacy of ChatGPT, which is crucial for diverse stakeholders involved in SDG-related policies. Experimental outcomes from two widely used Sustainability Assessment tests-the UN SDG Fitness Test and Sustainability Literacy Test (SULITEST) - suggest that ChatGPT exhibits high SDG literacy, yet its comprehensive SDG intelligence needs further exploration. The Fitness Test gauges eight vital competencies across introductory, intermediate, and advanced levels. Accurate mapping of these to the test questions is essential for partial evaluation of SDG intelligence. To assess SDG intelligence, the questions from both tests were mapped to 17 SDGs and eight cross-cutting SDG core competencies, but both test questionnaires were found to be insufficient. SULITEST could satisfactorily map only 5 out of 8 competencies, whereas the Fitness Test managed to map 6 out of 8. Regarding the coverage of the Fitness Test and SULITEST, their mapping to the 17 SDGs, both tests fell short. Most SDGs were underrepresented in both instruments, with certain SDGs not represented at all. Consequently, both tools proved ineffective in assessing SDG intelligence through SDG coverage. The study recommends future versions of ChatGPT to enhance competencies such as collaboration, critical thinking, systems thinking, and others to achieve the SDGs. It concludes that while AI models like ChatGPT hold considerable potential in sustainable development, their usage must be approached carefully, considering current limitations and ethical implications.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | | | - Santanu Mandal
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Payel Das
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Tavleen Kaur
- Fortune Institute of International Business, New Delhi, India
| | | |
Collapse
|
76
|
Zangrossi P, Martini M, Guerrini F, DE Bonis P, Spena G. Large language model, AI and scientific research: why ChatGPT is only the beginning. J Neurosurg Sci 2024; 68:216-224. [PMID: 38261307 DOI: 10.23736/s0390-5616.23.06171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
ChatGPT, a conversational artificial intelligence model based on the generative pre-trained transformer GPT architecture, has garnered widespread attention due to its user-friendly nature and diverse capabilities. This technology enables users of all backgrounds to effortlessly engage in human-like conversations and receive coherent and intelligible responses. Beyond casual interactions, ChatGPT offers compelling prospects for scientific research, facilitating tasks like literature review and content summarization, ultimately expediting and enhancing the academic writing process. Still, in the field of medicine and surgery, it has already shown its endless potential in many tasks (enhancing decision-making processes, aiding in surgical planning and simulation, providing real-time assistance during surgery, improving postoperative care and rehabilitation, contributing to training, education, research, and development). However, it is crucial to acknowledge the model's limitations, encompassing knowledge constraints and the potential for erroneous responses, as well as ethical and legal considerations. This paper explores the potential benefits and pitfalls of these innovative technologies in scientific research, shedding light on their transformative impact while addressing concerns surrounding their use.
Collapse
Affiliation(s)
- Pietro Zangrossi
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy -
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy -
| | - Massimo Martini
- R&D Department, Gate-away.com, Grottammare, Ascoli Piceno, Italy
| | - Francesco Guerrini
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| | - Pasquale DE Bonis
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy
- Unit of Minimally Invasive Neurosurgery, Ferrara University Hospital, Ferrara, Italy
| | - Giannantonio Spena
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| |
Collapse
|
77
|
Feng Y, Han J, Lan X. After one year of ChatGPT's launch: reflections on artificial intelligence in scientific writing. Eur J Nucl Med Mol Imaging 2024; 51:1203-1204. [PMID: 38236428 DOI: 10.1007/s00259-023-06579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Affiliation(s)
- Yuan Feng
- Department of Nuclear Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei Key Laboratory of Molecular Imaging, Wuhan, China
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China
| | | | - Xiaoli Lan
- Department of Nuclear Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
- Hubei Key Laboratory of Molecular Imaging, Wuhan, China.
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China.
| |
Collapse
|
78
|
Bajaj S, Gandhi D, Nayar D. Potential Applications and Impact of ChatGPT in Radiology. Acad Radiol 2024; 31:1256-1261. [PMID: 37802673 DOI: 10.1016/j.acra.2023.08.039] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 10/08/2023]
Abstract
Radiology has always gone hand-in-hand with technology and artificial intelligence (AI) is not new to the field. While various AI devices and algorithms have already been integrated in the daily clinical practice of radiology, with applications ranging from scheduling patient appointments to detecting and diagnosing certain clinical conditions on imaging, the use of natural language processing and large language model based software have been in discussion for a long time. Algorithms like ChatGPT can help in improving patient outcomes, increasing the efficiency of radiology interpretation, and aiding in the overall workflow of radiologists and here we discuss some of its potential applications.
Collapse
Affiliation(s)
- Suryansh Bajaj
- Department of Radiology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (S.B.)
| | - Darshan Gandhi
- Department of Diagnostic Radiology, University of Tennessee Health Science Center, Memphis, Tennessee 38103 (D.G.).
| | - Divya Nayar
- Department of Neurology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205 (D.N.)
| |
Collapse
|
79
|
Cheng J. Applications of Large Language Models in Pathology. Bioengineering (Basel) 2024; 11:342. [PMID: 38671764 PMCID: PMC11047860 DOI: 10.3390/bioengineering11040342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Large language models (LLMs) are transformer-based neural networks that can provide human-like responses to questions and instructions. LLMs can generate educational material, summarize text, extract structured data from free text, create reports, write programs, and potentially assist in case sign-out. LLMs combined with vision models can assist in interpreting histopathology images. LLMs have immense potential in transforming pathology practice and education, but these models are not infallible, so any artificial intelligence generated content must be verified with reputable sources. Caution must be exercised on how these models are integrated into clinical practice, as these models can produce hallucinations and incorrect results, and an over-reliance on artificial intelligence may lead to de-skilling and automation bias. This review paper provides a brief history of LLMs and highlights several use cases for LLMs in the field of pathology.
Collapse
Affiliation(s)
- Jerome Cheng
- Department of Pathology, University of Michigan, Ann Arbor, MI 48105, USA
| |
Collapse
|
80
|
Zampatti S, Peconi C, Megalizzi D, Calvino G, Trastulli G, Cascella R, Strafella C, Caltagirone C, Giardina E. Innovations in Medicine: Exploring ChatGPT's Impact on Rare Disorder Management. Genes (Basel) 2024; 15:421. [PMID: 38674356 PMCID: PMC11050022 DOI: 10.3390/genes15040421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the field of medicine, announcing a new era of innovation and efficiency. Among AI programs designed for general use, ChatGPT holds a prominent position, using an innovative language model developed by OpenAI. Thanks to the use of deep learning techniques, ChatGPT stands out as an exceptionally viable tool, renowned for generating human-like responses to queries. Various medical specialties, including rheumatology, oncology, psychiatry, internal medicine, and ophthalmology, have been explored for ChatGPT integration, with pilot studies and trials revealing each field's potential benefits and challenges. However, the field of genetics and genetic counseling, as well as that of rare disorders, represents an area suitable for exploration, with its complex datasets and the need for personalized patient care. In this review, we synthesize the wide range of potential applications for ChatGPT in the medical field, highlighting its benefits and limitations. We pay special attention to rare and genetic disorders, aiming to shed light on the future roles of AI-driven chatbots in healthcare. Our goal is to pave the way for a healthcare system that is more knowledgeable, efficient, and centered around patient needs.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of System Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy;
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
81
|
Bagenal J. Generative artificial intelligence and scientific publishing: urgent questions, difficult answers. Lancet 2024; 403:1118-1120. [PMID: 38460530 DOI: 10.1016/s0140-6736(24)00416-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/11/2024]
|
82
|
Popkov AA, Barrett TS. AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing. Account Res 2024:1-17. [PMID: 38516933 DOI: 10.1080/08989621.2024.2331757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/13/2024] [Indexed: 03/23/2024]
Abstract
Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by "ChatGPT" and 100 by "Claude"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.
Collapse
Affiliation(s)
- Andrey A Popkov
- Highmark Health, Pittsburgh, PA, USA
- Contigo Health, LLC, a subsidiary of Premier, Inc, Charlotte, NC, USA
| | | |
Collapse
|
83
|
Li J, Dada A, Puladi B, Kleesiek J, Egger J. ChatGPT in healthcare: A taxonomy and systematic review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108013. [PMID: 38262126 DOI: 10.1016/j.cmpb.2024.108013] [Citation(s) in RCA: 64] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 12/29/2023] [Accepted: 01/08/2024] [Indexed: 01/25/2024]
Abstract
The recent release of ChatGPT, a chat bot research project/product of natural language processing (NLP) by OpenAI, stirs up a sensation among both the general public and medical professionals, amassing a phenomenally large user base in a short time. This is a typical example of the 'productization' of cutting-edge technologies, which allows the general public without a technical background to gain firsthand experience in artificial intelligence (AI), similar to the AI hype created by AlphaGo (DeepMind Technologies, UK) and self-driving cars (Google, Tesla, etc.). However, it is crucial, especially for healthcare researchers, to remain prudent amidst the hype. This work provides a systematic review of existing publications on the use of ChatGPT in healthcare, elucidating the 'status quo' of ChatGPT in medical applications, for general readers, healthcare professionals as well as NLP scientists. The large biomedical literature database PubMed is used to retrieve published works on this topic using the keyword 'ChatGPT'. An inclusion criterion and a taxonomy are further proposed to filter the search results and categorize the selected publications, respectively. It is found through the review that the current release of ChatGPT has achieved only moderate or 'passing' performance in a variety of tests, and is unreliable for actual clinical deployment, since it is not intended for clinical applications by design. We conclude that specialized NLP models trained on (bio)medical datasets still represent the right direction to pursue for critical clinical applications.
Collapse
Affiliation(s)
- Jianning Li
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Amin Dada
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Behrus Puladi
- Institute of Medical Informatics, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Jens Kleesiek
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227 Dortmund, Germany
| | - Jan Egger
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; Center for Virtual and Extended Reality in Medicine (ZvRM), University Hospital Essen, University Medicine Essen, Hufelandstraße 55, 45147 Essen, Germany.
| |
Collapse
|
84
|
Posner KM, Bakus C, Basralian G, Chester G, Zeiman M, O'Malley GR, Klein GR. Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features. Cureus 2024; 16:e55945. [PMID: 38601421 PMCID: PMC11005479 DOI: 10.7759/cureus.55945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
Introduction The efficacy of integrating artificial intelligence (AI) models like ChatGPT into the medical field, specifically orthopedic surgery, has yet to be fully determined. The most recent adaptation of ChatGPT that has yet to be explored is its image analysis capabilities. This study assesses ChatGPT's performance in answering Orthopedic In-Training Examination (OITE) questions, including those that require image analysis. Methods Questions from the 2014, 2015, 2021, and 2022 AAOS OITE were screened for inclusion. All questions without images were entered into ChatGPT 3.5 and 4.0 twice. Questions that necessitated the use of images were only entered into ChatGPT 4.0 twice, as this is the only version of the system that can analyze images. The responses were recorded and compared to AAOS's correct answers, evaluating the AI's accuracy and precision. Results A total of 940 questions were included in the final analysis (457 questions with images and 483 questions without images). ChatGPT 4.0 performed significantly better on questions that did not require image analysis (67.81% vs 47.59%, p<0.001). Discussion While the use of AI in orthopedics is an intriguing possibility, this evaluation demonstrates how, even with the addition of image processing capabilities, ChatGPT still falls short in terms of its accuracy. As AI technology evolves, ongoing research is vital to harness AI's potential effectively, ensuring it complements rather than attempts to replace the nuanced skills of orthopedic surgeons.
Collapse
Affiliation(s)
- Kevin M Posner
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Cassandra Bakus
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Grace Basralian
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Grace Chester
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Mallery Zeiman
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Geoffrey R O'Malley
- Department of Orthopedic Surgery, Hackensack University Medical Center, Hackensack, USA
| | - Gregg R Klein
- Department of Orthopedic Surgery, Hackensack University Medical Center, Hackensack, USA
| |
Collapse
|
85
|
Mihalache A, Huang RS, Popovic MM, Muni RH. ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. MEDICAL TEACHER 2024; 46:366-372. [PMID: 37839017 DOI: 10.1080/0142159x.2023.2249588] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
PURPOSE ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions. METHOD Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4. RESULTS ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03). CONCLUSIONS ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.
Collapse
Affiliation(s)
- Andrew Mihalache
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Marko M Popovic
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Rajeev H Muni
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada
| |
Collapse
|
86
|
Xu X, Su Y, Zhang Y, Wu Y, Xu X. Understanding learners' perceptions of ChatGPT: A thematic analysis of peer interviews among undergraduates and postgraduates in China. Heliyon 2024; 10:e26239. [PMID: 38420484 PMCID: PMC10900412 DOI: 10.1016/j.heliyon.2024.e26239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/02/2024] [Accepted: 02/08/2024] [Indexed: 03/02/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI)-driven language model engineered by OpenAI, has experienced a substantial upsurge in adoption within higher education due to its versatile applications and sophisticated capabilities. Although prevailing research on ChatGPT has predominantly concentrated on its technological aspects and pedagogical ramifications, a comprehensive understanding of students' perceptions and experiences regarding ChatGPT remains elusive. To address this gap, this study employed a peer interview methodology, conducting a thematic analysis of 106 first-year undergraduates and 81 first-year postgraduate students' perceptions from diverse disciplines at a comprehensive university in East China. The data analysis revealed that among the four factors examined-grade, age, gender, and major-grade emerged as the most influential determinant, followed by age and major. Postgraduate students demonstrated heightened awareness of the potential limitations of ChatGPT in addressing academic challenges and exhibited greater concern for security issues associated with its application. This research offers essential insights into students' perceptions and experiences with ChatGPT, emphasizing the importance of recognizing potential limitations and ethical concerns associated with ChatGPT usage. Additionally, the findings highlight ethical concerns, as students noted the importance of responsible data handling and academic integrity in ChatGPT usage, underscoring the need for ethical guidance in AI utilization. Moreover, further research is essential to optimize AI use in education, aiming to improve learning outcomes effectively.
Collapse
Affiliation(s)
- Xiaoshu Xu
- School of Foreign Studies, Wenzhou University, Wenzhou City, Zhejiang Province, China
- Stamford International University Thailand
- Macao Polytechnique University
| | - Yujie Su
- School of Foreign Studies, Wenzhou University, Wenzhou City, Zhejiang Province, China
| | - Yunfeng Zhang
- The Faculty of Languages and Translation, R. de Luís Gonzaga Gomes, Macao Polytechnic University
| | - Yunyang Wu
- School of Foreign Studies, Wenzhou University, Wenzhou City, Zhejiang Province, China
| | - Xinyu Xu
- School of Foreign Studies, Wenzhou University, Wenzhou City, Zhejiang Province, China
| |
Collapse
|
87
|
Abdelhafiz AS, Ali A, Maaly AM, Ziady HH, Sultan EA, Mahgoub MA. Knowledge, Perceptions and Attitude of Researchers Towards Using ChatGPT in Research. J Med Syst 2024; 48:26. [PMID: 38411833 PMCID: PMC10899415 DOI: 10.1007/s10916-024-02044-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 02/10/2024] [Indexed: 02/28/2024]
Abstract
INTRODUCTION ChatGPT, a recently released chatbot from OpenAI, has found applications in various aspects of life, including academic research. This study investigated the knowledge, perceptions, and attitudes of researchers towards using ChatGPT and other chatbots in academic research. METHODS A pre-designed, self-administered survey using Google Forms was employed to conduct the study. The questionnaire assessed participants' knowledge of ChatGPT and other chatbots, their awareness of current chatbot and artificial intelligence (AI) applications, and their attitudes towards ChatGPT and its potential research uses. RESULTS Two hundred researchers participated in the survey. A majority were female (57.5%), and over two-thirds belonged to the medical field (68%). While 67% had heard of ChatGPT, only 11.5% had employed it in their research, primarily for rephrasing paragraphs and finding references. Interestingly, over one-third supported the notion of listing ChatGPT as an author in scientific publications. Concerns emerged regarding AI's potential to automate researcher tasks, particularly in language editing, statistics, and data analysis. Additionally, roughly half expressed ethical concerns about using AI applications in scientific research. CONCLUSION The increasing use of chatbots in academic research necessitates thoughtful regulation that balances potential benefits with inherent limitations and potential risks. Chatbots should not be considered authors of scientific publications but rather assistants to researchers during manuscript preparation and review. Researchers should be equipped with proper training to utilize chatbots and other AI tools effectively and ethically.
Collapse
Affiliation(s)
- Ahmed Samir Abdelhafiz
- Department of Clinical pathology, National Cancer Institute, Cairo University, Kasr Al-Aini Street, from Elkhalig Square, Cairo, 11796, Egypt.
| | - Asmaa Ali
- Department of Pulmonary Medicine, Abbassia Chest Hospital, Ministry of Health and Population, Cairo, Egypt
| | - Ayman Mohamed Maaly
- Department of Anaesthesia and Surgical Intensive Care, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Hany Hassan Ziady
- Department of Community Medicine, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Eman Anwar Sultan
- Department of Community Medicine, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Mohamed Anwar Mahgoub
- Department of Microbiology, High Institute of Public Health, Alexandria University, Alexandria, Egypt
| |
Collapse
|
88
|
van Woudenberg R, Ranalli C, Bracker D. Authorship and ChatGPT: a Conservative View. PHILOSOPHY & TECHNOLOGY 2024; 37:34. [PMID: 38419827 PMCID: PMC10896910 DOI: 10.1007/s13347-024-00715-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/05/2024] [Indexed: 03/02/2024]
Abstract
Is ChatGPT an author? Given its capacity to generate something that reads like human-written text in response to prompts, it might seem natural to ascribe authorship to ChatGPT. However, we argue that ChatGPT is not an author. ChatGPT fails to meet the criteria of authorship because it lacks the ability to perform illocutionary speech acts such as promising or asserting, lacks the fitting mental states like knowledge, belief, or intention, and cannot take responsibility for the texts it produces. Three perspectives are compared: liberalism (which ascribes authorship to ChatGPT), conservatism (which denies ChatGPT's authorship for normative and metaphysical reasons), and moderatism (which treats ChatGPT as if it possesses authorship without committing to the existence of mental states like knowledge, belief, or intention). We conclude that conservatism provides a more nuanced understanding of authorship in AI than liberalism and moderatism, without denying the significant potential, influence, or utility of AI technologies such as ChatGPT.
Collapse
Affiliation(s)
| | - Chris Ranalli
- Department of Philosophy, Vrije Universiteit, Amsterdam, Netherlands
| | - Daniel Bracker
- Department of Philosophy, Vrije Universiteit, Amsterdam, Netherlands
| |
Collapse
|
89
|
Clark SC. Can ChatGPT transform cardiac surgery and heart transplantation? J Cardiothorac Surg 2024; 19:108. [PMID: 38409178 PMCID: PMC10898059 DOI: 10.1186/s13019-024-02541-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 01/28/2024] [Indexed: 02/28/2024] Open
Abstract
Artificial intelligence (AI) is a transformative technology with many benefits, but also risks when applied to healthcare and cardiac surgery in particular. Surgeons must be aware of AI and its application through generative pre-trained transformers (GPT/ChatGPT) to fully understand what this offers to clinical care, decision making, training, research and education. Clinicians must appreciate that the advantages and potential for transformative change in practice is balanced by risks typified by validation, ethical challenges and medicolegal concerns. ChatGPT should be seen as a tool to support and enhance the skills of surgeons, rather than a replacement for their experience and judgment. Human oversight and intervention will always be necessary to ensure patient safety and to make complex decisions that may require a refined understanding of individual patient circumstances.
Collapse
Affiliation(s)
- S C Clark
- Cardiothoracic Surgery and Transplantation Freeman Hospital, Newcastle upon Tyne, NE7 7DN, UK.
| |
Collapse
|
90
|
Shah PS, Acharya G. Artificial intelligence/machine learning and journalology: Challenges and opportunities. Acta Obstet Gynecol Scand 2024; 103:196-198. [PMID: 38284152 PMCID: PMC10823383 DOI: 10.1111/aogs.14772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 12/16/2023] [Indexed: 01/30/2024]
Affiliation(s)
- Prakesh S. Shah
- Department of PediatricsMount Sinai Hospital, University of TorontoTorontoOntarioCanada
| | - Ganesh Acharya
- Department of Clinical Science, Intervention and Technology (CLINTEC)Karolinska Institutet and Center for Fetal Medicine, Karolinska University HospitalStockholmSweden
- Women's Health and Perinatology Research GroupUiT – The Arctic University of NorwayTromsøNorway
| |
Collapse
|
91
|
McFayden TC, Bristol S, Putnam O, Harrop C. ChatGPT: Artificial Intelligence as a Potential Tool for Parents Seeking Information About Autism. CYBERPSYCHOLOGY, BEHAVIOR AND SOCIAL NETWORKING 2024; 27:135-148. [PMID: 38181176 PMCID: PMC11071095 DOI: 10.1089/cyber.2023.0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
Autism Spectrum Disorder has seen a drastic increase in prevalence over the past two decades, along with discourse rife with debates and misinformation. This discourse has primarily taken place online, the main source of information for parents seeking information about autism. One potential tool for navigating information is ChatGPT-4, an artificial intelligence question and answer-style communication program. Although ChatGPT shows great promise, no empirical work has evaluated its viability as a tool for providing information about autism to caregivers. The current study evaluated answers provided by ChatGPT, including basic information about autism, myths/misconceptions, and resources. Our results suggested that ChatGPT was largely correct, concise, and clear, but did not provide much actionable advice, which was further limited by inaccurate references and hyperlinks. The authors conclude that ChatGPT-4 is a viable tool for parents seeking accurate information about autism, with opportunities for improvement in actionability and reference accuracy.
Collapse
Affiliation(s)
- Tyler C. McFayden
- Carolina Institute for Developmental Disabilities, University of North Carolina at Chapel Hill, Carrboro, North Carolina, USA
| | - Stephanie Bristol
- Department of Health Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Orla Putnam
- Department of Health Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Clare Harrop
- Department of Health Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- UNC TEACCH Autism Program, University of North Carolina at Chapel Hill, Carrboro, North Carolina, USA
| |
Collapse
|
92
|
Dallari V, Sacchetto A, Saetti R, Calabrese L, Vittadello F, Gazzini L. Is artificial intelligence ready to replace specialist doctors entirely? ENT specialists vs ChatGPT: 1-0, ball at the center. Eur Arch Otorhinolaryngol 2024; 281:995-1023. [PMID: 37962570 DOI: 10.1007/s00405-023-08321-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
PURPOSE The purpose of this study is to evaluate ChatGPT's responses to Ear, Nose and Throat (ENT) clinical cases and compare them with the responses of ENT specialists. METHODS We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT. The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days. RESULTS Among the 20 cases, 8 were rated as low complexity, 6 as moderate complexity and 6 as high complexity. The overall mean correctness and consistency score of ChatGPT responses was 3.80 (SD 1.02) and 2.89 (SD 1.24), respectively. We did not find a statistically significant difference in the average ChatGPT correctness and coherence score according to case complexity. The total intraclass correlation coefficient (ICC) for the stability of the correctness and consistency of ChatGPT was 0.763 (95% confidence interval [CI] 0.553-0.895) and 0.837 (95% CI 0.689-0.927), respectively. CONCLUSIONS Our results revealed the potential usefulness of ChatGPT in ENT diagnosis. The instability in responses and the inability to recognise certain clinical elements are its main limitations.
Collapse
Affiliation(s)
- Virginia Dallari
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Piazzale L.A. Scuro 10, 37134, Verona, Italy
| | - Andrea Sacchetto
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland.
- Department of Otolaryngology, Ospedale San Bortolo, AULSS 8 Berica, Vicenza, Italy.
| | - Roberto Saetti
- Department of Otolaryngology, Ospedale San Bortolo, AULSS 8 Berica, Vicenza, Italy
| | - Luca Calabrese
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Bolzano-Bozen, Italy
| | | | - Luca Gazzini
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Bolzano-Bozen, Italy
| |
Collapse
|
93
|
Taloni A, Scorcia V, Giannaccare G. Modern threats in academia: evaluating plagiarism and artificial intelligence detection scores of ChatGPT. Eye (Lond) 2024; 38:397-400. [PMID: 37532832 PMCID: PMC10810838 DOI: 10.1038/s41433-023-02678-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/18/2023] [Indexed: 08/04/2023] Open
Affiliation(s)
- Andrea Taloni
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Vincenzo Scorcia
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Giuseppe Giannaccare
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
94
|
Jairoun AA, El-Dahiyat F, ElRefae GA, Al-Hemyari SS, Shahwan M, Zyoud SH, Hammour KA, Babar ZUD. Detecting manuscripts written by generative AI and AI-assisted technologies in the field of pharmacy practice. J Pharm Policy Pract 2024; 17:2303759. [PMID: 38229951 PMCID: PMC10791078 DOI: 10.1080/20523211.2024.2303759] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024] Open
Abstract
Generative AI can be a powerful research tool, but researchers must employ it ethically and transparently. This commentary addresses how the editors of pharmacy practice journals can identify manuscripts generated by generative AI and AI-assisted technologies. Editors and reviewers must stay well-informed about developments in AI technologies to effectively recognise AI-written papers. Editors should safeguard the reliability of journal publishing and sustain industry standards for pharmacy practice by implementing the crucial strategies outlined in this editorial. Although obstacles, including ignorance, time constraints, and protean AI strategies, might hinder detection efforts, several facilitators can help overcome those obstacles. Pharmacy practice journal editors and reviewers would benefit from educational programmes, collaborations with AI experts, and sophisticated plagiarism-detection techniques geared toward accurately identifying AI-generated text. Academics and practitioners can further uphold the integrity of published research through transparent reporting and ethical standards. Pharmacy practice journal staffs can sustain academic rigour and guarantee the validity of scholarly work by recognising and addressing the relevant barriers and utilising the proper enablers. Navigating the changing world of AI-generated content and preserving standards of excellence in pharmaceutical research and practice requires a proactive strategy of constant learning and community participation.
Collapse
Affiliation(s)
- Ammar Abdulrahman Jairoun
- Health and Safety Department, Dubai Municipality, Dubai, UAE
- Discipline of Clinical Pharmacy, School of Pharmaceutical Sciences, Universiti Sains Malaysia (USM), George Town, Malaysia
| | - Faris El-Dahiyat
- Clinical Pharmacy Program, College of Pharmacy, Al Ain University, Al Ain, UAE
- Artificial Intelligence Research Center, Al Ain University, Al Ain, UAE
| | - Ghaleb A. ElRefae
- Artificial Intelligence Research Center, Al Ain University, Al Ain, UAE
| | - Sabaa Saleh Al-Hemyari
- Discipline of Clinical Pharmacy, School of Pharmaceutical Sciences, Universiti Sains Malaysia (USM), George Town, Malaysia
- Pharmacy Department, Emirates Health Services, Dubai, UAE
| | - Moyad Shahwan
- Centre of Medical and Bio-allied Health Sciences Research, Ajman University, Ajman, UAE
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, UAE
| | - Samer H. Zyoud
- Department of Mathematics and Sciences, Ajman University, Ajman, UAE
| | - Khawla Abu Hammour
- Department of Biopharmaceutics and Clinical Pharmacy, Faculty of Pharmacy, The University of Jordan, Amman, Jordan
| | - Zaheer-Ud-Din Babar
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Huddersfield, UK
| |
Collapse
|
95
|
Jerry JK. Exploring polycystic disease solutions with ChatGPT: the role of AI in patient support and empowerment. Qatar Med J 2024; 2023:35. [PMID: 38204561 PMCID: PMC10776889 DOI: 10.5339/qmj.2023.35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/09/2023] [Indexed: 01/12/2024] Open
Affiliation(s)
- Jackson Keefer Jerry
- Internal Medicine, PSG Institute of Medical Sciences and Research, India jkjerry
| |
Collapse
|
96
|
Farhat F, Silva ES, Hassani H, Madsen DØ, Sohail SS, Himeur Y, Alam MA, Zafar A. The scholarly footprint of ChatGPT: a bibliometric analysis of the early outbreak phase. Front Artif Intell 2024; 6:1270749. [PMID: 38249789 PMCID: PMC10797012 DOI: 10.3389/frai.2023.1270749] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 12/08/2023] [Indexed: 01/23/2024] Open
Abstract
This paper presents a comprehensive analysis of the scholarly footprint of ChatGPT, an AI language model, using bibliometric and scientometric methods. The study zooms in on the early outbreak phase from when ChatGPT was launched in November 2022 to early June 2023. It aims to understand the evolution of research output, citation patterns, collaborative networks, application domains, and future research directions related to ChatGPT. By retrieving data from the Scopus database, 533 relevant articles were identified for analysis. The findings reveal the prominent publication venues, influential authors, and countries contributing to ChatGPT research. Collaborative networks among researchers and institutions are visualized, highlighting patterns of co-authorship. The application domains of ChatGPT, such as customer support and content generation, are examined. Moreover, the study identifies emerging keywords and potential research areas for future exploration. The methodology employed includes data extraction, bibliometric analysis using various indicators, and visualization techniques such as Sankey diagrams. The analysis provides valuable insights into ChatGPT's early footprint in academia and offers researchers guidance for further advancements. This study stimulates discussions, collaborations, and innovations to enhance ChatGPT's capabilities and impact across domains.
Collapse
Affiliation(s)
- Faiza Farhat
- Department of Zoology, Aligarh Muslim University, Aligarh, India
| | - Emmanuel Sirimal Silva
- Department of Economics and Law, Glasgow School for Business and Society, Glasgow Caledonian University, Glasgow, United Kingdom
| | - Hossein Hassani
- The Research Institute of Energy Management and Planning (RIEMP), University of Tehran, Tehran, Iran
| | - Dag Øivind Madsen
- USN School of Business, University of South-Eastern Norway, Hønefoss, Norway
| | - Shahab Saquib Sohail
- Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India
| | - Yassine Himeur
- College of Engineering and Information Technology, University of Dubai, Dubai, United Arab Emirates
| | - M. Afshar Alam
- Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India
| | - Aasim Zafar
- Department of Computer Science, Aligarh Muslim University, Aligarh, India
| |
Collapse
|
97
|
Ting DSJ, Tan TF, Ting DSW. ChatGPT in ophthalmology: the dawn of a new era? Eye (Lond) 2024; 38:4-7. [PMID: 37369764 PMCID: PMC10764795 DOI: 10.1038/s41433-023-02619-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 05/22/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Affiliation(s)
- Darren Shu Jeng Ting
- Birmingham and Midland Eye Centre, Birmingham, UK
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
- Academic Ophthalmology, School of Medicine, University of Nottingham, Nottingham, UK
| | - Ting Fang Tan
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Daniel Shu Wei Ting
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore.
- Singapore National Eye Centre, Singapore, Singapore.
- Department of Ophthalmology and Visual Sciences, Duke-National University of Singapore Medical School, Singapore, Singapore.
| |
Collapse
|
98
|
Di Ieva A, Stewart C, Suero Molina E. Large Language Models in Neurosurgery. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1462:177-198. [PMID: 39523266 DOI: 10.1007/978-3-031-64892-2_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
A large language model (LLM), in the context of natural language processing and artificial intelligence, refers to a sophisticated neural network that has been trained on a massive amount of text data to understand and generate human-like language. These models are typically built on architectures like transformers. The term "large" indicates that the neural network has a significant number of parameters, making it more powerful and capable of capturing complex patterns in language. One notable example of a large language model is ChatGPT. ChatGPT is a large language model developed by OpenAI that uses deep learning techniques to generate human-like text. It can be trained on a variety of tasks, such as language translation, question answering, and text completion. One of the key features of ChatGPT is its ability to understand and respond to natural language inputs. This makes it a powerful tool for generating a wide range of text, including medical reports, surgical notes, and even poetry. Additionally, the model has been trained on a large corpus of text, which allows it to generate text that is both grammatically correct and semantically meaningful. In terms of applications in neurosurgery, ChatGPT can be used to generate detailed and accurate surgical reports, which can be very useful for sharing information about a patient's case with other members of the medical team. Additionally, the model can be used to generate detailed surgical notes, which can be very useful for training and educating residents and medical students. Overall, LLMs have the potential to be a valuable tool in the field of neurosurgery. Indeed, this abstract has been generated by ChatGPT within few seconds. Potential applications and pitfalls of the applications of LLMs are discussed in this paper.
Collapse
Affiliation(s)
- Antonio Di Ieva
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, NSW, Australia.
- Macquarie Neurosurgery & Spine, MQ Health, Macquarie University Hospital, Sydney, NSW, Australia.
- Department of Neurosurgery, Nepean Blue Mountains Local Health District, Penrith, NSW, Australia.
- Centre for Applied Artificial Intelligence, School of Computing, Macquarie University, Sydney, NSW, Australia.
| | - Caleb Stewart
- Department of Neurosurgery, Louisiana State University Health Sciences Shreveport, Shreveport, LA, USA
| | - Eric Suero Molina
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, NSW, Australia
- Department of Neurosurgery, University Hospital of Münster, Münster, Germany
| |
Collapse
|
99
|
Osama M. Artificial Intelligence in scientific writing and research publication: A paradigm shift in language inclusivity. J Back Musculoskelet Rehabil 2024; 37:249-251. [PMID: 38517774 DOI: 10.3233/bmr-245001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
|
100
|
Mannstadt I, Mehta B. Large language models and the future of rheumatology: assessing impact and emerging opportunities. Curr Opin Rheumatol 2024; 36:46-51. [PMID: 37729050 DOI: 10.1097/bor.0000000000000981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
PURPOSE OF REVIEW Large language models (LLMs) have grown rapidly in size and capabilities as more training data and compute power has become available. Since the release of ChatGPT in late 2022, there has been growing interest and exploration around potential applications of LLM technology. Numerous examples and pilot studies demonstrating the capabilities of these tools have emerged across several domains. For rheumatology professionals and patients, LLMs have the potential to transform current practices in medicine. RECENT FINDINGS Recent studies have begun exploring capabilities of LLMs that can assist rheumatologists in clinical practice, research, and medical education, though applications are still emerging. In clinical settings, LLMs have shown promise in assist healthcare professionals enabling more personalized medicine or generating routine documentation like notes and letters. Challenges remain around integrating LLMs into clinical workflows, accuracy of the LLMs and ensuring patient data confidentiality. In research, early experiments demonstrate LLMs can offer analysis of datasets, with quality control as a critical piece. Lastly, LLMs could supplement medical education by providing personalized learning experiences and integration into established curriculums. SUMMARY As these powerful tools continue evolving at a rapid pace, rheumatology professionals should stay informed on how they may impact the field.
Collapse
Affiliation(s)
| | - Bella Mehta
- Weill Cornell Medicine
- Hospital for Special Surgery, New York, New York, USA
| |
Collapse
|