101
|
Warren E, Hurley ET, Park CN, Crook BS, Lorentz S, Levin JM, Anakwenze O, MacDonald PB, Klifto CS. Evaluation of information from artificial intelligence on rotator cuff repair surgery. JSES Int 2024; 8:53-57. [PMID: 38312282 PMCID: PMC10837709 DOI: 10.1016/j.jseint.2023.09.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024] Open
Abstract
Purpose The purpose of this study was to analyze the quality and readability of information regarding rotator cuff repair surgery available using an online AI software. Methods An open AI model (ChatGPT) was used to answer 24 commonly asked questions from patients on rotator cuff repair. Questions were stratified into one of three categories based on the Rothwell classification system: fact, policy, or value. The answers for each category were evaluated for reliability, quality and readability using The Journal of the American Medical Association Benchmark criteria, DISCERN score, Flesch-Kincaid Reading Ease Score and Grade Level. Results The Journal of the American Medical Association Benchmark criteria score for all three categories was 0, which is the lowest score indicating no reliable resources cited. The DISCERN score was 51 for fact, 53 for policy, and 55 for value questions, all of which are considered good scores. Across question categories, the reliability portion of the DISCERN score was low, due to a lack of resources. The Flesch-Kincaid Reading Ease Score (and Flesch-Kincaid Grade Level) was 48.3 (10.3) for the fact class, 42.0 (10.9) for the policy class, and 38.4 (11.6) for the value class. Conclusion The quality of information provided by the open AI chat system was generally high across all question types but had significant shortcomings in reliability due to the absence of source material citations. The DISCERN scores of the AI generated responses matched or exceeded previously published results of studies evaluating the quality of online information about rotator cuff repairs. The responses were U.S. 10th grade or higher reading level which is above the AMA and NIH recommendation of 6th grade reading level for patient materials. The AI software commonly referred the user to seek advice from orthopedic surgeons to improve their chances of a successful outcome.
Collapse
Affiliation(s)
- Eric Warren
- Duke University School of Medicine, Duke University, Durham, NC, USA
| | - Eoghan T. Hurley
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Caroline N. Park
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Bryan S. Crook
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Samuel Lorentz
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Jay M. Levin
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Oke Anakwenze
- Department of Orthopaedic Surgery, Duke University, Durham, NC, USA
| | - Peter B. MacDonald
- Section of Orthopaedic Surgery & The Pan Am Clinic, University of Manitoba, Winnipeg, MB, Canada
| | | |
Collapse
|
102
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review. Clin Pract 2023; 14:89-105. [PMID: 38248432 PMCID: PMC10801601 DOI: 10.3390/clinpract14010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/23/2023] [Accepted: 12/28/2023] [Indexed: 01/23/2024] Open
Abstract
The emergence of artificial intelligence (AI) has greatly propelled progress across various sectors including the field of nephrology academia. However, this advancement has also given rise to ethical challenges, notably in scholarly writing. AI's capacity to automate labor-intensive tasks like literature reviews and data analysis has created opportunities for unethical practices, with scholars incorporating AI-generated text into their manuscripts, potentially undermining academic integrity. This situation gives rise to a range of ethical dilemmas that not only question the authenticity of contemporary academic endeavors but also challenge the credibility of the peer-review process and the integrity of editorial oversight. Instances of this misconduct are highlighted, spanning from lesser-known journals to reputable ones, and even infiltrating graduate theses and grant applications. This subtle AI intrusion hints at a systemic vulnerability within the academic publishing domain, exacerbated by the publish-or-perish mentality. The solutions aimed at mitigating the unethical employment of AI in academia include the adoption of sophisticated AI-driven plagiarism detection systems, a robust augmentation of the peer-review process with an "AI scrutiny" phase, comprehensive training for academics on ethical AI usage, and the promotion of a culture of transparency that acknowledges AI's role in research. This review underscores the pressing need for collaborative efforts among academic nephrology institutions to foster an environment of ethical AI application, thus preserving the esteemed academic integrity in the face of rapid technological advancements. It also makes a plea for rigorous research to assess the extent of AI's involvement in the academic literature, evaluate the effectiveness of AI-enhanced plagiarism detection tools, and understand the long-term consequences of AI utilization on academic integrity. An example framework has been proposed to outline a comprehensive approach to integrating AI into Nephrology academic writing and peer review. Using proactive initiatives and rigorous evaluations, a harmonious environment that harnesses AI's capabilities while upholding stringent academic standards can be envisioned.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bang Phli 10540, Samut Prakan, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (S.S.); (O.A.G.V.); (F.Q.); (W.C.)
| |
Collapse
|
103
|
Liang J, Wang L, Luo J, Yan Y, Fan C. The relationship between student interaction with generative artificial intelligence and learning achievement: serial mediating roles of self-efficacy and cognitive engagement. Front Psychol 2023; 14:1285392. [PMID: 38187430 PMCID: PMC10766754 DOI: 10.3389/fpsyg.2023.1285392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 11/28/2023] [Indexed: 01/09/2024] Open
Abstract
Generative artificial intelligence (GAI) shocked the world with its unprecedented ability and raised significant tensions in the education field. Educators inevitably transition to an educational future that embraces GAI rather than shuns it. Understanding the mechanism between students interacting with GAI tools and their achievement is important for educators and schools, but relevant empirical evidence is relatively lacking. Due to the characteristics of personalization and real-time interactivity of GAI tools, we propose that the students-GAI interaction would affect their learning achievement through serial mediators of self-efficacy and cognitive engagement. Based on questionnaire surveys that include 389 participants as the objective, this study finds that: (1) in total, there is a significantly positive relationship between student-GAI interaction and learning achievement. (2) This positive relationship is mediated by self-efficacy, with a significant mediation effect value of 0.015. (3) Cognitive engagement also acts as a mediator in the mechanism between the student-GAI interaction and learning achievement, evidenced by a significant and relatively strong mediating effect value of 0.046. (4) Self-efficacy and cognitive engagement in series mediate this positive association, with a serial mediating effect value of 0.011, which is relatively small in comparison but also shows significance. In addition, the propensity score matching (PSM) method is applied to alleviate self-selection bias, reinforcing the validity of the results. The findings offer empirical evidence for the incorporation of GAI in teaching and learning.
Collapse
Affiliation(s)
- Jing Liang
- College of Management Science, Chengdu University of Technology, Chengdu, China
| | - Lili Wang
- School of Logistics, Chengdu University of Information Technology, Chengdu, China
| | - Jia Luo
- Business School, Chengdu University, Chengdu, China
| | - Yufei Yan
- Business School, Southwest Minzu University, Chengdu, China
| | - Chao Fan
- College of Management Science, Chengdu University of Technology, Chengdu, China
| |
Collapse
|
104
|
Ferreira RM. New evidence-based practice: Artificial intelligence as a barrier breaker. World J Methodol 2023; 13:384-389. [PMID: 38229944 PMCID: PMC10789101 DOI: 10.5662/wjm.v13.i5.384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/24/2023] [Accepted: 11/08/2023] [Indexed: 12/20/2023] Open
Abstract
The concept of evidence-based practice has persisted over several years and remains a cornerstone in clinical practice, representing the gold standard for optimal patient care. However, despite widespread recognition of its significance, practical application faces various challenges and barriers, including a lack of skills in interpreting studies, limited resources, time constraints, linguistic competencies, and more. Recently, we have witnessed the emergence of a groundbreaking technological revolution known as artificial intelligence. Although artificial intelligence has become increasingly integrated into our daily lives, some reluctance persists among certain segments of the public. This article explores the potential of artificial intelligence as a solution to some of the main barriers encountered in the application of evidence-based practice. It highlights how artificial intelligence can assist in staying updated with the latest evidence, enhancing clinical decision-making, addressing patient misinformation, and mitigating time constraints in clinical practice. The integration of artificial intelligence into evidence-based practice has the potential to revolutionize healthcare, leading to more precise diagnoses, personalized treatment plans, and improved doctor-patient interactions. This proposed synergy between evidence-based practice and artificial intelligence may necessitate adjustments to its core concept, heralding a new era in healthcare.
Collapse
Affiliation(s)
- Ricardo Maia Ferreira
- Department of Sports and Exercise, Polytechnic Institute of Maia (N2i), Maia 4475-690, Porto, Portugal
- Department of Physioterapy, Polytechnic Institute of Coimbra, Coimbra Health School, Coimbra 3046-854, Coimbra, Portugal
- Department of Physioterapy, Polytechnic Institute of Castelo Branco, Dr. Lopes Dias Health School, Castelo Branco 6000-767, Castelo Branco, Portugal
- Sport Physical Activity and Health Research & Innovation Center, Polytechnic Institute of Viana do Castelo, Melgaço, 4960-320, Viana do Castelo, Portugal
| |
Collapse
|
105
|
Semrl N, Feigl S, Taumberger N, Bracic T, Fluhr H, Blockeel C, Kollmann M. AI language models in human reproduction research: exploring ChatGPT's potential to assist academic writing. Hum Reprod 2023; 38:2281-2288. [PMID: 37833847 DOI: 10.1093/humrep/dead207] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/06/2023] [Indexed: 10/15/2023] Open
Abstract
Artificial intelligence (AI)-driven language models have the potential to serve as an educational tool, facilitate clinical decision-making, and support research and academic writing. The benefits of their use are yet to be evaluated and concerns have been raised regarding the accuracy, transparency, and ethical implications of using this AI technology in academic publishing. At the moment, Chat Generative Pre-trained Transformer (ChatGPT) is one of the most powerful and widely debated AI language models. Here, we discuss its feasibility to answer scientific questions, identify relevant literature, and assist writing in the field of human reproduction. With consideration of the scarcity of data on this topic, we assessed the feasibility of ChatGPT in academic writing, using data from six meta-analyses published in a leading journal of human reproduction. The text generated by ChatGPT was evaluated and compared to the original text by blinded reviewers. While ChatGPT can produce high-quality text and summarize information efficiently, its current ability to interpret data and answer scientific questions is limited, and it cannot be relied upon for a literature search or accurate source citation due to the potential spread of incomplete or false information. We advocate for open discussions within the reproductive medicine research community to explore the advantages and disadvantages of implementing this AI technology. Researchers and reviewers should be informed about AI language models, and we encourage authors to transparently disclose their use.
Collapse
Affiliation(s)
- N Semrl
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| | - S Feigl
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| | - N Taumberger
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| | - T Bracic
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| | - H Fluhr
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| | - C Blockeel
- Centre for Reproductive Medicine, Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium
| | - M Kollmann
- Department of Obstetrics and Gynecology, Medical University of Graz, Graz, Austria
| |
Collapse
|
106
|
Levin G, Brezinov Y, Meyer R. Exploring the use of ChatGPT in OBGYN: a bibliometric analysis of the first ChatGPT-related publications. Arch Gynecol Obstet 2023; 308:1785-1789. [PMID: 37222839 DOI: 10.1007/s00404-023-07081-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/08/2023] [Indexed: 05/25/2023]
Abstract
PURPOSE Little is known about the scientific literature regarding the new revolutionary tool, ChatGPT. We aim to perform a bibliometric analysis to identify ChatGPT-related publications in obstetrics and gynecology (OBGYN). STUDY DESIGN A bibliometric study through PubMed database. We mined all ChatGPT-related publications using the search term "ChatGPT". Bibliometric data were obtained from the iCite database. We performed a descriptive analysis. We further compared IF among publications describing a study vs. other publications. RESULTS Overall, 42 ChatGPT-related publications were published across 26 different journals during 69 days. Most publications were editorials (52%) and news/briefing (22%), with only one (2%) research article identified. Five (12%) publications described a study performed. No ChatGPT-related publications in OBGYN were found. The leading journal by the number of publications was Nature (24%), followed by Lancet Digital Health and Radiology (7%, for both). The main subjects of publications were ChatGPT's scientific writing quality (26%) and a description of ChatGPT (26%) followed by tested performance of ChatGPT (14%), authorship and ethical issues (10% for both topics).In a comparison of publications describing a study performed (n = 5) vs. other publications (n = 37), mean IF was lower in the study-publications (mean 6.25 ± 0 vs. 25.4 ± 21.6, p < .001). CONCLUSIONS The study highlights main trends in ChatGPT-related publications. OBGYN is yet to be represented in this literature.
Collapse
Affiliation(s)
- Gabriel Levin
- The Department of Gynecologic Oncology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel.
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Quebec, Canada.
| | - Yoav Brezinov
- Experimental Surgery, McGill University, Quebec, Canada
| | - Raanan Meyer
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, CA, USA
- The Dr. Pinchas Bornstein Talpiot Medical Leadership Program, Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
| |
Collapse
|
107
|
Rossettini G, Cook C, Palese A, Pillastrini P, Turolla A. Pros and Cons of Using Artificial Intelligence Chatbots for Musculoskeletal Rehabilitation Management. J Orthop Sports Phys Ther 2023; 53:728-734. [PMID: 37707390 DOI: 10.2519/jospt.2023.12000] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
SYNOPSIS: Artificial intelligence (AI), specifically large language models (LLMs), which focus on the interaction between computers and human language, can influence musculoskeletal rehabilitation management. AI chatbots (eg, ChatGPT, Microsoft Bing, and Google Bard) are a form of large language models designed to understand, interpret, and generate text similar to what is produced by humans. Since their release, chatbots have triggered controversy in the international scientific community, including when they have passed university exams, generated credible scientific abstracts, and shown potential for replacing humans in scientific roles. The controversies extend to the field of musculoskeletal rehabilitation. In this Viewpoint, we describe the potential applications and limitations, and recommended actions for education, clinical practice, and research when using AI chatbots for musculoskeletal rehabilitation management, aspects that may have similar implications for the broader health care community. J Orthop Sports Phys Ther 2023;53(12):1-7. Epub 14 September 2023. doi:10.2519/jospt.2023.12000.
Collapse
|
108
|
Xie Y, Seth I, Rozen WM, Hunter-Smith DJ. Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study. Aesthetic Plast Surg 2023; 47:2360-2369. [PMID: 37314466 PMCID: PMC10784397 DOI: 10.1007/s00266-023-03443-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 05/27/2023] [Indexed: 06/15/2023]
Abstract
BACKGROUND ChatGPT is an open-source artificial intelligence (AI) chatbot that uses deep learning to produce human-like text dialog. Its potential applications in the scientific community are vast; however, its efficacy on performing comprehensive literature searches, data analysis and report writing in aesthetic plastic surgery topics remains unknown. This study aims to evaluate both the accuracy and comprehensiveness of ChatGPT's responses to assess its suitability for use in aesthetic plastic surgery research. METHODS Six questions were prompted to ChatGPT on post-mastectomy breast reconstruction. First two questions focused on the current evidence and options for breast reconstruction post-mastectomy, and remaining four questions focused specifically on autologous breast reconstruction. Using the Likert framework, the responses provided by ChatGPT were qualitatively assessed for accuracy and information content by two specialist plastic surgeons with extensive experience in the field. RESULTS ChatGPT provided relevant, accurate information; however, it lacked depth. It could provide no more than a superficial overview in response to more esoteric questions and generated incorrect references. It created non-existent references, cited wrong journal and date, which poses a significant challenge in maintaining academic integrity and caution of its use in academia. CONCLUSION While ChatGPT demonstrated proficiency in summarizing existing knowledge, it created fictitious references which poses a significant concern of its use in academia and healthcare. Caution should be exercised in interpreting its responses in the aesthetic plastic surgical field and should only be used for such with sufficient oversight. LEVEL OF EVIDENCE IV This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Yi Xie
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia.
- Faculty of Medicine, Monash University, Melbourne, Victoria, 3004, Australia.
| | - Warren M Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- Faculty of Medicine, Monash University, Melbourne, Victoria, 3004, Australia
| | - David J Hunter-Smith
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- Faculty of Medicine, Monash University, Melbourne, Victoria, 3004, Australia
| |
Collapse
|
109
|
Kurian N, James D, Varghese VS, Cherian JM, Varghese KG. Artificial intelligence in scientific publications. J Am Dent Assoc 2023; 154:1041-1043. [PMID: 37140497 DOI: 10.1016/j.adaj.2023.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/16/2023] [Accepted: 03/27/2023] [Indexed: 05/05/2023]
|
110
|
Dinis-Oliveira RJ, Azevedo RMS. ChatGPT in forensic sciences: a new Pandora's box with advantages and challenges to pay attention. Forensic Sci Res 2023; 8:275-279. [PMID: 38405625 PMCID: PMC10894065 DOI: 10.1093/fsr/owad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 10/19/2023] [Indexed: 02/27/2024] Open
Abstract
ChatGPT is a variant of the generative pre-trained transformer (GPT) language model that uses large amounts of text-based training data and a transformer architecture to generate human-like text adjusted to the received prompts. ChatGPT presents several advantages in forensic sciences, namely, constituting a virtual assistant to aid lawyers, judges, and victims in managing and interpreting forensic expert data. But what would happen if ChatGPT began to be used to produce forensic expertise reports? Despite its potential applications, the use of ChatGPT and other Large Language Models and artificial intelligence tools in forensic writing also poses ethical and legal concerns, which are discussed in this perspective together with some expected future perspectives.
Collapse
Affiliation(s)
- Ricardo J Dinis-Oliveira
- 1H-TOXRUN—One Health Toxicology Research Unit, University Institute of Health Sciences (IUCS-CESPU), CESPU, CRL, Gandra, Portugal
- Department of Public Health and Forensic Sciences and Medical Education, Faculty of Medicine, University of Porto, Porto, Portugal
- UCIBIO/REQUIMTE, Laboratory of Toxicology, Faculty of Pharmacy, University of Porto, R. Jorge Viterbo Ferreira, n° 33-A, Lisboa, Portugal
- FOREN—Forensic Science Experts, Dr. Mário Moutinho Avenue, n.° 33-A, Lisbon, Portugal
| | - Rui M S Azevedo
- 1H-TOXRUN—One Health Toxicology Research Unit, University Institute of Health Sciences (IUCS-CESPU), CESPU, CRL, Gandra, Portugal
| |
Collapse
|
111
|
Оdri GА, Ji Yun Yооn D. Detecting generative artificial intelligence in scientific articles: Evasion techniques and implications for scientific integrity. Orthop Traumatol Surg Res 2023; 109:103706. [PMID: 37838021 DOI: 10.1016/j.otsr.2023.103706] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 09/06/2023] [Accepted: 09/19/2023] [Indexed: 10/16/2023]
Abstract
BACKGROUND Artificial intelligence (AI) tools, although beneficial for data collection and analysis, can also facilitate scientific fraud. AI detectors can help resolve this problem, but their effectiveness depends on their ability to track AI progress. In addition, many methods of evading AI detection exist and their constantly evolving sophistication can make the task more difficult. Thus, from an AI-generated text, we wanted to: (1) evaluate the AI detection sites on a text generated entirely by the AI, (2) test the methods described for evading AI detection, and (3) evaluate the effectiveness of these methods to evade AI detection on the sites tested previously. HYPOTHESIS Not all AI detection tools are equally effective in detecting AI-generated text and some techniques used to evade АI detection can make an AI-produced text almost undetectable. MATERIALS AND METHODS We created a text with ChatGPT-4 (Chat Gеnеrаtivе Prе-trained Transformer) and submitted it to 11 АI detection web tools (Оriginаlity, ZеrоGPT, Writеr, Cоpylеаks, Crоssplag, GPTZеrо, Sapling, Cоntеnt аt scаlе, Cоrrеctоr, Writеfull еt Quill), bеfоrе аnd аftеr applying strategies to minimise AI detection. The strategies used to minimize AI detection were the improvement of command messages in ChatPGT, the introduction of minor grammatical errors such as comma deletion, paraphrasing, and the substitution of Latin letters with similar Cyrillic letters (a and о) which is also a method used elsewhere to evade the detection of plagiarism. We have also tested the effectiveness of these tools in correctly identifying a scientific text written by a human in 1960. RESULTS From the initial text generated by the AI, 7 of the 11 detectors concluded that the text was mainly written by humans. Subsequently, the introduction of simple modifications, such as the removal of commas or paraphrasing can effectively reduce AI detection and make the text appear human for all detectors. In addition, replacing certain Latin letters with Cyrillic letters can make an AI text completely undetectable. Finally, we observe that in a paradoxical way, certain sites detect a significant proportion of AI in a text written by a human in 1960. DISCUSSION AI detectors have low efficiency, and simple modifications can allow even the most robust detectors to be easily bypassed. The rapid development of generative AI raises questions about the future of scientific writing but also about the detection of scientific fraud, such as data fabrication. LEVEL OF EVIDENCE III Control case study.
Collapse
Affiliation(s)
- Guillаumе-Аnthоny Оdri
- Sеrvicе dе chirurgiе оrthоpédiquе еt trаumаtоlоgiquе, cеntrе hоspitаliеr univеrsitаirе Lаribоisièrе, 2, ruе Аmbrоisе-Pаré, 75010 Pаris, Frаncе; Insеrm U1132 BIОSCАR, univеrsité Pаris-Cité, 75010 Pаris, Frаncе.
| | | |
Collapse
|
112
|
Kuşcu O, Pamuk AE, Sütay Süslü N, Hosal S. Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer? Front Oncol 2023; 13:1256459. [PMID: 38107064 PMCID: PMC10722294 DOI: 10.3389/fonc.2023.1256459] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/13/2023] [Indexed: 12/19/2023] Open
Abstract
Background and objective Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence (AI)-based language processing model using deep learning to create human-like text dialogue. It has been a popular source of information covering vast number of topics including medicine. Patient education in head and neck cancer (HNC) is crucial to enhance the understanding of patients about their medical condition, diagnosis, and treatment options. Therefore, this study aims to examine the accuracy and reliability of ChatGPT in answering questions regarding HNC. Methods 154 head and neck cancer-related questions were compiled from sources including professional societies, institutions, patient support groups, and social media. These questions were categorized into topics like basic knowledge, diagnosis, treatment, recovery, operative risks, complications, follow-up, and cancer prevention. ChatGPT was queried with each question, and two experienced head and neck surgeons assessed each response independently for accuracy and reproducibility. Responses were rated on a scale: (1) comprehensive/correct, (2) incomplete/partially correct, (3) a mix of accurate and inaccurate/misleading, and (4) completely inaccurate/irrelevant. Discrepancies in grading were resolved by a third reviewer. Reproducibility was evaluated by repeating questions and analyzing grading consistency. Results ChatGPT yielded "comprehensive/correct" responses to 133/154 (86.4%) of the questions whereas, rates of "incomplete/partially correct" and "mixed with accurate and inaccurate data/misleading" responses were 11% and 2.6%, respectively. There were no "completely inaccurate/irrelevant" responses. According to category, the model provided "comprehensive/correct" answers to 80.6% of questions regarding "basic knowledge", 92.6% related to "diagnosis", 88.9% related to "treatment", 80% related to "recovery - operative risks - complications - follow-up", 100% related to "cancer prevention" and 92.9% related to "other". There was not any significant difference between the categories regarding the grades of ChatGPT responses (p=0.88). The rate of reproducibility was 94.1% (145 of 154 questions). Conclusion ChatGPT generated substantially accurate and reproducible information to diverse medical queries related to HNC. Despite its limitations, it can be a useful source of information for both patients and medical professionals. With further developments in the model, ChatGPT can also play a crucial role in clinical decision support to provide the clinicians with up-to-date information.
Collapse
Affiliation(s)
- Oğuz Kuşcu
- Department of Otorhinolaryngology, School of Medicine, Hacettepe University, Ankara, Türkiye
| | - A. Erim Pamuk
- Department of Otorhinolaryngology, School of Medicine, Hacettepe University, Ankara, Türkiye
| | | | - Sefik Hosal
- Department of Otorhinolaryngology, School of Medicine, Atılım University, Ankara, Türkiye
| |
Collapse
|
113
|
Østergaard SD. Will Generative Artificial Intelligence Chatbots Generate Delusions in Individuals Prone to Psychosis? Schizophr Bull 2023; 49:1418-1419. [PMID: 37625027 PMCID: PMC10686326 DOI: 10.1093/schbul/sbad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Affiliation(s)
- Søren Dinesen Østergaard
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
- Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark
| |
Collapse
|
114
|
Wong RSY, Ming LC, Raja Ali RA. The Intersection of ChatGPT, Clinical Medicine, and Medical Education. JMIR MEDICAL EDUCATION 2023; 9:e47274. [PMID: 37988149 DOI: 10.2196/47274] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 11/22/2023]
Abstract
As we progress deeper into the digital age, the robust development and application of advanced artificial intelligence (AI) technology, specifically generative language models like ChatGPT (OpenAI), have potential implications in all sectors including medicine. This viewpoint article aims to present the authors' perspective on the integration of AI models such as ChatGPT in clinical medicine and medical education. The unprecedented capacity of ChatGPT to generate human-like responses, refined through Reinforcement Learning with Human Feedback, could significantly reshape the pedagogical methodologies within medical education. Through a comprehensive review and the authors' personal experiences, this viewpoint article elucidates the pros, cons, and ethical considerations of using ChatGPT within clinical medicine and notably, its implications for medical education. This exploration is crucial in a transformative era where AI could potentially augment human capability in the process of knowledge creation and dissemination, potentially revolutionizing medical education and clinical practice. The importance of maintaining academic integrity and professional standards is highlighted. The relevance of establishing clear guidelines for the responsible and ethical use of AI technologies in clinical medicine and medical education is also emphasized.
Collapse
Affiliation(s)
- Rebecca Shin-Yee Wong
- Department of Medical Education, School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
- Faculty of Medicine, Nursing and Health Sciences, SEGi University, Petaling Jaya, Malaysia
| | - Long Chiau Ming
- School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
| | - Raja Affendi Raja Ali
- School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
- GUT Research Group, Faculty of Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| |
Collapse
|
115
|
Tanaka OM, Gasparello GG, Hartmann GC, Casagrande FA, Pithon MM. Assessing the reliability of ChatGPT: a content analysis of self-generated and self-answered questions on clear aligners, TADs and digital imaging. Dental Press J Orthod 2023; 28:e2323183. [PMID: 37937680 PMCID: PMC10627416 DOI: 10.1590/2177-6709.28.5.e2323183.oar] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 09/04/2023] [Indexed: 11/09/2023] Open
Abstract
INTRODUCTION Artificial Intelligence (AI) is a tool that is already part of our reality, and this is an opportunity to understand how it can be useful in interacting with patients and providing valuable information about orthodontics. OBJECTIVE This study evaluated the accuracy of ChatGPT in providing accurate and quality information to answer questions on Clear aligners, Temporary anchorage devices and Digital imaging in orthodontics. METHODS forty-five questions and answers were generated by the ChatGPT 4.0, and analyzed separately by five orthodontists. The evaluators independently rated the quality of information provided on a Likert scale, in which higher scores indicated greater quality of information (1 = very poor; 2 = poor; 3 = acceptable; 4 = good; 5 = very good). The Kruskal-Wallis H test (p< 0.05) and post-hoc pairwise comparisons with the Bonferroni correction were performed. RESULTS From the 225 evaluations of the five different evaluators, 11 (4.9%) were considered as very poor, 4 (1.8%) as poor, and 15 (6.7%) as acceptable. The majority were considered as good [34 (15,1%)] and very good [161 (71.6%)]. Regarding evaluators' scores, a slight agreement was perceived, with Fleiss's Kappa equal to 0.004. CONCLUSIONS ChatGPT has proven effective in providing quality answers related to clear aligners, temporary anchorage devices, and digital imaging within the context of interest of orthodontics.
Collapse
|
116
|
Pantanowitz J, Pantanowitz L. Implications of ChatGPT for cytopathology and recommendations for updating JASC guidelines on the responsible use of artificial intelligence. J Am Soc Cytopathol 2023; 12:389-394. [PMID: 37714732 DOI: 10.1016/j.jasc.2023.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/01/2023] [Accepted: 07/03/2023] [Indexed: 09/17/2023]
Affiliation(s)
- Joshua Pantanowitz
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Liron Pantanowitz
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
| |
Collapse
|
117
|
Daher M, Koa J, Boufadel P, Singh J, Fares MY, Abboud JA. Breaking barriers: can ChatGPT compete with a shoulder and elbow specialist in diagnosis and management? JSES Int 2023; 7:2534-2541. [PMID: 37969495 PMCID: PMC10638599 DOI: 10.1016/j.jseint.2023.07.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2023] Open
Abstract
Background ChatGPT is an artificial intelligence (AI) language processing model that uses deep learning to generate human-like responses to natural language inputs. Its potential use in health care has raised questions and several studies have assessed its effectiveness in writing articles, clinical reasoning, and solving complex questions. This study aims to investigate ChatGPT's capabilities and implications in diagnosing and managing patients with new shoulder and elbow complaints in a private clinical setting to provide insights into its potential use as a diagnostic tool for patients and a first consultation resource for primary physicians. Methods In a private clinical setting, patients were assessed by ChatGPT after being seen by a shoulder and elbow specialist for shoulder and elbow symptoms. To be assessed by the AI model, a research fellow filled out a standardized form (including age, gender, major comorbidities, symptoms and the localization, natural history, and duration, any associated symptoms or movement deficit, aggravating/relieving factors, and x-ray/imaging report if present). This form was submitted through the ChatGPT portal and the AI model was asked for a diagnosis and best management modality. Results A total of 29 patients with 15 males and 14 females, were included in this study. The AI model was able to correctly choose the diagnosis and management in 93% (27/29) and 83% (24/29) of the patients, respectively. Furthermore, of the remaining 24 patients that were managed correctly, ChatGPT did not specify the appropriate management in 6 patients and chose only one management in 5 patients, where both were applicable and dependent on the patient's choice. Therefore, 55% of ChatGPT's management was poor. Conclusion ChatGPT made a worthy opponent; however, it will not be able to replace in its current form a shoulder and elbow specialist in diagnosing and treating patients for many reasons such as misdiagnosis, poor management, lack of empathy and interactions with patients, its dependence on magnetic resonance imaging reports, and its lack of new knowledge.
Collapse
Affiliation(s)
| | - Jonathan Koa
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Peter Boufadel
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Jaspal Singh
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Mohamad Y. Fares
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Joseph A. Abboud
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| |
Collapse
|
118
|
Bogdanovich B, Patel PA, Kavian JA, Boyd CJ, Rodriguez ED. ChatGPT for the Modern Plastic Surgeon. Plast Reconstr Surg 2023; 152:969e-970e. [PMID: 37871032 DOI: 10.1097/prs.0000000000010794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Affiliation(s)
| | - Parth A Patel
- Medical College of Georgia, Augusta University, Augusta, GA
| | - Joseph Abraham Kavian
- Hansjörg Wyss Department of Plastic Surgery, New York University Langone Health, New York, NY
| | - Carter J Boyd
- Hansjörg Wyss Department of Plastic Surgery, New York University Langone Health, New York, NY
| | - Eduardo D Rodriguez
- Hansjörg Wyss Department of Plastic Surgery, New York University Langone Health, New York, NY
| |
Collapse
|
119
|
Fleming AM, Phillips AL, Drake JA, Murphy AJ, Yakoub D, Shibata D, Wood EH. Sugarbaker Versus Keyhole Repair for Parastomal Hernia: Results of an Artificial Intelligence Large Language Model Post Hoc Analysis. J Gastrointest Surg 2023; 27:2567-2570. [PMID: 37353657 DOI: 10.1007/s11605-023-05749-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 06/03/2023] [Indexed: 06/25/2023]
Affiliation(s)
- Andrew M Fleming
- Department of Surgery, The University of Tennessee Health Science Center, Memphis, TN, USA.
- Department of Surgery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| | - Alisa L Phillips
- Department of Surgery, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - Justin A Drake
- Division of Gastrointestinal Oncology, H. Lee Moffitt Cancer Center, Tampa, FL, USA
| | - Andrew J Murphy
- Department of Surgery, The University of Tennessee Health Science Center, Memphis, TN, USA
- Department of Surgery, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Danny Yakoub
- Division of Surgical Oncology, Augusta University Medical Center, Augusta, GA, USA
| | - David Shibata
- Department of Surgery, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - Elizabeth H Wood
- Department of Surgery, The University of Tennessee Health Science Center, Memphis, TN, USA
| |
Collapse
|
120
|
Chatelan A, Clerc A, Fonta PA. ChatGPT and Future Artificial Intelligence Chatbots: What may be the Influence on Credentialed Nutrition and Dietetics Practitioners? J Acad Nutr Diet 2023; 123:1525-1531. [PMID: 37544375 DOI: 10.1016/j.jand.2023.08.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/12/2023] [Accepted: 08/01/2023] [Indexed: 08/08/2023]
Affiliation(s)
- Angeline Chatelan
- Department of Nutrition and Dietetics, Geneva School of Health Sciences, HES-SO University of Applied Sciences and Arts Western Switzerland, Geneva, Switzerland.
| | - Aurélien Clerc
- Department of Nutrition and Dietetics, Geneva School of Health Sciences, HES-SO University of Applied Sciences and Arts Western Switzerland, Geneva, Switzerland; HFR Fribourg University Training Hospital, Fribourg, Switzerland
| | | |
Collapse
|
121
|
Huang H. Performance of ChatGPT on Registered Nurse License Exam in Taiwan: A Descriptive Study. Healthcare (Basel) 2023; 11:2855. [PMID: 37958000 PMCID: PMC10649156 DOI: 10.3390/healthcare11212855] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/17/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
(1) Background: AI (artificial intelligence) chatbots have been widely applied. ChatGPT could enhance individual learning capabilities and clinical reasoning skills and facilitate students' understanding of complex concepts in healthcare education. There is currently less emphasis on its application in nursing education. The application of ChatGPT in nursing education needs to be verified. (2) Methods: A descriptive study was used to analyze the scores of ChatGPT on the registered nurse license exam (RNLE) in 2022~2023, and to explore the response and explanations of ChatGPT. The process of data measurement encompassed input sourcing, encoding methods, and statistical analysis. (3) Results: ChatGPT promptly responded within seconds. The average score of four exams was around 51.6 to 63.75 by ChatGPT, and it passed the RNLE in 2022 1st and 2023 2nd. However, ChatGPT may generate misleading or inaccurate explanations, or it could lead to hallucination; confusion or misunderstanding about complicated scenarios; and languages bias. (4) Conclusions: ChatGPT may have the potential to assist with nursing education because of its advantages. It is recommended to integrate ChatGPT into different nursing courses, to assess its limitations and effectiveness through a variety of tools and methods.
Collapse
Affiliation(s)
- Huiman Huang
- School of Nursing, College of Nursing, Tzu Chi University of Science and Technology, Hualien 970302, Taiwan
| |
Collapse
|
122
|
Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, Giannaccare G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep 2023; 13:18562. [PMID: 37899405 PMCID: PMC10613606 DOI: 10.1038/s41598-023-45837-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/24/2023] [Indexed: 10/31/2023] Open
Abstract
To compare the performance of humans, GPT-4.0 and GPT-3.5 in answering multiple-choice questions from the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course (BCSC) self-assessment program, available at https://www.aao.org/education/self-assessments . In June 2023, text-based multiple-choice questions were submitted to GPT-4.0 and GPT-3.5. The AAO provides the percentage of humans who selected the correct answer, which was analyzed for comparison. All questions were classified by 10 subspecialties and 3 practice areas (diagnostics/clinics, medical treatment, surgery). Out of 1023 questions, GPT-4.0 achieved the best score (82.4%), followed by humans (75.7%) and GPT-3.5 (65.9%), with significant difference in accuracy rates (always P < 0.0001). Both GPT-4.0 and GPT-3.5 showed the worst results in surgery-related questions (74.6% and 57.0% respectively). For difficult questions (answered incorrectly by > 50% of humans), both GPT models favorably compared to humans, without reaching significancy. The word count for answers provided by GPT-4.0 was significantly lower than those produced by GPT-3.5 (160 ± 56 and 206 ± 77 respectively, P < 0.0001); however, incorrect responses were longer (P < 0.02). GPT-4.0 represented a substantial improvement over GPT-3.5, achieving better performance than humans in an AAO BCSC self-assessment test. However, ChatGPT is still limited by inconsistency across different practice areas, especially when it comes to surgery.
Collapse
Affiliation(s)
- Andrea Taloni
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Massimiliano Borselli
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Valentina Scarsi
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Costanza Rossi
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Giulia Coco
- Department of Clinical Sciences and Translational Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Vincenzo Scorcia
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Giuseppe Giannaccare
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
- Department of Surgical Sciences, Eye Clinic, University of Cagliari, Via Università 40, 09124, Cagliari, Italy.
| |
Collapse
|
123
|
E K, S P, R G, R KL, A B, M G, T O, S R, V R, H M, G S. Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4. BMC MEDICAL EDUCATION 2023; 23:772. [PMID: 37848913 PMCID: PMC10580534 DOI: 10.1186/s12909-023-04752-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/07/2023] [Indexed: 10/19/2023]
Abstract
BACKGROUND The task of writing multiple choice question examinations for medical students is complex, timely and requires significant efforts from clinical staff and faculty. Applying artificial intelligence algorithms in this field of medical education may be advisable. METHODS During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs examination based on an existing exam template and thoroughly investigated the output by specialist physicians who were blinded to the source of the questions. Algorithm mistakes and inaccuracies, as identified by specialists were classified as stemming from age, gender or geographical insensitivities. RESULTS After inputting a detailed prompt, GPT-4 produced the test rapidly and effectively. Only 1 question (0.5%) was defined as false; 15% of questions necessitated revisions. Errors in the AI-generated questions included: the use of outdated or inaccurate terminology, age-sensitive inaccuracies, gender-sensitive inaccuracies, and geographically sensitive inaccuracies. Questions that were disqualified due to flawed methodology basis included elimination-based questions and questions that did not include elements of integrating knowledge with clinical reasoning. CONCLUSION GPT-4 can be used as an adjunctive tool in creating multi-choice question medical examinations yet rigorous inspection by specialist physicians remains pivotal.
Collapse
Affiliation(s)
- Klang E
- The Sami Sagol AI Hub, ARC Innovation Center, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Portugez S
- Silesia Medical University, Katowice, Poland
| | - Gross R
- Division of Psychiatry, the Chaim Sheba Medical Center, Tel-Hashomer, Ramat Gan, Israel. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Kassif Lerner R
- Department of Pediatric Intensive Care, The Edmond and Lily Safra Children's' Hospital, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Brenner A
- Obstetrics and Gynecology Division, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Gilboa M
- Infection Prevention and Control Unit, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Ortal T
- Education Authority, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Ron S
- Education Authority, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Robinzon V
- Education Authority, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Meiri H
- Department of Surgery and Transplantation B, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel
| | - Segal G
- Infection Prevention and Control Unit, Chaim Sheba Medical Center. Affiliated to the Faculty of Medicine, Tel-Aviv University, Ramat Aviv, Israel.
| |
Collapse
|
124
|
Sharma A, Kumar R, Vinjamuri S. Artificial Intelligence Chatbots: addressing the stochastic parrots in medical science. Nucl Med Commun 2023; 44:831-833. [PMID: 37578315 DOI: 10.1097/mnm.0000000000001739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Affiliation(s)
- Anshul Sharma
- Department of Nuclear Medicine, Homi Bhabha Cancer Hospital and Research Centre (Tata Memorial Centre), New Chandigarh
| | - Rakesh Kumar
- Department of Nuclear Medicine, All India Institute of Medical Sciences, New Delhi, India
| | - Sobhan Vinjamuri
- Department of Nuclear Medicine, Royal Liverpool and Broadgreen University Hospitals, NHS Trust, Liverpool, UK
| |
Collapse
|
125
|
Benichou L. The role of using ChatGPT AI in writing medical scientific articles. JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2023; 124:101456. [PMID: 36966950 DOI: 10.1016/j.jormas.2023.101456] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 03/15/2023] [Accepted: 03/23/2023] [Indexed: 06/18/2023]
Abstract
The use of artificial intelligence (AI) in medical research is on the rise. This article explores the role of using ChatGPT, a language model developed by OpenAI, in writing medical scientific articles. The material and methods used included a comparative analysis of medical scientific articles produced with and without the use of ChatGPT. The results suggest that the use of ChatGPT can be a useful tool for scientists to increase the production of higher quality medical scientific articles, but it is important to note that AI cannot fully replace human authors. In conclusion, scientists should consider ChatGPT as an additional tool to produce higher quality medical scientific articles more quickly.
Collapse
Affiliation(s)
- L Benichou
- Service de chirurgie maxillo-faciale et stomatologie, Groupe Hospitalier Paris St-Joseph, 185 rue Raymond Losserand 75014 Paris, France.
| |
Collapse
|
126
|
Krüger L, Krotsetis S, Nydahl P. [ChatGPT: curse or blessing in nursing care?]. Med Klin Intensivmed Notfmed 2023; 118:534-539. [PMID: 37401955 DOI: 10.1007/s00063-023-01038-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/28/2023] [Accepted: 06/03/2023] [Indexed: 07/05/2023]
Abstract
Artificial intelligence (AI) has been used in healthcare for some years for risk detection, diagnostics, documentation, education and training and other purposes. A new open AI application is ChatGPT, which is accessible to everyone. The application of ChatGPT as AI in education, training or studies is currently being discussed from many perspectives. It is questionable whether ChatGPT can and should also support nursing professions in health care. The aim of this review article is to show and critically discuss possible areas of application of ChatGPT in theory and practice with a focus on nursing practice, pedagogy, nursing research and nursing development.
Collapse
Affiliation(s)
- Lars Krüger
- Herz- und Diabeteszentrum NRW, Universitätsklinikum der Ruhr-Universität Bochum, Bad Oeynhausen, Deutschland
| | - Susanne Krotsetis
- Pflegeentwicklung und Pflegewissenschaft angegliedert der Pflegedirektion, des Universitätsklinikums Schleswig-Holstein, Campus Lübeck, Lübeck, Deutschland
| | - Peter Nydahl
- Pflegeforschung und -entwicklung, Pflegedirektion, Universitätsklinikum Schleswig-Holstein, Haus V40, Arnold-Heller-Str. 3, 24105, Kiel, Deutschland.
- Universitätsinstitut für Pflegewissenschaft und -praxis, Paracelsus Medizinische Privatuniversität, Salzburg, Österreich.
| |
Collapse
|
127
|
Schlam I, Saad Menezes MC, Corti C, Tan A, Abuali I, Tolaney SM. Artificial intelligence as an adjunct tool for breast oncologists - are we there yet? ESMO Open 2023; 8:101643. [PMID: 37703594 PMCID: PMC10502370 DOI: 10.1016/j.esmoop.2023.101643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/15/2023] Open
Affiliation(s)
- I Schlam
- Department of Hematology and Oncology, Tufts Medical Center, Boston; Harvard T.H. Chan School of Public Health, Boston.
| | - M C Saad Menezes
- Harvard T.H. Chan School of Public Health, Boston; Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - C Corti
- Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan; Department of Oncology and Hemato-Oncology (DIPO), University of Milan, Milan, Italy
| | - A Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - I Abuali
- Department of Hematology and Oncology, Massachusetts General Hospital, Boston
| | - S M Tolaney
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston; Breast Oncology Program, Dana-Farber Brigham Cancer Center, Boston; Harvard Medical School, Boston, USA
| |
Collapse
|
128
|
Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J ESTHET RESTOR DENT 2023; 35:1098-1102. [PMID: 37017291 DOI: 10.1111/jerd.13046] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/25/2023] [Accepted: 03/28/2023] [Indexed: 04/06/2023]
Abstract
OBJECTIVE This article provides an overview of the implications of ChatGPT and other large language models (LLMs) for dental medicine. OVERVIEW ChatGPT, a LLM trained on massive amounts of textual data, is adept at fulfilling various language-related tasks. Despite its impressive capabilities, ChatGPT has serious limitations, such as occasionally giving incorrect answers, producing nonsensical content, and presenting misinformation as fact. Dental practitioners, assistants, and hygienists are not likely to be significantly impacted by LLMs. However, LLMs could affect the work of administrative personnel and the provision of dental telemedicine. LLMs offer potential for clinical decision support, text summarization, efficient writing, and multilingual communication. As more people seek health information from LLMs, it is crucial to safeguard against inaccurate, outdated, and biased responses to health-related queries. LLMs pose challenges for patient data confidentiality and cybersecurity that must be tackled. In dental education, LLMs present fewer challenges than in other academic fields. LLMs can enhance academic writing fluency, but acceptable usage boundaries in science need to be established. CONCLUSIONS While LLMs such as ChatGPT may have various useful applications in dental medicine, they come with risks of malicious use and serious limitations, including the potential for misinformation. CLINICAL SIGNIFICANCE Along with the potential benefits of using LLMs as an additional tool in dental medicine, it is crucial to carefully consider the limitations and potential risks inherent in such artificial intelligence technologies.
Collapse
Affiliation(s)
- Florin Eggmann
- Department of Preventive and Restorative Sciences, Penn Dental Medicine, Robert Schattner Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Roland Weiger
- Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Nicola U Zitzmann
- Department of Reconstructive Dentistry, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Markus B Blatz
- Department of Preventive and Restorative Sciences, Penn Dental Medicine, Robert Schattner Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
129
|
Salimi A, Saheb H. Large Language Models in Ophthalmology Scientific Writing: Ethical Considerations Blurred Lines or Not at All? Am J Ophthalmol 2023; 254:177-181. [PMID: 37348667 DOI: 10.1016/j.ajo.2023.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/12/2023] [Accepted: 06/07/2023] [Indexed: 06/24/2023]
Abstract
PURPOSE To discuss the implications of large language models (LLMs) in ophthalmology research, as well as the associated ethical considerations. DESIGN Perspective. METHODS This discussion reviews the potential uses of LLMs such as ChatGPT in ophthalmology research, highlights the associated threats and ethical considerations, and proposes solutions for the use of LLMs in ophthalmology research and scientific writing. RESULTS With the increasing interest in LLMs, such as ChatGPT, their diverse utility has been widely explored, including their application in research and scientific writing. LLMs have the potential to guide researchers throughout the different stages of their research, from idea generation to drafting a scientific piece. However, there are significant ethical concerns and challenges related to scientific integrity in ophthalmology research that should be addressed by scientific journals. Our review of the 10 highest-impact-factor ophthalmology journals revealed that the number of journals addressing this topic in their submission guidelines is rapidly increasing. Therefore, we propose certain domains that all journals should consider regarding the use of LLMs in research. CONCLUSIONS As LLMs continue to improve, their use in scientific writing will remain a contentious issue due to the ethical dilemmas involved in determining the appropriate scope of their use. This article reviews the ethical dilemmas related to the use of LLMs in ophthalmology research and calls for the prompt development of guidelines for their ethical use in manuscript writing as ophthalmology journals update their editorial policies with respect to LLMs.
Collapse
Affiliation(s)
- Ali Salimi
- From the Department of Ophthalmology and Visual Sciences, McGill University, Montreal, Quebec, Canada
| | - Hady Saheb
- From the Department of Ophthalmology and Visual Sciences, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
130
|
Diaz Milian R, Moreno Franco P, Freeman WD, Halamka JD. Revolution or Peril? The Controversial Role of Large Language Models in Medical Manuscript Writing. Mayo Clin Proc 2023; 98:1444-1448. [PMID: 37793723 DOI: 10.1016/j.mayocp.2023.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 07/13/2023] [Indexed: 10/06/2023]
Affiliation(s)
- Ricardo Diaz Milian
- Division of Critical Care Medicine, Mayo Clinic, Jacksonville, FL; Department of Anesthesiology, Mayo Clinic, Jacksonville, FL.
| | - Pablo Moreno Franco
- Department of Anesthesiology, Mayo Clinic, Jacksonville, FL; Department of Transplant Medicine, Mayo Clinic, Jacksonville, FL
| | - William D Freeman
- Department of Neurology and Neurosurgery, Mayo Clinic, Jacksonville, FL
| | - John D Halamka
- Department of Emergency Medicine, Mayo Clinic, Rochester, MN; Department of Internal Medicine, Mayo Clinic, Rochester, MN; Mayo Clinic Platform, Mayo Clinic, Rochester, MN
| |
Collapse
|
131
|
Kang Y, Xia Z, Zhu L. When ChatGPT Meets Plastic Surgeons. Aesthetic Plast Surg 2023; 47:2190-2193. [PMID: 37165022 DOI: 10.1007/s00266-023-03372-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 04/23/2023] [Indexed: 05/12/2023]
Affiliation(s)
- Yuanbo Kang
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Shuaifuyuan 1#, Dongcheng District, Beijing, 100730, People's Republic of China
- Peking Union Medical College, Chinese Academy of Medical Sciences, Dongdan Santiao 9#, Dongcheng District, Beijing, 100730, People's Republic of China
| | - Zenan Xia
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Shuaifuyuan 1#, Dongcheng District, Beijing, 100730, People's Republic of China
| | - Lin Zhu
- Department of Plastic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Shuaifuyuan 1#, Dongcheng District, Beijing, 100730, People's Republic of China.
| |
Collapse
|
132
|
Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol 2023; 13:170-178. [PMID: 37771867 PMCID: PMC10523250 DOI: 10.5662/wjm.v13.i4.170] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/29/2023] [Accepted: 07/24/2023] [Indexed: 09/20/2023] Open
Abstract
Artificial intelligence (AI) tools, like OpenAI's Chat Generative Pre-trained Transformer (ChatGPT), hold considerable potential in healthcare, academia, and diverse industries. Evidence demonstrates its capability at a medical student level in standardized tests, suggesting utility in medical education, radiology reporting, genetics research, data optimization, and drafting repetitive texts such as discharge summaries. Nevertheless, these tools should augment, not supplant, human expertise. Despite promising applications, ChatGPT confronts limitations, including critical thinking tasks and generating false references, necessitating stringent cross-verification. Ensuing concerns, such as potential misuse, bias, blind trust, and privacy, underscore the need for transparency, accountability, and clear policies. Evaluations of AI-generated content and preservation of academic integrity are critical. With responsible use, AI can significantly improve healthcare, academia, and industry without compromising integrity and research quality. For effective and ethical AI deployment, collaboration amongst AI developers, researchers, educators, and policymakers is vital. The development of domain-specific tools, guidelines, regulations, and the facilitation of public dialogue must underpin these endeavors to responsibly harness AI's potential.
Collapse
Affiliation(s)
- Madhan Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Swaminathan Ramasubramanian
- Department of General Medicine, Government Medical College, Omandurar Government Estate, Chennai 600018, Tamil Nadu, India
| | - Sangeetha Balaji
- Department of General Medicine, Government Medical College, Omandurar Government Estate, Chennai 600018, Tamil Nadu, India
| | - Naveen Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Arulkumar Nallakumarasamy
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Shilpa Sharma
- Department of Paediatric Surgery, All India Institute of Medical Sciences, Delhi 110029, New Delhi, India
| |
Collapse
|
133
|
Laxar D, Eitenberger M, Maleczek M, Kaider A, Hammerle FP, Kimberger O. The influence of explainable vs non-explainable clinical decision support systems on rapid triage decisions: a mixed methods study. BMC Med 2023; 21:359. [PMID: 37726729 PMCID: PMC10510231 DOI: 10.1186/s12916-023-03068-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND During the COVID-19 pandemic, a variety of clinical decision support systems (CDSS) were developed to aid patient triage. However, research focusing on the interaction between decision support systems and human experts is lacking. METHODS Thirty-two physicians were recruited to rate the survival probability of 59 critically ill patients by means of chart review. Subsequently, one of two artificial intelligence systems advised the physician of a computed survival probability. However, only one of these systems explained the reasons behind its decision-making. In the third step, physicians reviewed the chart once again to determine the final survival probability rating. We hypothesized that an explaining system would exhibit a higher impact on the physicians' second rating (i.e., higher weight-on-advice). RESULTS The survival probability rating given by the physician after receiving advice from the clinical decision support system was a median of 4 percentage points closer to the advice than the initial rating. Weight-on-advice was not significantly different (p = 0.115) between the two systems (with vs without explanation for its decision). Additionally, weight-on-advice showed no difference according to time of day or between board-qualified and not yet board-qualified physicians. Self-reported post-experiment overall trust was awarded a median of 4 out of 10 points. When asked after the conclusion of the experiment, overall trust was 5.5/10 (non-explaining median 4 (IQR 3.5-5.5), explaining median 7 (IQR 5.5-7.5), p = 0.007). CONCLUSIONS Although overall trust in the models was low, the median (IQR) weight-on-advice was high (0.33 (0.0-0.56)) and in line with published literature on expert advice. In contrast to the hypothesis, weight-on-advice was comparable between the explaining and non-explaining systems. In 30% of cases, weight-on-advice was 0, meaning the physician did not change their rating. The median of the remaining weight-on-advice values was 50%, suggesting that physicians either dismissed the recommendation or employed a "meeting halfway" approach. Newer technologies, such as clinical reasoning systems, may be able to augment the decision process rather than simply presenting unexplained bias.
Collapse
Affiliation(s)
- Daniel Laxar
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| | - Magdalena Eitenberger
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| | - Mathias Maleczek
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria.
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria.
| | - Alexandra Kaider
- Center for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Fabian Peter Hammerle
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
| | - Oliver Kimberger
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| |
Collapse
|
134
|
Lareyre F, Nasr B, Chaudhuri A, Di Lorenzo G, Carlier M, Raffort J. Comprehensive Review of Natural Language Processing (NLP) in Vascular Surgery. EJVES Vasc Forum 2023; 60:57-63. [PMID: 37822918 PMCID: PMC10562666 DOI: 10.1016/j.ejvsvf.2023.09.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/13/2023] [Accepted: 09/08/2023] [Indexed: 10/13/2023] Open
Abstract
Objective The use of Natural Language Processing (NLP) has attracted increased interest in healthcare with various potential applications including identification and extraction of health information, development of chatbots and virtual assistants. The aim of this comprehensive literature review was to provide an overview of NLP applications in vascular surgery, identify current limitations, and discuss future perspectives in the field. Data sources The MEDLINE database was searched on April 2023. Review methods The database was searched using a combination of keywords to identify studies reporting the use of NLP and chatbots in three main vascular diseases. Keywords used included Natural Language Processing, chatbot, chatGPT, aortic disease, carotid, peripheral artery disease, vascular, and vascular surgery. Results Given the heterogeneity of study design, techniques, and aims, a comprehensive literature review was performed to provide an overview of NLP applications in vascular surgery. By enabling identification and extraction of information on patients with vascular diseases, such technology could help to analyse data from healthcare information systems to provide feedback on current practice and help in optimising patient care. In addition, chatbots and NLP driven techniques have the potential to be used as virtual assistants for both health professionals and patients. Conclusion While Artificial Intelligence and NLP technology could be used to enhance care for patients with vascular diseases, many challenges remain including the need to define guidelines and clear consensus on how to evaluate and validate these innovations before their implementation into clinical practice.
Collapse
Affiliation(s)
- Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
- Université Côte d'Azur, Inserm, U1065, C3M, Nice, France
| | - Bahaa Nasr
- Department of Vascular and Endovascular Surgery, Brest University Hospital, Brest, France
- INSERM, UMR 1101, LaTIM, Brest, France
| | - Arindam Chaudhuri
- Bedfordshire - Milton Keynes Vascular Centre, Bedfordshire Hospitals, NHS Foundation Trust, Bedford, UK
| | - Gilles Di Lorenzo
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
| | - Mathieu Carlier
- Department of Urology, University Hospital of Nice, Nice, France
| | - Juliette Raffort
- Université Côte d'Azur, Inserm, U1065, C3M, Nice, France
- Institute 3IA Côte d’Azur, Université Côte d’Azur, France
- Clinical Chemistry Laboratory, University Hospital of Nice, France
| |
Collapse
|
135
|
Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K. The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study. J Med Internet Res 2023; 25:e47621. [PMID: 37713254 PMCID: PMC10541638 DOI: 10.2196/47621] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/17/2023] [Accepted: 08/17/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. OBJECTIVE The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. METHODS Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. RESULTS The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. CONCLUSIONS The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.
Collapse
Affiliation(s)
- Tomoyuki Kuroiwa
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Orthopedic Surgery Research, Mayo Clinic, Rochester, MN, United States
| | - Aida Sarcon
- Department of Surgery, Mayo Clinic, Rochester, MN, United States
| | - Takuya Ibara
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Eriku Yamada
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Akiko Yamamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Kazuya Tsukamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Koji Fujita
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Medical Design Innovations, Open Innovation Center, Institute of Research Innovation, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
136
|
Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023; 10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Purpose of review ChatGPT has emerged as a potential tool for facilitating doctors' workflows. However, when it comes to applying these findings within a urological context, there have not been many studies. Thus, our objective was rooted in analyzing the pros and cons of ChatGPT use and how it can be exploited and used by urologists. Recent findings ChatGPT can facilitate clinical documentation and note-taking, patient communication and support, medical education, and research. In urology, it was proven that ChatGPT has the potential as a virtual healthcare aide for benign prostatic hyperplasia, an educational and prevention tool on prostate cancer, educational support for urological residents, and as an assistant in writing urological papers and academic work. However, several concerns about its exploitation are presented, such as lack of web crawling, risk of accidental plagiarism, and concerns about patients-data privacy. Summary The existing limitations mediate the need for further improvement of ChatGPT, such as ensuring the privacy of patient data and expanding the learning dataset to include medical databases, and developing guidance on its appropriate use. Urologists can also help by conducting studies to determine the effectiveness of ChatGPT in urology in clinical scenarios and nosologies other than those previously listed.
Collapse
Affiliation(s)
- Ali Talyshinskii
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Nithesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | | | | | - Gafur Khairli
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Bakhman Guliev
- Department of Urology, Mariinsky Hospital, St Petersburg, Russia
| | | | - Lazaros Tzelves
- Department of Urology, National and Kapodistrian University of Athens, Sismanogleion Hospital, Athens, Marousi, Greece
| | - Bhaskar Kumar Somani
- Department of Urology, University Hospital Southampton NHS Trust, Southampton, United Kingdom
| |
Collapse
|
137
|
Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ. Artificial Intelligence in Orthopaedic Surgery: Can a Large Language Model "Write" a Believable Orthopaedic Journal Article? J Bone Joint Surg Am 2023; 105:1388-1392. [PMID: 37437021 DOI: 10.2106/jbjs.23.00473] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
ABSTRACT ➢ Natural language processing with large language models is a subdivision of artificial intelligence (AI) that extracts meaning from text with use of linguistic rules, statistics, and machine learning to generate appropriate text responses. Its utilization in medicine and in the field of orthopaedic surgery is rapidly growing.➢ Large language models can be utilized in generating scientific manuscript texts of a publishable quality; however, they suffer from AI hallucinations, in which untruths or half-truths are stated with misleading confidence. Their use raises considerable concerns regarding the potential for research misconduct and for hallucinations to insert misinformation into the clinical literature.➢ Current editorial processes are insufficient for identifying the involvement of large language models in manuscripts. Academic publishing must adapt to encourage safe use of these tools by establishing clear guidelines for their use, which should be adopted across the orthopaedic literature, and by implementing additional steps in the editorial screening process to identify the use of these tools in submitted manuscripts.
Collapse
Affiliation(s)
- Devon T Brameier
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ahmad A Alnasser
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Jonathan M Carnino
- Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts
| | - Abhiram R Bhashyam
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Arvind G von Keudell
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
- Bispebjerg Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Michael J Weaver
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
138
|
Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy? Semin Nucl Med 2023; 53:719-730. [PMID: 37225599 DOI: 10.1053/j.semnuclmed.2023.04.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 04/30/2023] [Indexed: 05/26/2023]
Abstract
Academic integrity in both higher education and scientific writing has been challenged by developments in artificial intelligence. The limitations associated with algorithms have been largely overcome by the recently released ChatGPT; a chatbot powered by GPT-3.5 capable of producing accurate and human-like responses to questions in real-time. Despite the potential benefits, ChatGPT confronts significant limitations to its usefulness in nuclear medicine and radiology. Most notably, ChatGPT is prone to errors and fabrication of information which poses a risk to professionalism, ethics and integrity. These limitations simultaneously undermine the value of ChatGPT to the user by not producing outcomes at the expected standard. Nonetheless, there are a number of exciting applications of ChatGPT in nuclear medicine across education, clinical and research sectors. Assimilation of ChatGPT into practice requires redefining of norms, and re-engineering of information expectations.
Collapse
Affiliation(s)
- Geoffrey M Currie
- Charles Sturt University, Wagga Wagga, NSW, Australia; Baylor College of Medicine, Houston, TX.
| |
Collapse
|
139
|
Wang H, Wu W, Dou Z, He L, Yang L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI. Int J Med Inform 2023; 177:105173. [PMID: 37549499 DOI: 10.1016/j.ijmedinf.2023.105173] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/01/2023] [Accepted: 07/08/2023] [Indexed: 08/09/2023]
Abstract
BACKGROUND Although chat generative pre-trained transformer (ChatGPT) has made several successful attempts in the medical field, most notably in answering medical questions in English, no studies have evaluated ChatGPT's performance in a Chinese context for a medical task. OBJECTIVE The aim of this study was to evaluate ChatGPT's ability to understand medical knowledge in Chinese, as well as its potential to serve as an electronic health infrastructure for medical development, by evaluating its performance in medical examinations, records, and education. METHOD The Chinese (CNMLE) and English (ENMLE) datasets of the China National Medical Licensing Examination and the Chinese dataset (NEEPM) of the China National Entrance Examination for Postgraduate Clinical Medicine Comprehensive Ability were used to evaluate the performance of ChatGPT (GPT-3.5 and GPT-4). We assessed answer accuracy, verbal fluency, and the classification of incorrect responses owing to hallucinations on multiple occasions. In addition, we tested ChatGPT's performance on discharge summaries and group learning in a Chinese context on a small scale. RESULTS The accuracy of GPT-3.5 in CNMLE, ENMLE, and NEEPM was 56% (56/100), 76% (76/100), and 62% (62/100), respectively, compared to that of GPT-4, which was of 84% (84/100), 86% (86/100), and 82% (82/100). The verbal fluency of all the ChatGPT responses exceeded 95%. Among the GPT-3.5 incorrect responses, the proportions of open-domain hallucinations were 66 % (29/44), 54 % (14/24), and 63 % (24/38), whereas close-domain hallucinations accounted for 34 % (15/44), 46 % (14/24), and 37 % (14/38), respectively. By contrast, GPT-4 open-domain hallucinations accounted for 56% (9/16), 43% (6/14), and 83% (15/18), while close-domain hallucinations accounted for 44% (7/16), 57% (8/14), and 17% (3/18), respectively. In the discharge summary, ChatGPT demonstrated logical coherence, however GPT-3.5 could not fulfill the quality requirements, while GPT-4 met the qualification of 60% (6/10). In group learning, the verbal fluency and interaction satisfaction with ChatGPT were 100% (10/10). CONCLUSION ChatGPT based on GPT-4 is at par with Chinese medical practitioners who passed the CNMLE and at the standard required for admission to clinical medical graduate programs in China. The GPT-4 shows promising potential for discharge summarization and group learning. Additionally, it shows high verbal fluency, resulting in a positive human-computer interaction experience. GPT-4 significantly improves multiple capabilities and reduces hallucinations compared to the previous GPT-3.5 model, with a particular leap forward in the Chinese comprehension capability of medical tasks. Artificial intelligence (AI) systems face the challenges of hallucinations, legal risks, and ethical issues. However, we discovered ChatGPT's potential to promote medical development as an electronic health infrastructure, paving the way for Medical AI to become necessary.
Collapse
Affiliation(s)
- Hongyan Wang
- Department of Pain Management, Xuanwu Hospital, Capital Medical University
| | - WeiZhen Wu
- Department of Anesthesia, China-Japan Union Hospital of Jilin University
| | - Zhi Dou
- Department of Pain Management, Xuanwu Hospital, Capital Medical University
| | - Liangliang He
- Department of Pain Management, Xuanwu Hospital, Capital Medical University
| | - Liqiang Yang
- Department of Pain Management, Xuanwu Hospital, Capital Medical University.
| |
Collapse
|
140
|
Patel V, Deleonibus A, Wells MW, Bernard SL, Schwarz GS. Distinguishing Authentic Voices in the Age of ChatGPT: Comparing AI-Generated and Applicant-Written Personal Statements for Plastic Surgery Residency Application. Ann Plast Surg 2023; 91:324-325. [PMID: 37566815 DOI: 10.1097/sap.0000000000003653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2023]
Abstract
BACKGROUND ChatGPT, a generative artificial intelligence model, may be used by future applicants in the plastic surgery residency match. METHODS Ten personal statements (5 generated by ChatGPT, 5 written by applicants) were rated by 10 reviewers, blinded to the source of the essay. RESULTS A total of a 100 evaluations were collected. There was no significant difference in ratings for readability, originality, authenticity, and overall quality (all P > 0.05) when comparing computer-generated and applicant essays. CONCLUSION Personal statements prepared by ChatGPT are indistinguishable from essays written by actual applicants. This finding suggests that the current plastic surgery application format be reevaluated to better aid in holistic evaluation of students.
Collapse
Affiliation(s)
- Viren Patel
- From the Department of Plastic Surgery, Cleveland Clinic, Cleveland, OH
| | | | | | | | | |
Collapse
|
141
|
Panthier C, Gatinel D. Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment. J Fr Ophtalmol 2023; 46:706-711. [PMID: 37537126 DOI: 10.1016/j.jfo.2023.05.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/27/2023] [Accepted: 05/31/2023] [Indexed: 08/05/2023]
Abstract
PURPOSE The purpose of this study was to evaluate the performance of ChatGPT, a cutting-edge artificial intelligence (AI) language model developed by OpenAI, in successfully completing the French language version of the European Board of Ophthalmology (EBO) examination and to assess its potential role in medical education and knowledge assessment. METHODS ChatGPT, based on the GPT-4 architecture, was exposed to a series of EBO examination questions in French, covering various aspects of ophthalmology. The AI's performance was evaluated by comparing its responses with the correct answers provided by ophthalmology experts. Additionally, the study assessed the time taken by ChatGPT to answer each question as a measure of efficiency. RESULTS ChatGPT achieved a 91% success rate on the EBO examination, demonstrating a high level of competency in ophthalmology knowledge and application. The AI provided correct answers across all question categories, indicating a strong understanding of basic sciences, clinical knowledge, and clinical management. The AI model also answered the questions rapidly, taking only a fraction of the time needed by human test-takers. CONCLUSION ChatGPT's performance on the French language version of the EBO examination demonstrates its potential to be a valuable tool in medical education and knowledge assessment. Further research is needed to explore optimal ways to implement AI language models in medical education and to address the associated ethical and practical concerns.
Collapse
Affiliation(s)
- C Panthier
- Department of Ophthalmology, Rothschild Foundation Hospital, 25, rue Manin, 75019 Paris, France; Center of Expertise and Research in Optics for Vision (CEROV), Paris, France
| | - D Gatinel
- Department of Ophthalmology, Rothschild Foundation Hospital, 25, rue Manin, 75019 Paris, France; Center of Expertise and Research in Optics for Vision (CEROV), Paris, France.
| |
Collapse
|
142
|
Gravel J, D’Amours-Gravel M, Osmanlliu E. Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions. MAYO CLINIC PROCEEDINGS. DIGITAL HEALTH 2023; 1:226-234. [PMID: 40206627 PMCID: PMC11975740 DOI: 10.1016/j.mcpdig.2023.05.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
Objective To evaluate the quality of the answers and the references provided by ChatGPT for medical questions. Patients and Methods Three researchers asked ChatGPT 20 medical questions and prompted it to provide the corresponding references. The responses were evaluated for the quality of content by medical experts using a verbal numeric scale going from 0% to 100%. These experts were the corresponding authors of the 20 articles from where the medical questions were derived. We planned to evaluate 3 references per response for their pertinence, but this was amended on the basis of preliminary results showing that most references provided by ChatGPT were fabricated. This experimental observational study was conducted in February 2023. Results ChatGPT provided responses varying between 53 and 244 words long and reported 2 to 7 references per answer. Seventeen of the 20 invited raters provided feedback. The raters reported limited quality of the responses, with a median score of 60% (first and third quartiles: 50% and 85%, respectively). In addition, they identified major (n=5) and minor (n=7) factual errors among the 17 evaluated responses. Of the 59 references evaluated, 41 (69%) were fabricated, although they appeared real. Most fabricated citations used names of authors with previous relevant publications, a title that seemed pertinent and a credible journal format. Conclusion When asked multiple medical questions, ChatGPT provided answers of limited quality for scientific publication. More importantly, ChatGPT provided deceptively real references. Users of ChatGPT should pay particular attention to the references provided before integration into medical manuscripts.
Collapse
Affiliation(s)
- Jocelyn Gravel
- Department of Pediatric Emergency Medicine, CHU Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | | | - Esli Osmanlliu
- Division of Pediatric Emergency Medicine, Montréal Children Hospital, McGill University, Montréal, Québec, Canada
| |
Collapse
|
143
|
Alhasan K, Raina R, Jamal A, Temsah MH. Combining human and AI could predict nephrologies future, but should be handled with care. Acta Paediatr 2023; 112:1844-1848. [PMID: 37278392 DOI: 10.1111/apa.16867] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/07/2023]
Affiliation(s)
- Khalid Alhasan
- Pediatrics Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Solid Organ Transplant Center of Excellence, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Rupesh Raina
- Department of Nephrology, Cleveland Clinic Akron General and Akron Childrens Hospital, Akron, Ohio, USA
| | - Amr Jamal
- Department of Family and Community Medicine, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Mohamad-Hani Temsah
- Pediatrics Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
144
|
Lundin RM, Berk M, Østergaard SD. ChatGPT on ECT: Can Large Language Models Support Psychoeducation? J ECT 2023; 39:130-133. [PMID: 37310145 DOI: 10.1097/yct.0000000000000941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Affiliation(s)
- Robert M Lundin
- From the Barwon Health MHDAS, Change to Improve Mental Health (CHIME), University Hospital Geelong, Geelong, Victoria, Australia
| | - Michael Berk
- Deakin University, Institute for Mental and Physical Health and Clinical Translation (IMPACT)
| | | |
Collapse
|
145
|
Ho WLJ, Koussayer B, Sujka J. ChatGPT: Friend or foe in medical writing? An example of how ChatGPT can be utilized in writing case reports. SURGERY IN PRACTICE AND SCIENCE 2023; 14:100185. [PMID: 39845855 PMCID: PMC11749974 DOI: 10.1016/j.sipas.2023.100185] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 05/30/2023] [Accepted: 05/31/2023] [Indexed: 01/24/2025] Open
Abstract
ChatGPT is a chatbot built on a natural language processing model which can generate human-like responses to prompts given to it. Despite its lack of domain-specific training, ChatGPT has developed remarkable accuracy in interpreting clinical information. In this article, we aim to assess what role ChatGPT can serve in medical writing. We recruited a first-year medical student with no prior experience in writing case reports to write a case report on a complex surgery with the assistance of ChatGPT. After a thorough evaluation of its responses, we believe that ChatGPT is a powerful medical writing tool that can be used to generate summaries, proofread, and provide valuable medical insight. However, ChatGPT is not a substitute for a study author due to several significant limitations, and should instead be used in conjunction with the author during the writing process. As the impact of natural language processing models such as ChatGPT grows, we suggest that guidelines be established on how to better utilize this technology to improve clinical research rather than outright prohibiting its usage.
Collapse
Affiliation(s)
- Wai Lone Jonathan Ho
- USF Health Morsani College of Medicine, 560 Channelside Dr, Tampa, FL 33602, United States
| | - Bilal Koussayer
- USF Health Morsani College of Medicine, 560 Channelside Dr, Tampa, FL 33602, United States
| | - Joseph Sujka
- USF Department of General Surgery 2 Tampa General Circle, 7th Floor Tampa, FL 33606, United States
| |
Collapse
|
146
|
Inojosa H, Gilbert S, Kather JN, Proschmann U, Akgün K, Ziemssen T. Can ChatGPT explain it? Use of artificial intelligence in multiple sclerosis communication. Neurol Res Pract 2023; 5:48. [PMID: 37649106 PMCID: PMC10469796 DOI: 10.1186/s42466-023-00270-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 07/20/2023] [Indexed: 09/01/2023] Open
Affiliation(s)
- Hernan Inojosa
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus, Technische Univesität Dresden, Fetscherstr. 74, 01307, Dresden, Germany
| | - Stephen Gilbert
- Else Kröner-Fresenius Center for Digital Health, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Jakob Nikolas Kather
- Else Kröner-Fresenius Center for Digital Health, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Undine Proschmann
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus, Technische Univesität Dresden, Fetscherstr. 74, 01307, Dresden, Germany
| | - Katja Akgün
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus, Technische Univesität Dresden, Fetscherstr. 74, 01307, Dresden, Germany
| | - Tjalf Ziemssen
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus, Technische Univesität Dresden, Fetscherstr. 74, 01307, Dresden, Germany.
| |
Collapse
|
147
|
Ibrahim H, Liu F, Asim R, Battu B, Benabderrahmane S, Alhafni B, Adnan W, Alhanai T, AlShebli B, Baghdadi R, Bélanger JJ, Beretta E, Celik K, Chaqfeh M, Daqaq MF, Bernoussi ZE, Fougnie D, Garcia de Soto B, Gandolfi A, Gyorgy A, Habash N, Harris JA, Kaufman A, Kirousis L, Kocak K, Lee K, Lee SS, Malik S, Maniatakos M, Melcher D, Mourad A, Park M, Rasras M, Reuben A, Zantout D, Gleason NW, Makovi K, Rahwan T, Zaki Y. Perception, performance, and detectability of conversational artificial intelligence across 32 university courses. Sci Rep 2023; 13:12187. [PMID: 37620342 PMCID: PMC10449897 DOI: 10.1038/s41598-023-38964-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 07/18/2023] [Indexed: 08/26/2023] Open
Abstract
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work-a possibility that has sparked ample discussion on the integrity of student evaluation processes in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses across various disciplines. Further, students' perspectives regarding the use of such tools in school work, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of the state-of-the-art tool, ChatGPT, against that of students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a global survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use in school work. We find that ChatGPT's performance is comparable, if not superior, to that of students in a multitude of courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to both their propensity to classify human-written answers as AI-generated, as well as the relative ease with which AI-generated text can be edited to evade detection. Finally, there seems to be an emerging consensus among students to use the tool, and among educators to treat its use as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of artificial intelligence into educational frameworks.
Collapse
Affiliation(s)
- Hazem Ibrahim
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Fengyuan Liu
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Rohail Asim
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Balaraju Battu
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Bashar Alhafni
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Wifag Adnan
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Tuka Alhanai
- Division of Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Bedoor AlShebli
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Riyadh Baghdadi
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Elena Beretta
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Kemal Celik
- Division of Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Moumena Chaqfeh
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Mohammed F Daqaq
- Division of Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Daryl Fougnie
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Alberto Gandolfi
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Andras Gyorgy
- Division of Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Nizar Habash
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - J Andrew Harris
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Aaron Kaufman
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Korhan Kocak
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Kangsan Lee
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Seungah S Lee
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Samreen Malik
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - David Melcher
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Azzam Mourad
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Minsu Park
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Mahmoud Rasras
- Division of Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Alicja Reuben
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Dania Zantout
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Nancy W Gleason
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Kinga Makovi
- Division of Social Science, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Talal Rahwan
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Yasir Zaki
- Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE.
| |
Collapse
|
148
|
Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023; 6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open
Abstract
ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.
Collapse
Affiliation(s)
- Casey Watters
- Faculty of Law, Bond University, Gold Coast, QLD, Australia
| | | |
Collapse
|
149
|
Emsley R. ChatGPT: these are not hallucinations - they're fabrications and falsifications. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2023; 9:52. [PMID: 37598184 PMCID: PMC10439949 DOI: 10.1038/s41537-023-00379-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 07/18/2023] [Indexed: 08/21/2023]
|
150
|
Levin G, Meyer R, Yasmeen A, Yang B, Guigue PA, Bar-Noy T, Tatar A, Perelshtein Brezinov O, Brezinov Y. Chat Generative Pre-trained Transformer-written obstetrics and gynecology abstracts fool practitioners. Am J Obstet Gynecol MFM 2023; 5:100993. [PMID: 37127209 DOI: 10.1016/j.ajogmf.2023.100993] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/03/2023]
Affiliation(s)
- Gabriel Levin
- Division of Cardiology, Jewish General Hospital, McGill University, Montreal, QC, Canada.
| | - Raanan Meyer
- Division of Cardiology, Jewish General Hospital, McGill University, Montreal, QC, Canada
| | - Amber Yasmeen
- Lady Davis Institute for Cancer Research, Jewish General Hospital McGill University Quebec, Canada
| | - Bowen Yang
- Department of Gynecology and Obstetrics, West China Second University Hospital, Sichuan University, Chengdu 610041, China; Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu 610041, China
| | - Paul-Adrien Guigue
- Division of Cardiology, Jewish General Hospital, McGill University, Montreal, QC, Canada
| | - Tomer Bar-Noy
- Division of Cardiology, Jewish General Hospital, McGill University, Montreal, QC, Canada
| | - Angela Tatar
- Division of Cardiology, Jewish General Hospital, McGill University, Montreal, QC, Canada
| | | | - Yoav Brezinov
- Experimental Surgery, McGill University, Montreal, Quebec, Canada; Lady Davis Institute, Jewish General Hospital, Montreal, Quebec, Canada; Kaplan Medical Center, Hebrew University, Jerusalem, Israel
| |
Collapse
|