1
|
Nield LS, Nguyen J, Nguyen E, Vallejo MC. Rate of AI-Generated Text in Medical School Applicants' Personal Comments Essays. J Gen Intern Med 2025; 40:1936-1937. [PMID: 39653995 PMCID: PMC12119419 DOI: 10.1007/s11606-024-09247-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 11/26/2024] [Indexed: 05/31/2025]
Affiliation(s)
- Linda S Nield
- West Virginia University School of Medicine, Morgantown, WV, USA.
- Department of Medical Education, West Virginia University School of Medicine, PO Box 9111, Morgantown, WV, USA.
| | - John Nguyen
- West Virginia University School of Medicine, Morgantown, WV, USA
- Department of Ophthalmology and Visual Sciences, West Virginia University School of Medicine, Morgantown, WV, USA
| | - Emily Nguyen
- West Virginia University School of Medicine, Morgantown, WV, USA
| | - Manuel C Vallejo
- West Virginia University School of Medicine, Morgantown, WV, USA
- Department of Medical Education, West Virginia University School of Medicine, PO Box 9111, Morgantown, WV, USA
| |
Collapse
|
2
|
Hyatt JPK, Bienenstock EJ, Firetto CM, Woods ER, Comus RC. Using aggregated AI detector outcomes to eliminate false positives in STEM-student writing. ADVANCES IN PHYSIOLOGY EDUCATION 2025; 49:486-495. [PMID: 40105702 DOI: 10.1152/advan.00235.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 01/13/2025] [Accepted: 03/17/2025] [Indexed: 03/20/2025]
Abstract
Generative artificial intelligence (AI) large language models have become sufficiently accessible and user-friendly to assist students with course work, studying tactics, and written communication. AI-generated writing is almost indistinguishable from human-derived work. Instructors must rely on intuition/experience and, recently, assistance from online AI detectors to help them distinguish between student- and AI-written material. Here, we tested the veracity of AI detectors for writing samples from a fact-heavy, lower-division undergraduate anatomy and physiology course. Student participants (n = 190) completed three parts: a hand-written essay answering a prompt on the structure/function of the plasma membrane; creating an AI-generated answer to the same prompt; and a survey seeking participants' views on the quality of each essay as well as general AI use. Randomly selected (n = 50) participant-written and AI-generated essays were blindly uploaded onto four AI detectors; a separate and unique group of randomly selected essays (n = 48) was provided to human raters (n = 9) for classification assessment. For the majority of essays, human raters and the best-performing AI detectors (n = 3) similarly identified their correct origin (84-95% and 93-98%, respectively) (P > 0.05). Approximately 1.3% and 5.0% of the essays were detected as false positives (human writing incorrectly labeled as AI) by AI detectors and human raters, respectively. Surveys generally indicated that students viewed the AI-generated work as better than their own (P < 0.01). Using AI detectors in aggregate reduced the likelihood of detecting a false positive to nearly 0%, and this strategy was validated against human rater-labeled false positives. Taken together, our findings show that AI detectors, when used together, become a powerful tool to inform instructors.NEW & NOTEWORTHY We show how online artificial intelligence (AI) detectors can assist instructors in distinguishing between human- and AI-written work for written assignments. Although individual AI detectors may vary in their accuracy for correctly identifying the origin of written work, they are most effective when used in aggregate to inform instructors when human intuition gets it wrong. Using AI detectors for consensus detection reduces the false positive rate to nearly zero.
Collapse
Affiliation(s)
- Jon-Philippe K Hyatt
- College of Integrative Sciences and Arts, Arizona State University, Tempe, Arizona, United States
| | - Elisa Jayne Bienenstock
- Watts College of Public Service and Community Solutions, Arizona State University, Tempe, Arizona, United States
| | - Carla M Firetto
- Mary Lou Fulton College for Teaching and Learning Innovation, Arizona State University, Tempe, Arizona, United States
| | - Elizabeth R Woods
- College of Integrative Sciences and Arts, Arizona State University, Tempe, Arizona, United States
| | - Robert C Comus
- College of Integrative Sciences and Arts, Arizona State University, Tempe, Arizona, United States
| |
Collapse
|
3
|
Ozdag Y, Mahmoud M, Klena JC, Grandizio LC. Artificial Intelligence in Personal Statements Within Orthopaedic Surgery Residency Applications. J Am Acad Orthop Surg 2025; 33:554-560. [PMID: 40101179 DOI: 10.5435/jaaos-d-24-01285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 01/28/2025] [Indexed: 03/20/2025] Open
Abstract
PURPOSE Artificial intelligence (AI) has been increasingly studied within medical education and clinical practice. At present, it remains uncertain if AI is being used to write personal statements (PSs) for orthopaedic surgery residency applications. Our purpose was to analyze PS that were submitted to our institution and determine the rate of AI utilization within these texts. METHODS Four groups were created for comparison: 100 PS submitted before the release of ChatGTP (PRE-PS), 100 PS submitted after Chat Generative Pre-Trained Transformers introduction (POST-PS), 10 AI-generated PS (AI-PS), and 10 hybrid PS (H-PS), which contained both human-generated and AI-generated text. For each of the four groups, AI detection software (GPT-Zero) was used to quantify the percentage of human-generated text, "mixed" text, and AI-generated text. In addition, the detection software provided level of confidence (highly confident, moderately confident, uncertain) with respect to the "final verdict" of human-generated versus AI-generated text. RESULTS The percentage of human-generated text in the PRE-PS, POST-PS, H-PS, and AI-PS groups were 94%, 93%, 28%, and 0% respectively. All 200 PS (100%) submitted to our program had a final verdict of "human" with verdict confidence of >90%. By contrast, all AI-generated statements (H-PS and AI-PS groups) had a final verdict of "AI." Verdict confidence for the AI-PS group was 100%. CONCLUSION Orthopaedic surgery residency applicants do not appear, at present, to be using AI to create PS included in their applications. AI detection software (GPTZero) appears to be able to accurately detect human-generated and AI-generated PSs for orthopaedic residency applications. Considering the increasing role and development of AI software, future investigations should endeavor to explore if these results change over time. Similar to orthopaedic journals, guidelines should be established that pertain to the use of AI on postgraduate training applications. LEVEL OF EVIDENCE V-Nonclinical.
Collapse
Affiliation(s)
- Yagiz Ozdag
- From the Department of Orthopaedic Surgery, Geisinger Commonwealth School of Medicine, Geisinger Musculoskeletal Institute, Danville, PA
| | | | | | | |
Collapse
|
4
|
Kutler RB, Setzen SA, Tsai S, Rameau A. An Evaluation of Current Trends in AI-Generated Text in Otolaryngology Publications. Laryngoscope 2025. [PMID: 40277459 DOI: 10.1002/lary.32202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 03/07/2025] [Accepted: 03/31/2025] [Indexed: 04/26/2025]
Abstract
OBJECTIVES Since the release of ChatGPT-4 in March 2023, large language models (LLMs) application in biomedical manuscript production has been widespread. GPT-modified text detectors, such as GPTzero, lack sensitivity and reliability and do not quantify the amount of AI-generated text. However, recent work has identified certain adjectives more frequently used by LLMs that can help identify and quantify LLM-modified text. The aim of this study is to utilize these adjectives to identify LLM-generated text in otolaryngology publications. STUDY DESIGN Meta-research. METHODS Twenty-five otolaryngology journals were studied between November 2022 and July 2024, encompassing 8751 published works. Articles from countries where ChatGPT-4 is not available were removed, yielding 7702 articles for study inclusion. These publications were analyzed using a Python script to determine the frequency of the top 100 adjectives disproportionately generated by ChatGPT-4. RESULTS A significant increase in the frequency of adjectives associated with GPT use was observed from November 2023 to July 2024 across all journals (p < 0.001), with a significant difference before and after the release of ChatGPT in March 2023. Journals with higher impact factors had significantly lower usage of GPT-associated adjectives than those with lower impact factors (p < 0.001). There was no significant difference in GPT-associated adjective use by first authors with a doctoral degree versus those without. Publications by authors from English-speaking countries demonstrated a significantly more frequent use of LLM-associated adjectives (p < 0.001). CONCLUSIONS This study suggests that ChatGPT use in otolaryngology manuscript production has significantly increased since the release of ChatGPT-4. Future research should be aimed at further characterizing the landscape of AI-generated text in otolaryngology and developing tools that encourage authors' transparency regarding the use of LLMs. LEVEL OF EVIDENCE NA.
Collapse
Affiliation(s)
- Rachel B Kutler
- Department of Otolaryngology - Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| | - Sean A Setzen
- Department of Otolaryngology - Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| | - Samantha Tsai
- Department of Otolaryngology - Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| | - Anaïs Rameau
- Department of Otolaryngology - Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
5
|
Al-Rawas M, Qader OAJA, Othman NH, Ismail NH, Mamat R, Halim MS, Abdullah JY, Noorani TY. Identification of dental related ChatGPT generated abstracts by senior and young academicians versus artificial intelligence detectors and a similarity detector. Sci Rep 2025; 15:11275. [PMID: 40175423 PMCID: PMC11965432 DOI: 10.1038/s41598-025-95387-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 03/20/2025] [Indexed: 04/04/2025] Open
Abstract
Several researchers have investigated the consequences of using ChatGPT in the education industry. Their findings raised doubts regarding the probable effects that ChatGPT may have on the academia. As such, the present study aimed to assess the ability of three methods, namely: (1) academicians (senior and young), (2) three AI detectors (GPT-2 output detector, Writefull GPT detector, and GPTZero) and (3) one plagiarism detector, to differentiate between human- and ChatGPT-written abstracts. A total of 160 abstracts were assessed by those three methods. Two senior and two young academicians used a newly developed rubric to assess the type and quality of 80 human-written and 80 ChatGPT-written abstracts. The results were statistically analysed using crosstabulation and chi-square analysis. Bivariate correlation and accuracy of the methods were assessed. The findings demonstrated that all the three methods made a different variety of incorrect assumptions. The level of the academician experience may play a role in the detection ability with senior academician 1 demonstrating superior accuracy. GPTZero AI and similarity detectors were very good at accurately identifying the abstracts origin. In terms of abstract type, every variable positively correlated, except in the case of similarity detectors (p < 0.05). Human-AI collaborations may significantly benefit the identification of the abstract origins.
Collapse
Affiliation(s)
- Matheel Al-Rawas
- Prosthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
| | | | - Nurul Hanim Othman
- Prosthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
| | - Noor Huda Ismail
- Prosthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
| | - Rosnani Mamat
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
- Conservative Dentistry Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
| | - Mohamad Syahrizal Halim
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
- Conservative Dentistry Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia
| | - Johari Yap Abdullah
- Craniofacial Imaging Laboratory, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, 16150 Kubang Kerian, Kota Bharu, Kelantan, Malaysia.
- Dental Research Unit, Center for Transdisciplinary Research (CFTR), Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, Tamil Nadu, India.
| | - Tahir Yusuf Noorani
- Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia.
- Conservative Dentistry Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia.
- Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, Tamil Nadu, India.
| |
Collapse
|
6
|
Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, Klifto CS, Levy JC, Namdari S, Sanchez-Sotelo J, Menendez ME. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy 2025; 41:916-924.e2. [PMID: 38992513 DOI: 10.1016/j.arthro.2024.06.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/13/2024]
Abstract
PURPOSE To evaluate the extent to which experienced reviewers can accurately discern between artificial intelligence (AI)-generated and original research abstracts published in the field of shoulder and elbow surgery and compare this with the performance of an AI detection tool. METHODS Twenty-five shoulder- and elbow-related articles published in high-impact journals in 2023 were randomly selected. ChatGPT was prompted with only the abstract title to create an AI-generated version of each abstract. The resulting 50 abstracts were randomly distributed to and evaluated by 8 blinded peer reviewers with at least 5 years of experience. Reviewers were tasked with distinguishing between original and AI-generated text. A Likert scale assessed reviewer confidence for each interpretation, and the primary reason guiding assessment of generated text was collected. AI output detector (0%-100%) and plagiarism (0%-100%) scores were evaluated using GPTZero. RESULTS Reviewers correctly identified 62% of AI-generated abstracts and misclassified 38% of original abstracts as being AI generated. GPTZero reported a significantly higher probability of AI output among generated abstracts (median, 56%; interquartile range [IQR], 51%-77%) compared with original abstracts (median, 10%; IQR, 4%-37%; P < .01). Generated abstracts scored significantly lower on the plagiarism detector (median, 7%; IQR, 5%-14%) relative to original abstracts (median, 82%; IQR, 72%-92%; P < .01). Correct identification of AI-generated abstracts was predominately attributed to the presence of unrealistic data/values. The primary reason for misidentifying original abstracts as AI was attributed to writing style. CONCLUSIONS Experienced reviewers faced difficulties in distinguishing between human and AI-generated research content within shoulder and elbow surgery. The presence of unrealistic data facilitated correct identification of AI abstracts, whereas misidentification of original abstracts was often ascribed to writing style. CLINICAL RELEVANCE With rapidly increasing AI advancements, it is paramount that ethical standards of scientific reporting are upheld. It is therefore helpful to understand the ability of reviewers to identify AI-generated content.
Collapse
Affiliation(s)
- Ryan D Stadler
- Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey, U.S.A..
| | - Suleiman Y Sudah
- Department of Orthopaedic Surgery, Monmouth Medical Center, Monmouth, New Jersey, U.S.A
| | - Michael A Moverman
- Department of Orthopaedics, University of Utah School of Medicine, Salt Lake City, Utah, U.S.A
| | | | | | - Grant E Garrigues
- Midwest Orthopaedics at Rush University Medical Center, Chicago, Illinois, U.S.A
| | - Christopher S Klifto
- Department of Orthopaedic Surgery, Duke University School of Medicine, Durham, North Carolina, U.S.A
| | - Jonathan C Levy
- Levy Shoulder Center at Paley Orthopedic & Spine Institute, Boca Raton, Florida, U.S.A
| | - Surena Namdari
- Rothman Orthopaedic Institute at Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, U.S.A
| | | | - Mariano E Menendez
- Department of Orthopaedics, University of California Davis, Sacramento, California, U.S.A
| |
Collapse
|
7
|
Seifert R, Hartman E, Wang K, Yildiz D. Authors must follow the editorial guidelines on the use of large language models in review papers. NAUNYN-SCHMIEDEBERG'S ARCHIVES OF PHARMACOLOGY 2025:10.1007/s00210-025-04102-1. [PMID: 40156609 DOI: 10.1007/s00210-025-04102-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/01/2025]
Affiliation(s)
- Roland Seifert
- Institute of Pharmacology, Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany.
| | - Erik Hartman
- Department of Clinical Sciences, Lund University, Lund, Sweden
| | - KeWei Wang
- Department of Pharmacology, Qingdao University, Qingdao, China
| | - Daniela Yildiz
- Molecular Pharmacology, University of the Saarland, Saarbrücken, Germany
| |
Collapse
|
8
|
Kim J, Vajravelu BN. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Form Res 2025; 9:e51319. [PMID: 39819585 PMCID: PMC11756841 DOI: 10.2196/51319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/31/2024] [Accepted: 09/03/2024] [Indexed: 01/19/2025] Open
Abstract
Unlabelled The integration of large language models (LLMs), as seen with the generative pretrained transformers series, into health care education and clinical management represents a transformative potential. The practical use of current LLMs in health care sparks great anticipation for new avenues, yet its embracement also elicits considerable concerns that necessitate careful deliberation. This study aims to evaluate the application of state-of-the-art LLMs in health care education, highlighting the following shortcomings as areas requiring significant and urgent improvements: (1) threats to academic integrity, (2) dissemination of misinformation and risks of automation bias, (3) challenges with information completeness and consistency, (4) inequity of access, (5) risks of algorithmic bias, (6) exhibition of moral instability, (7) technological limitations in plugin tools, and (8) lack of regulatory oversight in addressing legal and ethical challenges. Future research should focus on strategically addressing the persistent challenges of LLMs highlighted in this paper, opening the door for effective measures that can improve their application in health care education.
Collapse
Affiliation(s)
- JaeYong Kim
- School of Pharmacy, Massachusetts College of Pharmacy and Health Sciences, Boston, MA, United States
| | - Bathri Narayan Vajravelu
- Department of Physician Assistant Studies, Massachusetts College of Pharmacy and Health Sciences, 179 Longwood Avenue, Boston, MA, 02115, United States, 1 6177322961
| |
Collapse
|
9
|
Lee JM. Strategies for integrating ChatGPT and generative AI into clinical studies. Blood Res 2024; 59:45. [PMID: 39718704 PMCID: PMC11668709 DOI: 10.1007/s44313-024-00045-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 11/07/2024] [Indexed: 12/25/2024] Open
Abstract
Large language models, specifically ChatGPT, are revolutionizing clinical research by improving content creation and providing specific useful features. These technologies can transform clinical research, including data collection, analysis, interpretation, and results sharing. However, integrating these technologies into the academic writing workflow poses significant challenges. In this review, I investigated the integration of large-language model-based AI tools into clinical research, focusing on practical implementation strategies and addressing the ethical considerations associated with their use. Additionally, I provide examples of the safe and sound use of generative AI in clinical research and emphasize the need to ensure that AI-generated outputs are reliable and valid in scholarly writing settings. In conclusion, large language models are a powerful tool for organizing and expressing ideas efficiently; however, they have limitations. Writing an academic paper requires critical analysis and intellectual input from the authors. Moreover, AI-generated text must be carefully reviewed to reflect the authors' insights. These AI tools significantly enhance the efficiency of repetitive research tasks, although challenges related to plagiarism detection and ethical use persist.
Collapse
Affiliation(s)
- Jeong-Moo Lee
- Department of Surgery, Division of HBP Surgery, Seoul National University Hospital, Seoul National University College of Medicine, 101 Daehak-ro, Jongno-Gu, Seoul, 03080, Republic of Korea.
| |
Collapse
|
10
|
Kocak Z. Publication Ethics in the Era of Artificial Intelligence. J Korean Med Sci 2024; 39:e249. [PMID: 39189714 PMCID: PMC11347185 DOI: 10.3346/jkms.2024.39.e249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 07/17/2024] [Indexed: 08/28/2024] Open
Abstract
The application of new technologies, such as artificial intelligence (AI), to science affects the way and methodology in which research is conducted. While the responsible use of AI brings many innovations and benefits to science and humanity, its unethical use poses a serious threat to scientific integrity and literature. Even in the absence of malicious use, the Chatbot output itself, as a software application based on AI, carries the risk of containing biases, distortions, irrelevancies, misrepresentations and plagiarism. Therefore, the use of complex AI algorithms raises concerns about bias, transparency and accountability, requiring the development of new ethical rules to protect scientific integrity. Unfortunately, the development and writing of ethical codes cannot keep up with the pace of development and implementation of technology. The main purpose of this narrative review is to inform readers, authors, reviewers and editors about new approaches to publication ethics in the era of AI. It specifically focuses on tips on how to disclose the use of AI in your manuscript, how to avoid publishing entirely AI-generated text, and current standards for retraction.
Collapse
Affiliation(s)
- Zafer Kocak
- Department of Radiation Oncology, Trakya University School of Medicine, Edirne, Türkiye.
| |
Collapse
|
11
|
Alnaimat F, Al-Halaseh S, AlSamhori ARF. Evolution of Research Reporting Standards: Adapting to the Influence of Artificial Intelligence, Statistics Software, and Writing Tools. J Korean Med Sci 2024; 39:e231. [PMID: 39164055 PMCID: PMC11333804 DOI: 10.3346/jkms.2024.39.e231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/01/2024] [Indexed: 08/22/2024] Open
Abstract
Reporting standards are essential to health research as they improve accuracy and transparency. Over time, significant changes have occurred to the requirements for reporting research to ensure comprehensive and transparent reporting across a range of study domains and foster methodological rigor. The establishment of the Declaration of Helsinki, Consolidated Standards of Reporting Trials (CONSORT), Strengthening the Reporting of Observational Studies in Epidemiology (STROBE), and Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) are just a few of the historic initiatives that have increased research transparency. Through enhanced discoverability, statistical analysis facilitation, article quality enhancement, and language barrier reduction, artificial intelligence (AI)-in particular, large language models like ChatGPT-has transformed academic writing. However, problems with errors that could occur and the need for transparency while utilizing AI tools still exist. Modifying reporting rules to include AI-driven writing tools such as ChatGPT is ethically and practically challenging. In academic writing, precautions for truth, privacy, and responsibility are necessary due to concerns about biases, openness, data limits, and potential legal ramifications. The CONSORT-AI and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT)-AI Steering Group expands the CONSORT guidelines for AI clinical trials-new checklists like METRICS and CLEAR help to promote transparency in AI studies. Responsible usage of technology in research and writing software adoption requires interdisciplinary collaboration and ethical assessment. This study explores the impact of AI technologies, specifically ChatGPT, on past reporting standards and the need for revised guidelines for open, reproducible, and robust scientific publications.
Collapse
Affiliation(s)
- Fatima Alnaimat
- Division of Rheumatology, Department of Internal Medicine, School of Medicine, University of Jordan, Amman, Jordan.
| | - Salameh Al-Halaseh
- Department of Internal Medicine, School of Medicine, University of Jordan, Amman, Jordan
| | | |
Collapse
|
12
|
Howard FM, Li A, Riffon MF, Garrett-Mayer E, Pearson AT. Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023. JCO Clin Cancer Inform 2024; 8:e2400077. [PMID: 38822755 PMCID: PMC11371107 DOI: 10.1200/cci.24.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 06/03/2024] Open
Abstract
PURPOSE Artificial intelligence (AI) models can generate scientific abstracts that are difficult to distinguish from the work of human authors. The use of AI in scientific writing and performance of AI detection tools are poorly characterized. METHODS We extracted text from published scientific abstracts from the ASCO 2021-2023 Annual Meetings. Likelihood of AI content was evaluated by three detectors: GPTZero, Originality.ai, and Sapling. Optimal thresholds for AI content detection were selected using 100 abstracts from before 2020 as negative controls, and 100 produced by OpenAI's GPT-3 and GPT-4 models as positive controls. Logistic regression was used to evaluate the association of predicted AI content with submission year and abstract characteristics, and adjusted odds ratios (aORs) were computed. RESULTS Fifteen thousand five hundred and fifty-three abstracts met inclusion criteria. Across detectors, abstracts submitted in 2023 were significantly more likely to contain AI content than those in 2021 (aOR range from 1.79 with Originality to 2.37 with Sapling). Online-only publication and lack of clinical trial number were consistently associated with AI content. With optimal thresholds, 99.5%, 96%, and 97% of GPT-3/4-generated abstracts were identified by GPTZero, Originality, and Sapling respectively, and no sampled abstracts from before 2020 were classified as AI generated by the GPTZero and Originality detectors. Correlation between detectors was low to moderate, with Spearman correlation coefficient ranging from 0.14 for Originality and Sapling to 0.47 for Sapling and GPTZero. CONCLUSION There is an increasing signal of AI content in ASCO abstracts, coinciding with the growing popularity of generative AI models.
Collapse
Affiliation(s)
- Frederick M. Howard
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| | - Anran Li
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| | - Mark F. Riffon
- Center for Research and Analytics, American Society of Clinical Oncology, Alexandria, VA
| | | | - Alexander T. Pearson
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| |
Collapse
|
13
|
Habibzadeh F. Plagiarism: A Bird's Eye View. J Korean Med Sci 2023; 38:e373. [PMID: 37987104 PMCID: PMC10659926 DOI: 10.3346/jkms.2023.38.e373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 09/22/2023] [Indexed: 11/22/2023] Open
Abstract
Plagiarism is among the prevalent misconducts reported in scientific writing and common causes of article retraction in scholarly journals. Plagiarism of idea is not acceptable by any means. However, plagiarism of text is a matter of debate from culture to culture. Herein, I wish to reflect on a bird's eye view of plagiarism, particularly plagiarism of text, in scientific writing. Text similarity score as a signal of text plagiarism is not an appropriate index and an expert should examine the similarity with enough scrutiny. Text recycling in certain instances might be acceptable in scientific writing provided that the authors could correctly construe the text piece they borrowed. With introduction of artificial intelligence-based units, which help authors to write their manuscripts, the incidence of text plagiarism might increase. However, after a while, when a universal artificial unit takes over, no one will need to worry about text plagiarism as the incentive to commit plagiarism will be abolished, I believe.
Collapse
Affiliation(s)
- Farrokh Habibzadeh
- Past President, World Association of Medical Editors (WAME), Editorial Consultant, The Lancet, Associate Editor, Frontiers in Epidemiology.
| |
Collapse
|