1
|
Karakash WJ, Avetisian H, Ragheb JM, Wang JC, Hah RJ, Alluri RK. Artificial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants? Global Spine J 2025:21925682251344248. [PMID: 40392947 DOI: 10.1177/21925682251344248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/22/2025] Open
Abstract
Study DesignA comparative analysis of AI-generated vs human-authored personal statements for spine surgery fellowship applications.ObjectiveTo assess whether evaluators could differentiate between ChatGPT- and human-authored personal statements and determine if AI-generated statements could outperform human-authored ones in quality metrics.Summary of Background DataPersonal statements are key in fellowship admissions, but the rise of AI tools like ChatGPT raises concerns about their use. While previous studies have examined AI-generated residency statements, their role in spine fellowship applications remains unexplored.MethodsNine personal statements (4 ChatGPT-generated, 5 human-authored) were evaluated by 8 blinded reviewers (6 attending spine surgeons and 2 fellows). ChatGPT-4o was prompted to create statements focused on 4 unique experiences. Evaluators rated each for readability, originality, quality, and authenticity (0-100 scale), determined AI authorship, and indicated interview recommendations.ResultsChatGPT-authored statements scored higher in readability (65.69 vs 56.40, P = 0.016) and quality (63.00 vs 51.80, P = 0.004) but showed no differences in originality (P = 0.339) or authenticity (P = 0.256). Reviewers could not reliably distinguish AI from human authorship (P = 1.000). Interview recommendations favored ChatGPT-generated statements (84.4% vs 62.5%, OR: 3.24 [1.08-11.17], P = 0.045).ConclusionChatGPT can produce high quality, indistinguishable spine fellowship personal statements that increase interview likelihood. These findings highlight the need for nuanced guidelines regarding AI use in application processes, particularly considering its potential role in expanding access to high-quality writing assistance and editing.
Collapse
Affiliation(s)
- William J Karakash
- Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA
| | - Henry Avetisian
- Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA
| | - Jonathan M Ragheb
- Department of Orthopaedic Surgery, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, USA
| | - Jeffrey C Wang
- Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA
| | - Raymond J Hah
- Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA
| | - Ram K Alluri
- Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
2
|
Verghese BG, Iyer C, Borse T, Cooper S, White J, Sheehy R. Modern artificial intelligence and large language models in graduate medical education: a scoping review of attitudes, applications & practice. BMC MEDICAL EDUCATION 2025; 25:730. [PMID: 40394586 PMCID: PMC12093616 DOI: 10.1186/s12909-025-07321-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 05/09/2025] [Indexed: 05/22/2025]
Abstract
BACKGROUND Artificial intelligence (AI) holds transformative potential for graduate medical education (GME), yet, a comprehensive exploration of AI's applications, perceptions, and limitations in GME is lacking. OBJECTIVE To map the current literature on AI in GME, identifying prevailing perceptions, applications, and research gaps to inform future research, policy discussions, and educational practices through a scoping review. METHODS Following the Joanna Briggs Institute guidelines and the PRISMA-ScR checklist a comprehensive search of multiple databases up to February 2024 was performed to include studies addressing AI interventions in GME. RESULTS Out of 1734 citations, 102 studies met the inclusion criteria, conducted across 16 countries, predominantly from North America (72), Asia (14), and Europe (6). Radiology had the highest number of publications (21), followed by general surgery (11) and emergency medicine (8). The majority of studies were published in 2023. Several key thematic areas emerged from the literature. Initially, perceptions of AI in graduate medical education (GME) were mixed, but have increasingly shifted toward a more favorable outlook, particularly as the benefits of AI integration in education become more apparent. In assessments, AI demonstrated the ability to differentiate between skill levels and offer meaningful feedback. It has also been effective in evaluating narrative comments to assess resident performance. In the domain of recruitment, AI tools have been applied to analyze letters of recommendation, applications, and personal statements, helping identify potential biases and improve equity in candidate selection. Furthermore, large language models consistently outperformed average candidates on board certification and in-training examinations, indicating their potential utility in standardized assessments. Finally, AI tools showed promise in enhancing clinical decision-making by supporting trainees with improved diagnostic accuracy and efficiency. CONCLUSIONS This scoping review provides a comprehensive overview of applications and limitations of AI in GME but is limited with potential biases, study heterogeneity, and evolving nature of AI.
Collapse
Affiliation(s)
- Basil George Verghese
- Education for Health Professions Program, School of Education, Johns Hopkins University, 2800 N Charles St, Baltimore, MD, 21218, USA.
- Internal Medicine Residency Program, Rochester, NY, USA.
| | - Charoo Iyer
- West Virginia University, Morgantown, WV, USA
| | - Tanvi Borse
- Internal Medicine, Parkview Health, Fort Wayne, IN, USA
| | - Shiamak Cooper
- Internal Medicine, Rochester General Hospital, Rochester, NY, USA
| | - Jacob White
- Welch Medical Library, Johns Hopkins University, Baltimore, MD, USA
| | - Ryan Sheehy
- School of Medicine, University of Kansas Medical Center, Salina, KS campus, Kansas City, KS, USA
| |
Collapse
|
3
|
Genovese A, Borna S, Gomez-Cabello CA, Haider SA, Prabha S, Trabilsy M, Forte AJ. The Current Landscape of Artificial Intelligence in Plastic Surgery Education and Training: A Systematic Review. JOURNAL OF SURGICAL EDUCATION 2025; 82:103519. [PMID: 40378641 DOI: 10.1016/j.jsurg.2025.103519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 03/24/2025] [Accepted: 03/29/2025] [Indexed: 05/19/2025]
Abstract
OBJECTIVE Artificial intelligence (AI) shows promise in surgery, but its role in plastic surgery education remains underexplored. This review evaluates the current landscape of AI in plastic surgery education. DESIGN A systematic search was conducted on August 11, 2024, across PubMed, CINAHL, IEEE, Scopus, Web of Science, and Google Scholar using terms related to AI, plastic surgery, and education. Original research articles focusing on AI in plastic surgery education were included, excluding correspondence, reviews, book chapters, theses, corrections, and non-peer-reviewed or non-English articles. Two investigators independently screened studies and synthesized data. ROBINS-I was used to assess bias. RESULTS Fifteen studies were included, with 13 evaluating large language models (LLMs) such as ChatGPT, Microsoft Bing, and Google Bard. ChatGPT-4 outperformed other models on In-Service Examinations (average score of 72.7%) and demonstrated potential as a teaching assistant in plastic surgery education. AI-generated personal statements were comparable to human-written ones. However, ChatGPT showed inaccuracies in generating surgical protocols. ChatGPT demonstrated its ability to provide qualitative predictions, forecasting survey results that indicated limited current use of AI in plastic surgery education but support for further AI research. a study combined ChatGPT with DALL-E 2, a generative model, to create acceptable educational images. Machine learning was used in 1 study for evaluating surgical skill and providing real-time feedback during liposuction. Nine studies had low risk of bias, while 6 had moderate risk. CONCLUSIONS AI demonstrates potential as an educational tool in plastic surgery. However, limitations of evidence, such as AI model uncertainties, introduce ambiguity. While AI cannot replicate the expertise of seasoned surgeons, it shows promise for foundational learning and skill assessment. Developing authenticity guidelines and enhancing AI capabilities are essential for its effective, ethical integration into plastic surgery education.
Collapse
Affiliation(s)
- Ariana Genovese
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, Florida
| | - Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, Florida
| | | | - Syed Ali Haider
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, Florida
| | | | - Maissa Trabilsy
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, Florida
| | - Antonio Jorge Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, Florida; Center for Digital Health, Mayo Clinic, Rochester, Minnesota.
| |
Collapse
|
4
|
Ozdag Y, Mahmoud M, Klena JC, Grandizio LC. Artificial Intelligence in Personal Statements Within Orthopaedic Surgery Residency Applications. J Am Acad Orthop Surg 2025; 33:554-560. [PMID: 40101179 DOI: 10.5435/jaaos-d-24-01285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 01/28/2025] [Indexed: 03/20/2025] Open
Abstract
PURPOSE Artificial intelligence (AI) has been increasingly studied within medical education and clinical practice. At present, it remains uncertain if AI is being used to write personal statements (PSs) for orthopaedic surgery residency applications. Our purpose was to analyze PS that were submitted to our institution and determine the rate of AI utilization within these texts. METHODS Four groups were created for comparison: 100 PS submitted before the release of ChatGTP (PRE-PS), 100 PS submitted after Chat Generative Pre-Trained Transformers introduction (POST-PS), 10 AI-generated PS (AI-PS), and 10 hybrid PS (H-PS), which contained both human-generated and AI-generated text. For each of the four groups, AI detection software (GPT-Zero) was used to quantify the percentage of human-generated text, "mixed" text, and AI-generated text. In addition, the detection software provided level of confidence (highly confident, moderately confident, uncertain) with respect to the "final verdict" of human-generated versus AI-generated text. RESULTS The percentage of human-generated text in the PRE-PS, POST-PS, H-PS, and AI-PS groups were 94%, 93%, 28%, and 0% respectively. All 200 PS (100%) submitted to our program had a final verdict of "human" with verdict confidence of >90%. By contrast, all AI-generated statements (H-PS and AI-PS groups) had a final verdict of "AI." Verdict confidence for the AI-PS group was 100%. CONCLUSION Orthopaedic surgery residency applicants do not appear, at present, to be using AI to create PS included in their applications. AI detection software (GPTZero) appears to be able to accurately detect human-generated and AI-generated PSs for orthopaedic residency applications. Considering the increasing role and development of AI software, future investigations should endeavor to explore if these results change over time. Similar to orthopaedic journals, guidelines should be established that pertain to the use of AI on postgraduate training applications. LEVEL OF EVIDENCE V-Nonclinical.
Collapse
Affiliation(s)
- Yagiz Ozdag
- From the Department of Orthopaedic Surgery, Geisinger Commonwealth School of Medicine, Geisinger Musculoskeletal Institute, Danville, PA
| | | | | | | |
Collapse
|
5
|
Koga S, Du W. The Balance Between Personal Tone and AI-Generated Content in Academic Communication. Ann Surg Oncol 2025; 32:3447-3448. [PMID: 39827318 DOI: 10.1245/s10434-025-16903-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Accepted: 01/02/2025] [Indexed: 01/22/2025]
Affiliation(s)
- Shunsuke Koga
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
| | - Wei Du
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
6
|
Wihlidal JGJ, Wolter NE, Propst EJ, Lin V, Au M, Amin S, Siu JM. Generative AI in Otolaryngology Residency Personal Statement Writing: A Mixed-Methods Analysis. Laryngoscope 2025. [PMID: 40227955 DOI: 10.1002/lary.32188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 03/12/2025] [Accepted: 03/28/2025] [Indexed: 04/16/2025]
Abstract
OBJECTIVE Generative Artificial Intelligence (GAI) interfaces have rapidly integrated into various societal domains. Widespread accessibility of GAI for drafting personal statements poses challenges for evaluators to gauge writing ability and personal insight. This study aims to compare the quality of GAI-generated personal statements to those written by successful applicants in OHNS residency programs, via integration of statistical and qualitative thematic analyses. METHODS Personal statements were collected from successful OHNS residency applicants. Characteristic extraction from submitted statements was used to generate GAI-written personal statements using ChatGPT 4.0. All statements were blindly reviewed by 21 experienced evaluators on a 10-point Likert scale of authenticity, readability, personability, and overall quality. Thematic analysis of qualitative reviewer comments was conducted to extract deeper insights into evaluators' perceptions. Quantitative results were compared using independent t-tests, while thematic coding was performed inductively using NVivo software. RESULTS GAI-generated personal statements significantly outperformed applicant-written statements in all assessed domains, including authenticity (7.67 vs. 7.05, p = 0.002), readability (8.03 vs. 7.49, p = 0.002), personability (7.33 vs. 6.72, p = 0.004), and overall score (7.49 vs. 6.90, p = 0.005). Thematic analysis revealed that GAI statements were seen as "well-constructed but generic," while applicant statements were often "verbose and lacked focus." Additionally, reviewers noted concerns regarding personal insight and engagement in AI-generated statements. CONCLUSION GAI-generated personal statements were rated more favorably across all domains, raising critical questions about the future of personal statements in the residency application process. While AI in medical education continues to evolve, clear guidelines on its ethical use in residency applications are essential. LEVEL OF EVIDENCE N/A.
Collapse
Affiliation(s)
- Jacob G J Wihlidal
- Department of Otolaryngology-Head and Neck Surgery, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Nikolaus E Wolter
- Department of Otolaryngology-Head and Neck Surgery, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Evan J Propst
- Department of Otolaryngology-Head and Neck Surgery, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Vincent Lin
- Department of Otolaryngology-Head and Neck Surgery, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Sunnybrook Hospital, Toronto, Ontario, Canada
| | - Michael Au
- Department of Otolaryngology, McMaster Health Sciences, Hamilton, Ontario, Canada
| | - Shaunak Amin
- Department of Otolaryngology, University of Washington, Seattle, Washington, USA
| | - Jennifer M Siu
- Department of Otolaryngology-Head and Neck Surgery, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Hospital for Sick Children, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Farrell MJ, Wu TC, Raldow AC. A Letter of Recommendation Regarding Impersonal Personal Statements. N Engl J Med 2025; 392:1257-1259. [PMID: 40162627 DOI: 10.1056/nejmp2414494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Affiliation(s)
- Matthew J Farrell
- Department of Radiation Oncology, University of California, Los Angeles, Los Angeles
| | - Trudy C Wu
- Department of Radiation Oncology, University of California, Los Angeles, Los Angeles
| | - Ann C Raldow
- Department of Radiation Oncology, University of California, Los Angeles, Los Angeles
| |
Collapse
|
8
|
Smith B, Ramadoss T, D'Amario V, Shoja MM, Rajput V, Cervantes J. Utilization and perception of generative artificial intelligence by medical students in residency applications. J Investig Med 2025; 73:338-344. [PMID: 39927515 DOI: 10.1177/10815589251322102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2025]
Abstract
After completing medical school in the United States, most students apply to residency programs to progress in their training. The residency application process contains numerous writing sections, including the personal statement, curriculum vitae, and "impactful experiences" section. This study's purpose is to investigate the perception of third and fourth-year medical students on generative artificial intelligence (GenAI) and its influence on the residency application process. We developed a 13-question survey using the REDCap application to explore participants' educational background, year in school, preferred medical specialty, and perception of current or potential use of GenAI within residency applications. More than half of the responders have already used or plan to use GenAI for assistance in developing their personal statements for their applications. A considerable percentage (43.3%) will use GenAI to edit/modify a draft of the personal statement. More than half of survey participants believe that in the future, GenAI may alter the significance program directors place on letters of recommendation (LORs) in their selection criteria for deciding who to interview and select. Our survey results indicate that a number of students are either using or are receptive to the idea of using GenAI to draft or refine certain components of their residency application, such as the personal statement and impactful experiences section. As the application of GenAI expands, in-person interactions in the evaluation of candidates may become increasingly critical, and although personal statements and LORs are currently significant components of the residency application, their future roles remain a question.
Collapse
Affiliation(s)
- Blake Smith
- Dr. Kiran C. Patel College of Osteopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Tanya Ramadoss
- Dr. Kiran C. Patel College of Allopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Vanessa D'Amario
- Dr. Kiran C. Patel College of Osteopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Mohammadali M Shoja
- Dr. Kiran C. Patel College of Allopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Vijay Rajput
- Dr. Kiran C. Patel College of Allopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Jorge Cervantes
- Dr. Kiran C. Patel College of Allopathic Medicine, Nova Southeastern University, Fort Lauderdale, FL, USA
| |
Collapse
|
9
|
Patel KA, Suriano CJ, Janis JE. The Role of ChatGPT in Personal Statements for Plastic Surgery Residency Applications: Program Directors' Perspective. PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN 2025; 13:e6698. [PMID: 40242723 PMCID: PMC12002369 DOI: 10.1097/gox.0000000000006698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Accepted: 02/27/2025] [Indexed: 04/18/2025]
Abstract
Background Personal statements are a required component of plastic surgery residency applications but can be extremely time- and labor-intensive. Artificial intelligence (AI) programs like ChatGPT can streamline personal statement writing, but their use, especially if undisclosed, can have ethical implications. This study elucidates the perspective of plastic surgery residency program directors (PDs) regarding the importance of personal statements in reviewing applicants and whether ChatGPT should be utilized. Methods An anonymous, 6-question multiple-choice survey was designed and administered in 3 rounds via REDCap to 120 current plastic surgery residency PDs. An additional email reminder was administered by the principal investigator. Data was collected and reported in aggregate. Results The survey response rate was 28.6%. Most PDs (73.5%) reported that personal statements were somewhat important in determining interviewees and the rank list; 85.3% of PDs were not confident in their ability to determine if ChatGPT was utilized. Additionally, 85.3% of residencies reported not utilizing AI-detection software, although 11.8% plan to implement one. Only 8.8% of PDs believed ChatGPT use to be ethically appropriate in all aspects of personal statement creation, whereas others believed it was only appropriate for brainstorming (11.8%), editing (14.7%), or writing (5.9%). Finally, 58.8% of PDs believed ChatGPT use to be unethical in all parts of personal statement creation. Conclusions The utilization of AI could have a profound impact on streamlining personal statement creation, but its use has many ethical implications. Currently, the majority of surveyed PDs feel the use of ChatGPT to be unethical in any form during personal statement writing.
Collapse
Affiliation(s)
- Krishna A. Patel
- From the Department of General Surgery, OhioHealth Riverside Methodist Hospital, Columbus, OH
| | - Carly J. Suriano
- From the Department of General Surgery, OhioHealth Riverside Methodist Hospital, Columbus, OH
| | - Jeffrey E. Janis
- Department of Plastic and Reconstructive Surgery, Ohio State University, Wexner Medical Center, Columbus, OH
| |
Collapse
|
10
|
Aster A, Laupichler MC, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review. MEDICAL SCIENCE EDUCATOR 2025; 35:555-567. [PMID: 40144083 PMCID: PMC11933646 DOI: 10.1007/s40670-024-02206-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 10/24/2024] [Indexed: 03/28/2025]
Abstract
This review aims to provide a summary of all scientific publications on the use of large language models (LLMs) in medical education over the first year of their availability. A scoping literature review was conducted in accordance with the PRISMA recommendations for scoping reviews. Five scientific literature databases were searched using predefined search terms. The search yielded 1509 initial results, of which 145 studies were ultimately included. Most studies assessed LLMs' capabilities in passing medical exams. Some studies discussed advantages, disadvantages, and potential use cases of LLMs. Very few studies conducted empirical research. Many published studies lack methodological rigor. We therefore propose a research agenda to improve the quality of studies on LLM.
Collapse
Affiliation(s)
- Alexandra Aster
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Matthias Carl Laupichler
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tamina Rockwell-Kollmann
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Gilda Masala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Ebru Bala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tobias Raupach
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| |
Collapse
|
11
|
Ruiz AM, Kraus MB, Arendt KW, Schroeder DR, Sharpe EE. Artificial intelligence-created personal statements compared with applicant-written personal statements: a survey of obstetric anesthesia fellowship program directors in the United States. Int J Obstet Anesth 2025; 61:104293. [PMID: 39591877 DOI: 10.1016/j.ijoa.2024.104293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 10/06/2024] [Accepted: 11/07/2024] [Indexed: 11/28/2024]
Abstract
BACKGROUND A personal statement is a common requirement in medical residency and fellowship applications. Generative artificial intelligence may be used to create a personal statement for these applications. METHODS Two personal statements were created using OpenAI's Chat Generative Pre-trained Transformer (ChatGPT) and two applicant-written statements were collected. A survey was sent to obstetric anesthesia fellowship program directors in the United States to assess the perceived readability, authenticity, and originality of the four personal statements. In addition, the survey assessed perceptions of applicants who use artificial intelligence to write a personal statement, including their integrity, work ethic, reliability, intelligence, and English proficiency. RESULTS Surveyed fellowship directors could not accurately discern whether statements were applicant-written or artificial intelligence-generated. The artificial intelligence-generated personal statements were rated as more readable and original than the applicant-written statements. Most program directors were moderately or extremely concerned about the applicant's integrity, work ethic, and reliability if they suspected the applicant utilized ChatGPT. CONCLUSIONS Program directors could not accurately discern if the statements were written by a person or artificial intelligence and would have concerns about an applicant suspected of using artificial intelligence. Medical training programs may benefit from outlining their expectations regarding applicants' use of artificial intelligence.
Collapse
Affiliation(s)
- A M Ruiz
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN, United States
| | - M B Kraus
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Phoenix, AZ, United States
| | - K W Arendt
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN, United States
| | - D R Schroeder
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - E E Sharpe
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
12
|
Kouam JS, Pak TK, Montelongo Hernandez CE. Ethics of Using Artificial Intelligence for Medical Residency Personal Statements. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2025; 49:46-47. [PMID: 39294328 DOI: 10.1007/s40596-024-02047-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 09/03/2024] [Indexed: 09/20/2024]
Affiliation(s)
| | - Thomas Kun Pak
- University of Texas Southwestern Medical Center, Dallas, TX, USA.
| | | |
Collapse
|
13
|
Goodman MA, Lee AM, Schreck Z, Hollman JH. Human or Machine? A Comparative Analysis of Artificial Intelligence-Generated Writing Detection in Personal Statements. JOURNAL, PHYSICAL THERAPY EDUCATION 2025:00001416-990000000-00149. [PMID: 39808529 DOI: 10.1097/jte.0000000000000396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/22/2024] [Indexed: 01/16/2025]
Abstract
INTRODUCTION This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications. REVIEW OF LITERATURE The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text. SUBJECTS Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle. METHODS Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics. RESULTS Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875). DISCUSSION AND CONCLUSION The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.
Collapse
Affiliation(s)
- Margaret A Goodman
- Margaret A. Goodman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Anthony M. Lee, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Zachary Schreck, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- John H. Hollman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic, Siebens 7-55 200 First Street SW Rochester, MN 55905 . Please address all correspondence to John H. Hollman
| | - Anthony M Lee
- Margaret A. Goodman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Anthony M. Lee, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Zachary Schreck, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- John H. Hollman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic, Siebens 7-55 200 First Street SW Rochester, MN 55905 . Please address all correspondence to John H. Hollman
| | - Zachary Schreck
- Margaret A. Goodman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Anthony M. Lee, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Zachary Schreck, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- John H. Hollman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic, Siebens 7-55 200 First Street SW Rochester, MN 55905 . Please address all correspondence to John H. Hollman
| | - John H Hollman
- Margaret A. Goodman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Anthony M. Lee, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- Zachary Schreck, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic
- John H. Hollman, Program in Physical Therapy in the Mayo Clinic School of Health Sciences at the Mayo Clinic College of Medicine and Science and in the Department of Physical Medicine and Rehabilitation at the Mayo Clinic, Siebens 7-55 200 First Street SW Rochester, MN 55905 . Please address all correspondence to John H. Hollman
| |
Collapse
|
14
|
Gordon EB, Maxfield CM, French R, Fish LJ, Romm J, Barre E, Kinne E, Peterson R, Grimm LJ. Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable. J Am Coll Radiol 2025; 22:33-40. [PMID: 39299618 DOI: 10.1016/j.jacr.2024.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 08/23/2024] [Accepted: 08/31/2024] [Indexed: 09/22/2024]
Abstract
OBJECTIVE This study explores radiology program directors' perspectives on the impact of large language model (LLM) use among residency applicants to craft personal statements. METHODS Eight program directors from the Radiology Residency Education Research Alliance participated in a mixed-methods study, which included a survey regarding impressions of artificial intelligence (AI)-generated personal statements and focus group discussions (July 2023). Each director reviewed four personal statement variations for five applicants, anonymized to author type: the original and three Chat Generative Pre-trained Transformer-4.0 (GPT) versions generated with varying prompts, aggregated for analysis. A 5-point Likert scale surveyed the writing quality, including voice, clarity, engagement, organization, and perceived origin of each statement. An experienced qualitative researcher facilitated focus group discussions. Data analysis was performed using a rapid analytic approach with a coding template capturing key areas related to residency applications. RESULTS GPT-generated statement ratings were more often average or worse in quality (56%, 268 of 475) than ratings of human-authored statements (29%, 45 of 160). Although reviewers were not confident in their ability to distinguish the origin of personal statements, they did so reliably and consistently, identifying the human-authored personal statements at 95% (38 of 40) as probably or definitely original. Focus group discussions highlighted the inevitable use of AI in crafting personal statements and concerns about its impact on the authenticity and the value of the personal statement in residency selections. Program directors were divided on the appropriate use and regulation of AI. DISCUSSION Radiology residency program directors rated LLM-generated personal statements as lower in quality and expressed concern about the loss of the applicant's voice but acknowledged the inevitability of increased AI use in the generation of application statements.
Collapse
Affiliation(s)
- Emile B Gordon
- Department of Radiology, Duke University Health System, Durham, North Carolina; Department of Radiology, University of California San Diego, La Jolla, California.
| | - Charles M Maxfield
- Department of Radiology, Duke University Health System, Durham, North Carolina
| | - Robert French
- Department of Radiology, Duke University Health System, Durham, North Carolina
| | - Laura J Fish
- Duke Cancer Institute, Durham, North Carolina; Department of Family Medicine and Community Health, Duke University School of Medicine, Durham, North Carolina
| | - Jacob Romm
- Department of Radiology, Duke University Health System, Durham, North Carolina
| | - Emily Barre
- Department of Radiology, Duke University Health System, Durham, North Carolina
| | - Erica Kinne
- Department of Radiology, Loma Linda University Medical Center, Loma Linda, California
| | - Ryan Peterson
- Department of Radiology and Imaging Sciences, Emory University, Atlanta, Georgia
| | - Lars J Grimm
- Department of Radiology, Duke University Health System, Durham, North Carolina
| |
Collapse
|
15
|
Nair V, Nayak A, Ahuja N, Weng Y, Keet K, Hosamani P, Hom J. Comparing IM Residency Application Personal Statements Generated by GPT-4 and Authentic Applicants. J Gen Intern Med 2025; 40:124-126. [PMID: 38689120 PMCID: PMC11780005 DOI: 10.1007/s11606-024-08784-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/22/2024] [Indexed: 05/02/2024]
Affiliation(s)
- Vishnu Nair
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| | - Ashwin Nayak
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Neera Ahuja
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Yingjie Weng
- Quantitative Sciences Unit, Stanford University, Stanford, CA, USA
| | - Kevin Keet
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Poonam Hosamani
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Jason Hom
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
16
|
Hostetter L, Kelm D, Nelson D. Ethics of Writing Personal Statements and Letters of Recommendations with Large Language Models. ATS Sch 2024; 5:486-491. [PMID: 39822218 PMCID: PMC11734674 DOI: 10.34197/ats-scholar.2024-0038ps] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 06/17/2024] [Indexed: 01/19/2025] Open
Abstract
Large language models are becoming ubiquitous in the editing and generation of written content and are actively being explored for their use in medical education. The use of artificial intelligence (AI) engines to generate content in academic spaces is controversial and has been meet with swift responses and guidance from academic journals and publishers regarding the appropriate use or disclosure of use of AI engines in professional writing. To date, there is no guidance to applicants of graduate medical education programs in using AI engines to generate application content-primarily personal statements and letters of recommendation. In this Perspective, we review perceptions of using AI to generate application content, considerations for the impact of AI in holistic application review, ethical challenges regarding plagiarism, and AI text classifiers. Finally, included are recommendations to the graduate medical education community to provide guidance on use of AI engines in applications to maintain the integrity of the application process in graduate medical education.
Collapse
Affiliation(s)
- Logan Hostetter
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
| | - Diana Kelm
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
| | - Darlene Nelson
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
17
|
Whitrock JN, Pratt CG, Carter MM, Chae RC, Price AD, Justiniano CF, Van Haren RM, Silski LS, Quillin RC, Shah SA. Does using artificial intelligence take the person out of personal statements? We can't tell. Surgery 2024; 176:1610-1616. [PMID: 39299851 DOI: 10.1016/j.surg.2024.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 06/11/2024] [Accepted: 08/12/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Use of artificial intelligence to generate personal statements for residency is currently not permitted but is difficult to monitor. This study sought to evaluate the ability of surgical residency application reviewers to identify artificial intelligence-generated personal statements and to understand perceptions of this practice. METHODS Three personal statements were generated using ChatGPT, and 3 were written by medical students who previously matched into surgery residency. Blinded participants at a single institution were instructed to read all personal statements and identify which were generated by artificial intelligence; they then completed a survey exploring their opinions regarding artificial intelligence use. RESULTS Of the 30 participants, 50% were faculty (n = 15) and 50% were residents (n = 15). Overall, experience ranged from 0 to 20 years (median, 2 years; interquartile range, 1-6.25 years). Artificial intelligence-derived personal statements were identified correctly only 59% of the time, with 3 (10%) participants identifying all the artificial intelligence-derived personal statements correctly. Artificial intelligence-generated personal statements were labeled as the best 60% of the time and the worst 43.3% of the time. When asked whether artificial intelligence use should be allowed in personal statements writing, 66.7% (n = 20) said no and 30% (n = 9) said yes. When asked if the use of artificial intelligence would impact their opinion of an applicant, 80% (n = 24) said yes, and 20% (n = 6) said no. When survey questions and ability to identify artificial intelligence-generated personal statements were evaluated by faculty/resident status and experience, no differences were noted (P > .05). CONCLUSION This study shows that surgical faculty and residents cannot reliably identify artificial intelligence-generated personal statements and that concerns exist regarding the impact of artificial intelligence on the application process.
Collapse
Affiliation(s)
- Jenna N Whitrock
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH.
| | - Catherine G Pratt
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Michela M Carter
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Ryan C Chae
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Adam D Price
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Carla F Justiniano
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH; Division of Colon and Rectal Surgery, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Robert M Van Haren
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH; Division of Cardiothoracic Surgery, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Latifa S Silski
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH; Division of Transplantation, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Ralph C Quillin
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH; Division of Transplantation, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Shimul A Shah
- Cincinnati Research in Outcomes and Safety in Surgery (CROSS) Research Group, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH; Division of Transplantation, Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| |
Collapse
|
18
|
Lewis LS, Hartman AM, Brennan-Cook J, Felsman IC, Colbert B, Ledbetter L, Gedzyk-Nieman SA. Artificial Intelligence and Admissions to Health Professions Educational Programs: A Scoping Review. Nurse Educ 2024:00006223-990000000-00551. [PMID: 39418331 DOI: 10.1097/nne.0000000000001753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
BACKGROUND The use of large language models (LLMs) and artificial intelligence (AI) tools to prepare health professions admissions applications is increasing. These tools can improve writing significantly but raise ethical concerns about application authenticity. PURPOSE This scoping review explored the literature on use of AI by applicants applying to health professions programs and by admission reviewers. METHODS Following Joanna Briggs Institute and Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines, a search was conducted in multiple databases, which identified 1706 citations. After screening, 18 articles were included. RESULTS Articles included in the review focused on the (1) use of AI to screen applicants or predict ranking and interview invitations, (2) ethical implications of AI-generated personal statements, (3) potential to detect AI-generated applications, and (4) use of AI to write or analyze letters of reference. CONCLUSIONS AI tools can enhance the efficiency of the admissions review process, but clear guidelines are required to address ethical issues. Further research is needed, particularly in nursing education.
Collapse
Affiliation(s)
- Lisa S Lewis
- Author Affiliations: Duke University School of Nursing, Durham, North Carolina (Drs Lewis, Hartman, Brennan-Cook, and Felsman, and Ms Colbert, and Dr Gedzyk-Nieman); and Duke University Medical Center Library, Durham, North Carolina (Ms Ledbetter)
| | | | | | | | | | | | | |
Collapse
|
19
|
Bellinger JR, Kwak MW, Ramos GA, Mella JS, Mattos JL. Quantitative Comparison of Chatbots on Common Rhinology Pathologies. Laryngoscope 2024; 134:4225-4231. [PMID: 38666768 DOI: 10.1002/lary.31470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/07/2024] [Accepted: 04/10/2024] [Indexed: 10/19/2024]
Abstract
OBJECTIVES Understanding the strengths and weaknesses of chatbots as a source of patient information is critical for providers in the rising artificial intelligence landscape. This study is the first to quantitatively analyze and compare four of the most used chatbots available regarding treatments of common pathologies in rhinology. METHODS The treatment of epistaxis, chronic sinusitis, sinus infection, allergic rhinitis, allergies, and nasal polyps was asked to chatbots ChatGPT, ChatGPT Plus, Google Bard, and Microsoft Bing in May 2023. Individual responses were analyzed by reviewers for readability, quality, understandability, and actionability using validated scoring metrics. Accuracy and comprehensiveness were evaluated for each response by two experts in rhinology. RESULTS ChatGPT, Plus, Bard, and Bing had FRE readability scores of 33.17, 35.93, 46.50, and 46.32, respectively, indicating higher readability for Bard and Bing compared to ChatGPT (p = 0.003, p = 0.008) and Plus (p = 0.025, p = 0.048). ChatGPT, Plus, and Bard had mean DISCERN quality scores of 20.42, 20.89, and 20.61, respectively, which was higher than the score for Bing of 16.97 (p < 0.001). For understandability, ChatGPT and Bing had PEMAT scores of 76.67 and 66.61, respectively, which were lower than both Plus at 92.00 (p < 0.001, p < 0.001) and Bard at 92.67 (p < 0.001, p < 0.001). ChatGPT Plus had an accuracy score of 4.39 which was higher than ChatGPT (3.97, p = 0.118), Bard (3.72, p = 0.002), and Bing (3.19, p < 0.001). CONCLUSION On aggregate of the tested domains, our results suggest ChatGPT Plus and Google Bard are currently the most patient-friendly chatbots for the treatment of common pathologies in rhinology. LEVEL OF EVIDENCE N/A Laryngoscope, 134:4225-4231, 2024.
Collapse
Affiliation(s)
- Jeffrey R Bellinger
- Department of Otolaryngology-Head and Neck Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, U.S.A
| | - Minhie W Kwak
- Department of Otolaryngology-Head and Neck Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, U.S.A
| | - Gabriel A Ramos
- Department of Otolaryngology-Head and Neck Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, U.S.A
| | - Jeffrey S Mella
- Department of Otolaryngology-Head and Neck Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, U.S.A
| | - Jose L Mattos
- Department of Otolaryngology-Head and Neck Surgery, University of Virginia School of Medicine, Charlottesville, Virginia, U.S.A
| |
Collapse
|
20
|
Crawford LM, Hendzlik P, Lam J, Cannon LM, Qi Y, DeCaporale-Ryan L, Wilson NA. Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence-Generated Essays in Residency Applications. J Surg Res 2024; 301:504-511. [PMID: 39042979 DOI: 10.1016/j.jss.2024.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 05/25/2024] [Accepted: 06/24/2024] [Indexed: 07/25/2024]
Abstract
INTRODUCTION Large language models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly used in academic writing. Faculty may consider use of artificial intelligence (AI)-generated responses a form of cheating. We sought to determine whether general surgery residency faculty could detect AI versus human-written responses to a text prompt; hypothesizing that faculty would not be able to reliably differentiate AI versus human-written responses. METHODS Ten essays were generated using a text prompt, "Tell us in 1-2 paragraphs why you are considering the University of Rochester for General Surgery residency" (Current trainees: n = 5, ChatGPT: n = 5). Ten blinded faculty reviewers rated essays (ten-point Likert scale) on the following criteria: desire to interview, relevance to the general surgery residency, overall impression, and AI- or human-generated; with scores and identification error rates compared between the groups. RESULTS There were no differences between groups for %total points (ChatGPT 66.0 ± 13.5%, human 70.0 ± 23.0%, P = 0.508) or identification error rates (ChatGPT 40.0 ± 35.0%, human 20.0 ± 30.0%, P = 0.175). Except for one, all essays were identified incorrectly by at least two reviewers. Essays identified as human-generated received higher overall impression scores (area under the curve: 0.82 ± 0.04, P < 0.01). CONCLUSIONS Whether use of AI tools for academic purposes should constitute academic dishonesty is controversial. We demonstrate that human and AI-generated essays are similar in quality, but there is bias against presumed AI-generated essays. Faculty are not able to reliably differentiate human from AI-generated essays, thus bias may be misdirected. AI-tools are becoming ubiquitous and their use is not easily detected. Faculty must expect these tools to play increasing roles in medical education.
Collapse
Affiliation(s)
- Loralai M Crawford
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Peter Hendzlik
- School of Medicine and Dentistry, University of Rochester, Rochester, New York
| | - Justine Lam
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Lisa M Cannon
- Department of Surgery, University of Rochester Medical Center, Rochester, New York
| | - Yanjie Qi
- Department of Surgery, University of Rochester Medical Center, Rochester, New York
| | - Lauren DeCaporale-Ryan
- Department of Surgery, University of Rochester Medical Center, Rochester, New York; Department of Psychiatry, University of Rochester Medical Center, Rochester, New York
| | - Nicole A Wilson
- Department of Biomedical Engineering, University of Rochester, Rochester, New York; School of Medicine and Dentistry, University of Rochester, Rochester, New York; Department of Surgery, University of Rochester Medical Center, Rochester, New York; Department of Pediatrics, University of Rochester Medical Center, Rochester, New York.
| |
Collapse
|
21
|
Quinonez SC, Stewart DA, Banovic N. ChatGPT and Artificial Intelligence in Graduate Medical Education Program Applications. J Grad Med Educ 2024; 16:391-394. [PMID: 39148887 PMCID: PMC11324163 DOI: 10.4300/jgme-d-23-00823.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/17/2024] Open
Affiliation(s)
- Shane C. Quinonez
- Shane C. Quinonez, MD, is Clinical Associate Professor, Department of Pediatrics and Internal Medicine, and Associate Program Director, Pediatrics Residency Program, University of Michigan, Ann Arbor, Michigan, USA
| | - David A. Stewart
- David A. Stewart, MD, is Clinical Assistant Professor, Department of Pediatrics, and Associate Program Director, Pediatrics Residency Program, University of Michigan, Ann Arbor, Michigan, USA; and
| | - Nikola Banovic
- Nikola Banovic, PhD, is Associate Professor, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
22
|
Yokokawa D, Yanagita Y, Li Y, Yamashita S, Shikino K, Noda K, Tsukamoto T, Uehara T, Ikusaka M. For any disease a human can imagine, ChatGPT can generate a fake report. Diagnosis (Berl) 2024; 11:329-332. [PMID: 38386808 DOI: 10.1515/dx-2024-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024]
Affiliation(s)
- Daiki Yokokawa
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Yasutaka Yanagita
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Yu Li
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Shiho Yamashita
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Kiyoshi Shikino
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
- Department of Community-oriented Medical Education, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Kazutaka Noda
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Tomoko Tsukamoto
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Takanori Uehara
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| | - Masatomi Ikusaka
- Department of General Medicine, 92154 Chiba University Hospital , Chiba, Japan
| |
Collapse
|
23
|
Collins S, Baker EB. Resident Recruitment in a New Era. Int Anesthesiol Clin 2024; 62:35-46. [PMID: 38855840 DOI: 10.1097/aia.0000000000000447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
ABSTRACT This chapter focuses on resident recruitment and recent US National Resident Matching Program changes and the impact in the evaluation and ranking of applicants within the specialty of anesthesiology. Recruitment challenges are examined as well as program strategies and potential future directions. Also discussed are DEI initiatives within the recruitment process.
Collapse
Affiliation(s)
- Stephen Collins
- Department of Anesthesiology, University of Virginia Health, Charlottesville, Virginia
| | - E Brooke Baker
- Division of Regional Anesthesiology and Acute Pain Medicine, Department of Anesthesiology and Critical Care Medicine Chief, Faculty Affairs and DEI, Executive Physician for Claims Management, UNM Hospital System
| |
Collapse
|
24
|
Chen JX, Bowe S, Deng F. Residency Applications in the Era of Generative Artificial Intelligence. J Grad Med Educ 2024; 16:254-256. [PMID: 38882414 PMCID: PMC11173008 DOI: 10.4300/jgme-d-23-00629.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/18/2024] Open
Affiliation(s)
- Jenny X Chen
- is Assistant Professor, Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University, Baltimore, Maryland, USA
| | - Sarah Bowe
- is Associate Professor, Department of Otolaryngology-Head & Neck Surgery, San Antonio Uniformed Services Health Education Consortium, JBSA-Fort Sam Houston, Texas, USA; and
| | - Francis Deng
- is Assistant Professor, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
25
|
González R, Poenaru D, Woo R, Trappey AF, Carter S, Darcy D, Encisco E, Gulack B, Miniati D, Tombash E, Huang EY. ChatGPT: What Every Pediatric Surgeon Should Know About Its Potential Uses and Pitfalls. J Pediatr Surg 2024; 59:941-947. [PMID: 38336588 DOI: 10.1016/j.jpedsurg.2024.01.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/30/2023] [Accepted: 01/09/2024] [Indexed: 02/12/2024]
Abstract
ChatGPT - currently the most popular generative artificial intelligence system - has been revolutionizing the world and healthcare since its release in November 2022. ChatGPT is a conversational chatbot that uses machine learning algorithms to enhance its replies based on user interactions and is a part of a broader effort to develop natural language processing that can assist people in their daily lives by understanding and responding to human language in a useful and engaging way. Thus far, many potential applications within healthcare have been described, despite its relatively recent release. This manuscript offers the pediatric surgical community a primer on this new technology and discusses some initial observations about its potential uses and pitfalls. Moreover, it introduces the perspectives of medical journals and surgical societies regarding the use of this artificial intelligence chatbot. As ChatGPT and other large language models continue to evolve, it is the responsibility of the pediatric surgery community to stay abreast of these changes and play an active role in safely incorporating them into our field for the benefit of our patients. LEVEL OF EVIDENCE: V.
Collapse
Affiliation(s)
- Raquel González
- Division of Pediatric Surgery, Johns Hopkins All Children's Hospital, 501 6th Avenue S, Saint Petersburg, FL, 33701, USA.
| | - Dan Poenaru
- McGill University, 5252 Boul. De Maissonneuve O. rm. 3E.05, Montréal, QC, H4a 3S5, Canada
| | - Russell Woo
- Department of Surgery, Division of Pediatric Surgery, University of Hawai'i, John A. Burns School of Medicine, 1319 Punahou Street, Suite 600, Honolulu, HI, 96826, USA
| | - A Francois Trappey
- Pediatric General and Thoracic Surgery, Brooke Army Medical Center, 3551 Roger Brooke Dr, Fort Sam Houston, TX, 78234, USA
| | - Stewart Carter
- Division of Pediatric Surgery, University of Louisville, Norton Children's Hospital, 315 East Broadway, Suite 565, Louisville, KY, 40202, USA
| | - David Darcy
- Golisano Children's Hospital, University of Rochester Medical Center, 601 Elmwood Avenue, Box SURG, Rochester, NY, 14642, USA
| | - Ellen Encisco
- Division of Pediatric General and Thoracic Surgery, Cincinnati Children's Hospital, 3333 Burnet Ave, Cincinnati, OH, 45229, USA
| | - Brian Gulack
- Rush University Medical Center, 1653 W Congress Parkway, Kellogg, Chicago, IL, 60612, USA
| | - Doug Miniati
- Department of Pediatric Surgery, Kaiser Permanente Roseville, 1600 Eureka Road, Building C, Suite C35, Roseville, CA, 95661, USA
| | - Edzhem Tombash
- Division of Pediatric General and Thoracic Surgery, Cincinnati Children's Hospital, 3333 Burnet Ave, Cincinnati, OH, 45229, USA
| | - Eunice Y Huang
- Vanderbilt University Medical Center, Monroe Carell Jr. Children's Hospital, 2200 Children's Way, Suite 7100, Nashville, TN, 37232, USA
| |
Collapse
|