Hassan M, Ayad M, Nembhard C, Hayes-Dixon A, Lin A, Janjua M, Franko J, Tee M. Artificial Intelligence Compared to Manual Selection of Prospective Surgical Residents.
JOURNAL OF SURGICAL EDUCATION 2025;
82:103308. [PMID:
39509905 DOI:
10.1016/j.jsurg.2024.103308]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/26/2024] [Accepted: 10/02/2024] [Indexed: 11/15/2024]
Abstract
BACKGROUND
Artificial Intelligence (AI) in the selection of residency program applicants is a new tool that is gaining traction, with the aim of screening high numbers of applicants while introducing objectivity and mitigating bias in a traditionally subjective process. This study aims to compare applicants screened by an AI software to a single Program Director (PD) for interview selection.
METHODS
A single PD at an ACGME-accredited, academic general surgery program screened applicants. A parallel screen by AI software, programmed by the same PD, was conducted on the same pool of applicants. Weighted preferences were assigned in the following order: personal statement, research, medical school rankings, letters of recommendation, personal qualities, board scores, graduate degree, geographic preference, past experiences, program signal, honor society membership, and multilingualism. Statistical analyses were conducted by chi-square, ANOVA, and independent two-sided t-tests.
RESULTS
Out of 1235 applications, 144 applications were PD-selected and 150 AI-selected (294 top applications). Twenty applications (7.3%) were both PD and AI selected for a total analysis cohort of 274 prospective residents. We performed two analyses: 1) PD-selected vs. AI-selected vs. Both and 2) PD-selected vs. AI-selected with the overlapping applicants censored. For the first analysis, AI selected significantly: more White/Hispanic applicants (p < 0.001), less signals (p < 0.001), more AOA honors society (p = 0.016), and more publications (p < 0.001). When censoring overlapping PD and AI selection, AI selected significantly: more White/Hispanic applicants (p < 0.001), less signals (p < 0.001), more US medical graduates (p = 0.027), less applicants needing visa sponsorship (p = 0.01), younger applicants (p = 0.024), higher USMLE Step 2 CK scores (p < 0.001), and more publications (p < 0.001).
CONCLUSIONS
There was only a 7% overlap between PD-selected and AI-selected applicants for interview screening in the same applicant pool. Despite the same PD educating the AI software, the 2 application pools differed significantly. In its present state, AI may be utilized as a tool in resident application selection but should not completely replace human review. We recommend careful analysis of the performance of each AI model in the respective environment of each institution applying it, as it may alter the group of interviewees.
Collapse