Trang A, Putman K, Savani D, Chatterjee D, Zhao J, Kamel P, Jeudy JJ, Parekh VS, Yi PH. Sociodemographic biases in a commercial AI model for intracranial hemorrhage detection.
Emerg Radiol 2024;
31:713-723. [PMID:
39034382 DOI:
10.1007/s10140-024-02270-w]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 07/10/2024] [Indexed: 07/23/2024]
Abstract
PURPOSE
To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases.
METHODS
Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ2 test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05.
RESULTS
Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001).
CONCLUSIONS
In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases.
Collapse