1
|
Adleberg J, Benitez CL, Primiano N, Patel A, Mogel D, Kalra R, Adhia A, Berns M, Chin C, Tanghe S, Yi P, Zech J, Kohli A, Martin-Carreras T, Corcuera-Solano I, Huang M, Ngeow J. Fully Automated Measurement of the Insall-Salvati Ratio with Artificial Intelligence. J Imaging Inform Med 2024; 37:601-610. [PMID: 38343226 PMCID: PMC11031523 DOI: 10.1007/s10278-023-00955-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/17/2023] [Accepted: 09/19/2023] [Indexed: 04/20/2024]
Abstract
Patella alta (PA) and patella baja (PB) affect 1-2% of the world population, but are often underreported, leading to potential complications like osteoarthritis. The Insall-Salvati ratio (ISR) is commonly used to diagnose patellar height abnormalities. Artificial intelligence (AI) keypoint models show promising accuracy in measuring and detecting these abnormalities.An AI keypoint model is developed and validated to study the Insall-Salvati ratio on a random population sample of lateral knee radiographs. A keypoint model was trained and internally validated with 689 lateral knee radiographs from five sites in a multi-hospital urban healthcare system after IRB approval. A total of 116 lateral knee radiographs from a sixth site were used for external validation. Distance error (mm), Pearson correlation, and Bland-Altman plots were used to evaluate model performance. On a random sample of 2647 different lateral knee radiographs, mean and standard deviation were used to calculate the normal distribution of ISR. A keypoint detection model had mean distance error of 2.57 ± 2.44 mm on internal validation data and 2.73 ± 2.86 mm on external validation data. Pearson correlation between labeled and predicted Insall-Salvati ratios was 0.82 [95% CI 0.76-0.86] on internal validation and 0.75 [0.66-0.82] on external validation. For the population sample of 2647 patients, there was mean ISR of 1.11 ± 0.21. Patellar height abnormalities were underreported in radiology reports from the population sample. AI keypoint models consistently measure ISR on knee radiographs. Future models can enable radiologists to study musculoskeletal measurements on larger population samples and enhance our understanding of normal and abnormal ranges.
Collapse
Affiliation(s)
- J Adleberg
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - C L Benitez
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - N Primiano
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - A Patel
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - D Mogel
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - R Kalra
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - A Adhia
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - M Berns
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - C Chin
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - S Tanghe
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - P Yi
- University of Maryland, Baltimore, MD, USA
| | - J Zech
- Columbia University Medical Center, New York, NY, USA
| | - A Kohli
- UT Southwestern, Dallas, TX, USA
| | | | - I Corcuera-Solano
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - M Huang
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - J Ngeow
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Schweikhard FP, Kosanke A, Lange S, Kromrey ML, Mankertz F, Gamain J, Kirsch M, Rosenberg B, Hosten N. Doctor's Orders-Why Radiologists Should Consider Adjusting Commercial Machine Learning Applications in Chest Radiography to Fit Their Specific Needs. Healthcare (Basel) 2024; 12:706. [PMID: 38610129 PMCID: PMC11011470 DOI: 10.3390/healthcare12070706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 03/03/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
This retrospective study evaluated a commercial deep learning (DL) software for chest radiographs and explored its performance in different scenarios. A total of 477 patients (284 male, 193 female, mean age 61.4 (44.7-78.1) years) were included. For the reference standard, two radiologists performed independent readings on seven diseases, thus reporting 226 findings in 167 patients. An autonomous DL reading was performed separately and evaluated against the gold standard regarding accuracy, sensitivity and specificity using ROC analysis. The overall average AUC was 0.84 (95%-CI 0.76-0.92) with an optimized DL sensitivity of 85% and specificity of 75.4%. The best results were seen in pleural effusion with an AUC of 0.92 (0.885-0.955) and sensitivity and specificity of each 86.4%. The data also showed a significant influence of sex, age, and comorbidity on the level of agreement between gold standard and DL reading. About 40% of cases could be ruled out correctly when screening for only one specific disease with a sensitivity above 95% in the exploratory analysis. For the combined reading of all abnormalities at once, only marginal workload reduction could be achieved due to insufficient specificity. DL applications like this one bear the prospect of autonomous comprehensive reporting on chest radiographs but for now require human supervision. Radiologists need to consider possible bias in certain patient groups, e.g., elderly and women. By adjusting their threshold values, commercial DL applications could already be deployed for a variety of tasks, e.g., ruling out certain conditions in screening scenarios and offering high potential for workload reduction.
Collapse
Affiliation(s)
- Frank Philipp Schweikhard
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Anika Kosanke
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Sandra Lange
- Institute for Psychology, University of Greifswald, 17489 Greifswald, Germany
| | - Marie-Luise Kromrey
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Fiona Mankertz
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Julie Gamain
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Michael Kirsch
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Britta Rosenberg
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| | - Norbert Hosten
- Institute for Diagnostic Radiology and Neuroradiology, University Medicine of Greifswald, 17475 Greifswald, Germany
| |
Collapse
|
3
|
Kumari V, Kumar N, Kumar K S, Kumar A, Skandha SS, Saxena S, Khanna NN, Laird JR, Singh N, Fouda MM, Saba L, Singh R, Suri JS. Deep Learning Paradigm and Its Bias for Coronary Artery Wall Segmentation in Intravascular Ultrasound Scans: A Closer Look. J Cardiovasc Dev Dis 2023; 10:485. [PMID: 38132653 PMCID: PMC10743870 DOI: 10.3390/jcdd10120485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/15/2023] [Accepted: 11/07/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND AND MOTIVATION Coronary artery disease (CAD) has the highest mortality rate; therefore, its diagnosis is vital. Intravascular ultrasound (IVUS) is a high-resolution imaging solution that can image coronary arteries, but the diagnosis software via wall segmentation and quantification has been evolving. In this study, a deep learning (DL) paradigm was explored along with its bias. METHODS Using a PRISMA model, 145 best UNet-based and non-UNet-based methods for wall segmentation were selected and analyzed for their characteristics and scientific and clinical validation. This study computed the coronary wall thickness by estimating the inner and outer borders of the coronary artery IVUS cross-sectional scans. Further, the review explored the bias in the DL system for the first time when it comes to wall segmentation in IVUS scans. Three bias methods, namely (i) ranking, (ii) radial, and (iii) regional area, were applied and compared using a Venn diagram. Finally, the study presented explainable AI (XAI) paradigms in the DL framework. FINDINGS AND CONCLUSIONS UNet provides a powerful paradigm for the segmentation of coronary walls in IVUS scans due to its ability to extract automated features at different scales in encoders, reconstruct the segmented image using decoders, and embed the variants in skip connections. Most of the research was hampered by a lack of motivation for XAI and pruned AI (PAI) models. None of the UNet models met the criteria for bias-free design. For clinical assessment and settings, it is necessary to move from a paper-to-practice approach.
Collapse
Affiliation(s)
- Vandana Kumari
- School of Computer Science and Engineering, Galgotias University, Greater Noida 201310, India; (V.K.); (S.K.K.)
| | - Naresh Kumar
- Department of Applied Computational Science and Engineering, G L Bajaj Institute of Technology and Management, Greater Noida 201310, India
| | - Sampath Kumar K
- School of Computer Science and Engineering, Galgotias University, Greater Noida 201310, India; (V.K.); (S.K.K.)
| | - Ashish Kumar
- School of CSET, Bennett University, Greater Noida 201310, India;
| | - Sanagala S. Skandha
- Department of CSE, CMR College of Engineering and Technology, Hyderabad 501401, India;
| | - Sanjay Saxena
- Department of Computer Science and Engineering, IIT Bhubaneswar, Bhubaneswar 751003, India;
| | - Narendra N. Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi 110076, India;
| | - John R. Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA 94574, USA;
| | - Narpinder Singh
- Department of Food Science and Technology, Graphic Era, Deemed to be University, Dehradun 248002, India;
| | - Mostafa M. Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID 83209, USA;
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria (A.O.U.), 09100 Cagliari, Italy;
| | - Rajesh Singh
- Department of Research and Innovation, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun 248007, India;
| | - Jasjit S. Suri
- Stroke Diagnostics and Monitoring Division, AtheroPoint™, Roseville, CA 95661, USA
- Department of Computer Science & Engineering, Graphic Era, Deemed to be University, Dehradun 248002, India
- Monitoring and Diagnosis Division, AtheroPoint™, Roseville, CA 95661, USA
| |
Collapse
|
4
|
Ungless EL, Ross B, Belle V. Potential Pitfalls With Automatic Sentiment Analysis: The Example of Queerphobic Bias. Soc Sci Comput Rev 2023; 41:2211-2229. [PMID: 38026543 PMCID: PMC10654032 DOI: 10.1177/08944393231152946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Automated sentiment analysis can help efficiently detect trends in patients' moods, consumer preferences, political attitudes and more. Unfortunately, like many natural language processing techniques, sentiment analysis can show bias against marginalised groups. We illustrate this point by showing how six popular sentiment analysis tools respond to sentences about queer identities, expanding on existing work on gender, ethnicity and disability. We find evidence of bias against several marginalised queer identities, including in the two models from Google and Amazon that seem to have been subject to superficial debiasing. We conclude with guidance on selecting a sentiment analysis tool to minimise the risk of model bias skewing results.
Collapse
Affiliation(s)
| | - Björn Ross
- The University of Edinburgh, Scotland, UK
| | | |
Collapse
|
5
|
Ghotbi N. The Ethics of Emotional Artificial Intelligence: A Mixed Method Analysis. Asian Bioeth Rev 2023; 15:417-430. [PMID: 37808444 PMCID: PMC10555972 DOI: 10.1007/s41649-022-00237-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/03/2022] Open
Abstract
Emotions play a significant role in human relations, decision-making, and the motivation to act on those decisions. There are ongoing attempts to use artificial intelligence (AI) to read human emotions, and to predict human behavior or actions that may follow those emotions. However, a person's emotions cannot be easily identified, measured, and evaluated by others, including automated machines and algorithms run by AI. The ethics of emotional AI is under research and this study has examined the emotional variables as well as the perception of emotional AI in two large random groups of college students in an international university in Japan, with a heavy representation of Japanese, Indonesian, Korean, Chinese, Thai, Vietnamese, and other Asian nationalities. Surveys with multiple close-ended questions and an open-ended essay question regarding emotional AI were administered for quantitative and qualitative analysis, respectively. The results demonstrate how ethically questionable results may be obtained through affective computing and by searching for correlations in a variety of factors in collected data to classify individuals into certain categories and thus aggravate bias and discrimination. Nevertheless, the qualitative study of students' essays shows a rather optimistic view over the use of emotional AI, which helps underscore the need to increase awareness about the ethical pitfalls of AI technologies in the complex field of human emotions.
Collapse
Affiliation(s)
- Nader Ghotbi
- College and Graduate School of Asia Pacific Studies, Ritsumeikan Asia Pacific University, Beppu City, Japan
| |
Collapse
|
6
|
Dehkharghanian T, Bidgoli AA, Riasatian A, Mazaheri P, Campbell CJV, Pantanowitz L, Tizhoosh HR, Rahnamayan S. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn Pathol 2023; 18:67. [PMID: 37198691 DOI: 10.1186/s13000-023-01355-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 05/07/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND Deep learning models applied to healthcare applications including digital pathology have been increasing their scope and importance in recent years. Many of these models have been trained on The Cancer Genome Atlas (TCGA) atlas of digital images, or use it as a validation source. One crucial factor that seems to have been widely ignored is the internal bias that originates from the institutions that contributed WSIs to the TCGA dataset, and its effects on models trained on this dataset. METHODS 8,579 paraffin-embedded, hematoxylin and eosin stained, digital slides were selected from the TCGA dataset. More than 140 medical institutions (acquisition sites) contributed to this dataset. Two deep neural networks (DenseNet121 and KimiaNet were used to extract deep features at 20× magnification. DenseNet was pre-trained on non-medical objects. KimiaNet has the same structure but trained for cancer type classification on TCGA images. The extracted deep features were later used to detect each slide's acquisition site, and also for slide representation in image search. RESULTS DenseNet's deep features could distinguish acquisition sites with 70% accuracy whereas KimiaNet's deep features could reveal acquisition sites with more than 86% accuracy. These findings suggest that there are acquisition site specific patterns that could be picked up by deep neural networks. It has also been shown that these medically irrelevant patterns can interfere with other applications of deep learning in digital pathology, namely image search. This study shows that there are acquisition site specific patterns that can be used to identify tissue acquisition sites without any explicit training. Furthermore, it was observed that a model trained for cancer subtype classification has exploited such medically irrelevant patterns to classify cancer types. Digital scanner configuration and noise, tissue stain variation and artifacts, and source site patient demographics are among factors that likely account for the observed bias. Therefore, researchers should be cautious of such bias when using histopathology datasets for developing and training deep networks.
Collapse
Affiliation(s)
- Taher Dehkharghanian
- University Health Network, Toronto, ON, Canada
- Department of Pathology and Molecular Medicine, Faculty of Health Science, McMaster University, Hamilton, ON, Canada
| | - Azam Asilian Bidgoli
- Nature Inspired Computational Intelligence (NICI), Ontario Tech University, Oshawa, ON, Canada
- Nature Inspired Computational Intelligence (NICI) Lab, Department of Engineering, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada
- Bharti School of Engineering and Computer Science, Laurentian University, Sudbury, ON, Canada
| | | | - Pooria Mazaheri
- Nature Inspired Computational Intelligence (NICI), Ontario Tech University, Oshawa, ON, Canada
| | - Clinton J V Campbell
- Department of Pathology and Molecular Medicine, Faculty of Health Science, McMaster University, Hamilton, ON, Canada
- William Osler Health System, Brampton, ON, Canada
| | | | - H R Tizhoosh
- KIMIA Lab, University of Waterloo, Waterloo, ON, Canada
- Rhazes Lab, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Shahryar Rahnamayan
- Nature Inspired Computational Intelligence (NICI), Ontario Tech University, Oshawa, ON, Canada.
- Nature Inspired Computational Intelligence (NICI) Lab, Department of Engineering, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| |
Collapse
|
7
|
Khanna NN, Maindarkar MA, Viswanathan V, Fernandes JFE, Paul S, Bhagawati M, Ahluwalia P, Ruzsa Z, Sharma A, Kolluri R, Singh IM, Laird JR, Fatemi M, Alizad A, Saba L, Agarwal V, Sharma A, Teji JS, Al-Maini M, Rathore V, Naidu S, Liblik K, Johri AM, Turk M, Mohanty L, Sobel DW, Miner M, Viskovic K, Tsoulfas G, Protogerou AD, Kitas GD, Fouda MM, Chaturvedi S, Kalra MK, Suri JS. Economics of Artificial Intelligence in Healthcare: Diagnosis vs. Treatment. Healthcare (Basel) 2022; 10. [PMID: 36554017 DOI: 10.3390/healthcare10122493] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/03/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Motivation: The price of medical treatment continues to rise due to (i) an increasing population; (ii) an aging human growth; (iii) disease prevalence; (iv) a rise in the frequency of patients that utilize health care services; and (v) increase in the price. Objective: Artificial Intelligence (AI) is already well-known for its superiority in various healthcare applications, including the segmentation of lesions in images, speech recognition, smartphone personal assistants, navigation, ride-sharing apps, and many more. Our study is based on two hypotheses: (i) AI offers more economic solutions compared to conventional methods; (ii) AI treatment offers stronger economics compared to AI diagnosis. This novel study aims to evaluate AI technology in the context of healthcare costs, namely in the areas of diagnosis and treatment, and then compare it to the traditional or non-AI-based approaches. Methodology: PRISMA was used to select the best 200 studies for AI in healthcare with a primary focus on cost reduction, especially towards diagnosis and treatment. We defined the diagnosis and treatment architectures, investigated their characteristics, and categorized the roles that AI plays in the diagnostic and therapeutic paradigms. We experimented with various combinations of different assumptions by integrating AI and then comparing it against conventional costs. Lastly, we dwell on three powerful future concepts of AI, namely, pruning, bias, explainability, and regulatory approvals of AI systems. Conclusions: The model shows tremendous cost savings using AI tools in diagnosis and treatment. The economics of AI can be improved by incorporating pruning, reduction in AI bias, explainability, and regulatory approvals.
Collapse
|
8
|
Khanna NN, Maindarkar MA, Viswanathan V, Puvvula A, Paul S, Bhagawati M, Ahluwalia P, Ruzsa Z, Sharma A, Kolluri R, Krishnan PR, Singh IM, Laird JR, Fatemi M, Alizad A, Dhanjil SK, Saba L, Balestrieri A, Faa G, Paraskevas KI, Misra DP, Agarwal V, Sharma A, Teji JS, Al-Maini M, Nicolaides A, Rathore V, Naidu S, Liblik K, Johri AM, Turk M, Sobel DW, Miner M, Viskovic K, Tsoulfas G, Protogerou AD, Mavrogeni S, Kitas GD, Fouda MM, Kalra MK, Suri JS. Cardiovascular/Stroke Risk Stratification in Diabetic Foot Infection Patients Using Deep Learning-Based Artificial Intelligence: An Investigative Study. J Clin Med 2022; 11. [PMID: 36431321 DOI: 10.3390/jcm11226844] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 11/22/2022] Open
Abstract
A diabetic foot infection (DFI) is among the most serious, incurable, and costly to treat conditions. The presence of a DFI renders machine learning (ML) systems extremely nonlinear, posing difficulties in CVD/stroke risk stratification. In addition, there is a limited number of well-explained ML paradigms due to comorbidity, sample size limits, and weak scientific and clinical validation methodologies. Deep neural networks (DNN) are potent machines for learning that generalize nonlinear situations. The objective of this article is to propose a novel investigation of deep learning (DL) solutions for predicting CVD/stroke risk in DFI patients. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) search strategy was used for the selection of 207 studies. We hypothesize that a DFI is responsible for increased morbidity and mortality due to the worsening of atherosclerotic disease and affecting coronary artery disease (CAD). Since surrogate biomarkers for CAD, such as carotid artery disease, can be used for monitoring CVD, we can thus use a DL-based model, namely, Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) for CVD/stroke risk prediction in DFI patients, which combines covariates such as office and laboratory-based biomarkers, carotid ultrasound image phenotype (CUSIP) lesions, along with the DFI severity. We confirmed the viability of CVD/stroke risk stratification in the DFI patients. Strong designs were found in the research of the DL architectures for CVD/stroke risk stratification. Finally, we analyzed the AI bias and proposed strategies for the early diagnosis of CVD/stroke in DFI patients. Since DFI patients have an aggressive atherosclerotic disease, leading to prominent CVD/stroke risk, we, therefore, conclude that the DL paradigm is very effective for predicting the risk of CVD/stroke in DFI patients.
Collapse
|
9
|
Adleberg J, Wardeh A, Doo FX, Marinelli B, Cook TS, Mendelson DS, Kagen A. Predicting Patient Demographics From Chest Radiographs With Deep Learning. J Am Coll Radiol 2022; 19:1151-1161. [PMID: 35964688 DOI: 10.1016/j.jacr.2022.06.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND Deep learning models are increasingly informing medical decision making, for instance, in the detection of acute intracranial hemorrhage and pulmonary embolism. However, many models are trained on medical image databases that poorly represent the diversity of the patients they serve. In turn, many artificial intelligence models may not perform as well on assisting providers with important medical decisions for underrepresented populations. PURPOSE Assessment of the ability of deep learning models to classify the self-reported gender, age, self-reported ethnicity, and insurance status of an individual patient from a given chest radiograph. METHODS Models were trained and tested with 55,174 radiographs in the MIMIC Chest X-ray (MIMIC-CXR) database. External validation data came from two separate databases, one from CheXpert and another from a multihospital urban health care system after institutional review board approval. Macro-averaged area under the curve (AUC) values were used to evaluate performance of models. Code used for this study is open-source and available at https://github.com/ai-bias/cxr-bias, and pixelstopatients.com/models/demographics. RESULTS Accuracy of models to predict gender was nearly perfect, with 0.999 (95% confidence interval: 0.99-0.99) AUC on held-out test data and 0.994 (0.99-0.99) and 0.997 (0.99-0.99) on external validation data. There was high accuracy to predict age and ethnicity, ranging from 0.854 (0.80-0.91) to 0.911 (0.88-0.94) AUC, and moderate accuracy to predict insurance status, with AUC ranging from 0.705 (0.60-0.81) on held-out test data to 0.675 (0.54-0.79) on external validation data. CONCLUSIONS Deep learning models can predict the age, self-reported gender, self-reported ethnicity, and insurance status of a patient from a chest radiograph. Visualization techniques are useful to ensure deep learning models function as intended and to demonstrate anatomical regions of interest. These models can be used to ensure that training data are diverse, thereby ensuring artificial intelligence models that work on diverse populations.
Collapse
Affiliation(s)
- Jason Adleberg
- Department of Radiology, Mount Sinai Health System, New York, New York.
| | - Amr Wardeh
- Deaprtment of Radiology, Upstate University Hospital, Syracuse, New York
| | - Florence X Doo
- Department of Radiology, Mount Sinai Health System, New York, New York
| | - Brett Marinelli
- Department of Radiology, Mount Sinai Health System, New York, New York
| | - Tessa S Cook
- Director, 3D and Advanced Imaging Laboratory and Director, Center for Practice Transformation in Radiology, Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - David S Mendelson
- Vice Chair, Informatics, Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Alexander Kagen
- Site Chair, Department of Radiology, Mount Sinai West and Mount Sinai St. Luke's Hospitals, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|