1
|
Weber S, Wyszynski M, Godefroid M, Plattfaut R, Niehaves B. How do medical professionals make sense (or not) of AI? A social-media-based computational grounded theory study and an online survey. Comput Struct Biotechnol J 2024; 24:146-159. [PMID: 38434249 PMCID: PMC10904922 DOI: 10.1016/j.csbj.2024.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 03/05/2024] Open
Abstract
To investigate opinions and attitudes of medical professionals towards adopting AI-enabled healthcare technologies in their daily business, we used a mixed-methods approach. Study 1 employed a qualitative computational grounded theory approach analyzing 181 Reddit threads in the several subreddits of r/medicine. By utilizing an unsupervised machine learning clustering method, we identified three key themes: (1) consequences of AI, (2) physician-AI relationship, and (3) a proposed way forward. In particular Reddit posts related to the first two themes indicated that the medical professionals' fear of being replaced by AI and skepticism toward AI played a major role in the argumentations. Moreover, the results suggest that this fear is driven by little or moderate knowledge about AI. Posts related to the third theme focused on factual discussions about how AI and medicine have to be designed to become broadly adopted in health care. Study 2 quantitatively examined the relationship between the fear of AI, knowledge about AI, and medical professionals' intention to use AI-enabled technologies in more detail. Results based on a sample of 223 medical professionals who participated in the online survey revealed that the intention to use AI technologies increases with increasing knowledge about AI and that this effect is moderated by the fear of being replaced by AI.
Collapse
Affiliation(s)
- Sebastian Weber
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marc Wyszynski
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marie Godefroid
- University of Siegen, Information Systems, Kohlbettstr. 15, 57072 Siegen, Germany
| | - Ralf Plattfaut
- University of Duisburg-Essen, Information Systems and Transformation Management, Universitätsstr. 9, 45141 Essen, Germany
| | - Bjoern Niehaves
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| |
Collapse
|
2
|
Chen H, Ma X, Rives H, Serpedin A, Yao P, Rameau A. Trust in Machine Learning Driven Clinical Decision Support Tools Among Otolaryngologists. Laryngoscope 2024; 134:2799-2804. [PMID: 38230948 DOI: 10.1002/lary.31260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/18/2024]
Abstract
BACKGROUND Machine learning driven clinical decision support tools (ML-CDST) are on the verge of being integrated into clinical settings, including in Otolaryngology-Head & Neck Surgery. In this study, we investigated whether such CDST may influence otolaryngologists' diagnostic judgement. METHODS Otolaryngologists were recruited virtually across the United States for this experiment on human-AI interaction. Participants were shown 12 different video-stroboscopic exams from patients with previously diagnosed laryngopharyngeal reflux or vocal fold paresis and asked to determine the presence of disease. They were then exposed to a random diagnosis purportedly resulting from an ML-CDST and given the opportunity to revise their diagnosis. The ML-CDST output was presented with no explanation, a general explanation, or a specific explanation of its logic. The ML-CDST impact on diagnostic judgement was assessed with McNemar's test. RESULTS Forty-five participants were recruited. When participants reported less confidence (268 observations), they were significantly (p = 0.001) more likely to change their diagnostic judgement after exposure to ML-CDST output compared to when they reported more confidence (238 observations). Participants were more likely to change their diagnostic judgement when presented with a specific explanation of the CDST logic (p = 0.048). CONCLUSIONS Our study suggests that otolaryngologists are susceptible to accepting ML-CDST diagnostic recommendations, especially when less confident. Otolaryngologists' trust in ML-CDST output is increased when accompanied with a specific explanation of its logic. LEVEL OF EVIDENCE 2 Laryngoscope, 134:2799-2804, 2024.
Collapse
Affiliation(s)
- Hannah Chen
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Xiaoyue Ma
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medical College, New York, New York, USA
| | - Hal Rives
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Aisha Serpedin
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Peter Yao
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
3
|
Kotter E, Pinto Dos Santos D. [Ethics and artificial intelligence]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024; 64:498-502. [PMID: 38499692 DOI: 10.1007/s00117-024-01286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/26/2024] [Indexed: 03/20/2024]
Abstract
The introduction of artificial intelligence (AI) into radiology promises to enhance efficiency and improve diagnostic accuracy, yet it also raises manifold ethical questions. These include data protection issues, the future role of radiologists, liability when using AI systems, and the avoidance of bias. To prevent data bias, the datasets need to be compiled carefully and to be representative of the target population. Accordingly, the upcoming European Union AI act sets particularly high requirements for the datasets used in training medical AI systems. Cognitive bias occurs when radiologists place too much trust in the results provided by AI systems (overreliance). So far, diagnostic AI systems are used almost exclusively as "second look" systems. If diagnostic AI systems are to be used in the future as "first look" systems or even as autonomous AI systems in order to enhance efficiency in radiology, the question of liability needs to be addressed, comparable to liability for autonomous driving. Such use of AI would also significantly change the role of radiologists.
Collapse
Affiliation(s)
- Elmar Kotter
- Klinik für Diagnostische und Interventionelle Radiologie, Universitätsklinikum Freiburg, Hugstetterstr. 55, 79106, Freiburg, Deutschland.
| | - Daniel Pinto Dos Santos
- Institut für Diagnostische und Interventionelle Radiologie, Uniklinik Köln, Kerpener Str. 62, 50937, Köln, Deutschland.
- Institut für Diagnostische und Interventionelle Radiologie, Universitätsklinik Frankfurt, Theodor-Stern-Kai 7, 60596, Frankfurt am Main, Deutschland.
| |
Collapse
|
4
|
Hasani AM, Singh S, Zahergivar A, Ryan B, Nethala D, Bravomontenegro G, Mendhiratta N, Ball M, Farhadi F, Malayeri A. Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. Eur Radiol 2024; 34:3566-3574. [PMID: 37938381 DOI: 10.1007/s00330-023-10384-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/01/2023] [Accepted: 09/08/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVE Radiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports. METHODS A comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports. RESULTS The AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775. CONCLUSION The results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice. CLINICAL RELEVANCE STATEMENT The findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice. KEY POINTS • Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports. • Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. • Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
Collapse
Affiliation(s)
- Amir M Hasani
- Laboratory of Translation Research, National Heart Blood Lung Institute, NIH, Bethesda, MD, USA
| | - Shiva Singh
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Aryan Zahergivar
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Beth Ryan
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Daniel Nethala
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Neil Mendhiratta
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Mark Ball
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Faraz Farhadi
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Ashkan Malayeri
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA.
| |
Collapse
|
5
|
Yuan W, Du Z, Han S. Semi-supervised skin cancer diagnosis based on self-feedback threshold focal learning. Discov Oncol 2024; 15:180. [PMID: 38776027 PMCID: PMC11111630 DOI: 10.1007/s12672-024-01043-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/17/2024] [Indexed: 05/25/2024] Open
Abstract
Worldwide, skin cancer prevalence necessitates accurate diagnosis to alleviate public health burdens. Although the application of artificial intelligence in image analysis and pattern recognition has improved the accuracy and efficiency of early skin cancer diagnosis, existing supervised learning methods are limited due to their reliance on a large amount of labeled data. To overcome the limitations of data labeling and enhance the performance of diagnostic models, this study proposes a semi-supervised skin cancer diagnostic model based on Self-feedback Threshold Focal Learning (STFL), capable of utilizing partial labeled and a large scale of unlabeled medical images for training models in unseen scenarios. The proposed model dynamically adjusts the selection threshold of unlabeled samples during training, effectively filtering reliable unlabeled samples and using focal learning to mitigate the impact of class imbalance in further training. The study is experimentally validated on the HAM10000 dataset, which includes images of various types of skin lesions, with experiments conducted across different scales of labeled samples. With just 500 annotated samples, the model demonstrates robust performance (0.77 accuracy, 0.6408 Kappa, 0.77 recall, 0.7426 precision, and 0.7462 F1-score), showcasing its efficiency with limited labeled data. Further, comprehensive testing validates the semi-supervised model's significant advancements in diagnostic accuracy and efficiency, underscoring the value of integrating unlabeled data. This model offers a new perspective on medical image processing and contributes robust scientific support for the early diagnosis and treatment of skin cancer.
Collapse
Affiliation(s)
- Weicheng Yuan
- College of Basic Medicine, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China
| | - Zeyu Du
- School of Health Science, University of Manchester, Sackville Street, Manchester, 610101, England, UK
| | - Shuo Han
- Department of Anatomy, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
6
|
Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34:2826-2837. [PMID: 37828297 DOI: 10.1007/s00330-023-10230-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 10/14/2023]
Abstract
OBJECTIVES As the technology continues to evolve and advance, we can expect to see artificial intelligence (AI) being used in increasingly sophisticated ways to make a diagnosis and decisions such as suggesting the most appropriate imaging referrals. We aim to explore whether Chat Generative Pretrained Transformer (ChatGPT) can provide accurate imaging referrals for clinical use that are at least as good as the ESR iGuide. METHODS A comparative study was conducted in a tertiary hospital. Data was collected from 97 consecutive cases that were admitted to the emergency department with abdominal complaints. We compared the imaging test referral recommendations suggested by the ESR iGuide and the ChatGPT and analyzed cases of disagreement. In addition, we selected cases where ChatGPT recommended a chest abdominal pelvis (CAP) CT (n = 66), and asked four specialists to grade the appropriateness of the referral. RESULTS ChatGPT recommendations were consistent with the recommendations provided by the ESR iGuide. No statistical differences were found between the appropriateness of referrals by age or gender. Using a sub-analysis of CAP cases, a high agreement between ChatGPT and the specialists was found. Cases of disagreement (12.4%) were further analyzed and presented themes of vague recommendations such as "it would be advisable" and "this would help to rule out." CONCLUSIONS ChatGPT's ability to guide the selection of appropriate tests may be comparable to some degree with the ESR iGuide. Features such as the clinical, ethical, and regulatory implications are still warranted and need to be addressed prior to clinical implementation. Further studies are needed to confirm these findings. CLINICAL RELEVANCE STATEMENT The article explores the potential of using advanced language models, such as ChatGPT, in healthcare as a CDS for selecting appropriate imaging tests. Using ChatGPT can improve the efficiency of the decision-making process KEY POINTS: • ChatGPT recommendations were highly consistent with the recommendations provided by the ESR iGuide. • ChatGPT's ability in guiding the selection of appropriate tests may be comparable to some degree with ESR iGuide's.
Collapse
Affiliation(s)
- Shani Rosen
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy, Institute of Epidemiology & Health Policy Research, Sheba Medical Center, Tel HaShomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Mor Saban
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
7
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. Can Assoc Radiol J 2024; 75:226-244. [PMID: 38251882 DOI: 10.1177/08465371231222229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‑growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‑society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- Data Science Institute, American College of Radiology, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- American College of Radiology, Reston, VA, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, SA, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
8
|
Cecil J, Lermer E, Hudecek MFC, Sauer J, Gaube S. Explainability does not mitigate the negative impact of incorrect AI advice in a personnel selection task. Sci Rep 2024; 14:9736. [PMID: 38679619 PMCID: PMC11056364 DOI: 10.1038/s41598-024-60220-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/19/2024] [Indexed: 05/01/2024] Open
Abstract
Despite the rise of decision support systems enabled by artificial intelligence (AI) in personnel selection, their impact on decision-making processes is largely unknown. Consequently, we conducted five experiments (N = 1403 students and Human Resource Management (HRM) employees) investigating how people interact with AI-generated advice in a personnel selection task. In all pre-registered experiments, we presented correct and incorrect advice. In Experiments 1a and 1b, we manipulated the source of the advice (human vs. AI). In Experiments 2a, 2b, and 2c, we further manipulated the type of explainability of AI advice (2a and 2b: heatmaps and 2c: charts). We hypothesized that accurate and explainable advice improves decision-making. The independent variables were regressed on task performance, perceived advice quality and confidence ratings. The results consistently showed that incorrect advice negatively impacted performance, as people failed to dismiss it (i.e., overreliance). Additionally, we found that the effects of source and explainability of advice on the dependent variables were limited. The lack of reduction in participants' overreliance on inaccurate advice when the systems' predictions were made more explainable highlights the complexity of human-AI interaction and the need for regulation and quality standards in HRM.
Collapse
Affiliation(s)
- Julia Cecil
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany.
| | - Eva Lermer
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- Department of Business Psychology, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Matthias F C Hudecek
- Department of Experimental Psychology, University of Regensburg, Regensburg, Germany
| | - Jan Sauer
- Department of Business Administration, University of Applied Sciences Amberg-Weiden, Weiden, Germany
| | - Susanne Gaube
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- UCL Global Business School for Health, University College London, London, UK
| |
Collapse
|
9
|
Jiang T, Chen C, Zhou Y, Cai S, Yan Y, Sui L, Lai M, Song M, Zhu X, Pan Q, Wang H, Chen X, Wang K, Xiong J, Chen L, Xu D. Deep learning-assisted diagnosis of benign and malignant parotid tumors based on ultrasound: a retrospective study. BMC Cancer 2024; 24:510. [PMID: 38654281 PMCID: PMC11036551 DOI: 10.1186/s12885-024-12277-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND To develop a deep learning(DL) model utilizing ultrasound images, and evaluate its efficacy in distinguishing between benign and malignant parotid tumors (PTs), as well as its practicality in assisting clinicians with accurate diagnosis. METHODS A total of 2211 ultrasound images of 980 pathologically confirmed PTs (Training set: n = 721; Validation set: n = 82; Internal-test set: n = 89; External-test set: n = 88) from 907 patients were retrospectively included in this study. The optimal model was selected and the diagnostic performance evaluation is conducted by utilizing the area under curve (AUC) of the receiver-operating characteristic(ROC) based on five different DL networks constructed at varying depths. Furthermore, a comparison of different seniority radiologists was made in the presence of the optimal auxiliary diagnosis model. Additionally, the diagnostic confusion matrix of the optimal model was calculated, and an analysis and summary of misjudged cases' characteristics were conducted. RESULTS The Resnet18 demonstrated superior diagnostic performance, with an AUC value of 0.947, accuracy of 88.5%, sensitivity of 78.2%, and specificity of 92.7% in internal-test set, and with an AUC value of 0.925, accuracy of 89.8%, sensitivity of 83.3%, and specificity of 90.6% in external-test set. The PTs were subjectively assessed twice by six radiologists, both with and without the assisted of the model. With the assisted of the model, both junior and senior radiologists demonstrated enhanced diagnostic performance. In the internal-test set, there was an increase in AUC values by 0.062 and 0.082 for junior radiologists respectively, while senior radiologists experienced an improvement of 0.066 and 0.106 in their respective AUC values. CONCLUSIONS The DL model based on ultrasound images demonstrates exceptional capability in distinguishing between benign and malignant PTs, thereby assisting radiologists of varying expertise levels to achieve heightened diagnostic performance, and serve as a noninvasive imaging adjunct diagnostic method for clinical purposes.
Collapse
Affiliation(s)
- Tian Jiang
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Chen Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yahan Zhou
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Shenzhou Cai
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yuqi Yan
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Lin Sui
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Min Lai
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
- Second Clinical College, Zhejiang University of Traditional Chinese Medicine, 310022, Hangzhou, Zhejiang, China
| | - Mei Song
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Xi Zhu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Qianmeng Pan
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Hui Wang
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Xiayi Chen
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Kai Wang
- Dongyang Hospital Affiliated to Wenzhou Medical University, 322100, Jinhua, Zhejiang, China
| | - Jing Xiong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518000, Shenzhen, Guangdong, China
| | - Liyu Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
| | - Dong Xu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China.
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China.
| |
Collapse
|
10
|
Montomoli J, Bitondo MM, Cascella M, Rezoagli E, Romeo L, Bellini V, Semeraro F, Gamberini E, Frontoni E, Agnoletti V, Altini M, Benanti P, Bignami EG. Algor-ethics: charting the ethical path for AI in critical care. J Clin Monit Comput 2024:10.1007/s10877-024-01157-y. [PMID: 38573370 DOI: 10.1007/s10877-024-01157-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
The integration of Clinical Decision Support Systems (CDSS) based on artificial intelligence (AI) in healthcare is groundbreaking evolution with enormous potential, but its development and ethical implementation, presents unique challenges, particularly in critical care, where physicians often deal with life-threating conditions requiring rapid actions and patients unable to participate in the decisional process. Moreover, development of AI-based CDSS is complex and should address different sources of bias, including data acquisition, health disparities, domain shifts during clinical use, and cognitive biases in decision-making. In this scenario algor-ethics is mandatory and emphasizes the integration of 'Human-in-the-Loop' and 'Algorithmic Stewardship' principles, and the benefits of advanced data engineering. The establishment of Clinical AI Departments (CAID) is necessary to lead AI innovation in healthcare, ensuring ethical integrity and human-centered development in this rapidly evolving field.
Collapse
Affiliation(s)
- Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
- Health Services Research, Evaluation and Policy Unit, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
| | - Maria Maddalena Bitondo
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Marco Cascella
- Unit of Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana, " University of Salerno, Baronissi, Salerno, Italy
| | - Emanuele Rezoagli
- School of Medicine and Surgery, University of Milano-Bicocca, Via Cadore, 48, Monza, 20900, Italy
- Dipartimento di Emergenza e Urgenza, Terapia intensiva e Semintensiva adulti e pediatrica, Fondazione IRCCS San Gerardo dei Tintori, Via Pergolesi, 33, Monza, 20900, Italy
| | - Luca Romeo
- Department of Economics and Law, University of Macerata, Macerata, 62100, Italy
| | - Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Ospedale Maggiore Carlo Alberto Pizzardi, Largo Bartolo Nigrisoli, 2, Bologna, 40133, Italy
| | - Emiliano Gamberini
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Emanuele Frontoni
- Department of Political Sciences, Communication and International Relations, University of Macerata, Macerata, 62100, Italy
| | - Vanni Agnoletti
- Department of Surgery and Trauma, Anesthesia and Intensive Care Unit, Maurizio Bufalini Hospital, Romagna Local Health Authority, Viale Giovanni Ghirotti, 286, Cesena, 47521, Italy
| | - Mattia Altini
- Hospital Care Sector, Emilia-Romagna Region, Via Aldo Moro, 21, Bologna, 40127, Italy
| | - Paolo Benanti
- Pontifical Gregorian University, Piazza della Pilotta 4, Roma, 00187, Italy
| | - Elena Giovanna Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| |
Collapse
|
11
|
Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024; 30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]
Abstract
Despite increasing numbers of regulatory approvals, deep learning-based computational pathology systems often overlook the impact of demographic factors on performance, potentially leading to biases. This concern is all the more important as computational pathology has leveraged large public datasets that underrepresent certain demographic groups. Using publicly available data from The Cancer Genome Atlas and the EBRAINS brain tumor atlas, as well as internal patient data, we show that whole-slide image classification models display marked performance disparities across different demographic groups when used to subtype breast and lung carcinomas and to predict IDH1 mutations in gliomas. For example, when using common modeling approaches, we observed performance gaps (in area under the receiver operating characteristic curve) between white and Black patients of 3.0% for breast cancer subtyping, 10.9% for lung cancer subtyping and 16.0% for IDH1 mutation prediction in gliomas. We found that richer feature representations obtained from self-supervised vision foundation models reduce performance variations between groups. These representations provide improvements upon weaker models even when those weaker models are combined with state-of-the-art bias mitigation strategies and modeling choices. Nevertheless, self-supervised vision foundation models do not fully eliminate these discrepancies, highlighting the continuing need for bias mitigation efforts in computational pathology. Finally, we demonstrate that our results extend to other demographic factors beyond patient race. Given these findings, we encourage regulatory and policy agencies to integrate demographic-stratified evaluation into their assessment guidelines.
Collapse
Affiliation(s)
- Anurag Vaidya
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
| | - Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Andrew H Song
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Guillaume Jaume
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yuzhe Yang
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Thomas Hartvigsen
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Emma C Dyer
- T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Muhammad Shaban
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
12
|
Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, Shalit U, van der Schaar M, Wagstaff KL. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS DIGITAL HEALTH 2024; 3:e0000474. [PMID: 38620047 PMCID: PMC11018283 DOI: 10.1371/journal.pdig.0000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/18/2024] [Indexed: 04/17/2024]
Abstract
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.
Collapse
Affiliation(s)
- Aparna Balagopalan
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
| | - Ioana Baldini
- IBM Research; Yorktown Heights, New York, United States of America
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center; Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, School of Medicine, Emory University; Atlanta, Georgia, United States of America
| | - Liam G. McCoy
- Division of Neurology, Department of Medicine, University of Alberta; Edmonton, Alberta, Canada
| | - Tristan Naumann
- Microsoft Research; Redmond, Washington, United States of America
| | - Uri Shalit
- The Faculty of Data and Decision Sciences, Technion; Haifa, Israel
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge; Cambridge, United Kingdom
- The Alan Turing Institute; London, United Kingdom
| | | |
Collapse
|
13
|
Simmons C, DeGrasse J, Polakovic S, Aibinder W, Throckmorton T, Noerdlinger M, Papandrea R, Trenhaile S, Schoch B, Gobbato B, Routman H, Parsons M, Roche CP. Initial clinical experience with a predictive clinical decision support tool for anatomic and reverse total shoulder arthroplasty. EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY & TRAUMATOLOGY : ORTHOPEDIE TRAUMATOLOGIE 2024; 34:1307-1318. [PMID: 38095688 DOI: 10.1007/s00590-023-03796-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/19/2023] [Indexed: 04/02/2024]
Abstract
PURPOSE Clinical decision support tools (CDSTs) are software that generate patient-specific assessments that can be used to better inform healthcare provider decision making. Machine learning (ML)-based CDSTs have recently been developed for anatomic (aTSA) and reverse (rTSA) total shoulder arthroplasty to facilitate more data-driven, evidence-based decision making. Using this shoulder CDST as an example, this external validation study provides an overview of how ML-based algorithms are developed and discusses the limitations of these tools. METHODS An external validation for a novel CDST was conducted on 243 patients (120F/123M) who received a personalized prediction prior to surgery and had short-term clinical follow-up from 3 months to 2 years after primary aTSA (n = 43) or rTSA (n = 200). The outcome score and active range of motion predictions were compared to each patient's actual result at each timepoint, with the accuracy quantified by the mean absolute error (MAE). RESULTS The results of this external validation demonstrate the CDST accuracy to be similar (within 10%) or better than the MAEs from the published internal validation. A few predictive models were observed to have substantially lower MAEs than the internal validation, specifically, Constant (31.6% better), active abduction (22.5% better), global shoulder function (20.0% better), active external rotation (19.0% better), and active forward elevation (16.2% better), which is encouraging; however, the sample size was small. CONCLUSION A greater understanding of the limitations of ML-based CDSTs will facilitate more responsible use and build trust and confidence, potentially leading to greater adoption. As CDSTs evolve, we anticipate greater shared decision making between the patient and surgeon with the aim of achieving even better outcomes and greater levels of patient satisfaction.
Collapse
Affiliation(s)
- Chelsey Simmons
- University of Florida, PO Box 116250, Gainesville, FL, 32605, USA
- Exactech, 2320 NW 66th Court, Gainesville, FL, 32653, USA
| | | | | | - William Aibinder
- University of Michigan, 1500 E. Medical Center Drive, Ann Arbor, MI, 48109, USA
| | | | - Mayo Noerdlinger
- Atlantic Orthopaedics and Sports Medicine, 1900 Lafayette Road, Portsmouth, NH, USA
| | | | | | - Bradley Schoch
- Mayo Clinic, Florida, 4500 San Pablo Rd., Jacksonville, FL, 32224, USA
| | - Bruno Gobbato
- , R. José Emmendoerfer, 1449, Nova Brasília, Jaraguá do Sul, SC, 89252-278, Brazil
| | - Howard Routman
- Atlantis Orthopedics, 900 Village Square Crossing, #170, Palm Beach Gardens, FL, 33410, USA
| | - Moby Parsons
- , 333 Borthwick Ave Suite #301, Portsmouth, NH, 03801, USA
| | | |
Collapse
|
14
|
Ciet P, Eade C, Ho ML, Laborie LB, Mahomed N, Naidoo J, Pace E, Segal B, Toso S, Tschauner S, Vamyanmane DK, Wagner MW, Shelmerdine SC. The unintended consequences of artificial intelligence in paediatric radiology. Pediatr Radiol 2024; 54:585-593. [PMID: 37665368 DOI: 10.1007/s00247-023-05746-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 09/05/2023]
Abstract
Over the past decade, there has been a dramatic rise in the interest relating to the application of artificial intelligence (AI) in radiology. Originally only 'narrow' AI tasks were possible; however, with increasing availability of data, teamed with ease of access to powerful computer processing capabilities, we are becoming more able to generate complex and nuanced prediction models and elaborate solutions for healthcare. Nevertheless, these AI models are not without their failings, and sometimes the intended use for these solutions may not lead to predictable impacts for patients, society or those working within the healthcare profession. In this article, we provide an overview of the latest opinions regarding AI ethics, bias, limitations, challenges and considerations that we should all contemplate in this exciting and expanding field, with a special attention to how this applies to the unique aspects of a paediatric population. By embracing AI technology and fostering a multidisciplinary approach, it is hoped that we can harness the power AI brings whilst minimising harm and ensuring a beneficial impact on radiology practice.
Collapse
Affiliation(s)
- Pierluigi Ciet
- Department of Radiology and Nuclear Medicine, Erasmus MC - Sophia's Children's Hospital, Rotterdam, The Netherlands
- Department of Medical Sciences, University of Cagliari, Cagliari, Italy
| | | | - Mai-Lan Ho
- University of Missouri, Columbia, MO, USA
| | - Lene Bjerke Laborie
- Department of Radiology, Section for Paediatrics, Haukeland University Hospital, Bergen, Norway
- Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Nasreen Mahomed
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Jaishree Naidoo
- Paediatric Diagnostic Imaging, Dr J Naidoo Inc., Johannesburg, South Africa
- Envisionit Deep AI Ltd, Coveham House, Downside Bridge Road, Cobham, UK
| | - Erika Pace
- Department of Diagnostic Radiology, The Royal Marsden NHS Foundation Trust, London, UK
| | - Bradley Segal
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Seema Toso
- Pediatric Radiology, Children's Hospital, University Hospitals of Geneva, Geneva, Switzerland
| | - Sebastian Tschauner
- Division of Paediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Dhananjaya K Vamyanmane
- Department of Pediatric Radiology, Indira Gandhi Institute of Child Health, Bangalore, India
| | - Matthias W Wagner
- Department of Diagnostic Imaging, Division of Neuroradiology, The Hospital for Sick Children, Toronto, Canada
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
- Department of Neuroradiology, University Hospital Augsburg, Augsburg, Germany
| | - Susan C Shelmerdine
- Department of Clinical Radiology, Great Ormond Street Hospital for Children NHS Foundation Trust, Great Ormond Street, London, WC1H 3JH, UK.
- Great Ormond Street Hospital for Children, UCL Great Ormond Street Institute of Child Health, London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, 30 Guilford Street, Bloomsbury, London, UK.
- Department of Clinical Radiology, St George's Hospital, London, UK.
| |
Collapse
|
15
|
Anderson JW, Visweswaran S. Algorithmic Individual Fairness and Healthcare: A Scoping Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.25.24304853. [PMID: 38585746 PMCID: PMC10996729 DOI: 10.1101/2024.03.25.24304853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Objective Statistical and artificial intelligence algorithms are increasingly being developed for use in healthcare. These algorithms may reflect biases that magnify disparities in clinical care, and there is a growing need for understanding how algorithmic biases can be mitigated in pursuit of algorithmic fairness. Individual fairness in algorithms constrains algorithms to the notion that "similar individuals should be treated similarly." We conducted a scoping review on algorithmic individual fairness to understand the current state of research in the metrics and methods developed to achieve individual fairness and its applications in healthcare. Methods We searched three databases, PubMed, ACM Digital Library, and IEEE Xplore, for algorithmic individual fairness metrics, algorithmic bias mitigation, and healthcare applications. Our search was restricted to articles published between January 2013 and September 2023. We identified 1,886 articles through database searches and manually identified one article from which we included 30 articles in the review. Data from the selected articles were extracted, and the findings were synthesized. Results Based on the 30 articles in the review, we identified several themes, including philosophical underpinnings of fairness, individual fairness metrics, mitigation methods for achieving individual fairness, implications of achieving individual fairness on group fairness and vice versa, fairness metrics that combined individual fairness and group fairness, software for measuring and optimizing individual fairness, and applications of individual fairness in healthcare. Conclusion While there has been significant work on algorithmic individual fairness in recent years, the definition, use, and study of individual fairness remain in their infancy, especially in healthcare. Future research is needed to apply and evaluate individual fairness in healthcare comprehensively.
Collapse
Affiliation(s)
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
16
|
Wei ML, Tada M, So A, Torres R. Artificial intelligence and skin cancer. Front Med (Lausanne) 2024; 11:1331895. [PMID: 38566925 PMCID: PMC10985205 DOI: 10.3389/fmed.2024.1331895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Artificial intelligence is poised to rapidly reshape many fields, including that of skin cancer screening and diagnosis, both as a disruptive and assistive technology. Together with the collection and availability of large medical data sets, artificial intelligence will become a powerful tool that can be leveraged by physicians in their diagnoses and treatment plans for patients. This comprehensive review focuses on current progress toward AI applications for patients, primary care providers, dermatologists, and dermatopathologists, explores the diverse applications of image and molecular processing for skin cancer, and highlights AI's potential for patient self-screening and improving diagnostic accuracy for non-dermatologists. We additionally delve into the challenges and barriers to clinical implementation, paths forward for implementation and areas of active research.
Collapse
Affiliation(s)
- Maria L. Wei
- Department of Dermatology, University of California, San Francisco, San Francisco, CA, United States
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| | - Mikio Tada
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, United States
| | - Alexandra So
- School of Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Rodrigo Torres
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| |
Collapse
|
17
|
Campion JR, O'Connor DB, Lahiff C. Human-artificial intelligence interaction in gastrointestinal endoscopy. World J Gastrointest Endosc 2024; 16:126-135. [PMID: 38577646 PMCID: PMC10989254 DOI: 10.4253/wjge.v16.i3.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/18/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
The number and variety of applications of artificial intelligence (AI) in gastrointestinal (GI) endoscopy is growing rapidly. New technologies based on machine learning (ML) and convolutional neural networks (CNNs) are at various stages of development and deployment to assist patients and endoscopists in preparing for endoscopic procedures, in detection, diagnosis and classification of pathology during endoscopy and in confirmation of key performance indicators. Platforms based on ML and CNNs require regulatory approval as medical devices. Interactions between humans and the technologies we use are complex and are influenced by design, behavioural and psychological elements. Due to the substantial differences between AI and prior technologies, important differences may be expected in how we interact with advice from AI technologies. Human–AI interaction (HAII) may be optimised by developing AI algorithms to minimise false positives and designing platform interfaces to maximise usability. Human factors influencing HAII may include automation bias, alarm fatigue, algorithm aversion, learning effect and deskilling. Each of these areas merits further study in the specific setting of AI applications in GI endoscopy and professional societies should engage to ensure that sufficient emphasis is placed on human-centred design in development of new AI technologies.
Collapse
Affiliation(s)
- John R Campion
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| | - Donal B O'Connor
- Department of Surgery, Trinity College Dublin, Dublin D02 R590, Ireland
| | - Conor Lahiff
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| |
Collapse
|
18
|
Topff L, Steltenpool S, Ranschaert ER, Ramanauskas N, Menezes R, Visser JJ, Beets-Tan RGH, Hartkamp NS. Artificial intelligence-assisted double reading of chest radiographs to detect clinically relevant missed findings: a two-centre evaluation. Eur Radiol 2024:10.1007/s00330-024-10676-w. [PMID: 38466390 DOI: 10.1007/s00330-024-10676-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/21/2024] [Accepted: 02/01/2024] [Indexed: 03/13/2024]
Abstract
OBJECTIVES To evaluate an artificial intelligence (AI)-assisted double reading system for detecting clinically relevant missed findings on routinely reported chest radiographs. METHODS A retrospective study was performed in two institutions, a secondary care hospital and tertiary referral oncology centre. Commercially available AI software performed a comparative analysis of chest radiographs and radiologists' authorised reports using a deep learning and natural language processing algorithm, respectively. The AI-detected discrepant findings between images and reports were assessed for clinical relevance by an external radiologist, as part of the commercial service provided by the AI vendor. The selected missed findings were subsequently returned to the institution's radiologist for final review. RESULTS In total, 25,104 chest radiographs of 21,039 patients (mean age 61.1 years ± 16.2 [SD]; 10,436 men) were included. The AI software detected discrepancies between imaging and reports in 21.1% (5289 of 25,104). After review by the external radiologist, 0.9% (47 of 5289) of cases were deemed to contain clinically relevant missed findings. The institution's radiologists confirmed 35 of 47 missed findings (74.5%) as clinically relevant (0.1% of all cases). Missed findings consisted of lung nodules (71.4%, 25 of 35), pneumothoraces (17.1%, 6 of 35) and consolidations (11.4%, 4 of 35). CONCLUSION The AI-assisted double reading system was able to identify missed findings on chest radiographs after report authorisation. The approach required an external radiologist to review the AI-detected discrepancies. The number of clinically relevant missed findings by radiologists was very low. CLINICAL RELEVANCE STATEMENT The AI-assisted double reader workflow was shown to detect diagnostic errors and could be applied as a quality assurance tool. Although clinically relevant missed findings were rare, there is potential impact given the common use of chest radiography. KEY POINTS • A commercially available double reading system supported by artificial intelligence was evaluated to detect reporting errors in chest radiographs (n=25,104) from two institutions. • Clinically relevant missed findings were found in 0.1% of chest radiographs and consisted of unreported lung nodules, pneumothoraces and consolidations. • Applying AI software as a secondary reader after report authorisation can assist in reducing diagnostic errors without interrupting the radiologist's reading workflow. However, the number of AI-detected discrepancies was considerable and required review by a radiologist to assess their relevance.
Collapse
Affiliation(s)
- Laurens Topff
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
| | - Sanne Steltenpool
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| | - Erik R Ranschaert
- Department of Radiology, St. Nikolaus Hospital, Eupen, Belgium
- Ghent University, Ghent, Belgium
| | - Naglis Ramanauskas
- Oxipit UAB, Vilnius, Lithuania
- Department of Radiology, Nuclear Medicine and Medical Physics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Vilnius, Lithuania
| | - Renee Menezes
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Jacob J Visser
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Regina G H Beets-Tan
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Nolan S Hartkamp
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| |
Collapse
|
19
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: Practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. J Med Imaging Radiat Oncol 2024; 68:7-26. [PMID: 38259140 DOI: 10.1111/1754-9485.13612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 11/23/2023] [Indexed: 01/24/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama, USA
- American College of Radiology Data Science Institute, Reston, Virginia, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, South Australia, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts, USA
- Tufts University Medical School, Boston, Massachusetts, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Reston, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, South Australia, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
20
|
Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen LR, Doraiswamy PM, Picard R. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med 2024; 30:573-583. [PMID: 38317019 PMCID: PMC10878981 DOI: 10.1038/s41591-023-02728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/16/2023] [Indexed: 02/07/2024]
Abstract
Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician-machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician-machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.
Collapse
Affiliation(s)
- Matthew Groh
- Northwestern University Kellogg School of Management, Evanston, IL, USA.
- MIT Media Lab, Cambridge, MA, USA.
| | - Omar Badri
- Northeast Dermatology Associates, Beverly, MA, USA
| | - Roxana Daneshjou
- Stanford Department of Biomedical Data Science, Stanford, CA, USA
- Stanford Department of Dermatology, Redwood City, CA, USA
| | | | | | - Luis R Soenksen
- Wyss Institute for Bioinspired Engineering at Harvard, Boston, MA, USA
| | - P Murali Doraiswamy
- MIT Media Lab, Cambridge, MA, USA
- Duke University School of Medicine, Durham, NC, USA
| | | |
Collapse
|
21
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. J Am Coll Radiol 2024:S1546-1440(23)01020-7. [PMID: 38276923 DOI: 10.1016/j.jacr.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Artificial intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. KEY POINTS.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama; American College of Radiology Data Science Institute, Reston, Virginia
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California; Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany; Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts; Tufts University Medical School, Boston, Massachusetts; Commision on Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia; College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
22
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. Insights Imaging 2024; 15:16. [PMID: 38246898 PMCID: PMC10800328 DOI: 10.1186/s13244-023-01541-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones.This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.Key points • The incorporation of artificial intelligence (AI) in radiological practice demands increased monitoring of its utility and safety.• Cooperation between developers, clinicians, and regulators will allow all involved to address ethical issues and monitor AI performance.• AI can fulfil its promise to advance patient well-being if all steps from development to integration in healthcare are rigorously evaluated.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- American College of Radiology Data Science Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
23
|
Nguyen T. ChatGPT in Medical Education: A Precursor for Automation Bias? JMIR MEDICAL EDUCATION 2024; 10:e50174. [PMID: 38231545 PMCID: PMC10831594 DOI: 10.2196/50174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients' values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user's susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students' first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias.
Collapse
Affiliation(s)
- Tina Nguyen
- The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
24
|
Day TG, Matthew J, Budd SF, Venturini L, Wright R, Farruggia A, Vigneswaran TV, Zidere V, Hajnal JV, Razavi R, Simpson JM, Kainz B. Interaction between clinicians and artificial intelligence to detect fetal atrioventricular septal defects on ultrasound: how can we optimize collaborative performance? ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2024. [PMID: 38197584 DOI: 10.1002/uog.27577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/19/2023] [Accepted: 12/30/2023] [Indexed: 01/11/2024]
Abstract
OBJECTIVES Artificial intelligence (AI) has shown promise in improving the performance of fetal ultrasound screening in detecting congenital heart disease (CHD). The effect of giving AI advice to human operators has not been studied in this context. Giving additional information about AI model workings, such as confidence scores for AI predictions, may be a way of further improving performance. Our aims were to investigate whether AI advice improved overall diagnostic accuracy (using a single CHD lesion as an exemplar), and to determine what, if any, additional information given to clinicians optimized the overall performance of the clinician-AI team. METHODS An AI model was trained to classify a single fetal CHD lesion (atrioventricular septal defect (AVSD)), using a retrospective cohort of 121 130 cardiac four-chamber images extracted from 173 ultrasound scan videos (98 with normal hearts, 75 with AVSD); a ResNet50 model architecture was used. Temperature scaling of model prediction probability was performed on a validation set, and gradient-weighted class activation maps (grad-CAMs) produced. Ten clinicians (two consultant fetal cardiologists, three trainees in pediatric cardiology and five fetal cardiac sonographers) were recruited from a center of fetal cardiology to participate. Each participant was shown 2000 fetal four-chamber images in a random order (1000 normal and 1000 AVSD). The dataset comprised 500 images, each shown in four conditions: (1) image alone without AI output; (2) image with binary AI classification; (3) image with AI model confidence; and (4) image with grad-CAM image overlays. The clinicians were asked to classify each image as normal or AVSD. RESULTS A total of 20 000 image classifications were recorded from 10 clinicians. The AI model alone achieved an accuracy of 0.798 (95% CI, 0.760-0.832), a sensitivity of 0.868 (95% CI, 0.834-0.902) and a specificity of 0.728 (95% CI, 0.702-0.754), and the clinicians without AI achieved an accuracy of 0.844 (95% CI, 0.834-0.854), a sensitivity of 0.827 (95% CI, 0.795-0.858) and a specificity of 0.861 (95% CI, 0.828-0.895). Showing a binary (normal or AVSD) AI model output resulted in significant improvement in accuracy to 0.865 (P < 0.001). This effect was seen in both experienced and less-experienced participants. Giving incorrect AI advice resulted in a significant deterioration in overall accuracy, from 0.761 to 0.693 (P < 0.001), which was driven by an increase in both Type-I and Type-II errors by the clinicians. This effect was worsened by showing model confidence (accuracy, 0.649; P < 0.001) or grad-CAM (accuracy, 0.644; P < 0.001). CONCLUSIONS AI has the potential to improve performance when used in collaboration with clinicians, even if the model performance does not reach expert level. Giving additional information about model workings such as model confidence and class activation map image overlays did not improve overall performance, and actually worsened performance for images for which the AI model was incorrect. © 2024 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
Affiliation(s)
- T G Day
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J Matthew
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - S F Budd
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - L Venturini
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Wright
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - A Farruggia
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - T V Vigneswaran
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - V Zidere
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Harris Birthright Research Centre, King's College London NHS Foundation Trust, London, UK
| | - J V Hajnal
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Razavi
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J M Simpson
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - B Kainz
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
| |
Collapse
|
25
|
Dot G, Gajny L, Ducret M. [The challenges of artificial intelligence in odontology]. Med Sci (Paris) 2024; 40:79-84. [PMID: 38299907 DOI: 10.1051/medsci/2023199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open
Abstract
Artificial intelligence has numerous potential applications in dentistry, as these algorithms aim to improve the efficiency and safety of several clinical situations. While the first commercial solutions are being proposed, most of these algorithms have not been sufficiently validated for clinical use. This article describes the challenges surrounding the development of these new tools, to help clinicians to keep a critical eye on this technology.
Collapse
Affiliation(s)
- Gauthier Dot
- UFR odontologie, université Paris Cité, Paris, France - AP-HP, hôpital Pitié-Salpêtrière, service de médecine bucco-dentaire, Paris, France - Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Laurent Gajny
- Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Maxime Ducret
- Faculté d'odontologie, université Claude Bernard Lyon 1, hospices civils de Lyon, Lyon, France
| |
Collapse
|
26
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement from the ACR, CAR, ESR, RANZCR and RSNA. Radiol Artif Intell 2024; 6:e230513. [PMID: 38251899 PMCID: PMC10831521 DOI: 10.1148/ryai.230513] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. This article is simultaneously published in Insights into Imaging (DOI 10.1186/s13244-023-01541-3), Journal of Medical Imaging and Radiation Oncology (DOI 10.1111/1754-9485.13612), Canadian Association of Radiologists Journal (DOI 10.1177/08465371231222229), Journal of the American College of Radiology (DOI 10.1016/j.jacr.2023.12.005), and Radiology: Artificial Intelligence (DOI 10.1148/ryai.230513). Keywords: Artificial Intelligence, Radiology, Automation, Machine Learning Published under a CC BY 4.0 license. ©The Author(s) 2024. Editor's Note: The RSNA Board of Directors has endorsed this article. It has not undergone review or editing by this journal.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical
Center, Birmingham, AL, USA
- American College of Radiology Data Science
Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich
School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and
Interventional Radiology, Medical Center, Faculty of Medicine, University of
Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA,
USA
- Stanford Center for Artificial
Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical
Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning,
University of Adelaide, Adelaide, Australia
| | - Daniel Pinto dos Santos
- Department of Radiology, University
Hospital of Cologne, Cologne, Germany
- Department of Radiology, University
Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation
Oncology, and Nuclear Medicine, Université de Montréal,
Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital
& Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston,
MA, USA
- Commission On Informatics, and Member,
Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging,
Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health,
Flinders University, Adelaide, Australia
| |
Collapse
|
27
|
Teneggi J, Yi PH, Sulam J. Examination-Level Supervision for Deep Learning-based Intracranial Hemorrhage Detection on Head CT Scans. Radiol Artif Intell 2024; 6:e230159. [PMID: 38294324 PMCID: PMC10831525 DOI: 10.1148/ryai.230159] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; P = .64) and the CQ500 dataset (0.90 vs 0.92; P = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; P = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average f1 = 0.73 vs 0.65; P < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. Keywords: CT, Head/Neck, Brain/Brain Stem, Hemorrhage Supplemental material is available for this article. © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.
Collapse
Affiliation(s)
- Jacopo Teneggi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Paul H. Yi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Jeremias Sulam
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| |
Collapse
|
28
|
Jabbour S, Fouhey D, Shepard S, Valley TS, Kazerooni EA, Banovic N, Wiens J, Sjoding MW. Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study. JAMA 2023; 330:2275-2284. [PMID: 38112814 PMCID: PMC10731487 DOI: 10.1001/jama.2023.22295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/11/2023] [Indexed: 12/21/2023]
Abstract
Importance Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established. Objectives To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors. Design, Setting, and Participants Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants. Interventions Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions. Main Outcomes and Measures Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease. Results Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model. Conclusions and Relevance Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect. Trial Registration ClinicalTrials.gov Identifier: NCT06098950.
Collapse
Affiliation(s)
- Sarah Jabbour
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - David Fouhey
- Computer Science and Engineering, University of Michigan, Ann Arbor
- Now with Computer Science Courant Institute, New York University, New York
- Now with Electrical and Computer Engineering Tandon School of Engineering, New York University, New York
| | | | - Thomas S. Valley
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| | - Ella A. Kazerooni
- Department of Radiology, University of Michigan Medical School, Ann Arbor
| | - Nikola Banovic
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Jenna Wiens
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Michael W. Sjoding
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| |
Collapse
|
29
|
Smith CM, Weathers AL, Lewis SL. An overview of clinical machine learning applications in neurology. J Neurol Sci 2023; 455:122799. [PMID: 37979413 DOI: 10.1016/j.jns.2023.122799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/26/2023] [Accepted: 11/12/2023] [Indexed: 11/20/2023]
Abstract
Machine learning techniques for clinical applications are evolving, and the potential impact this will have on clinical neurology is important to recognize. By providing a broad overview on this growing paradigm of clinical tools, this article aims to help healthcare professionals in neurology prepare to navigate both the opportunities and challenges brought on through continued advancements in machine learning. This narrative review first elaborates on how machine learning models are organized and implemented. Machine learning tools are then classified by clinical application, with examples of uses within neurology described in more detail. Finally, this article addresses limitations and considerations regarding clinical machine learning applications in neurology.
Collapse
Affiliation(s)
- Colin M Smith
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA
| | - Allison L Weathers
- Cleveland Clinic Information Technology Division, 9500 Euclid Ave. Cleveland, OH 44195, USA
| | - Steven L Lewis
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA.
| |
Collapse
|
30
|
Funer F, Liedtke W, Tinnemeyer S, Klausen AD, Schneider D, Zacharias HU, Langanke M, Salloch S. Responsibility and decision-making authority in using clinical decision support systems: an empirical-ethical exploration of German prospective professionals' preferences and concerns. JOURNAL OF MEDICAL ETHICS 2023; 50:6-11. [PMID: 37217277 PMCID: PMC10803986 DOI: 10.1136/jme-2022-108814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/11/2023] [Indexed: 05/24/2023]
Abstract
Machine learning-driven clinical decision support systems (ML-CDSSs) seem impressively promising for future routine and emergency care. However, reflection on their clinical implementation reveals a wide array of ethical challenges. The preferences, concerns and expectations of professional stakeholders remain largely unexplored. Empirical research, however, may help to clarify the conceptual debate and its aspects in terms of their relevance for clinical practice. This study explores, from an ethical point of view, future healthcare professionals' attitudes to potential changes of responsibility and decision-making authority when using ML-CDSS. Twenty-seven semistructured interviews were conducted with German medical students and nursing trainees. The data were analysed based on qualitative content analysis according to Kuckartz. Interviewees' reflections are presented under three themes the interviewees describe as closely related: (self-)attribution of responsibility, decision-making authority and need of (professional) experience. The results illustrate the conceptual interconnectedness of professional responsibility and its structural and epistemic preconditions to be able to fulfil clinicians' responsibility in a meaningful manner. The study also sheds light on the four relata of responsibility understood as a relational concept. The article closes with concrete suggestions for the ethically sound clinical implementation of ML-CDSS.
Collapse
Affiliation(s)
- Florian Funer
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
- Institute of Ethics and History of Medicine, Eberhard Karls University Tübingen, Tübingen, Germany
| | - Wenke Liedtke
- Department of Social Work, Protestant University of Applied Sciences RWL, Bochum, Germany
| | - Sara Tinnemeyer
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
| | | | - Diana Schneider
- Competence Center Emerging Technologies, Fraunhofer Institute for Systems and Innovation Research ISI, Karlsruhe, Germany
| | - Helena U Zacharias
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Germany
| | - Martin Langanke
- Department of Social Work, Protestant University of Applied Sciences RWL, Bochum, Germany
| | - Sabine Salloch
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
| |
Collapse
|
31
|
Banerji CRS, Chakraborti T, Harbron C, MacArthur BD. Clinical AI tools must convey predictive uncertainty for each individual patient. Nat Med 2023; 29:2996-2998. [PMID: 37821686 DOI: 10.1038/s41591-023-02562-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Affiliation(s)
- Christopher R S Banerji
- The Alan Turing Institute, London, UK.
- University College London Hospitals, NHS Foundation Trust, London, UK.
- UCL Cancer Institute, Faculty of Medical Sciences, University College London, London, UK.
| | - Tapabrata Chakraborti
- The Alan Turing Institute, London, UK
- UCL Cancer Institute, Faculty of Medical Sciences, University College London, London, UK
| | | | - Ben D MacArthur
- The Alan Turing Institute, London, UK.
- Faculty of Medicine, University of Southampton, Southampton, UK.
- Mathematical Sciences, University of Southampton, Southampton, UK.
| |
Collapse
|
32
|
Kostick-Quenet K, Lang BH, Smith J, Hurley M, Blumenthal-Barby J. Trust criteria for artificial intelligence in health: normative and epistemic considerations. JOURNAL OF MEDICAL ETHICS 2023:jme-2023-109338. [PMID: 37979976 PMCID: PMC11101592 DOI: 10.1136/jme-2023-109338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/02/2023] [Indexed: 11/20/2023]
Abstract
Rapid advancements in artificial intelligence and machine learning (AI/ML) in healthcare raise pressing questions about how much users should trust AI/ML systems, particularly for high stakes clinical decision-making. Ensuring that user trust is properly calibrated to a tool's computational capacities and limitations has both practical and ethical implications, given that overtrust or undertrust can influence over-reliance or under-reliance on algorithmic tools, with significant implications for patient safety and health outcomes. It is, thus, important to better understand how variability in trust criteria across stakeholders, settings, tools and use cases may influence approaches to using AI/ML tools in real settings. As part of a 5-year, multi-institutional Agency for Health Care Research and Quality-funded study, we identify trust criteria for a survival prediction algorithm intended to support clinical decision-making for left ventricular assist device therapy, using semistructured interviews (n=40) with patients and physicians, analysed via thematic analysis. Findings suggest that physicians and patients share similar empirical considerations for trust, which were primarily epistemic in nature, focused on accuracy and validity of AI/ML estimates. Trust evaluations considered the nature, integrity and relevance of training data rather than the computational nature of algorithms themselves, suggesting a need to distinguish 'source' from 'functional' explainability. To a lesser extent, trust criteria were also relational (endorsement from others) and sometimes based on personal beliefs and experience. We discuss implications for promoting appropriate and responsible trust calibration for clinical decision-making use AI/ML.
Collapse
Affiliation(s)
- Kristin Kostick-Quenet
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Benjamin H Lang
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
- Department of Philosophy, University of Oxford, Oxford, Oxfordshire, UK
| | - Jared Smith
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Meghan Hurley
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | | |
Collapse
|
33
|
Kenny R, Fischhoff B, Davis A, Canfield C. Improving Social Bot Detection Through Aid and Training. HUMAN FACTORS 2023:187208231210145. [PMID: 37963198 DOI: 10.1177/00187208231210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
OBJECTIVE We test the effects of three aids on individuals' ability to detect social bots among Twitter personas: a bot indicator score, a training video, and a warning. BACKGROUND Detecting social bots can prevent online deception. We use a simulated social media task to evaluate three aids. METHOD Lay participants judged whether each of 60 Twitter personas was a human or social bot in a simulated online environment, using agreement between three machine learning algorithms to estimate the probability of each persona being a bot. Experiment 1 compared a control group and two intervention groups, one provided a bot indicator score for each tweet; the other provided a warning about social bots. Experiment 2 compared a control group and two intervention groups, one receiving the bot indicator scores and the other a training video, focused on heuristics for identifying social bots. RESULTS The bot indicator score intervention improved predictive performance and reduced overconfidence in both experiments. The training video was also effective, although somewhat less so. The warning had no effect. Participants rarely reported willingness to share content for a persona that they labeled as a bot, even when they agreed with it. CONCLUSIONS Informative interventions improved social bot detection; warning alone did not. APPLICATION We offer an experimental testbed and methodology that can be used to evaluate and refine interventions designed to reduce vulnerability to social bots. We show the value of two interventions that could be applied in many settings.
Collapse
Affiliation(s)
- Ryan Kenny
- United States Army, Fayetteville, NC, USA
| | | | - Alex Davis
- Carnegie Mellon University, Pittsburgh, PA, USA
| | - Casey Canfield
- Missouri University of Science and Technology, Rolla, MO, USA
| |
Collapse
|
34
|
Nagendran M, Festor P, Komorowski M, Gordon AC, Faisal AA. Quantifying the impact of AI recommendations with explanations on prescription decision making. NPJ Digit Med 2023; 6:206. [PMID: 37935953 PMCID: PMC10630476 DOI: 10.1038/s41746-023-00955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
The influence of AI recommendations on physician behaviour remains poorly characterised. We assess how clinicians' decisions may be influenced by additional information more broadly, and how this influence can be modified by either the source of the information (human peers or AI) and the presence or absence of an AI explanation (XAI, here using simple feature importance). We used a modified between-subjects design where intensive care doctors (N = 86) were presented on a computer for each of 16 trials with a patient case and prompted to prescribe continuous values for two drugs. We used a multi-factorial experimental design with four arms, where each clinician experienced all four arms on different subsets of our 24 patients. The four arms were (i) baseline (control), (ii) peer human clinician scenario showing what doses had been prescribed by other doctors, (iii) AI suggestion and (iv) XAI suggestion. We found that additional information (peer, AI or XAI) had a strong influence on prescriptions (significantly for AI, not so for peers) but simple XAI did not have higher influence than AI alone. There was no correlation between attitudes to AI or clinical experience on the AI-supported decisions and nor was there correlation between what doctors self-reported about how useful they found the XAI and whether the XAI actually influenced their prescriptions. Our findings suggest that the marginal impact of simple XAI was low in this setting and we also cast doubt on the utility of self-reports as a valid metric for assessing XAI in clinical experts.
Collapse
Affiliation(s)
- Myura Nagendran
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
- Brain and Behaviour Lab, Imperial College London, London, UK
| | - Paul Festor
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK
- Brain and Behaviour Lab, Imperial College London, London, UK
- Department of Computing, Imperial College London, London, UK
| | - Matthieu Komorowski
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
| | - Anthony C Gordon
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
| | - Aldo A Faisal
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK.
- Brain and Behaviour Lab, Imperial College London, London, UK.
- Department of Computing, Imperial College London, London, UK.
- Institute of Artificial & Human Intelligence, University of Bayreuth, Bayreuth, Germany.
| |
Collapse
|
35
|
Li MD, Little BP. Appropriate Reliance on Artificial Intelligence in Radiology Education. J Am Coll Radiol 2023; 20:1126-1130. [PMID: 37392983 DOI: 10.1016/j.jacr.2023.04.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/20/2023] [Accepted: 04/06/2023] [Indexed: 07/03/2023]
Abstract
Users of artificial intelligence (AI) can become overreliant on AI, negatively affecting the performance of human-AI teams. For a future in which radiologists use interpretive AI tools routinely in clinical practice, radiology education will need to evolve to provide radiologists with the skills to use AI appropriately and wisely. In this work, we examine how overreliance on AI may develop in radiology trainees and explore how this problem can be mitigated, including through the use of AI-augmented education. Radiology trainees will still need to develop the perceptual skills and mastery of knowledge fundamental to radiology to use AI safely. We propose a framework for radiology trainees to use AI tools with appropriate reliance, drawing on lessons from human-AI interactions research.
Collapse
Affiliation(s)
- Matthew D Li
- Department of Radiology and Diagnostic Imaging, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Alberta, Canada.
| | - Brent P Little
- Mayo Clinic College of Medicine and Science, Department of Radiology, Division of Cardiothoracic Imaging, Mayo Clinic Florida, Florida; Committee Member, ACR Appropriateness Criteria Thoracic Imaging
| |
Collapse
|
36
|
Ghassemi M. Presentation matters for AI-generated clinical advice. Nat Hum Behav 2023; 7:1833-1835. [PMID: 37985904 DOI: 10.1038/s41562-023-01721-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Affiliation(s)
- Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Vector Institute, Toronto, Ontario, Canada.
| |
Collapse
|
37
|
Schlicker N, Langer M, Hirsch MC. [How trustworthy is artificial intelligence? : A model for the conflict between objectivity and subjectivity]. INNERE MEDIZIN (HEIDELBERG, GERMANY) 2023; 64:1051-1057. [PMID: 37737496 DOI: 10.1007/s00108-023-01602-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/18/2023] [Indexed: 09/23/2023]
Abstract
For the integration of artificial intelligence (AI) systems into medical processes it is decisive to address both the trustworthiness of these systems and the trust that physicians and patients have in those systems. Too much trust can result in physicians uncritically relying on this technology, while too little trust may result in physicians not taking advantage of the full potential of AI-based technology in making decisions. To strike a balance between these extremes it is crucial to correctly assess the trustworthiness of a system. Only in this way is it possible to decide whether or the system can be trusted or not. This article describes these relationships for the medical context. We show why trustworthiness and trust are important in the use of AI-based systems and how individuals can come to an accurate assessment of the trustworthiness of AI-based systems.
Collapse
Affiliation(s)
- Nadine Schlicker
- Institut für Künstliche Intelligenz in der Medizin, Philipps-Universität Marburg, Baldingerstr., 35043, Marburg, Deutschland.
| | - Markus Langer
- Fachbereich Psychologie, Arbeitseinheit Digitalisierung in psychologischen Handlungsfeldern, Philipps-Universität Marburg, Marburg, Deutschland
| | - Martin C Hirsch
- Institut für Künstliche Intelligenz in der Medizin, Philipps-Universität Marburg, Baldingerstr., 35043, Marburg, Deutschland
| |
Collapse
|
38
|
Vijayakumar S, Lee VV, Leong QY, Hong SJ, Blasiak A, Ho D. Physicians' Perspectives on AI in Clinical Decision Support Systems: Interview Study of the CURATE.AI Personalized Dose Optimization Platform. JMIR Hum Factors 2023; 10:e48476. [PMID: 37902825 PMCID: PMC10644191 DOI: 10.2196/48476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/24/2023] [Accepted: 09/10/2023] [Indexed: 10/31/2023] Open
Abstract
BACKGROUND Physicians play a key role in integrating new clinical technology into care practices through user feedback and growth propositions to developers of the technology. As physicians are stakeholders involved through the technology iteration process, understanding their roles as users can provide nuanced insights into the workings of these technologies that are being explored. Therefore, understanding physicians' perceptions can be critical toward clinical validation, implementation, and downstream adoption. Given the increasing prevalence of clinical decision support systems (CDSSs), there remains a need to gain an in-depth understanding of physicians' perceptions and expectations toward their downstream implementation. This paper explores physicians' perceptions of integrating CURATE.AI, a novel artificial intelligence (AI)-based and clinical stage personalized dosing CDSSs, into clinical practice. OBJECTIVE This study aims to understand physicians' perspectives of integrating CURATE.AI for clinical work and to gather insights on considerations of the implementation of AI-based CDSS tools. METHODS A total of 12 participants completed semistructured interviews examining their knowledge, experience, attitudes, risks, and future course of the personalized combination therapy dosing platform, CURATE.AI. Interviews were audio recorded, transcribed verbatim, and coded manually. The data were thematically analyzed. RESULTS Overall, 3 broad themes and 9 subthemes were identified through thematic analysis. The themes covered considerations that physicians perceived as significant across various stages of new technology development, including trial, clinical implementation, and mass adoption. CONCLUSIONS The study laid out the various ways physicians interpreted an AI-based personalized dosing CDSS, CURATE.AI, for their clinical practice. The research pointed out that physicians' expectations during the different stages of technology exploration can be nuanced and layered with expectations of implementation that are relevant for technology developers and researchers.
Collapse
Affiliation(s)
- Smrithi Vijayakumar
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - V Vien Lee
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - Qiao Ying Leong
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - Soo Jung Hong
- Department of Communications and New Media, National University of Singapore, Singapore, Singapore
| | - Agata Blasiak
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
- The Institute for Digital Medicine (WisDM), Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Dean Ho
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
- The Institute for Digital Medicine (WisDM), Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
39
|
Joo H, Mathis MR, Tam M, James C, Han P, Mangrulkar RS, Friedman CP, Vydiswaran VGV. Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial. JMIR Res Protoc 2023; 12:e49842. [PMID: 37874618 PMCID: PMC10630872 DOI: 10.2196/49842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/16/2023] [Accepted: 09/20/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied. OBJECTIVE This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy. METHODS Our study used a 3 × 2 factorial design (intervention type × order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention. RESULTS As of July 2023, 62 of the enrolled medical students have fulfilled this study's participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience. CONCLUSIONS We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/49842.
Collapse
Affiliation(s)
- Hyeon Joo
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Michael R Mathis
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | - Marty Tam
- Department of Internal Medicine, Cardiology, University of Michigan, Ann Arbor, MI, United States
| | - Cornelius James
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Rajesh S Mangrulkar
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Charles P Friedman
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
40
|
Vicente L, Matute H. Humans inherit artificial intelligence biases. Sci Rep 2023; 13:15737. [PMID: 37789032 PMCID: PMC10547752 DOI: 10.1038/s41598-023-42384-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/09/2023] [Indexed: 10/05/2023] Open
Abstract
Artificial intelligence recommendations are sometimes erroneous and biased. In our research, we hypothesized that people who perform a (simulated) medical diagnostic task assisted by a biased AI system will reproduce the model's bias in their own decisions, even when they move to a context without AI support. In three experiments, participants completed a medical-themed classification task with or without the help of a biased AI system. The biased recommendations by the AI influenced participants' decisions. Moreover, when those participants, assisted by the AI, moved on to perform the task without assistance, they made the same errors as the AI had made during the previous phase. Thus, participants' responses mimicked AI bias even when the AI was no longer making suggestions. These results provide evidence of human inheritance of AI bias.
Collapse
Affiliation(s)
- Lucía Vicente
- Department of Psychology, Deusto University, Avenida Universidades 24, 48007, Bilbao, Spain
| | - Helena Matute
- Department of Psychology, Deusto University, Avenida Universidades 24, 48007, Bilbao, Spain.
| |
Collapse
|
41
|
Carboni C, Wehrens R, van der Veen R, de Bont A. Eye for an AI: More-than-seeing, fauxtomation, and the enactment of uncertain data in digital pathology. SOCIAL STUDIES OF SCIENCE 2023; 53:712-737. [PMID: 37154611 PMCID: PMC10543128 DOI: 10.1177/03063127231167589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Artificial Intelligence (AI) tools are being developed to assist with increasingly complex diagnostic tasks in medicine. This produces epistemic disruption in diagnostic processes, even in the absence of AI itself, through the datafication and digitalization encouraged by the promissory discourses around AI. In this study of the digitization of an academic pathology department, we mobilize Barad's agential realist framework to examine these epistemic disruptions. Narratives and expectations around AI-assisted diagnostics-which are inextricable from material changes-enact specific types of organizational change, and produce epistemic objects that facilitate to the emergence of some epistemic practices and subjects, but hinder others. Agential realism allows us to simultaneously study epistemic, ethical, and ontological changes enacted through digitization efforts, while keeping a close eye on the attendant organizational changes. Based on ethnographic analysis of pathologists' changing work processes, we identify three different types of uncertainty produced by digitization: sensorial, intra-active, and fauxtomated uncertainty. Sensorial and intra-active uncertainty stem from the ontological otherness of digital objects, materialized in their affordances, and result in digital slides' partial illegibility. Fauxtomated uncertainty stems from the quasi-automated digital slide-making, which complicates the question of responsibility for epistemic objects and related knowledge by marginalizing the human.
Collapse
Affiliation(s)
- Chiara Carboni
- Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - Rik Wehrens
- Erasmus University Rotterdam, Rotterdam, The Netherlands
| | | | | |
Collapse
|
42
|
McCradden M, Hui K, Buchman DZ. Evidence, ethics and the promise of artificial intelligence in psychiatry. JOURNAL OF MEDICAL ETHICS 2023; 49:573-579. [PMID: 36581457 PMCID: PMC10423547 DOI: 10.1136/jme-2022-108447] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 11/29/2022] [Indexed: 05/20/2023]
Abstract
Researchers are studying how artificial intelligence (AI) can be used to better detect, prognosticate and subgroup diseases. The idea that AI might advance medicine's understanding of biological categories of psychiatric disorders, as well as provide better treatments, is appealing given the historical challenges with prediction, diagnosis and treatment in psychiatry. Given the power of AI to analyse vast amounts of information, some clinicians may feel obligated to align their clinical judgements with the outputs of the AI system. However, a potential epistemic privileging of AI in clinical judgements may lead to unintended consequences that could negatively affect patient treatment, well-being and rights. The implications are also relevant to precision medicine, digital twin technologies and predictive analytics generally. We propose that a commitment to epistemic humility can help promote judicious clinical decision-making at the interface of big data and AI in psychiatry.
Collapse
Affiliation(s)
- Melissa McCradden
- Joint Centre for Bioethics, University of Toronto Dalla Lana School of Public Health, Toronto, Ontario, Canada
- Bioethics, The Hospital for Sick Children, Toronto, Ontario, Canada
- Genetics & Genome Biology, Peter Gilgan Centre for Research and Learning, Toronto, Ontario, Canada
| | - Katrina Hui
- Everyday Ethics Lab, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Daniel Z Buchman
- Joint Centre for Bioethics, University of Toronto Dalla Lana School of Public Health, Toronto, Ontario, Canada
- Everyday Ethics Lab, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| |
Collapse
|
43
|
Day TG, Matthew J, Budd S, Hajnal JV, Simpson JM, Razavi R, Kainz B. Sonographer interaction with artificial intelligence: collaboration or conflict? ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2023; 62:167-174. [PMID: 37523514 DOI: 10.1002/uog.26238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/05/2023] [Accepted: 04/14/2023] [Indexed: 08/02/2023]
Affiliation(s)
- T G Day
- Department of Congenital Cardiology, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - J Matthew
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - S Budd
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - J V Hajnal
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - J M Simpson
- Department of Congenital Cardiology, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - R Razavi
- Department of Congenital Cardiology, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
| | - B Kainz
- Faculty of Life Sciences and Medicine, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK
- Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
| |
Collapse
|
44
|
van Leeuwen K, Becks M, Grob D, de Lange F, Rutten J, Schalekamp S, Rutten M, van Ginneken B, de Rooij M, Meijer F. AI-support for the detection of intracranial large vessel occlusions: One-year prospective evaluation. Heliyon 2023; 9:e19065. [PMID: 37636476 PMCID: PMC10458691 DOI: 10.1016/j.heliyon.2023.e19065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/07/2023] [Accepted: 08/09/2023] [Indexed: 08/29/2023] Open
Abstract
Purpose Few studies have evaluated real-world performance of radiological AI-tools in clinical practice. Over one-year, we prospectively evaluated the use of AI software to support the detection of intracranial large vessel occlusions (LVO) on CT angiography (CTA). Method Quantitative measures (user log-in attempts, AI standalone performance) and qualitative data (user surveys) were reviewed by a key-user group at three timepoints. A total of 491 CTA studies of 460 patients were included for analysis. Results The overall accuracy of the AI-tool for LVO detection and localization was 87.6%, sensitivity 69.1% and specificity 91.2%. Out of 81 LVOs, 31 of 34 (91%) M1 occlusions were detected correctly, 19 of 38 (50%) M2 occlusions, and 6 of 9 (67%) ICA occlusions. The product was considered user-friendly. The diagnostic confidence of the users for LVO detection remained the same over the year. The last measured net promotor score was -56%. The use of the AI-tool fluctuated over the year with a declining trend. Conclusions Our pragmatic approach of evaluating the AI-tool used in clinical practice, helped us to monitor the usage, to estimate the perceived added value by the users of the AI-tool, and to make an informed decision about the continuation of the use of the AI-tool.
Collapse
Affiliation(s)
- K.G. van Leeuwen
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M.J. Becks
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - D. Grob
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - F. de Lange
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - J.H.E. Rutten
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - S. Schalekamp
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M.J.C.M. Rutten
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Radiology, Jeroen Bosch Hospital, ‘s-Hertogenbosch, the Netherlands
| | - B. van Ginneken
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M. de Rooij
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - F.J.A. Meijer
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
45
|
Alarcón ÁS, Madrid NM, Seepold R, Ortega JA. Obstructive sleep apnea event detection using explainable deep learning models for a portable monitor. Front Neurosci 2023; 17:1155900. [PMID: 37521695 PMCID: PMC10375719 DOI: 10.3389/fnins.2023.1155900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 06/16/2023] [Indexed: 08/01/2023] Open
Abstract
Background Polysomnography (PSG) is the gold standard for detecting obstructive sleep apnea (OSA). However, this technique has many disadvantages when using it outside the hospital or for daily use. Portable monitors (PMs) aim to streamline the OSA detection process through deep learning (DL). Materials and methods We studied how to detect OSA events and calculate the apnea-hypopnea index (AHI) by using deep learning models that aim to be implemented on PMs. Several deep learning models are presented after being trained on polysomnography data from the National Sleep Research Resource (NSRR) repository. The best hyperparameters for the DL architecture are presented. In addition, emphasis is focused on model explainability techniques, concretely on Gradient-weighted Class Activation Mapping (Grad-CAM). Results The results for the best DL model are presented and analyzed. The interpretability of the DL model is also analyzed by studying the regions of the signals that are most relevant for the model to make the decision. The model that yields the best result is a one-dimensional convolutional neural network (1D-CNN) with 84.3% accuracy. Conclusion The use of PMs using machine learning techniques for detecting OSA events still has a long way to go. However, our method for developing explainable DL models demonstrates that PMs appear to be a promising alternative to PSG in the future for the detection of obstructive apnea events and the automatic calculation of AHI.
Collapse
Affiliation(s)
- Ángel Serrano Alarcón
- School of Informatics, Reutlingen University, Reutlingen, Germany
- Computer Languages and Systems, University of Seville, Sevilla, Spain
| | | | - Ralf Seepold
- Computer Science, HTWG Konstanz, Konstanz, Germany
| | | |
Collapse
|
46
|
Wehkamp K, Krawczak M, Schreiber S. The Quality and Utility of Artificial Intelligence in Patient Care. DEUTSCHES ARZTEBLATT INTERNATIONAL 2023; 120:463-469. [PMID: 37218054 PMCID: PMC10487679 DOI: 10.3238/arztebl.m2023.0124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 11/30/2022] [Accepted: 05/08/2023] [Indexed: 05/24/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is increasingly being used in patient care. In the future, physicians will need to understand not only the basic functioning of AI applications, but also their quality, utility, and risks. METHODS This article is based on a selective review of the literature on the principles, quality, limitations, and benefits AI applications in patient care, along with examples of individual applications. RESULTS The number of AI applications in patient care is rising, with more than 500 approvals in the United States to date. Their quality and utility are based on a number of interdependent factors, including the real-life setting, the type and amount of data collected, the choice of variables used by the application, the algorithms used, and the goal and implementation of each application. Bias (which may be hidden) and errors can arise at all these levels. Any evaluation of the quality and utility of an AI application must, therefore, be conducted according to the scientific principles of evidence-based medicine-a requirement that is often hampered by a lack of transparency. CONCLUSION AI has the potential to improve patient care while meeting the challenge of dealing with an ever-increasing surfeit of information and data in medicine with limited human resources. The limitations and risks of AI applications require critical and responsible consideration. This can best be achieved through a combination of scientific.
Collapse
Affiliation(s)
- Kai Wehkamp
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Lübeck, Kiel, Germany
- Department for Medical Management, MSH Medical School Hamburg, Hamburg, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany
| | - Stefan Schreiber
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Lübeck, Kiel, Germany
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany
| |
Collapse
|
47
|
Jiang LY, Liu XC, Nejatian NP, Nasir-Moin M, Wang D, Abidin A, Eaton K, Riina HA, Laufer I, Punjabi P, Miceli M, Kim NC, Orillac C, Schnurman Z, Livia C, Weiss H, Kurland D, Neifert S, Dastagirzada Y, Kondziolka D, Cheung ATM, Yang G, Cao M, Flores M, Costa AB, Aphinyanaphongs Y, Cho K, Oermann EK. Health system-scale language models are all-purpose prediction engines. Nature 2023; 619:357-362. [PMID: 37286606 PMCID: PMC10338337 DOI: 10.1038/s41586-023-06160-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 05/02/2023] [Indexed: 06/09/2023]
Abstract
Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.
Collapse
Affiliation(s)
- Lavender Yao Jiang
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
- Center for Data Science, New York University, New York, NY, USA
| | - Xujin Chris Liu
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
- Electrical and Computer Engineering, Tandon School of Engineering, New York, NY, USA
| | | | | | - Duo Wang
- Predictive Analytics Unit, NYU Langone Health, New York, NY, USA
| | | | - Kevin Eaton
- Department of Internal Medicine, NYU Langone Health, New York, NY, USA
| | | | - Ilya Laufer
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | - Paawan Punjabi
- Department of Internal Medicine, NYU Langone Health, New York, NY, USA
| | - Madeline Miceli
- Department of Internal Medicine, NYU Langone Health, New York, NY, USA
| | - Nora C Kim
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | - Cordelia Orillac
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | - Zane Schnurman
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | | | - Hannah Weiss
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | - David Kurland
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | - Sean Neifert
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
| | | | | | | | - Grace Yang
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
- Center for Data Science, New York University, New York, NY, USA
| | - Ming Cao
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA
- Center for Data Science, New York University, New York, NY, USA
| | | | | | - Yindalon Aphinyanaphongs
- Predictive Analytics Unit, NYU Langone Health, New York, NY, USA
- Department of Population Health, NYU Langone Health, New York, NY, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, USA
- Prescient Design, Genentech, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| | - Eric Karl Oermann
- Department of Neurosurgery, NYU Langone Health, New York, NY, USA.
- Center for Data Science, New York University, New York, NY, USA.
- Department of Radiology, NYU Langone Health, New York, NY, USA.
| |
Collapse
|
48
|
Šuster S, Baldwin T, Verspoor K. Analysis of predictive performance and reliability of classifiers for quality assessment of medical evidence revealed important variation by medical area. J Clin Epidemiol 2023; 159:58-69. [PMID: 37120028 DOI: 10.1016/j.jclinepi.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/30/2023] [Accepted: 04/18/2023] [Indexed: 05/01/2023]
Abstract
OBJECTIVES A major obstacle in deployment of models for automated quality assessment is their reliability. To analyze their calibration and selective classification performance. STUDY DESIGN AND SETTING We examine two systems for assessing the quality of medical evidence, EvidenceGRADEr and RobotReviewer, both developed from Cochrane Database of Systematic Reviews (CDSR) to measure strength of bodies of evidence and risk of bias (RoB) of individual studies, respectively. We report their calibration error and Brier scores, present their reliability diagrams, and analyze the risk-coverage trade-off in selective classification. RESULTS The models are reasonably well calibrated on most quality criteria (expected calibration error [ECE] 0.04-0.09 for EvidenceGRADEr, 0.03-0.10 for RobotReviewer). However, we discover that both calibration and predictive performance vary significantly by medical area. This has ramifications for the application of such models in practice, as average performance is a poor indicator of group-level performance (e.g., health and safety at work, allergy and intolerance, and public health see much worse performance than cancer, pain, and anesthesia, and Neurology). We explore the reasons behind this disparity. CONCLUSION Practitioners adopting automated quality assessment should expect large fluctuations in system reliability and predictive performance depending on the medical area. Prospective indicators of such behavior should be further researched.
Collapse
Affiliation(s)
- Simon Šuster
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.
| | - Timothy Baldwin
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia; Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, Melbourne, Australia; School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
49
|
Dvijotham KD, Winkens J, Barsbey M, Ghaisas S, Stanforth R, Pawlowski N, Strachan P, Ahmed Z, Azizi S, Bachrach Y, Culp L, Daswani M, Freyberg J, Kelly C, Kiraly A, Kohlberger T, McKinney S, Mustafa B, Natarajan V, Geras K, Witowski J, Qin ZZ, Creswell J, Shetty S, Sieniek M, Spitz T, Corrado G, Kohli P, Cemgil T, Karthikesalingam A. Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians. Nat Med 2023; 29:1814-1820. [PMID: 37460754 DOI: 10.1038/s41591-023-02437-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 06/05/2023] [Indexed: 07/20/2023]
Abstract
Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Laura Culp
- Google DeepMind, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | - Jan Witowski
- NYU Grossman School of Medicine, New York, NY, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Gasmi I, Calinghen A, Parienti JJ, Belloy F, Fohlen A, Pelage JP. Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children. Pediatr Radiol 2023; 53:1675-1684. [PMID: 36877239 DOI: 10.1007/s00247-023-05621-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 11/21/2022] [Accepted: 01/30/2023] [Indexed: 03/07/2023]
Abstract
BACKGROUND Advances have been made in the use of artificial intelligence (AI) in the field of diagnostic imaging, particularly in the detection of fractures on conventional radiographs. Studies looking at the detection of fractures in the pediatric population are few. The anatomical variations and evolution according to the child's age require specific studies of this population. Failure to diagnose fractures early in children may lead to serious consequences for growth. OBJECTIVE To evaluate the performance of an AI algorithm based on deep neural networks toward detecting traumatic appendicular fractures in a pediatric population. To compare sensitivity, specificity, positive predictive value and negative predictive value of different readers and the AI algorithm. MATERIALS AND METHODS This retrospective study conducted on 878 patients younger than 18 years of age evaluated conventional radiographs obtained after recent non-life-threatening trauma. All radiographs of the shoulder, arm, elbow, forearm, wrist, hand, leg, knee, ankle and foot were evaluated. The diagnostic performance of a consensus of radiology experts in pediatric imaging (reference standard) was compared with those of pediatric radiologists, emergency physicians, senior residents and junior residents. The predictions made by the AI algorithm and the annotations made by the different physicians were compared. RESULTS The algorithm predicted 174 fractures out of 182, corresponding to a sensitivity of 95.6%, a specificity of 91.64% and a negative predictive value of 98.76%. The AI predictions were close to that of pediatric radiologists (sensitivity 98.35%) and that of senior residents (95.05%) and were above those of emergency physicians (81.87%) and junior residents (90.1%). The algorithm identified 3 (1.6%) fractures not initially seen by pediatric radiologists. CONCLUSION This study suggests that deep learning algorithms can be useful in improving the detection of fractures in children.
Collapse
Affiliation(s)
- Idriss Gasmi
- Department of Radiology, Caen University Medical Center, 14033 Cedex 9, Caen, France
| | - Arvin Calinghen
- Department of Radiology, Caen University Medical Center, 14033 Cedex 9, Caen, France
| | - Jean-Jacques Parienti
- GRAM 2.0 EA2656 UNICAEN Normandie, University Hospital, Caen, France
- Department of Clinical Research, Caen University Hospital, Caen, France
| | - Frederique Belloy
- Department of Radiology, Caen University Medical Center, 14033 Cedex 9, Caen, France
| | - Audrey Fohlen
- Department of Radiology, Caen University Medical Center, 14033 Cedex 9, Caen, France
- UNICAEN CEA CNRS ISTCT- CERVOxy, Normandie University, 14000, Caen, France
| | - Jean-Pierre Pelage
- Department of Radiology, Caen University Medical Center, 14033 Cedex 9, Caen, France.
- UNICAEN CEA CNRS ISTCT- CERVOxy, Normandie University, 14000, Caen, France.
| |
Collapse
|