1
|
Buchlak QD, Tang CHM, Seah JCY, Johnson A, Holt X, Bottrell GM, Wardman JB, Samarasinghe G, Dos Santos Pinheiro L, Xia H, Ahmad HK, Pham H, Chiang JI, Ektas N, Milne MR, Chiu CHY, Hachey B, Ryan MK, Johnston BP, Esmaili N, Bennett C, Goldschlager T, Hall J, Vo DT, Oakden-Rayner L, Leveque JC, Farrokhi F, Abramson RG, Jones CM, Edelstein S, Brotchie P. Effects of a comprehensive brain computed tomography deep learning model on radiologist detection accuracy. Eur Radiol 2024; 34:810-822. [PMID: 37606663 PMCID: PMC10853361 DOI: 10.1007/s00330-023-10074-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/16/2023] [Accepted: 07/01/2023] [Indexed: 08/23/2023]
Abstract
OBJECTIVES Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. METHODS A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. RESULTS The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. CONCLUSIONS The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. CLINICAL RELEVANCE STATEMENT This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. KEY POINTS • This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. • The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. • The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.
Collapse
Affiliation(s)
- Quinlan D Buchlak
- Annalise.ai, Sydney, NSW, Australia.
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia.
- Department of Neurosurgery, Monash Health, Clayton, VIC, Australia.
| | | | - Jarrel C Y Seah
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, Alfred Health, Melbourne, VIC, Australia
| | | | | | | | | | | | | | | | | | - Hung Pham
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, University Medical Center, University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | - Jason I Chiang
- Annalise.ai, Sydney, NSW, Australia
- Department of General Practice, University of Melbourne, Melbourne, VIC, Australia
- Westmead Applied Research Centre, University of Sydney, Sydney, NSW, Australia
| | | | | | | | | | | | | | - Nazanin Esmaili
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
| | - Christine Bennett
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia
| | - Tony Goldschlager
- Department of Neurosurgery, Monash Health, Clayton, VIC, Australia
- Department of Surgery, Monash University, Clayton, VIC, Australia
| | - Jonathan Hall
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia
- Department of Radiology, Austin Hospital, Melbourne, VIC, Australia
| | - Duc Tan Vo
- Department of Radiology, University Medical Center, University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA, Australia
| | | | - Farrokh Farrokhi
- Center for Neurosciences and Spine, Virginia Mason Franciscan Health, Seattle, WA, USA
| | | | - Catherine M Jones
- Annalise.ai, Sydney, NSW, Australia
- I-MED Radiology Network, Brisbane, QLD, Australia
- School of Public and Preventive Health, Monash University, Clayton, VIC, Australia
- Department of Clinical Imaging Science, University of Sydney, Sydney, NSW, Australia
| | - Simon Edelstein
- Annalise.ai, Sydney, NSW, Australia
- I-MED Radiology Network, Brisbane, QLD, Australia
- Department of Radiology, Monash Health, Clayton, VIC, Australia
| | - Peter Brotchie
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia
| |
Collapse
|
2
|
Tang CHM, Seah JCY, Ahmad HK, Milne MR, Wardman JB, Buchlak QD, Esmaili N, Lambert JF, Jones CM. Analysis of Line and Tube Detection Performance of a Chest X-ray Deep Learning Model to Evaluate Hidden Stratification. Diagnostics (Basel) 2023; 13:2317. [PMID: 37510062 PMCID: PMC10378683 DOI: 10.3390/diagnostics13142317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
This retrospective case-control study evaluated the diagnostic performance of a commercially available chest radiography deep convolutional neural network (DCNN) in identifying the presence and position of central venous catheters, enteric tubes, and endotracheal tubes, in addition to a subgroup analysis of different types of lines/tubes. A held-out test dataset of 2568 studies was sourced from community radiology clinics and hospitals in Australia and the USA, and was then ground-truth labelled for the presence, position, and type of line or tube from the consensus of a thoracic specialist radiologist and an intensive care clinician. DCNN model performance for identifying and assessing the positioning of central venous catheters, enteric tubes, and endotracheal tubes over the entire dataset, as well as within each subgroup, was evaluated. The area under the receiver operating characteristic curve (AUC) was assessed. The DCNN algorithm displayed high performance in detecting the presence of lines and tubes in the test dataset with AUCs > 0.99, and good position classification performance over a subpopulation of ground truth positive cases with AUCs of 0.86-0.91. The subgroup analysis showed that model performance was robust across the various subtypes of lines or tubes, although position classification performance of peripherally inserted central catheters was relatively lower. Our findings indicated that the DCNN algorithm performed well in the detection and position classification of lines and tubes, supporting its use as an assistant for clinicians. Further work is required to evaluate performance in rarer scenarios, as well as in less common subgroups.
Collapse
Affiliation(s)
- Cyril H M Tang
- Annalise.ai, Sydney, NSW 2000, Australia
- Intensive Care Unit, Gosford Hospital, Sydney, NSW 2250, Australia
| | - Jarrel C Y Seah
- Annalise.ai, Sydney, NSW 2000, Australia
- Department of Radiology, Alfred Health, Melbourne, VIC 3004, Australia
| | | | | | | | - Quinlan D Buchlak
- Annalise.ai, Sydney, NSW 2000, Australia
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW 2007, Australia
- Department of Neurosurgery, Monash Health, Melbourne, VIC 3168, Australia
| | - Nazanin Esmaili
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW 2007, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | | | - Catherine M Jones
- Annalise.ai, Sydney, NSW 2000, Australia
- I-MED Radiology Network, Brisbane, QLD 4006, Australia
- School of Public and Preventive Health, Monash University, Clayton, VIC 3800, Australia
- Department of Clinical Imaging Science, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
3
|
Seah JCY, Tang CHM, Buchlak QD, Holt XG, Wardman JB, Aimoldin A, Esmaili N, Ahmad H, Pham H, Lambert JF, Hachey B, Hogg SJF, Johnston BP, Bennett C, Oakden-Rayner L, Brotchie P, Jones CM. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit Health 2021; 3:e496-e506. [PMID: 34219054 DOI: 10.1016/s2589-7500(21)00106-0] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 05/02/2021] [Accepted: 05/12/2021] [Indexed: 02/01/2023]
Abstract
BACKGROUND Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model. METHODS In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than -0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior. FINDINGS Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645-0·785) across the 127 clinical findings, compared with 0·808 (0·763-0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645-0·785) across all findings, compared with 0·957 (0·954-0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings. INTERPRETATION This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice. FUNDING Annalise.ai.
Collapse
Affiliation(s)
- Jarrel C Y Seah
- Annalise.ai, Sydney, NSW, Australia; Department of Radiology, Alfred Health, Melbourne, VIC, Australia
| | | | | | | | | | | | - Nazanin Esmaili
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia; Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, Australia
| | | | | | | | | | | | | | - Christine Bennett
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia
| | - Luke Oakden-Rayner
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA, Australia
| | - Peter Brotchie
- Annalise.ai, Sydney, NSW, Australia; Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia
| | | |
Collapse
|