1
|
Yahyatabar M, Jouvet P, Cheriet F. Joint classification and segmentation for an interpretable diagnosis of acute respiratory distress syndrome from chest x-rays. J Med Imaging (Bellingham) 2023; 10:054504. [PMID: 37854097 PMCID: PMC10581023 DOI: 10.1117/1.jmi.10.5.054504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/05/2023] [Accepted: 10/03/2023] [Indexed: 10/20/2023] Open
Abstract
Purpose Acute respiratory distress syndrome (ARDS) is a life-threatening condition that can cause a dramatic drop in blood oxygen levels due to widespread lung inflammation. Chest radiography is widely used as a primary modality to detect ARDS due to its crucial role in diagnosing the syndrome, and the x-ray images can be obtained promptly. However, despite the extensive literature on chest x-ray (CXR) image analysis, there is limited research on ARDS diagnosis due to the scarcity of ARDS-labeled datasets. Additionally, many machine learning-based approaches result in high performance in pulmonary disease diagnosis, but their decisions are often not easily interpretable, which can hinder their clinical acceptance. This work aims to develop a method for detecting signs of ARDS in CXR images that can be clinically interpretable. Approach To achieve this goal, an ARDS-labeled dataset of chest radiography images is gathered and annotated for training and evaluation of the proposed approach. The proposed deep classification-segmentation model, Dense-Ynet, provides an interpretable framework for automatically diagnosing ARDS in CXR images. The model takes advantage of lung segmentation in diagnosing ARDS. By definition, ARDS causes bilateral diffuse infiltrates throughout the lungs. To consider the local involvement of lung areas, each lung is divided into upper and lower halves, and our model classifies the resulting lung quadrants. Results The quadrant-based classification strategy yields the area under the receiver operating characteristic curve of 95.1% (95% CI 93.5 to 96.1), which allows for providing a reference for the model's predictions. In terms of segmentation, the model accurately identifies lung regions in CXR images even when lung boundaries are unclear in abnormal images. Conclusions This study provides an interpretable decision system for diagnosing ARDS, by following the definition used by clinicians for the diagnosis of ARDS from CXR images.
Collapse
Affiliation(s)
- Mohammad Yahyatabar
- Polytechnique Montréal, Department of Computer and Software Engineering, Montreal, Quebec, Canada
| | - Philippe Jouvet
- University of Montréal, Department of Pediatrics, Faculty of Medicine, Montréal, Quebec, Canada
| | - Farida Cheriet
- Polytechnique Montréal, Department of Computer and Software Engineering, Montreal, Quebec, Canada
| |
Collapse
|
2
|
Nikulin P, Zschaeck S, Maus J, Cegla P, Lombardo E, Furth C, Kaźmierska J, Rogasch JMM, Holzgreve A, Albert NL, Ferentinos K, Strouthos I, Hajiyianni M, Marschner SN, Belka C, Landry G, Cholewinski W, Kotzerke J, Hofheinz F, van den Hoff J. A convolutional neural network with self-attention for fully automated metabolic tumor volume delineation of head and neck cancer in [Formula: see text]F]FDG PET/CT. Eur J Nucl Med Mol Imaging 2023; 50:2751-2766. [PMID: 37079128 PMCID: PMC10317885 DOI: 10.1007/s00259-023-06197-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 03/14/2023] [Indexed: 04/21/2023]
Abstract
PURPOSE PET-derived metabolic tumor volume (MTV) and total lesion glycolysis of the primary tumor are known to be prognostic of clinical outcome in head and neck cancer (HNC). Including evaluation of lymph node metastases can further increase the prognostic value of PET but accurate manual delineation and classification of all lesions is time-consuming and prone to interobserver variability. Our goal, therefore, was development and evaluation of an automated tool for MTV delineation/classification of primary tumor and lymph node metastases in PET/CT investigations of HNC patients. METHODS Automated lesion delineation was performed with a residual 3D U-Net convolutional neural network (CNN) incorporating a multi-head self-attention block. 698 [Formula: see text]F]FDG PET/CT scans from 3 different sites and 5 public databases were used for network training and testing. An external dataset of 181 [Formula: see text]F]FDG PET/CT scans from 2 additional sites was employed to assess the generalizability of the network. In these data, primary tumor and lymph node (LN) metastases were interactively delineated and labeled by two experienced physicians. Performance of the trained network models was assessed by 5-fold cross-validation in the main dataset and by pooling results from the 5 developed models in the external dataset. The Dice similarity coefficient (DSC) for individual delineation tasks and the primary tumor/metastasis classification accuracy were used as evaluation metrics. Additionally, a survival analysis using univariate Cox regression was performed comparing achieved group separation for manual and automated delineation, respectively. RESULTS In the cross-validation experiment, delineation of all malignant lesions with the trained U-Net models achieves DSC of 0.885, 0.805, and 0.870 for primary tumor, LN metastases, and the union of both, respectively. In external testing, the DSC reaches 0.850, 0.724, and 0.823 for primary tumor, LN metastases, and the union of both, respectively. The voxel classification accuracy was 98.0% and 97.9% in cross-validation and external data, respectively. Univariate Cox analysis in the cross-validation and the external testing reveals that manually and automatically derived total MTVs are both highly prognostic with respect to overall survival, yielding essentially identical hazard ratios (HR) ([Formula: see text]; [Formula: see text] vs. [Formula: see text]; [Formula: see text] in cross-validation and [Formula: see text]; [Formula: see text] vs. [Formula: see text]; [Formula: see text] in external testing). CONCLUSION To the best of our knowledge, this work presents the first CNN model for successful MTV delineation and lesion classification in HNC. In the vast majority of patients, the network performs satisfactory delineation and classification of primary tumor and lymph node metastases and only rarely requires more than minimal manual correction. It is thus able to massively facilitate study data evaluation in large patient groups and also does have clear potential for supervised clinical application.
Collapse
Affiliation(s)
- Pavel Nikulin
- Helmholtz-Zentrum Dresden-Rossendorf, PET Center, Institute of Radiopharmaceutical Cancer Research, Bautzner Landstrasse 400, 01328, Dresden, Germany.
| | - Sebastian Zschaeck
- Department of Radiation Oncology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jens Maus
- Helmholtz-Zentrum Dresden-Rossendorf, PET Center, Institute of Radiopharmaceutical Cancer Research, Bautzner Landstrasse 400, 01328, Dresden, Germany
| | - Paulina Cegla
- Department of Nuclear Medicine, Greater Poland Cancer Centre, Poznan, Poland
| | - Elia Lombardo
- Department of Radiation Oncology, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Christian Furth
- Department of Nuclear Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Joanna Kaźmierska
- Electroradiology Department, University of Medical Sciences, Poznan, Poland
- Radiotherapy Department II, Greater Poland Cancer Centre, Poznan, Poland
| | - Julian M M Rogasch
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Nuclear Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Adrien Holzgreve
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Nathalie L Albert
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Konstantinos Ferentinos
- Department of Radiation Oncology, German Oncology Center, European University Cyprus, Limassol, Cyprus
| | - Iosif Strouthos
- Department of Radiation Oncology, German Oncology Center, European University Cyprus, Limassol, Cyprus
| | - Marina Hajiyianni
- Department of Radiation Oncology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian N Marschner
- Department of Radiation Oncology, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Claus Belka
- Department of Radiation Oncology, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Guillaume Landry
- Department of Radiation Oncology, University Hospital, Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Witold Cholewinski
- Department of Nuclear Medicine, Greater Poland Cancer Centre, Poznan, Poland
- Electroradiology Department, University of Medical Sciences, Poznan, Poland
| | - Jörg Kotzerke
- Department of Nuclear Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Frank Hofheinz
- Helmholtz-Zentrum Dresden-Rossendorf, PET Center, Institute of Radiopharmaceutical Cancer Research, Bautzner Landstrasse 400, 01328, Dresden, Germany
| | - Jörg van den Hoff
- Helmholtz-Zentrum Dresden-Rossendorf, PET Center, Institute of Radiopharmaceutical Cancer Research, Bautzner Landstrasse 400, 01328, Dresden, Germany
- Department of Nuclear Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
3
|
Arias-Garzón D, Tabares-Soto R, Bernal-Salcedo J, Ruz GA. Biases associated with database structure for COVID-19 detection in X-ray images. Sci Rep 2023; 13:3477. [PMID: 36859430 PMCID: PMC9975856 DOI: 10.1038/s41598-023-30174-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 02/17/2023] [Indexed: 03/03/2023] Open
Abstract
Several artificial intelligence algorithms have been developed for COVID-19-related topics. One that has been common is the COVID-19 diagnosis using chest X-rays, where the eagerness to obtain early results has triggered the construction of a series of datasets where bias management has not been thorough from the point of view of patient information, capture conditions, class imbalance, and careless mixtures of multiple datasets. This paper analyses 19 datasets of COVID-19 chest X-ray images, identifying potential biases. Moreover, computational experiments were conducted using one of the most popular datasets in this domain, which obtains a 96.19% of classification accuracy on the complete dataset. Nevertheless, when evaluated with the ethical tool Aequitas, it fails on all the metrics. Ethical tools enhanced with some distribution and image quality considerations are the keys to developing or choosing a dataset with fewer bias issues. We aim to provide broad research on dataset problems, tools, and suggestions for future dataset developments and COVID-19 applications using chest X-ray images.
Collapse
Affiliation(s)
- Daniel Arias-Garzón
- grid.441739.c0000 0004 0486 2919Departamento de Electrónica y Automatización, Universidad Autónoma de Manizales, Manizales, 170001 Colombia
| | - Reinel Tabares-Soto
- grid.441739.c0000 0004 0486 2919Departamento de Electrónica y Automatización, Universidad Autónoma de Manizales, Manizales, 170001 Colombia ,grid.440617.00000 0001 2162 5606Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, 7941169 Santiago, Chile ,grid.7779.e0000 0001 2290 6370Departamento de Sistemas e Informática, Universidad de Caldas, Manizales, 170001 Colombia
| | - Joshua Bernal-Salcedo
- grid.441739.c0000 0004 0486 2919Departamento de Electrónica y Automatización, Universidad Autónoma de Manizales, Manizales, 170001 Colombia
| | - Gonzalo A. Ruz
- grid.440617.00000 0001 2162 5606Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, 7941169 Santiago, Chile ,grid.512276.5Center of Applied Ecology and Sustainability (CAPES), 8331150 Santiago, Chile ,Data Observatory Foundation, 7941169 Santiago, Chile
| |
Collapse
|
4
|
A Web-Based Platform for the Automatic Stratification of ARDS Severity. Diagnostics (Basel) 2023; 13:diagnostics13050933. [PMID: 36900077 PMCID: PMC10000955 DOI: 10.3390/diagnostics13050933] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 02/23/2023] [Accepted: 02/24/2023] [Indexed: 03/06/2023] Open
Abstract
Acute respiratory distress syndrome (ARDS), including severe pulmonary COVID infection, is associated with a high mortality rate. It is crucial to detect ARDS early, as a late diagnosis may lead to serious complications in treatment. One of the challenges in ARDS diagnosis is chest X-ray (CXR) interpretation. ARDS causes diffuse infiltrates through the lungs that must be identified using chest radiography. In this paper, we present a web-based platform leveraging artificial intelligence (AI) to automatically assess pediatric ARDS (PARDS) using CXR images. Our system computes a severity score to identify and grade ARDS in CXR images. Moreover, the platform provides an image highlighting the lung fields, which can be utilized for prospective AI-based systems. A deep learning (DL) approach is employed to analyze the input data. A novel DL model, named Dense-Ynet, is trained using a CXR dataset in which clinical specialists previously labelled the two halves (upper and lower) of each lung. The assessment results show that our platform achieves a recall rate of 95.25% and a precision of 88.02%. The web platform, named PARDS-CxR, assigns severity scores to input CXR images that are compatible with current definitions of ARDS and PARDS. Once it has undergone external validation, PARDS-CxR will serve as an essential component in a clinical AI framework for diagnosing ARDS.
Collapse
|
5
|
Goldgof GM, Sun S, Van Cleave J, Wang L, Lucas F, Brown L, Spector JD, Boiocchi L, Baik J, Zhu M, Ardon O, Lu CM, Dogan A, Goldgof DB, Carmichael I, Prakash S, Butte AJ. DeepHeme: A generalizable, bone marrow classifier with hematopathologist-level performance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.20.528987. [PMID: 36865216 PMCID: PMC9979993 DOI: 10.1101/2023.02.20.528987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Morphology-based classification of cells in the bone marrow aspirate (BMA) is a key step in the diagnosis and management of hematologic malignancies. However, it is time-intensive and must be performed by expert hematopathologists and laboratory professionals. We curated a large, high-quality dataset of 41,595 hematopathologist consensus-annotated single-cell images extracted from BMA whole slide images (WSIs) containing 23 morphologic classes from the clinical archives of the University of California, San Francisco. We trained a convolutional neural network, DeepHeme, to classify images in this dataset, achieving a mean area under the curve (AUC) of 0.99. DeepHeme was then externally validated on WSIs from Memorial Sloan Kettering Cancer Center, with a similar AUC of 0.98, demonstrating robust generalization. When compared to individual hematopathologists from three different top academic medical centers, the algorithm outperformed all three. Finally, DeepHeme reliably identified cell states such as mitosis, paving the way for image-based quantification of mitotic index in a cell-specific manner, which may have important clinical applications.
Collapse
Affiliation(s)
- Gregory M. Goldgof
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Shenghuan Sun
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
| | - Jacob Van Cleave
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Linlin Wang
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
| | - Fabienne Lucas
- Department of Pathology, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, USA
| | - Laura Brown
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
| | - Jacob D. Spector
- Department of Laboratory Medicine, Boston Children’s Hospital/Harvard Medical School, Boston, MA, USA
| | - Leonardo Boiocchi
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeeyeon Baik
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Menglei Zhu
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Orly Ardon
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Chuanyi M. Lu
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- Department of Laboratory Medicine, Veterans Affairs Medical Center, San Francisco, CA, USA
| | - Ahmet Dogan
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dmitry B. Goldgof
- Department of Computer Science, University of South Florida, Tampa, FL, USA
| | - Iain Carmichael
- Department of Statistics, University of California, Berkeley, CA, USA
| | - Sonam Prakash
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
| |
Collapse
|
6
|
New patch-based strategy for COVID-19 automatic identification using chest x-ray images. HEALTH AND TECHNOLOGY 2022; 12:1117-1132. [PMCID: PMC9647770 DOI: 10.1007/s12553-022-00704-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 10/09/2022] [Indexed: 11/11/2022]
Abstract
Purpose The development of a robust model for automatic identification of COVID-19 based on chest x-rays has been a widely addressed topic over the last couple of years; however, the scarcity of good quality images sets, and their limited size, have proven to be an important obstacle to obtain reliable models. In fact, models proposed so far have suffered from over-fitting erroneous features instead of learning lung features, a phenomenon known as shortcut learning. In this research, a new image classification methodology is proposed that attempts to mitigate this problem. Methods To this end, annotation by expert radiologists of a set of images was performed. The lung region was then segmented and a new classification strategy based on a patch partitioning that improves the resolution of the convolution neural network is proposed. In addition, a set of native images, used as an external evaluation set, is released. Results The best results were obtained for the 6-patch splitting variant with 0.887 accuracy, 0.85 recall and 0.848 F1score on the external validation set. Conclusion The results show that the proposed new strategy maintains similar values between internal and external validation, which gives our model generalization power, making it available for use in hospital settings. Supplementary Information The online version contains supplementary material available at 10.1007/s12553-022-00704-4.
Collapse
|
7
|
Anilkumar B, Srividya K, Mary Sowjanya A. Covid-19 classification using sigmoid based hyper-parameter modified DNN for CT scans and chest X-rays. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:12513-12536. [PMID: 36157352 PMCID: PMC9485800 DOI: 10.1007/s11042-022-13783-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 07/22/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. Diagnosis of Computed Tomography (CT), and Chest X-rays (CXR) contains the problem of overfitting, earlier diagnosis, and mode collapse. In this work, we predict the classification of the Corona in CT and CXR images. Initially, the images of the dataset are pre-processed using the function of an adaptive Gaussian filter for de-nosing the image. Once the image is pre-processed it goes to Sigmoid Based Hyper-Parameter Modified DNN(SHMDNN). The hyperparameter modification makes use of the optimization algorithm of adaptive grey wolf optimization (AGWO). Finally, classification takes place and classifies the CT and CXR images into 3 categories namely normal, Pneumonia, and COVID-19 images. Better accuracy of 99.9% is reached when compared to different DNN networks.
Collapse
Affiliation(s)
- B Anilkumar
- Department of ECE, GMR Institute of Technology, Rajam, India
| | - K Srividya
- Department of CSE, GMR Institute of Technology, Rajam, India
| | - A Mary Sowjanya
- Department of CS&SE, Andhra University College of Engineering, Visakhapatnam, India
| |
Collapse
|
8
|
Generalizability assessment of COVID-19 3D CT data for deep learning-based disease detection. Comput Biol Med 2022; 145:105464. [PMID: 35390746 PMCID: PMC8971071 DOI: 10.1016/j.compbiomed.2022.105464] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 03/25/2022] [Accepted: 03/25/2022] [Indexed: 12/16/2022]
Abstract
BACKGROUND Artificial intelligence technologies in classification/detection of COVID-19 positive cases suffer from generalizability. Moreover, accessing and preparing another large dataset is not always feasible and time-consuming. Several studies have combined smaller COVID-19 CT datasets into "supersets" to maximize the number of training samples. This study aims to assess generalizability by splitting datasets into different portions based on 3D CT images using deep learning. METHOD Two large datasets, including 1110 3D CT images, were split into five segments of 20% each. Each dataset's first 20% segment was separated as a holdout test set. 3D-CNN training was performed with the remaining 80% from each dataset. Two small external datasets were also used to independently evaluate the trained models. RESULTS The total combination of 80% of each dataset has an accuracy of 91% on Iranmehr and 83% on Moscow holdout test datasets. Results indicated that 80% of the primary datasets are adequate for fully training a model. The additional fine-tuning using 40% of a secondary dataset helps the model generalize to a third, unseen dataset. The highest accuracy achieved through transfer learning was 85% on LDCT dataset and 83% on Iranmehr holdout test sets when retrained on 80% of Iranmehr dataset. CONCLUSION While the total combination of both datasets produced the best results, different combinations and transfer learning still produced generalizable results. Adopting the proposed methodology may help to obtain satisfactory results in the case of limited external datasets.
Collapse
|
9
|
Dhont J, Wolfs C, Verhaegen F. Automatic coronavirus disease 2019 diagnosis based on chest radiography and deep learning - Success story or dataset bias? Med Phys 2022; 49:978-987. [PMID: 34951033 PMCID: PMC9015341 DOI: 10.1002/mp.15419] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 12/03/2021] [Accepted: 12/03/2021] [Indexed: 11/11/2022] Open
Abstract
PURPOSE Over the last 2 years, the artificial intelligence (AI) community has presented several automatic screening tools for coronavirus disease 2019 (COVID-19) based on chest radiography (CXR), with reported accuracies often well over 90%. However, it has been noted that many of these studies have likely suffered from dataset bias, leading to overly optimistic results. The purpose of this study was to thoroughly investigate to what extent biases have influenced the performance of a range of previously proposed and promising convolutional neural networks (CNNs), and to determine what performance can be expected with current CNNs on a realistic and unbiased dataset. METHODS Five CNNs for COVID-19 positive/negative classification were implemented for evaluation, namely VGG19, ResNet50, InceptionV3, DenseNet201, and COVID-Net. To perform both internal and cross-dataset evaluations, four datasets were created. The first dataset Valencian Region Medical Image Bank (BIMCV) followed strict reverse transcriptase-polymerase chain reaction (RT-PCR) test criteria and was created from a single reliable open access databank, while the second dataset (COVIDxB8) was created through a combination of six online CXR repositories. The third and fourth datasets were created by combining the opposing classes from the BIMCV and COVIDxB8 datasets. To decrease inter-dataset variability, a pre-processing workflow of resizing, normalization, and histogram equalization were applied to all datasets. Classification performance was evaluated on unseen test sets using precision and recall. A qualitative sanity check was performed by evaluating saliency maps displaying the top 5%, 10%, and 20% most salient segments in the input CXRs, to evaluate whether the CNNs were using relevant information for decision making. In an additional experiment and to further investigate the origin of potential dataset bias, all pixel values outside the lungs were set to zero through automatic lung segmentation before training and testing. RESULTS When trained and evaluated on the single online source dataset (BIMCV), the performance of all CNNs is relatively low (precision: 0.65-0.72, recall: 0.59-0.71), but remains relatively consistent during external evaluation (precision: 0.58-0.82, recall: 0.57-0.72). On the contrary, when trained and internally evaluated on the combinatory datasets, all CNNs performed well across all metrics (precision: 0.94-1.00, recall: 0.77-1.00). However, when subsequently evaluated cross-dataset, results dropped substantially (precision: 0.10-0.61, recall: 0.04-0.80). For all datasets, saliency maps revealed the CNNs rarely focus on areas inside the lungs for their decision-making. However, even when setting all pixel values outside the lungs to zero, classification performance does not change and dataset bias remains. CONCLUSIONS Results in this study confirm that when trained on a combinatory dataset, CNNs tend to learn the origin of the CXRs rather than the presence or absence of disease, a behavior known as short-cut learning. The bias is shown to originate from differences in overall pixel values rather than embedded text or symbols, despite consistent image pre-processing. When trained on a reliable, and realistic single-source dataset in which non-lung pixels have been masked, CNNs currently show limited sensitivity (<70%) for COVID-19 infection in CXR, questioning their use as a reliable automatic screening tool.
Collapse
Affiliation(s)
- Jennifer Dhont
- Department of Radiation Oncology (Maastro)GROW School for OncologyMaastricht University Medical Centre+Maastrichtthe Netherlands
| | - Cecile Wolfs
- Department of Radiation Oncology (Maastro)GROW School for OncologyMaastricht University Medical Centre+Maastrichtthe Netherlands
| | - Frank Verhaegen
- Department of Radiation Oncology (Maastro)GROW School for OncologyMaastricht University Medical Centre+Maastrichtthe Netherlands
| |
Collapse
|
10
|
Karki M, Kantipudi K, Yang F, Yu H, Wang YXJ, Yaniv Z, Jaeger S. Generalization Challenges in Drug-Resistant Tuberculosis Detection from Chest X-rays. Diagnostics (Basel) 2022; 12:188. [PMID: 35054355 PMCID: PMC8775073 DOI: 10.3390/diagnostics12010188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 12/23/2021] [Accepted: 01/05/2022] [Indexed: 11/23/2022] Open
Abstract
Classification of drug-resistant tuberculosis (DR-TB) and drug-sensitive tuberculosis (DS-TB) from chest radiographs remains an open problem. Our previous cross validation performance on publicly available chest X-ray (CXR) data combined with image augmentation, the addition of synthetically generated and publicly available images achieved a performance of 85% AUC with a deep convolutional neural network (CNN). However, when we evaluated the CNN model trained to classify DR-TB and DS-TB on unseen data, significant performance degradation was observed (65% AUC). Hence, in this paper, we investigate the generalizability of our models on images from a held out country's dataset. We explore the extent of the problem and the possible reasons behind the lack of good generalization. A comparison of radiologist-annotated lesion locations in the lung and the trained model's localization of areas of interest, using GradCAM, did not show much overlap. Using the same network architecture, a multi-country classifier was able to identify the country of origin of the X-ray with high accuracy (86%), suggesting that image acquisition differences and the distribution of non-pathological and non-anatomical aspects of the images are affecting the generalization and localization of the drug resistance classification model as well. When CXR images were severely corrupted, the performance on the validation set was still better than 60% AUC. The model overfitted to the data from countries in the cross validation set but did not generalize to the held out country. Finally, we applied a multi-task based approach that uses prior TB lesions location information to guide the classifier network to focus its attention on improving the generalization performance on the held out set from another country to 68% AUC.
Collapse
Affiliation(s)
- Manohar Karki
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD 20894, USA; (F.Y.); (H.Y.); (Y.X.J.W.)
| | - Karthik Kantipudi
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20894, USA;
| | - Feng Yang
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD 20894, USA; (F.Y.); (H.Y.); (Y.X.J.W.)
| | - Hang Yu
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD 20894, USA; (F.Y.); (H.Y.); (Y.X.J.W.)
| | - Yi Xiang J. Wang
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD 20894, USA; (F.Y.); (H.Y.); (Y.X.J.W.)
- Department of Imaging and Interventional Radiology, Faculty of Medicine, The Chinese University of Hong Kong, Prince of Wales Hospital, New Territories, Hong Kong
| | - Ziv Yaniv
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20894, USA;
| | - Stefan Jaeger
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD 20894, USA; (F.Y.); (H.Y.); (Y.X.J.W.)
| |
Collapse
|
11
|
Riahi A, Elharrouss O, Al-Maadeed S. BEMD-3DCNN-based method for COVID-19 detection. Comput Biol Med 2021; 142:105188. [PMID: 34998222 PMCID: PMC8717690 DOI: 10.1016/j.compbiomed.2021.105188] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 12/27/2021] [Accepted: 12/27/2021] [Indexed: 12/23/2022]
Abstract
The coronavirus outbreak continues to spread around the world and no one knows when it will stop. Therefore, from the first day of the identification of the virus in Wuhan, China, scientists have launched numerous research projects to understand the nature of the virus, how to detect it, and search for the most effective medicine to help and protect patients. Importantly, a rapid diagnostic and detection system is a priority and should be developed to stop COVID-19 from spreading. Medical imaging techniques have been used for this purpose. Current research is focused on exploiting different backbones like VGG, ResNet, DenseNet, or combining them to detect COVID-19. By using these backbones many aspects cannot be analyzed like the spatial and contextual information in the images, although this information can be useful for more robust detection performance. In this paper, we used 3D representation of the data as input for the proposed 3DCNN-based deep learning model. The process includes using the Bi-dimensional Empirical Mode Decomposition (BEMD) technique to decompose the original image into IMFs, and then building a video of these IMF images. The formed video is used as input for the 3DCNN model to classify and detect the COVID-19 virus. The 3DCNN model consists of a 3D VGG-16 backbone followed by a Context-aware attention (CAA) module, and then fully connected layers for classification. Each CAA module takes the feature maps of different blocks of the backbone, which allows learning from different feature maps. In our experiments, we used 6484 X-ray images, of which 1802 were COVID-19 positive cases, 1910 normal cases, and 2772 pneumonia cases. The experiment results showed that our proposed technique achieved the desired results on the selected dataset. Additionally, the use of the 3DCNN model with contextual information processing exploited CAA networks to achieve better performance.
Collapse
Affiliation(s)
- Ali Riahi
- Department of Computer Science and Engineering, Department of Computer Science and Engineering, Qatar University, Doha, Qatar.
| | - Omar Elharrouss
- Department of Computer Science and Engineering, Department of Computer Science and Engineering, Qatar University, Doha, Qatar.
| | - Somaya Al-Maadeed
- Department of Computer Science and Engineering, Department of Computer Science and Engineering, Qatar University, Doha, Qatar.
| |
Collapse
|
12
|
Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias - A systematic review of a significant problem. Med Image Anal 2021. [PMID: 34597937 DOI: 10.1101/2021.02.15.21251775] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Computer-aided-diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and improper consideration of confounders prevents the translation of prediction models into clinical practice. By adopting established tools for model evaluation to the task of evaluating datasets, this study provides a systematic appraisal of publicly available COVID-19 chest X-ray datasets, determining their potential use and evaluating potential sources of bias. Only 9 out of more than a hundred identified datasets met at least the criteria for proper assessment of risk of bias and could be analysed in detail. Remarkably most of the datasets utilised in 201 papers published in peer-reviewed journals, are not among these 9 datasets, thus leading to models with high risk of bias. This raises concerns about the suitability of such models for clinical use. This systematic review highlights the limited description of datasets employed for modelling and aids researchers to select the most suitable datasets for their task.
Collapse
Affiliation(s)
- Beatriz Garcia Santa Cruz
- Centre Hospitalier de Luxembourg, 4, Rue Ernest Barble, Luxembourg L-1210, Luxembourg; Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts Fourneaux, Esch-sur-Alzette L-4362, Luxembourg.
| | - Matías Nicolás Bossa
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts Fourneaux, Esch-sur-Alzette L-4362, Luxembourg; Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels B-1050, Belgium
| | - Jan Sölter
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Andreas Dominik Husch
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
13
|
Horry MJ, Chakraborty S, Pradhan B, Fallahpoor M, Chegeni H, Paul M. Factors determining generalization in deep learning models for scoring COVID-CT images. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:9264-9293. [PMID: 34814345 DOI: 10.3934/mbe.2021456] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focused on the diagnosis of COVID-19 from medical images. However, these models have found limited, if any, clinical application due in part to unproven generalization to data sets beyond their source training corpus. This study investigates the generalizability of deep learning models using publicly available COVID-19 Computed Tomography data through cross dataset validation. The predictive ability of these models for COVID-19 severity is assessed using an independent dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. We show that under certain conditions, deep learning models can generalize well to an external dataset with F1 scores up to 86%. The best performing model shows predictive accuracy of between 75% and 96% for lung involvement scoring against an external expertly stratified dataset. From these results we identify key factors promoting deep learning generalization, being primarily the uniform acquisition of training images, and secondly diversity in CT slice position.
Collapse
Affiliation(s)
- Michael James Horry
- Center for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
| | - Subrata Chakraborty
- Center for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
| | - Biswajeet Pradhan
- Center for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
- Center of Excellence for Climate Change Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Earth Observation Center, Institute of Climate Change, Universiti Kebangsaan Malaysia, Selangor 43600, Malaysia
| | - Maryam Fallahpoor
- Center for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
| | - Hossein Chegeni
- Fellowship of Interventional Radiology Imaging Center, IranMehr General Hospital, Iran
| | - Manoranjan Paul
- Machine Vision and Digital Health (MaViDH), School of Computing, Mathematics, and Engineering, Charles Sturt University, Australia
| |
Collapse
|
14
|
López-Cabrera JD, Orozco-Morales R, Portal-Díaz JA, Lovelle-Enríquez O, Pérez-Díaz M. Current limitations to identify covid-19 using artificial intelligence with chest x-ray imaging (part ii). The shortcut learning problem. HEALTH AND TECHNOLOGY 2021; 11:1331-1345. [PMID: 34660166 PMCID: PMC8502237 DOI: 10.1007/s12553-021-00609-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 10/05/2021] [Indexed: 12/12/2022]
Abstract
Since the outbreak of the COVID-19 pandemic, computer vision researchers have been working on automatic identification of this disease using radiological images. The results achieved by automatic classification methods far exceed those of human specialists, with sensitivity as high as 100% being reported. However, prestigious radiology societies have stated that the use of this type of imaging alone is not recommended as a diagnostic method. According to some experts the patterns presented in these images are unspecific and subtle, overlapping with other viral pneumonias. This report seeks to evaluate the analysis the robustness and generalizability of different approaches using artificial intelligence, deep learning and computer vision to identify COVID-19 using chest X-rays images. We also seek to alert researchers and reviewers to the issue of "shortcut learning". Recommendations are presented to identify whether COVID-19 automatic classification models are being affected by shortcut learning. Firstly, papers using explainable artificial intelligence methods are reviewed. The results of applying external validation sets are evaluated to determine the generalizability of these methods. Finally, studies that apply traditional computer vision methods to perform the same task are considered. It is evident that using the whole chest X-Ray image or the bounding box of the lungs, the image regions that contribute most to the classification appear outside of the lung region, something that is not likely possible. In addition, although the investigations that evaluated their models on data sets external to the training set, the effectiveness of these models decreased significantly, it may provide a more realistic representation as how the model will perform in the clinic. The results indicate that, so far, the existing models often involve shortcut learning, which makes their use less appropriate in the clinical setting.
Collapse
Affiliation(s)
- José Daniel López-Cabrera
- Centro de Investigaciones de La Informática, Facultad de Matemática, Física y Computación, Universidad Central “Marta Abreu” de Las Villas, Villa Clara, Santa Clara, Cuba
| | - Rubén Orozco-Morales
- Departamento de Control Automático, Facultad de Ingeniería Eléctrica, Universidad Central “Marta Abreu” de Las Villas, Villa Clara, Santa Clara, Cuba
| | - Jorge Armando Portal-Díaz
- Departamento de Control Automático, Facultad de Ingeniería Eléctrica, Universidad Central “Marta Abreu” de Las Villas, Villa Clara, Santa Clara, Cuba
| | - Orlando Lovelle-Enríquez
- Departamento de Imagenología, Hospital Comandante Manuel Fajardo Rivero, Villa Clara, Santa Clara, Cuba
| | - Marlén Pérez-Díaz
- Departamento de Control Automático, Facultad de Ingeniería Eléctrica, Universidad Central “Marta Abreu” de Las Villas, Villa Clara, Santa Clara, Cuba
| |
Collapse
|
15
|
El Naqa I, Li H, Fuhrman J, Hu Q, Gorre N, Chen W, Giger ML. Lessons learned in transitioning to AI in the medical imaging of COVID-19. J Med Imaging (Bellingham) 2021; 8:010902-10902. [PMID: 34646912 PMCID: PMC8488974 DOI: 10.1117/1.jmi.8.s1.010902] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/20/2021] [Indexed: 12/12/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has wreaked havoc across the world. It also created a need for the urgent development of efficacious predictive diagnostics, specifically, artificial intelligence (AI) methods applied to medical imaging. This has led to the convergence of experts from multiple disciplines to solve this global pandemic including clinicians, medical physicists, imaging scientists, computer scientists, and informatics experts to bring to bear the best of these fields for solving the challenges of the COVID-19 pandemic. However, such a convergence over a very brief period of time has had unintended consequences and created its own challenges. As part of Medical Imaging Data and Resource Center initiative, we discuss the lessons learned from career transitions across the three involved disciplines (radiology, medical imaging physics, and computer science) and draw recommendations based on these experiences by analyzing the challenges associated with each of the three associated transition types: (1) AI of non-imaging data to AI of medical imaging data, (2) medical imaging clinician to AI of medical imaging, and (3) AI of medical imaging to AI of COVID-19 imaging. The lessons learned from these career transitions and the diffusion of knowledge among them could be accomplished more effectively by recognizing their associated intricacies. These lessons learned in the transitioning to AI in the medical imaging of COVID-19 can inform and enhance future AI applications, making the whole of the transitions more than the sum of each discipline, for confronting an emergency like the COVID-19 pandemic or solving emerging problems in biomedicine.
Collapse
Affiliation(s)
- Issam El Naqa
- Moffitt Cancer Center, Department of Machine Learning, Tampa, Florida, United States
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
| | - Hui Li
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Jordan Fuhrman
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Qiyuan Hu
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Naveena Gorre
- Moffitt Cancer Center, Department of Machine Learning, Tampa, Florida, United States
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
| | - Weijie Chen
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
- US FDA, CDRH, Office of Science and Engineering Laboratories, Division of Imaging, Diagnosis, and Software Reliability, Silver Spring, Maryland, United States
| | - Maryellen L. Giger
- The University of Chicago, Medical Imaging Data and Resource Center, Chicago, Illinois, United States
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|