1
|
Rickard D, Kabir MA, Homaira N. Machine learning-based approaches for distinguishing viral and bacterial pneumonia in paediatrics: A scoping review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 268:108802. [PMID: 40349546 DOI: 10.1016/j.cmpb.2025.108802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Revised: 04/13/2025] [Accepted: 04/22/2025] [Indexed: 05/14/2025]
Abstract
BACKGROUND AND OBJECTIVE Pneumonia is the leading cause of hospitalisation and mortality among children under five, particularly in low-resource settings. Accurate differentiation between viral and bacterial pneumonia is essential for guiding appropriate treatment, yet it remains challenging due to overlapping clinical and radiographic features. Advances in machine learning (ML), particularly deep learning (DL), have shown promise in classifying pneumonia using chest X-ray (CXR) images. This scoping review summarises the evidence on ML techniques for classifying viral and bacterial pneumonia using CXR images in paediatric patients. METHODS This scoping review was conducted following the Joanna Briggs Institute methodology and the PRISMA-ScR guidelines. A comprehensive search was performed in PubMed, Embase, and Scopus to identify studies involving children (0-18 years) with pneumonia diagnosed through CXR, using ML models for binary or multiclass classification. Data extraction included ML models, dataset characteristics, and performance metrics. RESULTS A total of 35 studies, published between 2018 and 2025, were included in this review. Of these, 31 studies used the publicly available Kermany dataset, raising concerns about overfitting and limited generalisability to broader, real-world clinical populations. Most studies (n=33) used convolutional neural networks (CNNs) for pneumonia classification. While many models demonstrated promising performance, significant variability was observed due to differences in methodologies, dataset sizes, and validation strategies, complicating direct comparisons. For binary classification (viral vs bacterial pneumonia), a median accuracy of 92.3% (range: 80.8% to 97.9%) was reported. For multiclass classification (healthy, viral pneumonia, and bacterial pneumonia), the median accuracy was 91.8% (range: 76.8% to 99.7%). CONCLUSIONS Current evidence is constrained by a predominant reliance on a single dataset and variability in methodologies, which limit the generalisability and clinical applicability of findings. To address these limitations, future research should focus on developing diverse and representative datasets while adhering to standardised reporting guidelines. Such efforts are essential to improve the reliability, reproducibility, and translational potential of machine learning models in clinical settings.
Collapse
Affiliation(s)
- Declan Rickard
- School of Clinical Medicine, UNSW Sydney, Kensington, NSW, 2052, Australia.
| | - Muhammad Ashad Kabir
- School of Computing, Mathematics and Engineering, Charles Sturt University, Bathurst, NSW, 2795, Australia; Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Bathurst, NSW, 2795, Australia.
| | - Nusrat Homaira
- School of Clinical Medicine, UNSW Sydney, Kensington, NSW, 2052, Australia; Discipline of Pediatrics and Child Health, UNSW Sydney, Randwick, NSW, 2031, Australia; Respiratory Department, Sydney Children's Hospital, Randwick, NSW, 2031, Australia.
| |
Collapse
|
2
|
Shih YC, Ko CL, Wang SY, Chang CY, Lin SS, Huang CW, Cheng MF, Chen CM, Wu YW. Cross-institutional validation of a polar map-free 3D deep learning model for obstructive coronary artery disease prediction using myocardial perfusion imaging: insights into generalizability and bias. Eur J Nucl Med Mol Imaging 2025:10.1007/s00259-025-07243-w. [PMID: 40198356 DOI: 10.1007/s00259-025-07243-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Accepted: 03/24/2025] [Indexed: 04/10/2025]
Abstract
PURPOSE Deep learning (DL) models for predicting obstructive coronary artery disease (CAD) using myocardial perfusion imaging (MPI) have shown potential for enhancing diagnostic accuracy. However, their ability to maintain consistent performance across institutions and demographics remains uncertain. This study aimed to investigate the generalizability and potential biases of an in-house MPI DL model between two hospital-based cohorts. METHODS We retrospectively included patients from two medical centers in Taiwan who underwent stress/redistribution thallium-201 MPI followed by invasive coronary angiography within 90 days as the reference standard. A polar map-free 3D DL model trained on 928 MPI images from one center to predict obstructive CAD was tested on internal (933 images) and external (3234 images from the other center) validation sets. Diagnostic performance, assessed using area under receiver operating characteristic curves (AUCs), was compared between the internal and external cohorts, demographic groups, and with the performance of stress total perfusion deficit (TPD). RESULTS The model showed significantly lower performance in the external cohort compared to the internal cohort in both patient-based (AUC: 0.713 vs. 0.813) and vessel-based (AUC: 0.733 vs. 0.782) analyses, but still outperformed stress TPD (all p < 0.001). The performance was lower in patients who underwent treadmill stress MPI in the internal cohort and in patients over 70 years old in the external cohort. CONCLUSIONS This study demonstrated adequate performance but also limitations in the generalizability of the DL-based MPI model, along with biases related to stress type and patient age. Thorough validation is essential before the clinical implementation of DL MPI models.
Collapse
Affiliation(s)
- Yu-Cheng Shih
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Chi-Lun Ko
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
- Department of Nuclear Medicine, National Taiwan University Hospital, Taipei, Taiwan
- College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Shan-Ying Wang
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
- Electrical and Communication Engineering College, Yuan Ze University, Taoyuan, Taiwan
| | - Chen-Yu Chang
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Shau-Syuan Lin
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Cheng-Wen Huang
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Mei-Fang Cheng
- Department of Nuclear Medicine, National Taiwan University Hospital, Taipei, Taiwan
- College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Chung-Ming Chen
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Yen-Wen Wu
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan.
- Division of Cardiology, Cardiovascular Center, Far Eastern Memorial Hospital, No. 21, Sec. 2, Nanya S. Rd., Banqiao Dist, New Taipei City, 220216, Taiwan.
- School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- Graduate Institute of Medicine, Yuan Ze University, Taoyuan City, Taiwan.
| |
Collapse
|
3
|
Ganatra HA. Machine Learning in Pediatric Healthcare: Current Trends, Challenges, and Future Directions. J Clin Med 2025; 14:807. [PMID: 39941476 PMCID: PMC11818243 DOI: 10.3390/jcm14030807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/16/2025] [Accepted: 01/23/2025] [Indexed: 02/16/2025] Open
Abstract
Background/Objectives: Artificial intelligence (AI) and machine learning (ML) are transforming healthcare by enabling predictive, diagnostic, and therapeutic advancements. Pediatric healthcare presents unique challenges, including limited data availability, developmental variability, and ethical considerations. This narrative review explores the current trends, applications, challenges, and future directions of ML in pediatric healthcare. Methods: A systematic search of the PubMed database was conducted using the query: ("artificial intelligence" OR "machine learning") AND ("pediatric" OR "paediatric"). Studies were reviewed to identify key themes, methodologies, applications, and challenges. Gaps in the research and ethical considerations were also analyzed to propose future research directions. Results: ML has demonstrated promise in diagnostic support, prognostic modeling, and therapeutic planning for pediatric patients. Applications include the early detection of conditions like sepsis, improved diagnostic imaging, and personalized treatment strategies for chronic conditions such as epilepsy and Crohn's disease. However, challenges such as data limitations, ethical concerns, and lack of model generalizability remain significant barriers. Emerging techniques, including federated learning and explainable AI (XAI), offer potential solutions. Despite these advancements, research gaps persist in data diversity, model interpretability, and ethical frameworks. Conclusions: ML offers transformative potential in pediatric healthcare by addressing diagnostic, prognostic, and therapeutic challenges. While advancements highlight its promise, overcoming barriers such as data limitations, ethical concerns, and model trustworthiness is essential for its broader adoption. Future efforts should focus on enhancing data diversity, developing standardized ethical guidelines, and improving model transparency to ensure equitable and effective implementation in pediatric care.
Collapse
Affiliation(s)
- Hammad A Ganatra
- Pediatric Critical Care Medicine, Cleveland Clinic Children's, 9500 Euclid Ave, Cleveland, OH 44195, USA
| |
Collapse
|
4
|
Rajaraman S, Liang Z, Xue Z, Antani S. Addressing Class Imbalance with Latent Diffusion-based Data Augmentation for Improving Disease Classification in Pediatric Chest X-rays. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2024; 2024:5059-5066. [PMID: 40134830 PMCID: PMC11936509 DOI: 10.1109/bibm62325.2024.10822172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Deep learning (DL) has transformed medical image classification; however, its efficacy is often limited by significant data imbalance due to far fewer cases (minority class) compared to controls (majority class). It has been shown that synthetic image augmentation techniques can simulate clinical variability, leading to enhanced model performance. We hypothesize that they could also mitigate the challenge of data imbalance, thereby addressing overfitting to the majority class and enhancing generalization. Recently, latent diffusion models (LDMs) have shown promise in synthesizing high-quality medical images. This study evaluates the effectiveness of a text-guided image-to-image LDM in synthesizing disease-positive chest X-rays (CXRs) and augmenting a pediatric CXR dataset to improve classification performance. We first establish baseline performance by fine-tuning an ImageNet-pretrained Inception-V3 model on class-imbalanced data for two tasks-normal vs. pneumonia and normal vs. bronchopneumonia. Next, we fine-tune individual text-guided image-to-image LDMs to generate CXRs showing signs of pneumonia and bronchopneumonia. The Inception-V3 model is retrained on an updated data set that includes these synthesized images as part of augmented training and validation sets. Classification performance is compared using balanced accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient (MCC), Kappa, and Youden's index against the baseline performance. Results show that the augmentation significantly improves Youden's index (p<0.05) and markedly enhances other metrics, indicating that data augmentation using LDM-synthesized images is an effective strategy for addressing class imbalance in medical image classification.
Collapse
Affiliation(s)
- Sivaramakrishnan Rajaraman
- Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhaohui Liang
- Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyun Xue
- Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sameer Antani
- Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
5
|
Santomartino SM, Zech JR, Hall K, Jeudy J, Parekh V, Yi PH, Weintraub E. Evaluating the Performance and Bias of Natural Language Processing Tools in Labeling Chest Radiograph Reports. Radiology 2024; 313:e232746. [PMID: 39436298 PMCID: PMC11535863 DOI: 10.1148/radiol.232746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 08/12/2024] [Accepted: 08/20/2024] [Indexed: 10/23/2024]
Abstract
Background Natural language processing (NLP) is commonly used to annotate radiology datasets for training deep learning (DL) models. However, the accuracy and potential biases of these NLP methods have not been thoroughly investigated, particularly across different demographic groups. Purpose To evaluate the accuracy and demographic bias of four NLP radiology report labeling tools on two chest radiograph datasets. Materials and Methods This retrospective study, performed between April 2022 and April 2024, evaluated chest radiograph report labeling using four NLP tools (CheXpert [rule-based], RadReportAnnotator [RRA; DL-based], OpenAI's GPT-4 [DL-based], cTAKES [hybrid]) on a subset of the Medical Information Mart for Intensive Care (MIMIC) chest radiograph dataset balanced for representation of age, sex, and race and ethnicity (n = 692) and the entire Indiana University (IU) chest radiograph dataset (n = 3665). Three board-certified radiologists annotated the chest radiograph reports for 14 thoracic disease labels. NLP tool performance was evaluated using several metrics, including accuracy and error rate. Bias was evaluated by comparing performance between demographic subgroups using the Pearson χ2 test. Results The IU dataset included 3665 patients (mean age, 49.7 years ± 17 [SD]; 1963 female), while the MIMIC dataset included 692 patients (mean age, 54.1 years ± 23.1; 357 female). All four NLP tools demonstrated high accuracy across findings in the IU and MIMIC datasets, as follows: CheXpert (92.6% [47 516 of 51 310], 90.2% [8742 of 9688]), RRA (82.9% [19 746 of 23 829], 92.2% [2870 of 3114]), GPT-4 (94.3% [45 586 of 48 342], 91.6% [6721 of 7336]), and cTAKES (84.7% [43 436 of 51 310], 88.7% [8597 of 9688]). RRA and cTAKES had higher accuracy (P < .001) on the MIMIC dataset, while CheXpert and GPT-4 had higher accuracy on the IU dataset. Differences (P < .001) in error rates were observed across age groups for all NLP tools except RRA on the MIMIC dataset, with the highest error rates for CheXpert, RRA, and cTAKES in patients older than 80 years (mean, 15.8% ± 5.0) and the highest error rate for GPT-4 in patients 60-80 years of age (8.3%). Conclusion Although commonly used NLP tools for chest radiograph report annotation are accurate when evaluating reports in aggregate, demographic subanalyses showed significant bias, with poorer performance in older patients. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Cai in this issue.
Collapse
Affiliation(s)
- Samantha M. Santomartino
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - John R. Zech
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Kent Hall
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Jean Jeudy
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Vishwa Parekh
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Paul H. Yi
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Elizabeth Weintraub
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| |
Collapse
|
6
|
Siddiqi R, Javaid S. Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey. J Imaging 2024; 10:176. [PMID: 39194965 DOI: 10.3390/jimaging10080176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 08/29/2024] Open
Abstract
This paper addresses the significant problem of identifying the relevant background and contextual literature related to deep learning (DL) as an evolving technology in order to provide a comprehensive analysis of the application of DL to the specific problem of pneumonia detection via chest X-ray (CXR) imaging, which is the most common and cost-effective imaging technique available worldwide for pneumonia diagnosis. This paper in particular addresses the key period associated with COVID-19, 2020-2023, to explain, analyze, and systematically evaluate the limitations of approaches and determine their relative levels of effectiveness. The context in which DL is applied as both an aid to and an automated substitute for existing expert radiography professionals, who often have limited availability, is elaborated in detail. The rationale for the undertaken research is provided, along with a justification of the resources adopted and their relevance. This explanatory text and the subsequent analyses are intended to provide sufficient detail of the problem being addressed, existing solutions, and the limitations of these, ranging in detail from the specific to the more general. Indeed, our analysis and evaluation agree with the generally held view that the use of transformers, specifically, vision transformers (ViTs), is the most promising technique for obtaining further effective results in the area of pneumonia detection using CXR images. However, ViTs require extensive further research to address several limitations, specifically the following: biased CXR datasets, data and code availability, the ease with which a model can be explained, systematic methods of accurate model comparison, the notion of class imbalance in CXR datasets, and the possibility of adversarial attacks, the latter of which remains an area of fundamental research.
Collapse
Affiliation(s)
- Raheel Siddiqi
- Computer Science Department, Karachi Campus, Bahria University, Karachi 73500, Pakistan
| | - Sameena Javaid
- Computer Science Department, Karachi Campus, Bahria University, Karachi 73500, Pakistan
| |
Collapse
|
7
|
Wu D, Smith D, VanBerlo B, Roshankar A, Lee H, Li B, Ali F, Rahman M, Basmaji J, Tschirhart J, Ford A, VanBerlo B, Durvasula A, Vannelli C, Dave C, Deglint J, Ho J, Chaudhary R, Clausdorff H, Prager R, Millington S, Shah S, Buchanan B, Arntfield R. Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification. Diagnostics (Basel) 2024; 14:1081. [PMID: 38893608 PMCID: PMC11172006 DOI: 10.3390/diagnostics14111081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 05/18/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce-compared to other medical imaging data-we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model's performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.
Collapse
Affiliation(s)
- Derek Wu
- Department of Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Delaney Smith
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Blake VanBerlo
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Amir Roshankar
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Hoseok Lee
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Brian Li
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Faraz Ali
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Marwan Rahman
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - John Basmaji
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Jared Tschirhart
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Alex Ford
- Independent Researcher, London, ON N6A 1L8, Canada;
| | - Bennett VanBerlo
- Faculty of Engineering, Western University, London, ON N6A 5C1, Canada;
| | - Ashritha Durvasula
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Claire Vannelli
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Chintan Dave
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Jason Deglint
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Jordan Ho
- Department of Family Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Rushil Chaudhary
- Department of Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Hans Clausdorff
- Departamento de Medicina de Urgencia, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile;
| | - Ross Prager
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Scott Millington
- Department of Critical Care Medicine, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
| | - Samveg Shah
- Department of Medicine, University of Alberta, Edmonton, AB T6G 2R3, Canada;
| | - Brian Buchanan
- Department of Critical Care, University of Alberta, Edmonton, AB T6G 2R3, Canada;
| | - Robert Arntfield
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| |
Collapse
|
8
|
Rajaraman S, Zamzmi G, Yang F, Liang Z, Xue Z, Antani S. Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric chest X-ray images. PLOS DIGITAL HEALTH 2024; 3:e0000286. [PMID: 38232121 DOI: 10.1371/journal.pdig.0000286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 12/04/2023] [Indexed: 01/19/2024]
Abstract
Model initialization techniques are vital for improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, the impacts on medical images, particularly chest X-rays (CXRs) are less understood. Addressing this gap, our study explores three deep model initialization techniques: Cold-start, Warm-start, and Shrink and Perturb start, focusing on adult and pediatric populations. We specifically focus on scenarios with periodically arriving data for training, thereby embracing the real-world scenarios of ongoing data influx and the need for model updates. We evaluate these models for generalizability against external adult and pediatric CXR datasets. We also propose novel ensemble methods: F-score-weighted Sequential Least-Squares Quadratic Programming (F-SLSQP) and Attention-Guided Ensembles with Learnable Fuzzy Softmax to aggregate weight parameters from multiple models to capitalize on their collective knowledge and complementary representations. We perform statistical significance tests with 95% confidence intervals and p-values to analyze model performance. Our evaluations indicate models initialized with ImageNet-pretrained weights demonstrate superior generalizability over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, ImageNet-pretrained models exhibit consistent performance during internal and external testing across different training scenarios. Weight-level ensembles of these models show significantly higher recall (p<0.05) during testing compared to individual models. Thus, our study accentuates the benefits of ImageNet-pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.
Collapse
Affiliation(s)
- Sivaramakrishnan Rajaraman
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ghada Zamzmi
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Feng Yang
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Zhaohui Liang
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Zhiyun Xue
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Sameer Antani
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
9
|
Rajaraman S, Yang F, Zamzmi G, Xue Z, Antani S. Can Deep Adult Lung Segmentation Models Generalize to the Pediatric Population? EXPERT SYSTEMS WITH APPLICATIONS 2023; 229:120531. [PMID: 37397242 PMCID: PMC10310063 DOI: 10.1016/j.eswa.2023.120531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is reported to be significantly different across the developmental stages from infancy to adulthood. This might result in age-related data domain shifts that would adversely impact lung segmentation performance when the models trained on the adult population are deployed for pediatric lung segmentation. In this work, our goal is to (i) analyze the generalizability of deep adult lung segmentation models to the pediatric population and (ii) improve performance through a stage-wise, systematic approach consisting of CXR modality-specific weight initializations, stacked ensembles, and an ensemble of stacked ensembles. To evaluate segmentation performance and generalizability, novel evaluation metrics consisting of mean lung contour distance (MLCD) and average hash score (AHS) are proposed in addition to the multi-scale structural similarity index measure (MS-SSIM), the intersection of union (IoU), Dice score, 95% Hausdorff distance (HD95), and average symmetric surface distance (ASSD). Our results showed a significant improvement (p < 0.05) in cross-domain generalization through our approach. This study could serve as a paradigm to analyze the cross-domain generalizability of deep segmentation models for other medical imaging modalities and applications.
Collapse
Affiliation(s)
- Sivaramakrishnan Rajaraman
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Feng Yang
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Ghada Zamzmi
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Zhiyun Xue
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sameer Antani
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
10
|
Krokos G, MacKewn J, Dunn J, Marsden P. A review of PET attenuation correction methods for PET-MR. EJNMMI Phys 2023; 10:52. [PMID: 37695384 PMCID: PMC10495310 DOI: 10.1186/s40658-023-00569-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 08/07/2023] [Indexed: 09/12/2023] Open
Abstract
Despite being thirteen years since the installation of the first PET-MR system, the scanners constitute a very small proportion of the total hybrid PET systems installed. This is in stark contrast to the rapid expansion of the PET-CT scanner, which quickly established its importance in patient diagnosis within a similar timeframe. One of the main hurdles is the development of an accurate, reproducible and easy-to-use method for attenuation correction. Quantitative discrepancies in PET images between the manufacturer-provided MR methods and the more established CT- or transmission-based attenuation correction methods have led the scientific community in a continuous effort to develop a robust and accurate alternative. These can be divided into four broad categories: (i) MR-based, (ii) emission-based, (iii) atlas-based and the (iv) machine learning-based attenuation correction, which is rapidly gaining momentum. The first is based on segmenting the MR images in various tissues and allocating a predefined attenuation coefficient for each tissue. Emission-based attenuation correction methods aim in utilising the PET emission data by simultaneously reconstructing the radioactivity distribution and the attenuation image. Atlas-based attenuation correction methods aim to predict a CT or transmission image given an MR image of a new patient, by using databases containing CT or transmission images from the general population. Finally, in machine learning methods, a model that could predict the required image given the acquired MR or non-attenuation-corrected PET image is developed by exploiting the underlying features of the images. Deep learning methods are the dominant approach in this category. Compared to the more traditional machine learning, which uses structured data for building a model, deep learning makes direct use of the acquired images to identify underlying features. This up-to-date review goes through the literature of attenuation correction approaches in PET-MR after categorising them. The various approaches in each category are described and discussed. After exploring each category separately, a general overview is given of the current status and potential future approaches along with a comparison of the four outlined categories.
Collapse
Affiliation(s)
- Georgios Krokos
- School of Biomedical Engineering and Imaging Sciences, The PET Centre at St Thomas' Hospital London, King's College London, 1st Floor Lambeth Wing, Westminster Bridge Road, London, SE1 7EH, UK.
| | - Jane MacKewn
- School of Biomedical Engineering and Imaging Sciences, The PET Centre at St Thomas' Hospital London, King's College London, 1st Floor Lambeth Wing, Westminster Bridge Road, London, SE1 7EH, UK
| | - Joel Dunn
- School of Biomedical Engineering and Imaging Sciences, The PET Centre at St Thomas' Hospital London, King's College London, 1st Floor Lambeth Wing, Westminster Bridge Road, London, SE1 7EH, UK
| | - Paul Marsden
- School of Biomedical Engineering and Imaging Sciences, The PET Centre at St Thomas' Hospital London, King's College London, 1st Floor Lambeth Wing, Westminster Bridge Road, London, SE1 7EH, UK
| |
Collapse
|
11
|
Beheshtian E, Putman K, Santomartino SM, Parekh VS, Yi PH. Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand Radiographs. Radiology 2023; 306:e220505. [PMID: 36165796 DOI: 10.1148/radiol.220505] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ2 tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.
Collapse
Affiliation(s)
- Elham Beheshtian
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Kristin Putman
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Samantha M Santomartino
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Vishwa S Parekh
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Paul H Yi
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| |
Collapse
|
12
|
Chua M, Kim D, Choi J, Lee NG, Deshpande V, Schwab J, Lev MH, Gonzalez RG, Gee MS, Do S. Tackling prediction uncertainty in machine learning for healthcare. Nat Biomed Eng 2022:10.1038/s41551-022-00988-x. [PMID: 36581695 DOI: 10.1038/s41551-022-00988-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 11/17/2022] [Indexed: 12/31/2022]
Abstract
Predictive machine-learning systems often do not convey the degree of confidence in the correctness of their outputs. To prevent unsafe prediction failures from machine-learning models, the users of the systems should be aware of the general accuracy of the model and understand the degree of confidence in each individual prediction. In this Perspective, we convey the need of prediction-uncertainty metrics in healthcare applications, with a focus on radiology. We outline the sources of prediction uncertainty, discuss how to implement prediction-uncertainty metrics in applications that require zero tolerance to errors and in applications that are error-tolerant, and provide a concise framework for understanding prediction uncertainty in healthcare contexts. For machine-learning-enabled automation to substantially impact healthcare, machine-learning models with zero tolerance for false-positive or false-negative errors must be developed intentionally.
Collapse
Affiliation(s)
- Michelle Chua
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Doyun Kim
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Jongmun Choi
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Nahyoung G Lee
- Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | - Vikram Deshpande
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Joseph Schwab
- Department of Orthopedic Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Michael H Lev
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Ramon G Gonzalez
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Michael S Gee
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Synho Do
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA. .,Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
13
|
Can images crowdsourced from the internet be used to train generalizable joint dislocation deep learning algorithms? Skeletal Radiol 2022; 51:2121-2128. [PMID: 35624310 DOI: 10.1007/s00256-022-04077-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 02/02/2023]
Abstract
OBJECTIVE Deep learning has the potential to automatically triage orthopedic emergencies, such as joint dislocations. However, due to the rarity of these injuries, collecting large numbers of images to train algorithms may be infeasible for many centers. We evaluated if the Internet could be used as a source of images to train convolutional neural networks (CNNs) for joint dislocations that would generalize well to real-world clinical cases. METHODS We collected datasets from online radiology repositories of 100 radiographs each (50 dislocated, 50 located) for four joints: native shoulder, elbow, hip, and total hip arthroplasty (THA). We trained a variety of CNN binary classifiers using both on-the-fly and static data augmentation to identify the various joint dislocations. The best-performing classifier for each joint was evaluated on an external test set of 100 corresponding radiographs (50 dislocations) from three hospitals. CNN performance was evaluated using area under the ROC curve (AUROC). To determine areas emphasized by the CNN for decision-making, class activation map (CAM) heatmaps were generated for test images. RESULTS The best-performing CNNs for elbow, hip, shoulder, and THA dislocation achieved high AUROCs on both internal and external test sets (internal/external AUC): elbow (1.0/0.998), hip (0.993/0.880), shoulder (1.0/0.993), THA (1.0/0.950). Heatmaps demonstrated appropriate emphasis of joints for both located and dislocated joints. CONCLUSION With modest numbers of images, radiographs from the Internet can be used to train clinically-generalizable CNNs for joint dislocations. Given the rarity of joint dislocations at many centers, online repositories may be a viable source for CNN-training data.
Collapse
|