1
|
Shih YC, Ko CL, Wang SY, Chang CY, Lin SS, Huang CW, Cheng MF, Chen CM, Wu YW. Cross-institutional validation of a polar map-free 3D deep learning model for obstructive coronary artery disease prediction using myocardial perfusion imaging: insights into generalizability and bias. Eur J Nucl Med Mol Imaging 2025:10.1007/s00259-025-07243-w. [PMID: 40198356 DOI: 10.1007/s00259-025-07243-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Accepted: 03/24/2025] [Indexed: 04/10/2025]
Abstract
PURPOSE Deep learning (DL) models for predicting obstructive coronary artery disease (CAD) using myocardial perfusion imaging (MPI) have shown potential for enhancing diagnostic accuracy. However, their ability to maintain consistent performance across institutions and demographics remains uncertain. This study aimed to investigate the generalizability and potential biases of an in-house MPI DL model between two hospital-based cohorts. METHODS We retrospectively included patients from two medical centers in Taiwan who underwent stress/redistribution thallium-201 MPI followed by invasive coronary angiography within 90 days as the reference standard. A polar map-free 3D DL model trained on 928 MPI images from one center to predict obstructive CAD was tested on internal (933 images) and external (3234 images from the other center) validation sets. Diagnostic performance, assessed using area under receiver operating characteristic curves (AUCs), was compared between the internal and external cohorts, demographic groups, and with the performance of stress total perfusion deficit (TPD). RESULTS The model showed significantly lower performance in the external cohort compared to the internal cohort in both patient-based (AUC: 0.713 vs. 0.813) and vessel-based (AUC: 0.733 vs. 0.782) analyses, but still outperformed stress TPD (all p < 0.001). The performance was lower in patients who underwent treadmill stress MPI in the internal cohort and in patients over 70 years old in the external cohort. CONCLUSIONS This study demonstrated adequate performance but also limitations in the generalizability of the DL-based MPI model, along with biases related to stress type and patient age. Thorough validation is essential before the clinical implementation of DL MPI models.
Collapse
Affiliation(s)
- Yu-Cheng Shih
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Chi-Lun Ko
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
- Department of Nuclear Medicine, National Taiwan University Hospital, Taipei, Taiwan
- College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Shan-Ying Wang
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
- Electrical and Communication Engineering College, Yuan Ze University, Taoyuan, Taiwan
| | - Chen-Yu Chang
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Shau-Syuan Lin
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Cheng-Wen Huang
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Mei-Fang Cheng
- Department of Nuclear Medicine, National Taiwan University Hospital, Taipei, Taiwan
- College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Chung-Ming Chen
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Yen-Wen Wu
- Department of Nuclear Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan.
- Division of Cardiology, Cardiovascular Center, Far Eastern Memorial Hospital, No. 21, Sec. 2, Nanya S. Rd., Banqiao Dist, New Taipei City, 220216, Taiwan.
- School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- Graduate Institute of Medicine, Yuan Ze University, Taoyuan City, Taiwan.
| |
Collapse
|
2
|
Xie W, Liu Z, Zhao L, Wang M, Tian J, Liu J. DIFLF: A domain-invariant features learning framework for single-source domain generalization in mammogram classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108592. [PMID: 39813937 DOI: 10.1016/j.cmpb.2025.108592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/14/2024] [Accepted: 01/05/2025] [Indexed: 01/18/2025]
Abstract
BACKGROUND AND OBJECTIVE Single-source domain generalization (SSDG) aims to generalize a deep learning (DL) model trained on one source dataset to multiple unseen datasets. This is important for the clinical applications of DL-based models to breast cancer screening, wherein a DL-based model is commonly developed in an institute and then tested in other institutes. One challenge of SSDG is to alleviate the domain shifts using only one domain dataset. METHODS The present study proposed a domain-invariant features learning framework (DIFLF) for single-source domain. Specifically, a style-augmentation module (SAM) and a content-style disentanglement module (CSDM) are proposed in DIFLF. SAM includes two different color jitter transforms, which transforms each mammogram in the source domain into two synthesized mammograms with new styles. Thus, it can greatly increase the feature diversity of the source domain, reducing the overfitting of the trained model. CSDM includes three feature disentanglement units, which extracts domain-invariant content (DIC) features by disentangling them from domain-specific style (DSS) features, reducing the influence of the domain shifts resulting from different feature distributions. Our code is available for open access on Github (https://github.com/85675/DIFLF). RESULTS DIFLF is trained in a private dataset (PRI1), and tested first in another private dataset (PRI2) with similar feature distribution to PRI1 and then tested in two public datasets (INbreast and MIAS) with greatly different feature distributions from PRI1. As revealed by the experiment results, DIFLF presents excellent performance for classifying mammograms in the unseen target datasets of PRI2, INbreast, and MIAS. The accuracy and AUC of DIFLF are 0.917 and 0.928 in PRI2, 0.882 and 0.893 in INbreast, 0.767 and 0.710 in MIAS, respectively. CONCLUSIONS DIFLF can alleviate the influence of domain shifts only using one source dataset. Moreover, DIFLF can achieve an excellent mammogram classification performance even in the unseen datasets with great feature distribution differences from the training dataset.
Collapse
Affiliation(s)
- Wanfang Xie
- School of Engineering Medicine, Beihang University, Beijing 100191, PR China; Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100191, PR China
| | - Zhenyu Liu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, PR China; University of Chinese Academy of Sciences, Beijing 100080, PR China
| | - Litao Zhao
- School of Engineering Medicine, Beihang University, Beijing 100191, PR China; Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100191, PR China
| | - Meiyun Wang
- Department of Medical Imaging, Henan Provincial People's Hospital & People's Hospital of Zhengzhou University, Zhengzhou 450003, PR China.
| | - Jie Tian
- School of Engineering Medicine, Beihang University, Beijing 100191, PR China; Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100191, PR China.
| | - Jiangang Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, PR China; Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of the People's Republic of China, Beijing 100191, PR China; Beijing Engineering Research Center of Cardiovascular Wisdom Diagnosis and Treatment, Beijing 100029, PR China.
| |
Collapse
|
3
|
Luo L, Wang X, Lin Y, Ma X, Tan A, Chan R, Vardhanabhuti V, Chu WC, Cheng KT, Chen H. Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions. IEEE Rev Biomed Eng 2025; 18:130-151. [PMID: 38265911 DOI: 10.1109/rbme.2024.3357877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex context of breast imaging modalities. Considering the rapid improvement in deep learning technology and the increasing severity of breast cancer, it is critical to summarize past progress and identify future challenges to be addressed. This paper provides an extensive review of deep learning-based breast cancer imaging research, covering studies on mammograms, ultrasound, magnetic resonance imaging, and digital pathology images over the past decade. The major deep learning methods and applications on imaging-based screening, diagnosis, treatment response prediction, and prognosis are elaborated and discussed. Drawn from the findings of this survey, we present a comprehensive discussion of the challenges and potential avenues for future research in deep learning-based breast cancer imaging.
Collapse
|
4
|
Uwimana A, Gnecco G, Riccaboni M. Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review. Comput Biol Med 2025; 184:109391. [PMID: 39579663 DOI: 10.1016/j.compbiomed.2024.109391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/01/2024] [Accepted: 11/07/2024] [Indexed: 11/25/2024]
Abstract
BACKGROUND Recent healthcare advancements highlight the potential of Artificial Intelligence (AI) - and especially, among its subfields, Machine Learning (ML) - in enhancing Breast Cancer (BC) clinical care, leading to improved patient outcomes and increased radiologists' efficiency. While medical imaging techniques have significantly contributed to BC detection and diagnosis, their synergy with AI algorithms has consistently demonstrated superior diagnostic accuracy, reduced False Positives (FPs), and enabled personalized treatment strategies. Despite the burgeoning enthusiasm for leveraging AI for early and effective BC clinical care, its widespread integration into clinical practice is yet to be realized, and the evaluation of AI-based health technologies in terms of health and economic outcomes remains an ongoing endeavor. OBJECTIVES This scoping review aims to investigate AI (and especially ML) applications that have been implemented and evaluated across diverse clinical tasks or decisions in breast imaging and to explore the current state of evidence concerning the assessment of AI-based technologies for BC clinical care within the context of Health Technology Assessment (HTA). METHODS We conducted a systematic literature search following the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P) checklist in PubMed and Scopus to identify relevant studies on AI (and particularly ML) applications in BC detection and diagnosis. We limited our search to studies published from January 2015 to October 2023. The Minimum Information about CLinical Artificial Intelligence Modeling (MI-CLAIM) checklist was used to assess the quality of AI algorithms development, evaluation, and reporting quality in the reviewed articles. The HTA Core Model® was also used to analyze the comprehensiveness, robustness, and reliability of the reported results and evidence in AI-systems' evaluations to ensure rigorous assessment of AI systems' utility and cost-effectiveness in clinical practice. RESULTS Of the 1652 initially identified articles, 104 were deemed eligible for inclusion in the review. Most studies examined the clinical effectiveness of AI-based systems (78.84%, n= 82), with one study focusing on safety in clinical settings, and 13.46% (n=14) focusing on patients' benefits. Of the studies, 31.73% (n=33) were ethically approved to be carried out in clinical practice, whereas 25% (n=26) evaluated AI systems legally approved for clinical use. Notably, none of the studies addressed the organizational implications of AI systems in clinical practice. Of the 104 studies, only two of them focused on cost-effectiveness analysis, and were analyzed separately. The average percentage scores for the first 102 AI-based studies' quality assessment based on the MI-CLAIM checklist criteria were 84.12%, 83.92%, 83.98%, 74.51%, and 14.7% for study design, data and optimization, model performance, model examination, and reproducibility, respectively. Notably, 20.59% (n=21) of these studies relied on large-scale representative real-world breast screening datasets, with only 10.78% (n =11) studies demonstrating the robustness and generalizability of the evaluated AI systems. CONCLUSION In bridging the gap between cutting-edge developments and seamless integration of AI systems into clinical workflows, persistent challenges encompass data quality and availability, ethical and legal considerations, robustness and trustworthiness, scalability, and alignment with existing radiologists' workflow. These hurdles impede the synthesis of comprehensive, robust, and reliable evidence to substantiate these systems' clinical utility, relevance, and cost-effectiveness in real-world clinical workflows. Consequently, evaluating AI-based health technologies through established HTA methodologies becomes complicated. We also highlight potential significant influences on AI systems' effectiveness of various factors, such as operational dynamics, organizational structure, the application context of AI systems, and practices in breast screening or examination reading of AI support tools in radiology. Furthermore, we emphasize substantial reciprocal influences on decision-making processes between AI systems and radiologists. Thus, we advocate for an adapted assessment framework specifically designed to address these potential influences on AI systems' effectiveness, mainly addressing system-level transformative implications for AI systems rather than focusing solely on technical performance and task-level evaluations.
Collapse
Affiliation(s)
| | | | - Massimo Riccaboni
- IMT School for Advanced Studies, Lucca, Italy; IUSS University School for Advanced Studies, Pavia, Italy.
| |
Collapse
|
5
|
Wu DY, Vo DT, Seiler SJ. For the busy clinical-imaging professional in an AI world: Gaining intuition about deep learning without math. J Med Imaging Radiat Sci 2025; 56:101762. [PMID: 39437625 DOI: 10.1016/j.jmir.2024.101762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 07/25/2024] [Accepted: 08/25/2024] [Indexed: 10/25/2024]
Abstract
Medical diagnostics comprise recognizing patterns in images, tissue slides, and symptoms. Deep learning algorithms (DLs) are well suited to such tasks, but they are black boxes in various ways. To explain DL Computer-Aided Diagnostic (CAD) results and their accuracy to patients, to manage or drive the direction of future medical DLs, to make better decisions with CAD, etc., clinical professionals may benefit from hands-on, under-the-hood lessons about medical DL. For those who already have some high-level knowledge about DL, the next step is to gain a more-fundamental understanding of DLs, which may help illuminate inside the boxes. The objectives of this Continuing Medical Education (CME) article include:Better understanding can come from relatable medical analogies and personally experiencing quick simulations to observe deep learning in action, akin to the way clinicians are trained to perform other tasks. We developed readily-implementable demonstrations and simulation exercises. We framed the exercises using analogies to breast cancer, malignancy and cancer stage as example diagnostic applications. The simulations revealed a nuanced relationship between DL output accuracy and the quantity and nature of the data. The simulation results provided lessons-learned and implications for the clinical world. Although we focused on DLs for diagnosis, they are similar to DLs for treatment (e.g. radiotherapy) so that treatment providers may also benefit from this tutorial.
Collapse
Affiliation(s)
- Dolly Y Wu
- Volunteer Services, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Dat T Vo
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Stephen J Seiler
- Department of Radiology, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
6
|
Grande-Barreto J, Lopez-Armas GC, Sanchez-Tiro JA, Peregrina-Barreto H. A Short Breast Imaging Reporting and Data System-Based Description for Classification of Breast Mass Grade. Life (Basel) 2024; 14:1634. [PMID: 39768342 PMCID: PMC11677739 DOI: 10.3390/life14121634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 11/27/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025] Open
Abstract
Identifying breast masses is relevant in early cancer detection. Automatic identification using computational methods helps assist medical experts with this task. Although high values have been reported in breast mass classification from digital mammograms, most results have focused on a general benign/malignant classification. According to the BI-RADS standard, masses are associated with cancer risk by grade depending on their specific shape, margin, and density characteristics. This work presents a methodology of testing several descriptors on the INbreast dataset, finding those better related to clinical assessment. The analysis provides a description based on BI-RADS for mass classification by combining neural networks and image processing. The results show that masses associated with grades BI-RADS-2 to BI-RADS-5 can be identified, reaching a general accuracy and sensitivity of 0.88±0.07. While this initial study is limited to a single dataset, it demonstrates the possibility of generating a description for automatic classification that is directly linked to the information analyzed by medical experts in clinical practice.
Collapse
Affiliation(s)
- Jonas Grande-Barreto
- Tecnologías de la Información, Universidad Politécnica de Puebla, Cuanalá, Puebla 72640, Mexico;
| | | | - Jose Antonio Sanchez-Tiro
- Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro 1, San Andres Cholula 72840, Mexico;
| | - Hayde Peregrina-Barreto
- Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro 1, San Andres Cholula 72840, Mexico;
| |
Collapse
|
7
|
Wu DY, Vo DT, Seiler SJ. Long overdue national big data policies hinder accurate and equitable cancer detection AI systems. J Med Imaging Radiat Sci 2024; 55:101387. [PMID: 38443215 DOI: 10.1016/j.jmir.2024.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 02/04/2024] [Accepted: 02/09/2024] [Indexed: 03/07/2024]
Affiliation(s)
- Dolly Y Wu
- Volunteer Services, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Dat T Vo
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Stephen J Seiler
- Department of Radiology, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
8
|
Liao L, Aagaard EM. An open codebase for enhancing transparency in deep learning-based breast cancer diagnosis utilizing CBIS-DDSM data. Sci Rep 2024; 14:27318. [PMID: 39516557 PMCID: PMC11549440 DOI: 10.1038/s41598-024-78648-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024] Open
Abstract
Accessible mammography datasets and innovative machine learning techniques are at the forefront of computer-aided breast cancer diagnosis. However, the opacity surrounding private datasets and the unclear methodology behind the selection of subset images from publicly available databases for model training and testing, coupled with the arbitrary incompleteness or inaccessibility of code, markedly intensifies the obstacles in replicating and validating the model's efficacy. These challenges, in turn, erect barriers for subsequent researchers striving to learn and advance this field. To address these limitations, we provide a pilot codebase covering the entire process from image preprocessing to model development and evaluation pipeline, utilizing the publicly available Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) mass subset, including both full images and regions of interests (ROIs). We have identified that increasing the input size could improve the detection accuracy of malignant cases within each set of models. Collectively, our efforts hold promise in accelerating global software development for breast cancer diagnosis by leveraging our codebase and structure, while also integrating other advancements in the field.
Collapse
Affiliation(s)
- Ling Liao
- Biomedical Deep Learning LLC, St. Louis, MO, USA.
- Computational and Systems Biology, Washington University in St. Louis, St. Louis, MO, USA.
| | - Eva M Aagaard
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
9
|
Vasilev YA, Vladzymyrskyy AV, Alymova YA, Akhmedzyanova DA, Blokhin IA, Romanenko MO, Seradzhi SR, Suchilova MM, Shumskaya YF, Reshetnikov RV. Development and Validation of a Questionnaire to Assess the Radiologists' Views on the Implementation of Artificial Intelligence in Radiology (ATRAI-14). Healthcare (Basel) 2024; 12:2011. [PMID: 39408191 PMCID: PMC11476276 DOI: 10.3390/healthcare12192011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 09/30/2024] [Accepted: 10/05/2024] [Indexed: 10/20/2024] Open
Abstract
Introduction: Artificial Intelligence (AI) is becoming an essential part of modern radiology. However, available evidence highlights issues in the real-world applicability of AI tools and mixed radiologists' acceptance. We aimed to develop and validate a questionnaire to evaluate the attitude of radiologists toward radiology AI (ATRAI-14). Materials and Methods: We generated items based on the European Society of Radiology questionnaire. Item reduction yielded 23 items, 12 of which contribute to scoring. The items were allocated into four domains ("Familiarity", "Trust", "Implementation Perspective", and "Hopes and Fears") and a part related to the respondent's demographics and professional background. As a pre-test method, we conducted cognitive interviews with 20 radiologists. Pilot testing with reliability and validity assessment was carried out on a representative sample of 90 respondents. Construct validity was assessed via confirmatory factor analysis (CFA). Results: CFA confirmed the feasibility of four domains structure. ATRAI-14 demonstrated acceptable internal consistency (Cronbach's Alpha 0.78 95%CI [0.68, 0.83]), good test-retest reliability (ICC = 0.89, 95% CI [0.67, 0.96], p-value < 0.05), and acceptable criterion validity (Spearman's rho 0.73, p-value < 0.001). Conclusions: The questionnaire is useful for providing detailed AI acceptance measurements for making management decisions when implementing AI in radiology.
Collapse
Affiliation(s)
| | | | | | - Dina A. Akhmedzyanova
- Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department, 127051 Moscow, Russia; (Y.A.V.); (A.V.V.); (Y.A.A.); (I.A.B.); (M.O.R.); (S.R.S.); (M.M.S.); (Y.F.S.); (R.V.R.)
| | | | | | | | | | | | | |
Collapse
|
10
|
Geng J, Sui X, Du R, Feng J, Wang R, Wang M, Yao K, Chen Q, Bai L, Wang S, Li Y, Wu H, Hu X, Du Y. Localized fine-tuning and clinical evaluation of deep-learning based auto-segmentation (DLAS) model for clinical target volume (CTV) and organs-at-risk (OAR) in rectal cancer radiotherapy. Radiat Oncol 2024; 19:87. [PMID: 38956690 PMCID: PMC11221028 DOI: 10.1186/s13014-024-02463-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/03/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND AND PURPOSE Various deep learning auto-segmentation (DLAS) models have been proposed, some of which have been commercialized. However, the issue of performance degradation is notable when pretrained models are deployed in the clinic. This study aims to enhance precision of a popular commercial DLAS product in rectal cancer radiotherapy by localized fine-tuning, addressing challenges in practicality and generalizability in real-world clinical settings. MATERIALS AND METHODS A total of 120 Stage II/III mid-low rectal cancer patients were retrospectively enrolled and divided into three datasets: training (n = 60), external validation (ExVal, n = 30), and generalizability evaluation (GenEva, n = 30) datasets respectively. The patients in the training and ExVal dataset were acquired on the same CT simulator, while those in GenEva were on a different CT simulator. The commercial DLAS software was first localized fine-tuned (LFT) for clinical target volume (CTV) and organs-at-risk (OAR) using the training data, and then validated on ExVal and GenEva respectively. Performance evaluation involved comparing the LFT model with the vendor-provided pretrained model (VPM) against ground truth contours, using metrics like Dice similarity coefficient (DSC), 95th Hausdorff distance (95HD), sensitivity and specificity. RESULTS LFT significantly improved CTV delineation accuracy (p < 0.05) with LFT outperforming VPM in target volume, DSC, 95HD and specificity. Both models exhibited adequate accuracy for bladder and femoral heads, and LFT demonstrated significant enhancement in segmenting the more complex small intestine. We did not identify performance degradation when LFT and VPM models were applied in the GenEva dataset. CONCLUSIONS The necessity and potential benefits of LFT DLAS towards institution-specific model adaption is underscored. The commercial DLAS software exhibits superior accuracy once localized fine-tuned, and is highly robust to imaging equipment changes.
Collapse
Affiliation(s)
- Jianhao Geng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xin Sui
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Rongxu Du
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Jialin Feng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Ruoxi Wang
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Meijiao Wang
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Kaining Yao
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Qi Chen
- Research and Development Department, MedMind Technology Co., Ltd, Beijing, 100083, China
| | - Lu Bai
- Research and Development Department, MedMind Technology Co., Ltd, Beijing, 100083, China
| | - Shaobin Wang
- Research and Development Department, MedMind Technology Co., Ltd, Beijing, 100083, China
| | - Yongheng Li
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Hao Wu
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China
- Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China
| | - Xiangmin Hu
- Beijing Key Lab of Nanophotonics and Ultrafine Optoelectronic Systems, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yi Du
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, 100142, China.
- Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
11
|
Condon JJJ, Trinh V, Hall KA, Reintals M, Holmes AS, Oakden-Rayner L, Palmer LJ. Impact of Transfer Learning Using Local Data on Performance of a Deep Learning Model for Screening Mammography. Radiol Artif Intell 2024; 6:e230383. [PMID: 38717291 PMCID: PMC11294949 DOI: 10.1148/ryai.230383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 03/25/2024] [Accepted: 04/24/2024] [Indexed: 06/21/2024]
Abstract
Purpose To investigate the issues of generalizability and replication of deep learning models by assessing performance of a screening mammography deep learning system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy or surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was deep learning system performance-measured with area under the receiver operating characteristic curve (AUC)-in classifying invasive breast cancer or ductal carcinoma in situ (n = 425) versus no malignancy (n = 490) or benign lesions (n = 44). The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning), and after retraining with transfer learning. Results The local test set comprised 959 individuals (mean age, 62.5 years ± 8.5 [SD]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95% CI: 0.82, 0.84) and 0.89 (95% CI: 0.88, 0.89), respectively. When NYU1 and NYU2 were applied in their original form to the local test set, the AUCs were 0.76 (95% CI: 0.73, 0.79) and 0.84 (95% CI: 0.82, 0.87), respectively. After local training without transfer learning, the AUCs were 0.66 (95% CI: 0.62, 0.69) and 0.86 (95% CI: 0.84, 0.88). After retraining with transfer learning, the AUCs were 0.82 (95% CI: 0.80, 0.85) and 0.86 (95% CI: 0.84, 0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied "out of the box" to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. Keywords: Screening Mammography, Convolutional Neural Network (CNN), Deep Learning Algorithms, Breast Cancer Supplemental material is available for this article. © RSNA, 2024 See also commentary by Cadrin-Chênevert in this issue.
Collapse
Affiliation(s)
- James J. J. Condon
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Vincent Trinh
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Kelly A. Hall
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Michelle Reintals
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Andrew S. Holmes
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Lauren Oakden-Rayner
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| | - Lyle J. Palmer
- From the Australian Institute for Machine Learning (J.J.J.C., V.T.,
L.O.R., L.J.P.) and School of Public Health (J.J.J.C., V.T., K.A.H., L.O.R.,
L.J.P.), University of Adelaide, N Terrace, Adelaide, South Australia 5005,
Australia; and BreastScreen SA, Adelaide, South Australia, Australia (M.R.,
A.S.H.)
| |
Collapse
|
12
|
Bhalla D, Rangarajan K, Chandra T, Banerjee S, Arora C. Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature. Indian J Radiol Imaging 2024; 34:469-487. [PMID: 38912238 PMCID: PMC11188703 DOI: 10.1055/s-0043-1775737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024] Open
Abstract
Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919-0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.
Collapse
Affiliation(s)
- Deeksha Bhalla
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Krithika Rangarajan
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Tany Chandra
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Subhashis Banerjee
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| | - Chetan Arora
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| |
Collapse
|
13
|
Larson DB, Doo FX, Allen B, Mongan J, Flanders AE, Wald C. Proceedings From the 2022 ACR-RSNA Workshop on Safety, Effectiveness, Reliability, and Transparency in AI. J Am Coll Radiol 2024; 21:1119-1129. [PMID: 38354844 DOI: 10.1016/j.jacr.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 01/27/2024] [Accepted: 01/27/2024] [Indexed: 02/16/2024]
Abstract
Despite the surge in artificial intelligence (AI) development for health care applications, particularly for medical imaging applications, there has been limited adoption of such AI tools into clinical practice. During a 1-day workshop in November 2022, co-organized by the ACR and the RSNA, participants outlined experiences and problems with implementing AI in clinical practice, defined the needs of various stakeholders in the AI ecosystem, and elicited potential solutions and strategies related to the safety, effectiveness, reliability, and transparency of AI algorithms. Participants included radiologists from academic and community radiology practices, informatics leaders responsible for AI implementation, regulatory agency employees, and specialty society representatives. The major themes that emerged fell into two categories: (1) AI product development and (2) implementation of AI-based applications in clinical practice. In particular, participants highlighted key aspects of AI product development to include clear clinical task definitions; well-curated data from diverse geographic, economic, and health care settings; standards and mechanisms to monitor model reliability; and transparency regarding model performance, both in controlled and real-world settings. For implementation, participants emphasized the need for strong institutional governance; systematic evaluation, selection, and validation methods conducted by local teams; seamless integration into the clinical workflow; performance monitoring and support by local teams; performance monitoring by external entities; and alignment of incentives through credentialing and reimbursement. Participants predicted that clinical implementation of AI in radiology will continue to be limited until the safety, effectiveness, reliability, and transparency of such tools are more fully addressed.
Collapse
Affiliation(s)
- David B Larson
- Executive Vice Chair, Department of Radiology, Stanford University Medical Center, Stanford, California; Chair, Quality and Safety Commission, ACR; and Member, ACR Board of Chancellors.
| | - Florence X Doo
- Director of Innovation, University of Maryland Medical Intelligent Imaging (UM2ii) Center, Baltimore, Marlyand. https://twitter.com/flo_doo
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama; and Chief Medical Officer, ACR Data Science Institute. https://twitter.com/bibballen
| | - John Mongan
- Associate Chair for Translational Informatics and Director of the Center for Intelligent Imaging, Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, California. https://twitter.com/MonganMD
| | - Adam E Flanders
- Vice Chair for Informatics, Department of Radiology, Thomas Jefferson University, Philadelphia, Pennsylvania; and Member of the RSNA Board of Directors. https://twitter.com/BFlanksteak
| | - Christoph Wald
- Chair, Department of Radiology, Lahey Hospital and Medical Center, Boston, Massachusetts; Chair, Informatics Commission, ACR; and Member of the ACR Board of Chancellors. https://twitter.com/waldchristoph
| |
Collapse
|
14
|
Baum K, Baumann A, Batzel K. Investigating Innovation Diffusion in Gender-Specific Medicine: Insights from Social Network Analysis. BUSINESS & INFORMATION SYSTEMS ENGINEERING 2024; 66:335-355. [DOI: 10.1007/s12599-024-00875-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 04/26/2024] [Indexed: 01/03/2025]
Abstract
AbstractThe field of healthcare is characterized by constant innovation, with gender-specific medicine emerging as a new subfield that addresses sex and gender disparities in clinical manifestations, outcomes, treatment, and prevention of disease. Despite its importance, the adoption of gender-specific medicine remains understudied, posing potential risks to patient outcomes due to a lack of awareness of the topic. Building on the Innovation Decision Process Theory, this study examines the spread of information about gender-specific medicine in online networks. The study applies social network analysis to a Twitter dataset reflecting online discussions about the topic to gain insights into its adoption by health professionals and patients online. Results show that the network has a community structure with limited information exchange between sub-communities and that mainly medical experts dominate the discussion. The findings suggest that the adoption of gender-specific medicine might be in its early stages, focused on knowledge exchange. Understanding the diffusion of gender-specific medicine among medical professionals and patients may facilitate its adoption and ultimately improve health outcomes.
Collapse
|
15
|
Wu D, Smith D, VanBerlo B, Roshankar A, Lee H, Li B, Ali F, Rahman M, Basmaji J, Tschirhart J, Ford A, VanBerlo B, Durvasula A, Vannelli C, Dave C, Deglint J, Ho J, Chaudhary R, Clausdorff H, Prager R, Millington S, Shah S, Buchanan B, Arntfield R. Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification. Diagnostics (Basel) 2024; 14:1081. [PMID: 38893608 PMCID: PMC11172006 DOI: 10.3390/diagnostics14111081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 05/18/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce-compared to other medical imaging data-we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model's performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.
Collapse
Affiliation(s)
- Derek Wu
- Department of Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Delaney Smith
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Blake VanBerlo
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Amir Roshankar
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Hoseok Lee
- Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
| | - Brian Li
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Faraz Ali
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Marwan Rahman
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - John Basmaji
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Jared Tschirhart
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Alex Ford
- Independent Researcher, London, ON N6A 1L8, Canada;
| | - Bennett VanBerlo
- Faculty of Engineering, Western University, London, ON N6A 5C1, Canada;
| | - Ashritha Durvasula
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Claire Vannelli
- Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
| | - Chintan Dave
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Jason Deglint
- Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
| | - Jordan Ho
- Department of Family Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Rushil Chaudhary
- Department of Medicine, Western University, London, ON N6A 5C1, Canada;
| | - Hans Clausdorff
- Departamento de Medicina de Urgencia, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile;
| | - Ross Prager
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| | - Scott Millington
- Department of Critical Care Medicine, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
| | - Samveg Shah
- Department of Medicine, University of Alberta, Edmonton, AB T6G 2R3, Canada;
| | - Brian Buchanan
- Department of Critical Care, University of Alberta, Edmonton, AB T6G 2R3, Canada;
| | - Robert Arntfield
- Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
| |
Collapse
|
16
|
Yue Y, Jiang M, Zhang X, Xu J, Ye H, Zhang F, Li Z, Li Y. Mpox-AISM: AI-mediated super monitoring for mpox and like-mpox. iScience 2024; 27:109766. [PMID: 38711448 PMCID: PMC11070687 DOI: 10.1016/j.isci.2024.109766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/16/2023] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
Swift and accurate diagnosis for earlier-stage monkeypox (mpox) patients is crucial to avoiding its spread. However, the similarities between common skin disorders and mpox and the need for professional diagnosis unavoidably impaired the diagnosis of earlier-stage mpox patients and contributed to mpox outbreak. To address the challenge, we proposed "Super Monitoring", a real-time visualization technique employing artificial intelligence (AI) and Internet technology to diagnose earlier-stage mpox cheaply, conveniently, and quickly. Concretely, AI-mediated "Super Monitoring" (mpox-AISM) integrates deep learning models, data augmentation, self-supervised learning, and cloud services. According to publicly accessible datasets, mpox-AISM's Precision, Recall, Specificity, and F1-score in diagnosing mpox reach 99.3%, 94.1%, 99.9%, and 96.6%, respectively, and it achieves 94.51% accuracy in diagnosing mpox, six like-mpox skin disorders, and normal skin. With the Internet and communication terminal, mpox-AISM has the potential to perform real-time and accurate diagnosis for earlier-stage mpox in real-world scenarios, thereby preventing mpox outbreak.
Collapse
Affiliation(s)
- Yubiao Yue
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Minghua Jiang
- Department of science and education, Dermatological department, Foshan Sanshui District People’s Hospital, Foshan 528199, China
| | - Xinyue Zhang
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jialong Xu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Huacong Ye
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Fan Zhang
- Department of science and education, Dermatological department, Foshan Sanshui District People’s Hospital, Foshan 528199, China
| | - Zhenzhang Li
- School of Mathematics and Systems Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Yang Li
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| |
Collapse
|
17
|
Fogleman BM, Goldman M, Holland AB, Dyess G, Patel A. Charting Tomorrow's Healthcare: A Traditional Literature Review for an Artificial Intelligence-Driven Future. Cureus 2024; 16:e58032. [PMID: 38738104 PMCID: PMC11088287 DOI: 10.7759/cureus.58032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2024] [Indexed: 05/14/2024] Open
Abstract
Electronic health record (EHR) systems have developed over time in parallel with general advancements in mainstream technology. As artificially intelligent (AI) systems rapidly impact multiple societal sectors, it has become apparent that medicine is not immune from the influences of this powerful technology. Particularly appealing is how AI may aid in improving healthcare efficiency with note-writing automation. This literature review explores the current state of EHR technologies in healthcare, specifically focusing on possibilities for addressing EHR challenges through the automation of dictation and note-writing processes with AI integration. This review offers a broad understanding of existing capabilities and potential advancements, emphasizing innovations such as voice-to-text dictation, wearable devices, and AI-assisted procedure note dictation. The primary objective is to provide researchers with valuable insights, enabling them to generate new technologies and advancements within the healthcare landscape. By exploring the benefits, challenges, and future of AI integration, this review encourages the development of innovative solutions, with the goal of enhancing patient care and healthcare delivery efficiency.
Collapse
Affiliation(s)
- Brody M Fogleman
- Internal Medicine, Edward Via College of Osteopathic Medicine - Carolinas, Spartanburg, USA
| | - Matthew Goldman
- Neurological Surgery, Houston Methodist Hospital, Houston, USA
| | - Alexander B Holland
- General Surgery, Edward Via College of Osteopathic Medicine - Carolinas, Spartanburg, USA
| | - Garrett Dyess
- Medicine, University of South Alabama College of Medicine, Mobile, USA
| | - Aashay Patel
- Neurological Surgery, University of Florida College of Medicine, Gainesville, USA
| |
Collapse
|
18
|
Wu DY, Vo DT, Seiler SJ. Opinion: Big Data Elements Key to Medical Imaging Machine Learning Tool Development. JOURNAL OF BREAST IMAGING 2024; 6:217-219. [PMID: 38271153 DOI: 10.1093/jbi/wbad102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Indexed: 01/27/2024]
Affiliation(s)
- Dolly Y Wu
- UT Southwestern Medical Center, Volunteer Services, Dallas, TX, USA
| | - Dat T Vo
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Stephen J Seiler
- Department of Radiology, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
19
|
Africano G, Arponen O, Rinta-Kiikka I, Pertuz S. Transfer learning for the generalization of artificial intelligence in breast cancer detection: a case-control study. Acta Radiol 2024; 65:334-340. [PMID: 38115699 DOI: 10.1177/02841851231218960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
BACKGROUND Some researchers have questioned whether artificial intelligence (AI) systems maintain their performance when used for women from populations not considered during the development of the system. PURPOSE To evaluate the impact of transfer learning as a way of improving the generalization of AI systems in the detection of breast cancer. MATERIAL AND METHODS This retrospective case-control Finnish study involved 191 women diagnosed with breast cancer and 191 matched healthy controls. We selected a state-of-the-art AI system for breast cancer detection trained using a large US dataset. The selected baseline system was evaluated in two experimental settings. First, we examined our private Finnish sample as an independent test set that had not been considered in the development of the system (unseen population). Second, the baseline system was retrained to attempt to improve its performance in the unseen population by means of transfer learning. To analyze performance, we used areas under the receiver operating characteristic curve (AUCs) with DeLong's test. RESULTS Two versions of the baseline system were considered: ImageOnly and Heatmaps. The ImageOnly and Heatmaps versions yielded mean AUC values of 0.82±0.008 and 0.88±0.003 in the US dataset and 0.56 (95% CI=0.50-0.62) and 0.72 (95% CI=0.67-0.77) when evaluated in the unseen population, respectively. The retrained systems achieved AUC values of 0.61 (95% CI=0.55-0.66) and 0.69 (95% CI=0.64-0.75), respectively. There was no statistical difference between the baseline system and the retrained system. CONCLUSION Transfer learning with a small study sample did not yield a significant improvement in the generalization of the system.
Collapse
Affiliation(s)
- Gerson Africano
- School of Electrical, Electronics and Telecommunications Engineering, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - Otso Arponen
- Department of Radiology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Irina Rinta-Kiikka
- Department of Radiology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Said Pertuz
- School of Electrical, Electronics and Telecommunications Engineering, Universidad Industrial de Santander, Bucaramanga, Colombia
| |
Collapse
|
20
|
Wu DY, Fang YV, Vo DT, Spangler A, Seiler SJ. Detailed Image Data Quality and Cleaning Practices for Artificial Intelligence Tools for Breast Cancer. JCO Clin Cancer Inform 2024; 8:e2300074. [PMID: 38552191 PMCID: PMC10994436 DOI: 10.1200/cci.23.00074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 11/30/2023] [Accepted: 02/13/2024] [Indexed: 04/02/2024] Open
Abstract
Standardizing image-data preparation practices to improve accuracy/consistency of AI diagnostic tools.
Collapse
Affiliation(s)
- Dolly Y. Wu
- Volunteer Services, UT Southwestern Medical Center, Dallas, TX
| | - Yisheng V. Fang
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX
| | - Dat T. Vo
- Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX
| | - Ann Spangler
- Retired, Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX
| | | |
Collapse
|
21
|
Pertuz S, Ortega D, Suarez É, Cancino W, Africano G, Rinta-Kiikka I, Arponen O, Paris S, Lozano A. Saliency of breast lesions in breast cancer detection using artificial intelligence. Sci Rep 2023; 13:20545. [PMID: 37996504 PMCID: PMC10667547 DOI: 10.1038/s41598-023-46921-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/07/2023] [Indexed: 11/25/2023] Open
Abstract
The analysis of mammograms using artificial intelligence (AI) has shown great potential for assisting breast cancer screening. We use saliency maps to study the role of breast lesions in the decision-making process of AI systems for breast cancer detection in screening mammograms. We retrospectively collected mammograms from 191 women with screen-detected breast cancer and 191 healthy controls matched by age and mammographic system. Two radiologists manually segmented the breast lesions in the mammograms from CC and MLO views. We estimated the detection performance of four deep learning-based AI systems using the area under the ROC curve (AUC) with a 95% confidence interval (CI). We used automatic thresholding on saliency maps from the AI systems to identify the areas of interest on the mammograms. Finally, we measured the overlap between these areas of interest and the segmented breast lesions using Dice's similarity coefficient (DSC). The detection performance of the AI systems ranged from low to moderate (AUCs from 0.525 to 0.694). The overlap between the areas of interest and the breast lesions was low for all the studied methods (median DSC from 4.2% to 38.0%). The AI system with the highest cancer detection performance (AUC = 0.694, CI 0.662-0.726) showed the lowest overlap (DSC = 4.2%) with breast lesions. The areas of interest found by saliency analysis of the AI systems showed poor overlap with breast lesions. These results suggest that AI systems with the highest performance do not solely rely on localized breast lesions for their decision-making in cancer detection; rather, they incorporate information from large image regions. This work contributes to the understanding of the role of breast lesions in cancer detection using AI.
Collapse
Affiliation(s)
- Said Pertuz
- Escuela de Ingenierías Eléctrica Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - David Ortega
- Escuela de Ingenierías Eléctrica Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - Érika Suarez
- Escuela de Ingenierías Eléctrica Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - William Cancino
- Escuela de Ingenierías Eléctrica Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - Gerson Africano
- Escuela de Ingenierías Eléctrica Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - Irina Rinta-Kiikka
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- Department of Radiology, Tampere University Hospital, Tampere, Finland
| | - Otso Arponen
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- Department of Radiology, Tampere University Hospital, Tampere, Finland.
| | - Sara Paris
- Departamento de Imágenes Diagnósticas, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Alfonso Lozano
- Departamento de Imágenes Diagnósticas, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
22
|
Nguyen T, Nguyen P, Tran D, Pham H, Nguyen Q, Le T, Van H, Do B, Tran P, Le V, Nguyen T, Tran L, Pham H. Ensemble learning of myocardial displacements for myocardial infarction detection in echocardiography. Front Cardiovasc Med 2023; 10:1185172. [PMID: 37900571 PMCID: PMC10613081 DOI: 10.3389/fcvm.2023.1185172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 09/18/2023] [Indexed: 10/31/2023] Open
Abstract
Background Early detection and localization of myocardial infarction (MI) can reduce the severity of cardiac damage through timely treatment interventions. In recent years, deep learning techniques have shown promise for detecting MI in echocardiographic images. Existing attempts typically formulate this task as classification and rely on a single segmentation model to estimate myocardial segment displacements. However, there has been no examination of how segmentation accuracy affects MI classification performance or the potential benefits of using ensemble learning approaches. Our study investigates this relationship and introduces a robust method that combines features from multiple segmentation models to improve MI classification performance by leveraging ensemble learning. Materials and Methods Our method combines myocardial segment displacement features from multiple segmentation models, which are then input into a typical classifier to estimate the risk of MI. We validated the proposed approach on two datasets: the public HMC-QU dataset (109 echocardiograms) for training and validation, and an E-Hospital dataset (60 echocardiograms) from a local clinical site in Vietnam for independent testing. Model performance was evaluated based on accuracy, sensitivity, and specificity. Results The proposed approach demonstrated excellent performance in detecting MI. It achieved an F1 score of 0.942, corresponding to an accuracy of 91.4%, a sensitivity of 94.1%, and a specificity of 88.3%. The results showed that the proposed approach outperformed the state-of-the-art feature-based method, which had a precision of 85.2%, a specificity of 70.1%, a sensitivity of 85.9%, an accuracy of 85.5%, and an accuracy of 80.2% on the HMC-QU dataset. On the external validation set, the proposed model still performed well, with an F1 score of 0.8, an accuracy of 76.7%, a sensitivity of 77.8%, and a specificity of 75.0%. Conclusions Our study demonstrated the ability to accurately predict MI in echocardiograms by combining information from several segmentation models. Further research is necessary to determine its potential use in clinical settings as a tool to assist cardiologists and technicians with objective assessments and reduce dependence on operator subjectivity. Our research codes are available on GitHub at https://github.com/vinuni-vishc/mi-detection-echo.
Collapse
Affiliation(s)
- Tuan Nguyen
- VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam
- College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam
| | - Phi Nguyen
- Institute for Artificial Intelligence, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Dai Tran
- Cardiovascular Center, E Hospital, Hanoi, Vietnam
| | - Hung Pham
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Quang Nguyen
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Thanh Le
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Hanh Van
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Bach Do
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Phuong Tran
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Vinh Le
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Thuy Nguyen
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Long Tran
- Institute for Artificial Intelligence, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Hieu Pham
- VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam
- College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam
| |
Collapse
|
23
|
Xing X, Liang G, Wang C, Jacobs N, Lin AL. Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder. Bioengineering (Basel) 2023; 10:901. [PMID: 37627786 PMCID: PMC10451788 DOI: 10.3390/bioengineering10080901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Revised: 07/19/2023] [Accepted: 07/26/2023] [Indexed: 08/27/2023] Open
Abstract
The COVID-19 pandemic has underscored the urgent need for rapid and accurate diagnosis facilitated by artificial intelligence (AI), particularly in computer-aided diagnosis using medical imaging. However, this context presents two notable challenges: high diagnostic accuracy demand and limited availability of medical data for training AI models. To address these issues, we proposed the implementation of a Masked AutoEncoder (MAE), an innovative self-supervised learning approach, for classifying 2D Chest X-ray images. Our approach involved performing imaging reconstruction using a Vision Transformer (ViT) model as the feature encoder, paired with a custom-defined decoder. Additionally, we fine-tuned the pretrained ViT encoder using a labeled medical dataset, serving as the backbone. To evaluate our approach, we conducted a comparative analysis of three distinct training methods: training from scratch, transfer learning, and MAE-based training, all employing COVID-19 chest X-ray images. The results demonstrate that MAE-based training produces superior performance, achieving an accuracy of 0.985 and an AUC of 0.9957. We explored the mask ratio influence on MAE and found ratio = 0.4 shows the best performance. Furthermore, we illustrate that MAE exhibits remarkable efficiency when applied to labeled data, delivering comparable performance to utilizing only 30% of the original training dataset. Overall, our findings highlight the significant performance enhancement achieved by using MAE, particularly when working with limited datasets. This approach holds profound implications for future disease diagnosis, especially in scenarios where imaging information is scarce.
Collapse
Affiliation(s)
- Xin Xing
- Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA
- Department of Radiology, University of Missouri, Columbia, MO 65212, USA
| | - Gongbo Liang
- Department of Computing and Cyber Security, Texas A&M University-San Antonio, San Antonio, TX 78224, USA
| | - Chris Wang
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Nathan Jacobs
- Department of Computer Science & Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Ai-Ling Lin
- Department of Radiology, University of Missouri, Columbia, MO 65212, USA
- Department of Biological Sciences, University of Missouri, Columbia, MO 65211, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
24
|
ROZWAG C, VALENTINI F, COTTEN A, DEMONDION X, PREUX P, JACQUES T. Elbow trauma in children: development and evaluation of radiological artificial intelligence models. RESEARCH IN DIAGNOSTIC AND INTERVENTIONAL IMAGING 2023; 6:100029. [PMID: 39077546 PMCID: PMC11265386 DOI: 10.1016/j.redii.2023.100029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 04/24/2023] [Indexed: 07/31/2024]
Abstract
Rationale and Objectives To develop a model using artificial intelligence (A.I.) able to detect post-traumatic injuries on pediatric elbow X-rays then to evaluate its performances in silico and its impact on radiologists' interpretation in clinical practice. Material and Methods A total of 1956 pediatric elbow radiographs performed following a trauma were retrospectively collected from 935 patients aged between 0 and 18 years. Deep convolutional neural networks were trained on these X-rays. The two best models were selected then evaluated on an external test set involving 120 patients, whose X-rays were performed on a different radiological equipment in another time period. Eight radiologists interpreted this external test set without then with the help of the A.I. models . Results Two models stood out: model 1 had an accuracy of 95.8% and an AUROC of 0.983 and model 2 had an accuracy of 90.5% and an AUROC of 0.975. On the external test set, model 1 kept a good accuracy of 82.5% and AUROC of 0.916 while model 2 had a loss of accuracy down to 69.2% and of AUROC to 0.793. Model 1 significantly improved radiologist's sensitivity (0.82 to 0.88, P = 0.016) and accuracy (0.86 to 0.88, P = 0,047) while model 2 significantly decreased specificity of readers (0.86 to 0.83, P = 0.031). Conclusion End-to-end development of a deep learning model to assess post-traumatic injuries on elbow X-ray in children was feasible and showed that models with close metrics in silico can unpredictably lead radiologists to either improve or lower their performances in clinical settings.
Collapse
Affiliation(s)
- Clémence ROZWAG
- Université de Lille , Lille, France
- Centre hospitalier universitaire de Lille, Lille, France
| | - Franck VALENTINI
- Université de Lille , Lille, France
- Inria Lille – Nord Europe, équipe Scool, Lille, France
- CNRS UMR 9189 – CRIStAL, Lille, France
- École Centrale de Lille, Lille, France
| | - Anne COTTEN
- Université de Lille , Lille, France
- Centre hospitalier universitaire de Lille, Lille, France
| | - Xavier DEMONDION
- Université de Lille , Lille, France
- Centre hospitalier universitaire de Lille, Lille, France
| | - Philippe PREUX
- Université de Lille , Lille, France
- Inria Lille – Nord Europe, équipe Scool, Lille, France
- CNRS UMR 9189 – CRIStAL, Lille, France
- École Centrale de Lille, Lille, France
| | - Thibaut JACQUES
- Université de Lille , Lille, France
- Centre hospitalier universitaire de Lille, Lille, France
| |
Collapse
|
25
|
Shifat-E-Rabbi M, Zhuang Y, Li S, Rubaiyat AHM, Yin X, Rohde GK. Invariance encoding in sliced-Wasserstein space for image classification with limited training data. PATTERN RECOGNITION 2023; 137:109268. [PMID: 36713887 PMCID: PMC9879373 DOI: 10.1016/j.patcog.2022.109268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Deep convolutional neural networks (CNNs) are broadly considered to be state-of-the-art generic end-to-end image classification systems. However, they are known to underperform when training data are limited and thus require data augmentation strategies that render the method computationally expensive and not always effective. Rather than using a data augmentation strategy to encode invariances as typically done in machine learning, here we propose to mathematically augment a nearest subspace classification model in sliced-Wasserstein space by exploiting certain mathematical properties of the Radon Cumulative Distribution Transform (R-CDT), a recently introduced image transform. We demonstrate that for a particular type of learning problem, our mathematical solution has advantages over data augmentation with deep CNNs in terms of classification accuracy and computational complexity, and is particularly effective under a limited training data setting. The method is simple, effective, computationally efficient, non-iterative, and requires no parameters to be tuned. Python code implementing our method is available at https://github.com/rohdelab/mathematical augmentation. Our method is integrated as a part of the software package PyTransKit, which is available at https://github.com/rohdelab/PyTransKit.
Collapse
Affiliation(s)
- Mohammad Shifat-E-Rabbi
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Yan Zhuang
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Shiying Li
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Abu Hasnat Mohammad Rubaiyat
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Xuwang Yin
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Gustavo K. Rohde
- Imaging and Data Science Laboratory, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
26
|
Feng Y, Sim Zheng Ting J, Xu X, Bee Kun C, Ong Tien En E, Irawan Tan Wee Jun H, Ting Y, Lei X, Chen WX, Wang Y, Li S, Cui Y, Wang Z, Zhen L, Liu Y, Siow Mong Goh R, Tan CH. Deep Neural Network Augments Performance of Junior Residents in Diagnosing COVID-19 Pneumonia on Chest Radiographs. Diagnostics (Basel) 2023; 13:diagnostics13081397. [PMID: 37189498 DOI: 10.3390/diagnostics13081397] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 05/17/2023] Open
Abstract
Chest X-rays (CXRs) are essential in the preliminary radiographic assessment of patients affected by COVID-19. Junior residents, as the first point-of-contact in the diagnostic process, are expected to interpret these CXRs accurately. We aimed to assess the effectiveness of a deep neural network in distinguishing COVID-19 from other types of pneumonia, and to determine its potential contribution to improving the diagnostic precision of less experienced residents. A total of 5051 CXRs were utilized to develop and assess an artificial intelligence (AI) model capable of performing three-class classification, namely non-pneumonia, non-COVID-19 pneumonia, and COVID-19 pneumonia. Additionally, an external dataset comprising 500 distinct CXRs was examined by three junior residents with differing levels of training. The CXRs were evaluated both with and without AI assistance. The AI model demonstrated impressive performance, with an Area under the ROC Curve (AUC) of 0.9518 on the internal test set and 0.8594 on the external test set, which improves the AUC score of the current state-of-the-art algorithms by 1.25% and 4.26%, respectively. When assisted by the AI model, the performance of the junior residents improved in a manner that was inversely proportional to their level of training. Among the three junior residents, two showed significant improvement with the assistance of AI. This research highlights the novel development of an AI model for three-class CXR classification and its potential to augment junior residents' diagnostic accuracy, with validation on external data to demonstrate real-world applicability. In practical use, the AI model effectively supported junior residents in interpreting CXRs, boosting their confidence in diagnosis. While the AI model improved junior residents' performance, a decline in performance was observed on the external test compared to the internal test set. This suggests a domain shift between the patient dataset and the external dataset, highlighting the need for future research on test-time training domain adaptation to address this issue.
Collapse
Affiliation(s)
- Yangqin Feng
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Jordan Sim Zheng Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Xinxing Xu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Chew Bee Kun
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Edward Ong Tien En
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Hendra Irawan Tan Wee Jun
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Yonghan Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Xiaofeng Lei
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Wen-Xiang Chen
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Yan Wang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Shaohua Li
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Yingnan Cui
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Zizhou Wang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Liangli Zhen
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Yong Liu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Rick Siow Mong Goh
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Singapore
| | - Cher Heng Tan
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 11, Jalan Tan Tock Seng, Singapore 308433, Singapore
- Lee Kong Chian School of Medicine, 11, Mandalay Road, Singapore 308232, Singapore
| |
Collapse
|
27
|
Cai J, Guo L, Zhu L, Xia L, Qian L, Lure YMF, Yin X. Impact of localized fine tuning in the performance of segmentation and classification of lung nodules from computed tomography scans using deep learning. Front Oncol 2023; 13:1140635. [PMID: 37056345 PMCID: PMC10088514 DOI: 10.3389/fonc.2023.1140635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
BackgroundAlgorithm malfunction may occur when there is a performance mismatch between the dataset with which it was developed and the dataset on which it was deployed.MethodsA baseline segmentation algorithm and a baseline classification algorithm were developed using public dataset of Lung Image Database Consortium to detect benign and malignant nodules, and two additional external datasets (i.e., HB and XZ) including 542 cases and 486 cases were involved for the independent validation of these two algorithms. To explore the impact of localized fine tuning on the individual segmentation and classification process, the baseline algorithms were fine tuned with CT scans of HB and XZ datasets, respectively, and the performance of the fine tuned algorithms was tested to compare with the baseline algorithms.ResultsThe proposed baseline algorithms of both segmentation and classification experienced a drop when directly deployed in external HB and XZ datasets. Comparing with the baseline validation results in nodule segmentation, the fine tuned segmentation algorithm obtained better performance in Dice coefficient, Intersection over Union, and Average Surface Distance in HB dataset (0.593 vs. 0.444; 0.450 vs. 0.348; 0.283 vs. 0.304) and XZ dataset (0.601 vs. 0.486; 0.482 vs. 0.378; 0.225 vs. 0.358). Similarly, comparing with the baseline validation results in benign and malignant nodule classification, the fine tuned classification algorithm had improved area under the receiver operating characteristic curve value, accuracy, and F1 score in HB dataset (0.851 vs. 0.812; 0.813 vs. 0.769; 0.852 vs. 0.822) and XZ dataset (0.724 vs. 0.668; 0.696 vs. 0.617; 0.737 vs. 0.668).ConclusionsThe external validation performance of localized fine tuned algorithms outperformed the baseline algorithms in both segmentation process and classification process, which showed that localized fine tuning may be an effective way to enable a baseline algorithm generalize to site-specific use.
Collapse
Affiliation(s)
- Jingwei Cai
- Radiology Department, Affiliated Hospital of Hebei University, Baoding, Hebei, China
- Clinical Medical College, Hebei University, Baoding, Hebei, China
| | - Lin Guo
- Shenzhen Zhiying Medical Imaging, Shenzhen, Guangdong, China
| | - Litong Zhu
- Department of Medicine, Queen Mary Hospital, University of Hong, Hong Kong, Hong Kong SAR, China
| | - Li Xia
- Shenzhen Zhiying Medical Imaging, Shenzhen, Guangdong, China
| | - Lingjun Qian
- Shenzhen Zhiying Medical Imaging, Shenzhen, Guangdong, China
| | | | - Xiaoping Yin
- Radiology Department, Affiliated Hospital of Hebei University, Baoding, Hebei, China
- *Correspondence: Xiaoping Yin,
| |
Collapse
|
28
|
Beheshtian E, Putman K, Santomartino SM, Parekh VS, Yi PH. Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand Radiographs. Radiology 2023; 306:e220505. [PMID: 36165796 DOI: 10.1148/radiol.220505] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ2 tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.
Collapse
Affiliation(s)
- Elham Beheshtian
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Kristin Putman
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Samantha M Santomartino
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Vishwa S Parekh
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| | - Paul H Yi
- From the University of Maryland Medical Intelligent Imaging (UM2ii) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, First Floor, Room 1172, Baltimore, MD 21201
| |
Collapse
|
29
|
Jiménez-Sánchez A, Tardy M, González Ballester MA, Mateus D, Piella G. Memory-aware curriculum federated learning for breast cancer classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 229:107318. [PMID: 36592580 DOI: 10.1016/j.cmpb.2022.107318] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND AND OBJECTIVE For early breast cancer detection, regular screening with mammography imaging is recommended. Routine examinations result in datasets with a predominant amount of negative samples. The limited representativeness of positive cases can be problematic for learning Computer-Aided Diagnosis (CAD) systems. Collecting data from multiple institutions is a potential solution to mitigate this problem. Recently, federated learning has emerged as an effective tool for collaborative learning. In this setting, local models perform computation on their private data to update the global model. The order and the frequency of local updates influence the final global model. In the context of federated adversarial learning to improve multi-site breast cancer classification, we investigate the role of the order in which samples are locally presented to the optimizers. METHODS We define a novel memory-aware curriculum learning method for the federated setting. We aim to improve the consistency of the local models penalizing inconsistent predictions, i.e., forgotten samples. Our curriculum controls the order of the training samples prioritizing those that are forgotten after the deployment of the global model. Our approach is combined with unsupervised domain adaptation to deal with domain shift while preserving data privacy. RESULTS Two classification metrics: area under the receiver operating characteristic curve (ROC-AUC) and area under the curve for the precision-recall curve (PR-AUC) are used to evaluate the performance of the proposed method. Our method is evaluated with three clinical datasets from different vendors. An ablation study showed the improvement of each component of our method. The AUC and PR-AUC are improved on average by 5% and 6%, respectively, compared to the conventional federated setting. CONCLUSIONS We demonstrated the benefits of curriculum learning for the first time in a federated setting. Our results verified the effectiveness of the memory-aware curriculum federated learning for the multi-site breast cancer classification. Our code is publicly available at: https://github.com/ameliajimenez/curriculum-federated-learning.
Collapse
Affiliation(s)
- Amelia Jiménez-Sánchez
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain; IT University of Copenhagen, Copenhagen, Denmark.
| | - Mickael Tardy
- École Centrale Nantes, LS2N, UMR 6004, Nantes, France; Hera-MI SAS, Nantes, France
| | - Miguel A González Ballester
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain; ICREA, Barcelona, Spain
| | - Diana Mateus
- École Centrale Nantes, LS2N, UMR 6004, Nantes, France
| | - Gemma Piella
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
30
|
Castro E, Costa Pereira J, Cardoso JS. Symmetry-based regularization in deep breast cancer screening. Med Image Anal 2023; 83:102690. [PMID: 36446314 DOI: 10.1016/j.media.2022.102690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 10/28/2022] [Accepted: 11/09/2022] [Indexed: 11/23/2022]
Abstract
Breast cancer is the most common and lethal form of cancer in women. Recent efforts have focused on developing accurate neural network-based computer-aided diagnosis systems for screening to help anticipate this disease. The ultimate goal is to reduce mortality and improve quality of life after treatment. Due to the difficulty in collecting and annotating data in this domain, data scarcity is - and will continue to be - a limiting factor. In this work, we present a unified view of different regularization methods that incorporate domain-known symmetries in the model. Three general strategies were followed: (i) data augmentation, (ii) invariance promotion in the loss function, and (iii) the use of equivariant architectures. Each of these strategies encodes different priors on the functions learned by the model and can be readily introduced in most settings. Empirically we show that the proposed symmetry-based regularization procedures improve generalization to unseen examples. This advantage is verified in different scenarios, datasets and model architectures. We hope that both the principle of symmetry-based regularization and the concrete methods presented can guide development towards more data-efficient methods for breast cancer screening as well as other medical imaging domains.
Collapse
Affiliation(s)
- Eduardo Castro
- INESC TEC, Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal; Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal.
| | - Jose Costa Pereira
- INESC TEC, Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal; Huawei Technologies R&D, Noah's Ark Lab, Gridiron building, 1 Pancras Square, 5th floor, London N1C 4AG, United Kingdom
| | - Jaime S Cardoso
- INESC TEC, Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal; Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
| |
Collapse
|
31
|
Walsh R, Tardy M. A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer. Diagnostics (Basel) 2022; 13:67. [PMID: 36611358 PMCID: PMC9818528 DOI: 10.3390/diagnostics13010067] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/19/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, which can bias the model towards the healthy class. In this study, we systematically evaluate several popular techniques to deal with this class imbalance, namely, class weighting, over-sampling, and under-sampling, as well as a synthetic lesion generation approach to increase the number of malignant samples. These techniques are applied when training on three diverse Full-Field Digital Mammography datasets, and tested on in-distribution and out-of-distribution samples. The experiments show that a greater imbalance is associated with a greater bias towards the majority class, which can be counteracted by any of the standard class imbalance techniques. On the other hand, these methods provide no benefit to model performance with respect to Area Under the Curve of the Recall Operating Characteristic (AUC-ROC), and indeed under-sampling leads to a reduction of 0.066 in AUC in the case of a 19:1 benign to malignant imbalance. Our synthetic lesion methodology leads to better performance in most cases, with increases of up to 0.07 in AUC on out-of-distribution test sets over the next best experiment.
Collapse
Affiliation(s)
- Ricky Walsh
- ISTIC, Campus Beaulieu, Université de Rennes 1, 35700 Rennes, France
- Hera-MI SAS, 44800 Saint-Herblain, France
| | - Mickael Tardy
- Hera-MI SAS, 44800 Saint-Herblain, France
- Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, 44000 Nantes, France
| |
Collapse
|
32
|
Stember JN, Shalu H. Reinforcement learning using Deep
Q
networks and
Q
learning accurately localizes brain tumors on MRI with very small training sets. BMC Med Imaging 2022; 22:224. [PMID: 36564724 PMCID: PMC9784281 DOI: 10.1186/s12880-022-00919-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 10/22/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Supervised deep learning in radiology suffers from notorious inherent limitations: 1) It requires large, hand-annotated data sets; (2) It is non-generalizable; and (3) It lacks explainability and intuition. It has recently been proposed that reinforcement learning addresses all three of these limitations. Notable prior work applied deep reinforcement learning to localize brain tumors with radiologist eye tracking points, which limits the state-action space. Here, we generalize Deep Q Learning to a gridworld-based environment so that only the images and image masks are required. METHODS We trained a DeepQ network on 30 two-dimensional image slices from the BraTS brain tumor database. Each image contained one lesion. We then tested the trained Deep Q network on a separate set of 30 testing set images. For comparison, we also trained and tested a keypoint detection supervised deep learning network on the same set of training/testing images. RESULTS Whereas the supervised approach quickly overfit the training data and predictably performed poorly on the testing set (11% accuracy), the DeepQ learning approach showed progressive improved generalizability to the testing set over training time, reaching 70% accuracy. CONCLUSION We have successfully applied reinforcement learning to localize brain tumors on 2D contrast-enhanced MRI brain images. This represents a generalization of recent work to a gridworld setting naturally suitable for analyzing medical images. We have shown that reinforcement learning does not over-fit small training sets, and can generalize to a separate testing set.
Collapse
Affiliation(s)
- J. N. Stember
- Department of Radiology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, Box 29, New York, NY 10065 USA
| | - H. Shalu
- Department of Aerospace Engineering, Indian Institute of Technology Madras, Chennai, 600 036 India
| |
Collapse
|
33
|
Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022; 6:1330-1345. [PMID: 35788685 PMCID: PMC12063568 DOI: 10.1038/s41551-022-00898-y] [Citation(s) in RCA: 123] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/03/2022] [Indexed: 01/14/2023]
Abstract
In the past decade, the application of machine learning (ML) to healthcare has helped drive the automation of physician tasks as well as enhancements in clinical capabilities and access to care. This progress has emphasized that, from model development to model deployment, data play central roles. In this Review, we provide a data-centric view of the innovations and challenges that are defining ML for healthcare. We discuss deep generative models and federated learning as strategies to augment datasets for improved model performance, as well as the use of the more recent transformer models for handling larger datasets and enhancing the modelling of clinical text. We also discuss data-focused problems in the deployment of ML, emphasizing the need to efficiently deliver data to ML models for timely clinical predictions and to account for natural data shifts that can deteriorate model performance.
Collapse
Affiliation(s)
- Angela Zhang
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
- Greenstone Biosciences, Palo Alto, CA, USA.
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Lei Xing
- Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Informatics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA.
- Greenstone Biosciences, Palo Alto, CA, USA.
- Departments of Medicine, Division of Cardiovascular Medicine Stanford University, Stanford, CA, USA.
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
34
|
Miao S, Jia H, Cheng K, Hu X, Li J, Huang W, Wang R. Deep learning radiomics under multimodality explore association between muscle/fat and metastasis and survival in breast cancer patients. Brief Bioinform 2022; 23:6748489. [PMID: 36198668 DOI: 10.1093/bib/bbac432] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 09/03/2022] [Accepted: 09/06/2022] [Indexed: 12/14/2022] Open
Abstract
Sarcopenia is correlated with poor clinical outcomes in breast cancer (BC) patients. However, there is no precise quantitative study on the correlation between body composition changes and BC metastasis and survival. The present study proposed a deep learning radiomics (DLR) approach to investigate the effects of muscle and fat on distant metastasis and death outcomes in BC patients. Image feature extraction was performed on 4th thoracic vertebra (T4) and 11th thoracic vertebra (T11) on computed tomography (CT) image levels by DLR, and image features were combined with clinical information to predict distant metastasis in BC patients. Clinical information combined with DLR significantly predicted distant metastasis in BC patients. In the test cohort, the area under the curve of model performance on clinical information combined with DLR was 0.960 (95% CI: 0.942-0.979, P < 0.001). The patients with distant metastases had a lower pectoral muscle index in T4 (PMI/T4) than in patients without metastases. PMI/T4 and visceral fat tissue area in T11 (VFA/T11) were independent prognostic factors for the overall survival in BC patients. The pectoralis muscle area in T4 (PMA/T4) and PMI/T4 is an independent prognostic factor for distant metastasis-free survival in BC patients. The current study further confirmed that muscle/fat of T4 and T11 levels have a significant effect on the distant metastasis of BC. Appending the network features of T4 and T11 to the model significantly enhances the prediction performance of distant metastasis of BC, providing a valuable biomarker for the early treatment of BC patients.
Collapse
Affiliation(s)
- Shidi Miao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Haobo Jia
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Ke Cheng
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaohui Hu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Jing Li
- Department of Geriatrics, the Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Wenjuan Huang
- Department of Internal Medicine, Harbin Medical University Cancer Hospital, Harbin Medical University, Harbin, China
| | - Ruitao Wang
- Department of Internal Medicine, Harbin Medical University Cancer Hospital, Harbin Medical University, Harbin, China
| |
Collapse
|
35
|
Retrospective analysis and prospective validation of an AI-based software for intracranial haemorrhage detection at a high-volume trauma centre. Sci Rep 2022; 12:19885. [PMID: 36400834 PMCID: PMC9674833 DOI: 10.1038/s41598-022-24504-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 11/16/2022] [Indexed: 11/19/2022] Open
Abstract
Rapid detection of intracranial haemorrhage (ICH) is crucial for assessing patients with neurological symptoms. Prioritising these urgent scans for reporting presents a challenge for radiologists. Artificial intelligence (AI) offers a solution to enable radiologists to triage urgent scans and reduce reporting errors. This study aims to evaluate the accuracy of an ICH-detection AI software and whether it benefits a high-volume trauma centre in terms of triage and reducing diagnostic errors. A peer review of head CT scans performed prior to the implementation of the AI was conducted to identify the department's current miss-rate. Once implemented, the AI software was validated using CT scans performed over one month, and was reviewed by a neuroradiologist. The turn-around-time was calculated as the time taken from scan completion to report finalisation. 2916 head CT scans and reports were reviewed as part of the audit. The AI software flagged 20 cases that were negative-by-report. Two of these were true-misses that had no follow-up imaging. Both patients were followed up and exhibited no long-term neurological sequelae. For ICH-positive scans, there was an increase in TAT in the total sample (35.6%), and a statistically insignificant decrease in TAT in the emergency (- 5.1%) and outpatient (- 14.2%) cohorts. The AI software was tested on a sample of real-world data from a high-volume Australian centre. The diagnostic accuracy was comparable to that reported in literature. The study demonstrated the institution's low miss-rate and short reporting time, therefore any improvements from the use of AI would be marginal and challenging to measure.
Collapse
|
36
|
Kelly BS, Judge C, Bollard SM, Clifford SM, Healy GM, Aziz A, Mathur P, Islam S, Yeom KW, Lawlor A, Killeen RP. Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE). Eur Radiol 2022; 32:7998-8007. [PMID: 35420305 PMCID: PMC9668941 DOI: 10.1007/s00330-022-08784-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 03/17/2022] [Accepted: 03/26/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVE There has been a large amount of research in the field of artificial intelligence (AI) as applied to clinical radiology. However, these studies vary in design and quality and systematic reviews of the entire field are lacking.This systematic review aimed to identify all papers that used deep learning in radiology to survey the literature and to evaluate their methods. We aimed to identify the key questions being addressed in the literature and to identify the most effective methods employed. METHODS We followed the PRISMA guidelines and performed a systematic review of studies of AI in radiology published from 2015 to 2019. Our published protocol was prospectively registered. RESULTS Our search yielded 11,083 results. Seven hundred sixty-seven full texts were reviewed, and 535 articles were included. Ninety-eight percent were retrospective cohort studies. The median number of patients included was 460. Most studies involved MRI (37%). Neuroradiology was the most common subspecialty. Eighty-eight percent used supervised learning. The majority of studies undertook a segmentation task (39%). Performance comparison was with a state-of-the-art model in 37%. The most used established architecture was UNet (14%). The median performance for the most utilised evaluation metrics was Dice of 0.89 (range .49-.99), AUC of 0.903 (range 1.00-0.61) and Accuracy of 89.4 (range 70.2-100). Of the 77 studies that externally validated their results and allowed for direct comparison, performance on average decreased by 6% at external validation (range increase of 4% to decrease 44%). CONCLUSION This systematic review has surveyed the major advances in AI as applied to clinical radiology. KEY POINTS • While there are many papers reporting expert-level results by using deep learning in radiology, most apply only a narrow range of techniques to a narrow selection of use cases. • The literature is dominated by retrospective cohort studies with limited external validation with high potential for bias. • The recent advent of AI extensions to systematic reporting guidelines and prospective trial registration along with a focus on external validation and explanations show potential for translation of the hype surrounding AI from code to clinic.
Collapse
Affiliation(s)
- Brendan S Kelly
- St Vincent's University Hospital, Dublin, Ireland.
- Insight Centre for Data Analytics, UCD, Dublin, Ireland.
- Wellcome Trust - HRB, Irish Clinical Academic Training, Dublin, Ireland.
- School of Medicine, University College Dublin, Dublin, Ireland.
- HRB-Clinical Research Facility, NUI Galway, Galway, Ireland.
| | - Conor Judge
- Wellcome Trust - HRB, Irish Clinical Academic Training, Dublin, Ireland
- Lucille Packard Children's Hospital at Stanford, Stanford, CA, USA
| | - Stephanie M Bollard
- Wellcome Trust - HRB, Irish Clinical Academic Training, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| | | | | | - Awsam Aziz
- School of Medicine, University College Dublin, Dublin, Ireland
| | | | - Shah Islam
- Division of Brain Sciences, Imperial College London, GN1 Commonwealth Building, Hammersmith Hospital, Du Cane Road, London, W12 0HS, UK
| | - Kristen W Yeom
- HRB-Clinical Research Facility, NUI Galway, Galway, Ireland
| | | | - Ronan P Killeen
- St Vincent's University Hospital, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| |
Collapse
|
37
|
Bahl M. Artificial Intelligence in Clinical Practice: Implementation Considerations and Barriers. JOURNAL OF BREAST IMAGING 2022; 4:632-639. [PMID: 36530476 PMCID: PMC9741727 DOI: 10.1093/jbi/wbac065] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Indexed: 09/06/2023]
Abstract
The rapid growth of artificial intelligence (AI) in radiology has led to Food and Drug Administration clearance of more than 20 AI algorithms for breast imaging. The steps involved in the clinical implementation of an AI product include identifying all stakeholders, selecting the appropriate product to purchase, evaluating it with a local data set, integrating it into the workflow, and monitoring its performance over time. Despite the potential benefits of improved quality and increased efficiency with AI, several barriers, such as high costs and liability concerns, may limit its widespread implementation. This article lists currently available AI products for breast imaging, describes the key elements of clinical implementation, and discusses barriers to clinical implementation.
Collapse
Affiliation(s)
- Manisha Bahl
- Massachusetts General Hospital, Department of Radiology, Boston, MA, USA
| |
Collapse
|
38
|
Liu L, Zhang P, Liang G, Xiong S, Wang J, Zheng G. A spatiotemporal correlation deep learning network for brain penumbra disease. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
39
|
Garrucho L, Kushibar K, Jouide S, Diaz O, Igual L, Lekadir K. Domain generalization in deep learning based mass detection in mammography: A large-scale multi-center study. Artif Intell Med 2022; 132:102386. [PMID: 36207090 DOI: 10.1016/j.artmed.2022.102386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 08/07/2022] [Accepted: 08/19/2022] [Indexed: 11/02/2022]
Abstract
Computer-aided detection systems based on deep learning have shown great potential in breast cancer detection. However, the lack of domain generalization of artificial neural networks is an important obstacle to their deployment in changing clinical environments. In this study, we explored the domain generalization of deep learning methods for mass detection in digital mammography and analyzed in-depth the sources of domain shift in a large-scale multi-center setting. To this end, we compared the performance of eight state-of-the-art detection methods, including Transformer based models, trained in a single domain and tested in five unseen domains. Moreover, a single-source mass detection training pipeline was designed to improve the domain generalization without requiring images from the new domain. The results show that our workflow generalized better than state-of-the-art transfer learning based approaches in four out of five domains while reducing the domain shift caused by the different acquisition protocols and scanner manufacturers. Subsequently, an extensive analysis was performed to identify the covariate shifts with the greatest effects on detection performance, such as those due to differences in patient age, breast density, mass size, and mass malignancy. Ultimately, this comprehensive study provides key insights and best practices for future research on domain generalization in deep learning based breast cancer detection.
Collapse
Affiliation(s)
- Lidia Garrucho
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain.
| | - Kaisar Kushibar
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain
| | - Socayna Jouide
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain
| | - Oliver Diaz
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain
| | - Laura Igual
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain
| | - Karim Lekadir
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, 08007, Barcelona, Spain
| |
Collapse
|
40
|
Automatic Classification of Simulated Breast Tomosynthesis Whole Images for the Presence of Microcalcification Clusters Using Deep CNNs. J Imaging 2022; 8:jimaging8090231. [PMID: 36135397 PMCID: PMC9503015 DOI: 10.3390/jimaging8090231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/26/2022] [Accepted: 08/04/2022] [Indexed: 11/30/2022] Open
Abstract
Microcalcification clusters (MCs) are among the most important biomarkers for breast cancer, especially in cases of nonpalpable lesions. The vast majority of deep learning studies on digital breast tomosynthesis (DBT) are focused on detecting and classifying lesions, especially soft-tissue lesions, in small regions of interest previously selected. Only about 25% of the studies are specific to MCs, and all of them are based on the classification of small preselected regions. Classifying the whole image according to the presence or absence of MCs is a difficult task due to the size of MCs and all the information present in an entire image. A completely automatic and direct classification, which receives the entire image, without prior identification of any regions, is crucial for the usefulness of these techniques in a real clinical and screening environment. The main purpose of this work is to implement and evaluate the performance of convolutional neural networks (CNNs) regarding an automatic classification of a complete DBT image for the presence or absence of MCs (without any prior identification of regions). In this work, four popular deep CNNs are trained and compared with a new architecture proposed by us. The main task of these trainings was the classification of DBT cases by absence or presence of MCs. A public database of realistic simulated data was used, and the whole DBT image was taken into account as input. DBT data were considered without and with preprocessing (to study the impact of noise reduction and contrast enhancement methods on the evaluation of MCs with CNNs). The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance. Very promising results were achieved with a maximum AUC of 94.19% for the GoogLeNet. The second-best AUC value was obtained with a new implemented network, CNN-a, with 91.17%. This CNN had the particularity of also being the fastest, thus becoming a very interesting model to be considered in other studies. With this work, encouraging outcomes were achieved in this regard, obtaining similar results to other studies for the detection of larger lesions such as masses. Moreover, given the difficulty of visualizing the MCs, which are often spread over several slices, this work may have an important impact on the clinical analysis of DBT images.
Collapse
|
41
|
Begum AS, Kalaiselvi T, Rahimunnisa K. A Computer Aided Breast Cancer Detection Using Unit-Linking Pulse Coupled Neural Network & Multiphase Level Set Method. J BIOMATER TISS ENG 2022. [DOI: 10.1166/jbt.2022.3091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Breast cancer is one of the lethal diseases with high mortality rates among women. An early detection and diagnosis of the disease can help increase the survival rate. Distinguishing a normal breast tissue from a cancerous one proves to be ambiguous for a Radiologist. A computer aided
system can help a radiologist in better and efficient diagnosis. This paper aims at detection and classification of benign and malignant mammogram images with Unit-linking Pulse Coupled Neural Network combined with Multiphase level set Method. While Unit linking Pulse Coupled Neural Network
(PCNN) helps in coarse feature extraction, Multi phase Level Set method helps in extracting minute details and hence, better classification. The proposed method is tested with images from MIAS open-source database. Performance of the proposed method is measured using sensitivity, accuracy,
specificity and false positive rate. Experiments show that the proposed method gives satisfactory results when compared to the state-of-art methods. The sensitivity obtained by the proposed method is 95.16%, an accuracy of 96.76%, the False Positive Rate (FPR) is as less as 0.85% and specificity
of 97.12%.
Collapse
Affiliation(s)
- A. Sumaiya Begum
- Department of Electronics and Communication Engineering, R.M.D Engineering College, Chennai 601206, Tamilnadu, India
| | - T. Kalaiselvi
- Department of Electronics and Instrumentation Engineering, Easwari Engineering College, Chennai 600089, Tamilnadu, India
| | - K. Rahimunnisa
- Department of Electronics and Communication Engineering, Easwari Engineering College, Chennai 600089, Tamilnadu, India
| |
Collapse
|
42
|
Wu Y, Koyuncu CF, Toro P, Corredor G, Feng Q, Buzzy C, Old M, Teknos T, Connelly ST, Jordan RC, Lang Kuhs KA, Lu C, Lewis JS, Madabhushi A. A machine learning model for separating epithelial and stromal regions in oral cavity squamous cell carcinomas using H&E-stained histology images: A multi-center, retrospective study. Oral Oncol 2022; 131:105942. [PMID: 35689952 DOI: 10.1016/j.oraloncology.2022.105942] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 04/12/2022] [Accepted: 05/24/2022] [Indexed: 01/30/2023]
Abstract
OBJECTIVE Tissue slides from Oral cavity squamous cell carcinoma (OC-SCC), particularly the epithelial regions, hold morphologic features that are both diagnostic and prognostic. Yet, previously developed approaches for automated epithelium segmentation in OC-SCC have not been independently tested in a multi-center setting. In this study, we aimed to investigate the effectiveness and applicability of a convolutional neural network (CNN) model to perform epithelial segmentation using digitized H&E-stained diagnostic slides from OC-SCC patients in a multi-center setting. METHODS A CNN model was developed to segment the epithelial regions of digitized slides (n = 810), retrospectively collected from five different centers. Deep learning models were trained and validated using well-annotated tissue microarray (TMA) images (n = 212) at various magnifications. The best performing model was locked down and used for independent testing with a total of 478 whole-slide images (WSIs). Manually annotated epithelial regions were used as the reference standard for evaluation. We also compared the model generated results with IHC-stained epithelium (n = 120) as the reference. RESULTS The locked-down CNN model trained on the TMA image training cohorts with 10x magnification achieved the best segmentation performance. The locked-down model performed consistently and yielded Pixel Accuracy, Recall Rate, Precision Rate, and Dice Coefficient that ranged from 95.8% to 96.6%, 79.1% to 93.8%, 85.7% to 89.3%, and 82.3% to 89.0%, respectively for the three independent testing WSI cohorts. CONCLUSION The automated model achieved a consistently accurate performance for automated epithelial region segmentation compared to manual annotations. This model could be integrated into a computer-aided diagnosis or prognosis system.
Collapse
Affiliation(s)
- Yuxin Wu
- Shandong Junteng Medical Technology Co., Ltd, Jinan, China; College of Computer Science, Shaanxi Normal University, Xian, China
| | - Can F Koyuncu
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA
| | - Paula Toro
- Department of Pathology, Cleveland Clinic, OH, USA
| | - German Corredor
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA
| | - Qianyu Feng
- College of Computer Science, Shaanxi Normal University, Xian, China
| | - Christina Buzzy
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
| | - Matthew Old
- Department of Otolaryngology, Ohio State University Medical Center, OH, USA
| | - Theodoros Teknos
- Department of Otolaryngology, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - Stephen Thaddeus Connelly
- Department of Oral and Maxillofacial Surgery, San Francisco Veterans Affairs Health Care System, University of California, San Francisco, San Francisco, CA, USA
| | - Richard C Jordan
- Departments of Orofacial Sciences, Pathology and Radiation Oncology, University of California San Francisco, CA, USA
| | - Krystle A Lang Kuhs
- Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, KY, USA; Department of Medicine, Vanderbilt University Medical Cancer, Nashville, TN, USA
| | - Cheng Lu
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA.
| | - James S Lewis
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Otolaryngology - Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Anant Madabhushi
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA.
| |
Collapse
|
43
|
Characterization of Nuclear Pleomorphism and Tubules in Histopathological Images of Breast Cancer. SENSORS 2022; 22:s22155649. [PMID: 35957203 PMCID: PMC9371191 DOI: 10.3390/s22155649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/24/2022] [Accepted: 07/26/2022] [Indexed: 11/17/2022]
Abstract
Breast cancer (BC) diagnosis is made by a pathologist who analyzes a portion of the breast tissue under the microscope and performs a histological evaluation. This evaluation aims to determine the grade of cellular differentiation and the aggressiveness of the tumor by the Nottingham Grade Classification System (NGS). Nowadays, digital pathology is an innovative tool for pathologists in diagnosis and acquiring new learning. However, a recurring problem in health services is the excessive workload in all medical services. For this reason, it is required to develop computational tools that assist histological evaluation. This work proposes a methodology for the quantitative analysis of BC tissue that follows NGS. The proposed methodology is based on digital image processing techniques through which the BC tissue can be characterized automatically. Moreover, the proposed nuclei characterization was helpful for grade differentiation in carcinoma images of the BC tissue reaching an 0.84 accuracy. In addition, a metric was proposed to assess the likelihood of a structure in the tissue corresponding to a tubule by considering spatial and geometrical characteristics between lumina and its surrounding nuclei, reaching an accuracy of 0.83. Tests were performed from different databases and under various magnification and staining contrast conditions, showing that the methodology is reliable for histological breast tissue analysis.
Collapse
|
44
|
Wang Y, Zhang L, Shu X, Feng Y, Yi Z, Lv Q. Feature-Sensitive Deep Convolutional Neural Network for Multi-Instance Breast Cancer Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2241-2251. [PMID: 33600319 DOI: 10.1109/tcbb.2021.3060183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
To obtain a well-performed computer-aided detection model for detecting breast cancer, it is usually needed to design an effective and efficient algorithm and a well-labeled dataset to train it. In this paper, first, a multi-instance mammography clinic dataset was constructed. Each case in the dataset includes a different number of instances captured from different views, it is labeled according to the pathological report, and all the instances of one case share one label. Nevertheless, the instances captured from different views may have various levels of contributions to conclude the category of the target case. Motivated by this observation, a feature-sensitive deep convolutional neural network with an end-to-end training manner is proposed to detect breast cancer. The proposed method first uses a pre-train model with some custom layers to extract image features. Then, it adopts a feature fusion module to learn to compute the weight of each feature vector. It makes the different instances of each case have different sensibility on the classifier. Lastly, a classifier module is used to classify the fused features. The experimental results on both our constructed clinic dataset and two public datasets have demonstrated the effectiveness of the proposed method.
Collapse
|
45
|
Qu Y, Yan D, Xing E, Zheng F, Zhang J, Liu L, Liang G. Beware the Black-Box of Medical Image Generation: an Uncertainty Analysis by the Learned Feature Space. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3849-3853. [PMID: 36085751 DOI: 10.1109/embc48229.2022.9871921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep neural networks (DNNs) are the primary driving force for the current development of medical imaging analysis tools and often provide exciting performance on various tasks. However, such results are usually reported on the overall performance of DNNs, such as the Peak signal-to-noise ratio (PSNR) or mean square error (MSE) for imaging generation tasks. As a black-box, DNNs usually produce a relatively stable performance on the same task across multiple training trials, while the learned feature spaces could be significantly different. We believe additional insightful analysis, such as uncertainty analysis of the learned feature space, is equally important, if not more. Through this work, we evaluate the learned feature space of multiple U-Net architectures for image generation tasks using computational analysis and clustering analysis methods. We demonstrate that the learned feature spaces are easily separable between different training trials of the same architecture with the same hyperparameter setting, indicating the models using different criteria for the same tasks. This phenomenon naturally raises the question of which criteria are correct to use. Thus, our work suggests that assessments other than overall performance are needed before applying a DNN model to real-world practice.
Collapse
|
46
|
Ardestani A, Li MD, Chea P, Wortman JR, Medina A, Kalpathy-Cramer J, Wald C. External COVID-19 Deep Learning Model Validation on ACR AI-LAB: It's a Brave New World. J Am Coll Radiol 2022; 19:891-900. [PMID: 35483438 PMCID: PMC8989698 DOI: 10.1016/j.jacr.2022.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 03/19/2022] [Accepted: 03/21/2022] [Indexed: 11/22/2022]
Abstract
PURPOSE Deploying external artificial intelligence (AI) models locally can be logistically challenging. We aimed to use the ACR AI-LAB software platform for local testing of a chest radiograph (CXR) algorithm for COVID-19 lung disease severity assessment. METHODS An externally developed deep learning model for COVID-19 radiographic lung disease severity assessment was loaded into the AI-LAB platform at an independent academic medical center, which was separate from the institution in which the model was trained. The data set consisted of CXR images from 141 patients with reverse transcription-polymerase chain reaction-confirmed COVID-19, which were routed to AI-LAB for model inference. The model calculated a Pulmonary X-ray Severity (PXS) score for each image. This score was correlated with the average of a radiologist-based assessment of severity, the modified Radiographic Assessment of Lung Edema score, independently interpreted by three radiologists. The associations between the PXS score and patient admission and intubation or death were assessed. RESULTS The PXS score deployed in AI-LAB correlated with the radiologist-determined modified Radiographic Assessment of Lung Edema score (r = 0.80). PXS score was significantly higher in patients who were admitted (4.0 versus 1.3, P < .001) or intubated or died within 3 days (5.5 versus 3.3, P = .001). CONCLUSIONS AI-LAB was successfully used to test an external COVID-19 CXR AI algorithm on local data with relative ease, showing generalizability of the PXS score model. For AI models to scale and be clinically useful, software tools that facilitate the local testing process, like the freely available AI-LAB, will be important to cross the AI implementation gap in health care systems.
Collapse
Affiliation(s)
- Ali Ardestani
- Department of Radiology, Lahey Hospital and Medical Center, Tufts Medical School, Burlington, Massachusetts
| | - Matthew D Li
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Pauley Chea
- Department of Radiology, Lahey Hospital and Medical Center, Tufts Medical School, Burlington, Massachusetts
| | - Jeremy R Wortman
- Vice Chair, Research and Radiology Residency Program Director, Department of Radiology, Lahey Hospital and Medical Center, Tufts Medical School, Burlington, Massachusetts
| | - Adam Medina
- Department of Radiology, Lahey Hospital and Medical Center, Tufts Medical School, Burlington, Massachusetts
| | - Jayashree Kalpathy-Cramer
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Christoph Wald
- Chair, Department of Radiology, Lahey Hospital and Medical Center, Tufts Medical School, Burlington, Massachusetts; and Chair, Informatics Commission, ACR.
| |
Collapse
|
47
|
Yu AC, Mohajer B, Eng J. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review. Radiol Artif Intell 2022; 4:e210064. [PMID: 35652114 DOI: 10.1148/ryai.210064] [Citation(s) in RCA: 131] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 03/09/2022] [Accepted: 04/12/2022] [Indexed: 01/17/2023]
Abstract
Purpose To assess generalizability of published deep learning (DL) algorithms for radiologic diagnosis. Materials and Methods In this systematic review, the PubMed database was searched for peer-reviewed studies of DL algorithms for image-based radiologic diagnosis that included external validation, published from January 1, 2015, through April 1, 2021. Studies using nonimaging features or incorporating non-DL methods for feature extraction or classification were excluded. Two reviewers independently evaluated studies for inclusion, and any discrepancies were resolved by consensus. Internal and external performance measures and pertinent study characteristics were extracted, and relationships among these data were examined using nonparametric statistics. Results Eighty-three studies reporting 86 algorithms were included. The vast majority (70 of 86, 81%) reported at least some decrease in external performance compared with internal performance, with nearly half (42 of 86, 49%) reporting at least a modest decrease (≥0.05 on the unit scale) and nearly a quarter (21 of 86, 24%) reporting a substantial decrease (≥0.10 on the unit scale). No study characteristics were found to be associated with the difference between internal and external performance. Conclusion Among published external validation studies of DL algorithms for image-based radiologic diagnosis, the vast majority demonstrated diminished algorithm performance on the external dataset, with some reporting a substantial performance decrease.Keywords: Meta-Analysis, Computer Applications-Detection/Diagnosis, Neural Networks, Computer Applications-General (Informatics), Epidemiology, Technology Assessment, Diagnosis, Informatics Supplemental material is available for this article. © RSNA, 2022.
Collapse
Affiliation(s)
- Alice C Yu
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, 1800 Orleans St, Baltimore, MD 21287
| | - Bahram Mohajer
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, 1800 Orleans St, Baltimore, MD 21287
| | - John Eng
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, 1800 Orleans St, Baltimore, MD 21287
| |
Collapse
|
48
|
The Use of Internet of Things and Cloud Computing Technology in the Performance Appraisal Management of Innovation Capability of University Scientific Research Team. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9423718. [PMID: 35440942 PMCID: PMC9013565 DOI: 10.1155/2022/9423718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 02/21/2022] [Accepted: 03/09/2022] [Indexed: 11/17/2022]
Abstract
This study aims to speed up the progress of scientific research projects in colleges and universities, continuously improve the innovation ability of scientific research teams in colleges and universities, and optimize the current management methods of performance appraisal of college innovation ability. Firstly, the needs of the innovation performance evaluation system are analyzed, and the corresponding innovation performance evaluation index system of scientific research team is constructed. Secondly, the Internet of Things (IoT) combines the Field Programmable Gate Array (FPGA) to build an innovation capability performance appraisal management terminal. Thirdly, the lightweight deep network has been built into the innovation ability performance assessment management network of university scientific research teams, which relates to the innovation performance assessment index system of scientific research teams. Finally, the system performance is tested. The results show that the proposed method has different degrees of compression for MobileNet, which can significantly reduce the network computation and retain the original recognition ability. Models whose Floating-Point Operations (FLOPs) are reduced by 70% to 90% have 3.6 to 14.3 times fewer parameters. Under different pruning rates, the proposed model has higher model compression rate and recognition accuracy than other models. The results also show that the output of the results is closely related to the interests of the research team. The academic influence score of Team 1 is 0.17, which is the highest among the six groups in this experimental study, indicating that Team 1 has the most significant academic influence. These results provide certain data support and method reference for evaluating the innovation ability of scientific research teams in colleges and universities and contribute to the comprehensive development of efficient scientific research teams.
Collapse
|
49
|
Liang G, Greenwell C, Zhang Y, Xing X, Wang X, Kavuluru R, Jacobs N. Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging. IEEE J Biomed Health Inform 2022; 26:1640-1649. [PMID: 34495856 PMCID: PMC9242687 DOI: 10.1109/jbhi.2021.3110805] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A key challenge in training neural networks for a given medical imaging task is the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports are often readily available in medical records and contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.
Collapse
|
50
|
Liang G, Ganesh H, Steffe D, Liu L, Jacobs N, Zhang J. Development of CNN models for the enteral feeding tube positioning assessment on a small scale data set. BMC Med Imaging 2022; 22:52. [PMID: 35317725 PMCID: PMC8939093 DOI: 10.1186/s12880-022-00766-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 02/17/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Enteral nutrition through feeding tubes serves as the primary method of nutritional supplementation for patients unable to feed themselves. Plain radiographs are routinely used to confirm the position of the Nasoenteric feeding tubes the following insertion and before the commencement of tube feeds. Convolutional neural networks (CNNs) have shown encouraging results in assisting the tube positioning assessment. However, robust CNNs are often trained using large amounts of manually annotated data, which challenges applying CNNs on enteral feeding tube positioning assessment. METHOD We build a CNN model for feeding tube positioning assessment by pre-training the model under a weakly supervised fashion on large quantities of radiographs. Since most of the model was pre-trained, a small amount of labeled data is needed when fine-tuning the model for tube positioning assessment. We demonstrate the proposed method using a small dataset with 175 radiographs. RESULT The experimental result shows that the proposed model improves the area under the receiver operating characteristic curve (AUC) by up to 35.71% , from 0.56 to 0.76, and 14.49% on the accuracy, from 0.69 to 0.79 when compared with the no pre-trained method. The proposed method also has up to 40% less error when estimating its prediction confidence. CONCLUSION Our evaluation results show that the proposed model has a high prediction accuracy and a more accurate estimated prediction confidence when compared to the no pre-trained model and other baseline models. The proposed method can be potentially used for assessing the enteral tube positioning. It also provides a strong baseline for future studies.
Collapse
Affiliation(s)
| | | | | | | | | | - Jie Zhang
- University of Kentucky, Lexington, KY, USA.
| |
Collapse
|