1
|
Coroller T, Sahiner B, Amatya A, Gossmann A, Karagiannis K, Moloney C, Samala RK, Santana-Quintero L, Solovieff N, Wang C, Amiri-Kordestani L, Cao Q, Cha KH, Charlab R, Cross FH, Hu T, Huang R, Kraft J, Krusche P, Li Y, Li Z, Mazo I, Paul R, Schnakenberg S, Serra P, Smith S, Song C, Su F, Tiwari M, Vechery C, Xiong X, Zarate JP, Zhu H, Chakravartty A, Liu Q, Ohlssen D, Petrick N, Schneider JA, Walderhaug M, Zuber E. Methodology for Good Machine Learning with Multi-Omics Data. Clin Pharmacol Ther 2024; 115:745-757. [PMID: 37965805 DOI: 10.1002/cpt.3105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/20/2023] [Indexed: 11/16/2023]
Abstract
In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence and machine learning. This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas, such as "maintaining effective communication" and "following good data science practices," followed by the four steps of exploratory projects, namely (1) plan, (2) design, (3) develop, and (4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.
Collapse
Affiliation(s)
| | - Berkman Sahiner
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Anup Amatya
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Alexej Gossmann
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Konstantinos Karagiannis
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Ravi K Samala
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Luis Santana-Quintero
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Nadia Solovieff
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Craig Wang
- Novartis Pharma AG, Rotkreuz, Switzerland
| | - Laleh Amiri-Kordestani
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Qian Cao
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Kenny H Cha
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Rosane Charlab
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Frank H Cross
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Tingting Hu
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Ruihao Huang
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Jeffrey Kraft
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Yutong Li
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Zheng Li
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Ilya Mazo
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Rahul Paul
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Paolo Serra
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Sean Smith
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Chi Song
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Fei Su
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Mohit Tiwari
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Colin Vechery
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Xin Xiong
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Hao Zhu
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Qi Liu
- Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - David Ohlssen
- Novartis Pharmaceutical Company, East Hanover, New Jersey, USA
| | - Nicholas Petrick
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Julie A Schneider
- Oncology Center of Excellence, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Mark Walderhaug
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | |
Collapse
|
2
|
Sobiecki A, Hadjiiski LM, Chan HP, Samala RK, Zhou C, Stojanovska J, Agarwal PP. Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data. Diagnostics (Basel) 2024; 14:341. [PMID: 38337857 PMCID: PMC10855789 DOI: 10.3390/diagnostics14030341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/24/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
The diagnosis of severe COVID-19 lung infection is important because it carries a higher risk for the patient and requires prompt treatment with oxygen therapy and hospitalization while those with less severe lung infection often stay on observation. Also, severe infections are more likely to have long-standing residual changes in their lungs and may need follow-up imaging. We have developed deep learning neural network models for classifying severe vs. non-severe lung infections in COVID-19 patients on chest radiographs (CXR). A deep learning U-Net model was developed to segment the lungs. Inception-v1 and Inception-v4 models were trained for the classification of severe vs. non-severe COVID-19 infection. Four CXR datasets from multi-country and multi-institutional sources were used to develop and evaluate the models. The combined dataset consisted of 5748 cases and 6193 CXR images with physicians' severity ratings as reference standard. The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. We studied the reproducibility of classification performance using the different combinations of training and validation data sets. We also evaluated the generalizability of the trained deep learning models using both independent internal and external test sets. The Inception-v1 based models achieved AUC ranging between 0.81 ± 0.02 and 0.84 ± 0.0, while the Inception-v4 models achieved AUC in the range of 0.85 ± 0.06 and 0.89 ± 0.01, on the independent test sets, respectively. These results demonstrate the promise of using deep learning models in differentiating COVID-19 patients with severe from non-severe lung infection on chest radiographs.
Collapse
Affiliation(s)
- André Sobiecki
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Lubomir M. Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Ravi K. Samala
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA;
| | - Chuan Zhou
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | | | - Prachi P. Agarwal
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| |
Collapse
|
3
|
Mahmood U, Shukla-Dave A, Chan HP, Drukker K, Samala RK, Chen Q, Vergara D, Greenspan H, Petrick N, Sahiner B, Huo Z, Summers RM, Cha KH, Tourassi G, Deserno TM, Grizzard KT, Näppi JJ, Yoshida H, Regge D, Mazurchuk R, Suzuki K, Morra L, Huisman H, Armato SG, Hadjiiski L. Artificial intelligence in medicine: mitigating risks and maximizing benefits via quality assurance, quality control, and acceptance testing. BJR Artif Intell 2024; 1:ubae003. [PMID: 38476957 PMCID: PMC10928809 DOI: 10.1093/bjrai/ubae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/08/2024] [Accepted: 01/12/2024] [Indexed: 03/14/2024]
Abstract
The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples. We also highlight what we see as the shared responsibility of manufacturers or vendors, regulators, healthcare systems, medical physicists, and clinicians to enact appropriate testing and oversight to ensure a safe and equitable transformation of medicine through AI.
Collapse
Affiliation(s)
- Usman Mahmood
- Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, United States
| | - Amita Shukla-Dave
- Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, United States
- Department of Radiology, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, United States
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI, 48109, United States
| | - Karen Drukker
- Department of Radiology, University of Chicago, Chicago, IL, 60637, United States
| | - Ravi K Samala
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, 20993, United States
| | - Quan Chen
- Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ, 85054, United States
| | - Daniel Vergara
- Department of Radiology, University of Washington, Seattle, WA, 98195, United States
| | - Hayit Greenspan
- Biomedical Engineering and Imaging Institute, Department of Radiology, Icahn School of Medicine at Mt Sinai, New York, NY, 10029, United States
| | - Nicholas Petrick
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, 20993, United States
| | - Berkman Sahiner
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, 20993, United States
| | - Zhimin Huo
- Tencent America, Palo Alto, CA, 94306, United States
| | - Ronald M Summers
- Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, 20892, United States
| | - Kenny H Cha
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, 20993, United States
| | - Georgia Tourassi
- Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, United States
| | - Thomas M Deserno
- Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Braunschweig, Niedersachsen, 38106, Germany
| | - Kevin T Grizzard
- Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, 06510, United States
| | - Janne J Näppi
- 3D Imaging Research, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, United States
| | - Hiroyuki Yoshida
- 3D Imaging Research, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, United States
| | - Daniele Regge
- Radiology Unit, Candiolo Cancer Institute, FPO-IRCCS, Candiolo, 10060, Italy
- Department of Translational Research and of New Surgical and Medical Technologies, University of Pisa, Pisa, 56126, Italy
| | - Richard Mazurchuk
- Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, United States
| | - Kenji Suzuki
- Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, 226-8503, Japan
| | - Lia Morra
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Piemonte, 10129, Italy
| | - Henkjan Huisman
- Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, Gelderland, 6525 GA, Netherlands
| | - Samuel G Armato
- Department of Radiology, University of Chicago, Chicago, IL, 60637, United States
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI, 48109, United States
| |
Collapse
|
4
|
Burgon A, Sahiner B, Petrick N, Pennello G, Cha KH, Samala RK. Decision region analysis for generalizability of artificial intelligence models: estimating model generalizability in the case of cross-reactivity and population shift. J Med Imaging (Bellingham) 2024; 11:014501. [PMID: 38283653 PMCID: PMC10810180 DOI: 10.1117/1.jmi.11.1.014501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/30/2024] Open
Abstract
Purpose Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.
Collapse
Affiliation(s)
- Alexis Burgon
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| | - Berkman Sahiner
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| | - Nicholas Petrick
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| | - Gene Pennello
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| | - Kenny H. Cha
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| | - Ravi K. Samala
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States
| |
Collapse
|
5
|
Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol 2023; 96:20220878. [PMID: 36971405 PMCID: PMC10546450 DOI: 10.1259/bjr.20220878] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 03/29/2023] Open
Abstract
Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.
Collapse
Affiliation(s)
- Berkman Sahiner
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Weijie Chen
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Ravi K. Samala
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Nicholas Petrick
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| |
Collapse
|
6
|
Petrick N, Chen W, Delfino JG, Gallas BD, Kang Y, Krainak D, Sahiner B, Samala RK. Regulatory considerations for medical imaging AI/ML devices in the United States: concepts and challenges. J Med Imaging (Bellingham) 2023; 10:051804. [PMID: 37361549 PMCID: PMC10289177 DOI: 10.1117/1.jmi.10.5.051804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 05/22/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023] Open
Abstract
Purpose To introduce developers to medical device regulatory processes and data considerations in artificial intelligence and machine learning (AI/ML) device submissions and to discuss ongoing AI/ML-related regulatory challenges and activities. Approach AI/ML technologies are being used in an increasing number of medical imaging devices, and the fast evolution of these technologies presents novel regulatory challenges. We provide AI/ML developers with an introduction to U.S. Food and Drug Administration (FDA) regulatory concepts, processes, and fundamental assessments for a wide range of medical imaging AI/ML device types. Results The device type for an AI/ML device and appropriate premarket regulatory pathway is based on the level of risk associated with the device and informed by both its technological characteristics and intended use. AI/ML device submissions contain a wide array of information and testing to facilitate the review process with the model description, data, nonclinical testing, and multi-reader multi-case testing being critical aspects of the AI/ML device review process for many AI/ML device submissions. The agency is also involved in AI/ML-related activities that support guidance document development, good machine learning practice development, AI/ML transparency, AI/ML regulatory research, and real-world performance assessment. Conclusion FDA's AI/ML regulatory and scientific efforts support the joint goals of ensuring patients have access to safe and effective AI/ML devices over the entire device lifecycle and stimulating medical AI/ML innovation.
Collapse
Affiliation(s)
- Nicholas Petrick
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| | - Weijie Chen
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| | - Jana G. Delfino
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| | - Brandon D. Gallas
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| | - Yanna Kang
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Product Evaluation and Quality, Silver Spring, Maryland, United States
| | - Daniel Krainak
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Product Evaluation and Quality, Silver Spring, Maryland, United States
| | - Berkman Sahiner
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| | - Ravi K. Samala
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Labs, Silver Spring, Maryland, United States
| |
Collapse
|
7
|
Sun D, Hadjiiski L, Alva A, Zakharia Y, Joshi M, Chan HP, Garje R, Pomerantz L, Elhag D, Cohan RH, Caoili EM, Kerr WT, Cha KH, Kirova-Nedyalkova G, Davenport MS, Shankar PR, Francis IR, Shampain K, Meyer N, Barkmeier D, Woolen S, Palmbos PL, Weizer AZ, Samala RK, Zhou C, Matuszak M. Computerized Decision Support for Bladder Cancer Treatment Response Assessment in CT Urography: Effect on Diagnostic Accuracy in Multi-Institution Multi-Specialty Study. Tomography 2022; 8:644-656. [PMID: 35314631 PMCID: PMC8938803 DOI: 10.3390/tomography8020054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 02/17/2022] [Accepted: 02/28/2022] [Indexed: 11/22/2022] Open
Abstract
This observer study investigates the effect of computerized artificial intelligence (AI)-based decision support system (CDSS-T) on physicians’ diagnostic accuracy in assessing bladder cancer treatment response. The performance of 17 observers was evaluated when assessing bladder cancer treatment response without and with CDSS-T using pre- and post-chemotherapy CTU scans in 123 patients having 157 pre- and post-treatment cancer pairs. The impact of cancer case difficulty, observers’ clinical experience, institution affiliation, specialty, and the assessment times on the observers’ diagnostic performance with and without using CDSS-T were analyzed. It was found that the average performance of the 17 observers was significantly improved (p = 0.002) when aided by the CDSS-T. The cancer case difficulty, institution affiliation, specialty, and the assessment times influenced the observers’ performance without CDSS-T. The AI-based decision support system has the potential to improve the diagnostic accuracy in assessing bladder cancer treatment response and result in more consistent performance among all physicians.
Collapse
Affiliation(s)
- Di Sun
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
- Correspondence:
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Ajjai Alva
- Department of Internal Medicine-Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA; (A.A.); (P.L.P.)
| | - Yousef Zakharia
- Department of Internal Medicine-Hematology/Oncology, University of Iowa, Iowa, IA 52242, USA; (Y.Z.); (R.G.); (D.E.)
| | - Monika Joshi
- Department of Internal Medicine-Hematology/Oncology, Pennsylvania State University, Hershey, PA 16801, USA; (M.J.); (L.P.)
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Rohan Garje
- Department of Internal Medicine-Hematology/Oncology, University of Iowa, Iowa, IA 52242, USA; (Y.Z.); (R.G.); (D.E.)
| | - Lauren Pomerantz
- Department of Internal Medicine-Hematology/Oncology, Pennsylvania State University, Hershey, PA 16801, USA; (M.J.); (L.P.)
| | - Dean Elhag
- Department of Internal Medicine-Hematology/Oncology, University of Iowa, Iowa, IA 52242, USA; (Y.Z.); (R.G.); (D.E.)
| | - Richard H. Cohan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Elaine M. Caoili
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Wesley T. Kerr
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Kenny H. Cha
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD 20993, USA;
| | | | - Matthew S. Davenport
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
- Department of Urology, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Prasad R. Shankar
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Isaac R. Francis
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Kimberly Shampain
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Nathaniel Meyer
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Daniel Barkmeier
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Sean Woolen
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Phillip L. Palmbos
- Department of Internal Medicine-Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA; (A.A.); (P.L.P.)
| | - Alon Z. Weizer
- Department of Urology, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Ravi K. Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Chuan Zhou
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (H.-P.C.); (R.H.C.); (E.M.C.); (M.S.D.); (P.R.S.); (I.R.F.); (K.S.); (N.M.); (D.B.); (S.W.); (R.K.S.); (C.Z.)
| | - Martha Matuszak
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI 48109, USA;
| |
Collapse
|
8
|
Chan HP, Helvie MA, Klein KA, McLaughlin C, Neal CH, Oudsema R, Rahman WT, Roubidoux MA, Hadjiiski LM, Zhou C, Samala RK. Effect of Dose Level on Radiologists' Detection of Microcalcifications in Digital Breast Tomosynthesis: An Observer Study with Breast Phantoms. Acad Radiol 2022; 29 Suppl 1:S42-S49. [PMID: 32950384 DOI: 10.1016/j.acra.2020.07.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/30/2020] [Accepted: 07/30/2020] [Indexed: 11/16/2022]
Abstract
OBJECTIVES To compare radiologists' sensitivity, confidence level, and reading efficiency of detecting microcalcifications in digital breast tomosynthesis (DBT) at two clinically relevant dose levels. MATERIALS AND METHODS Six 5-cm-thick heterogeneous breast phantoms embedded with a total of 144 simulated microcalcification clusters of four speck sizes were imaged at two dose modes by a clinical DBT system. The DBT volumes at the two dose levels were read independently by six MQSA radiologists and one fellow with 1-33 years (median 12 years) of experience in a fully-crossed counter-balanced manner. The radiologist located each potential cluster and rated its conspicuity and his/her confidence that the marked location contained a cluster. The differences in the results between the two dose modes were analyzed by two-tailed paired t-test. RESULTS Compared to the lower-dose mode, the average glandular dose in the higher-dose mode for the 5-cm phantoms increased from 1.34 to 2.07 mGy. The detection sensitivity increased for all speck sizes and significantly for the two smaller sizes (p <0.05). An average of 13.8% fewer false positive clusters was marked. The average conspicuity rating and the radiologists' confidence level were higher for all speck sizes and reached significance (p <0.05) for the three larger sizes. The average reading time per detected cluster reduced significantly (p <0.05) by an average of 13.2%. CONCLUSION For a 5-cm-thick breast, an increase in average glandular dose from 1.34 to 2.07 mGy for DBT imaging increased the conspicuity of microcalcifications, improved the detection sensitivity by radiologists, increased their confidence levels, reduced false positive detections, and increased the reading efficiency.
Collapse
Affiliation(s)
- Heang-Ping Chan
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842.
| | - Mark A Helvie
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Katherine A Klein
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Carol McLaughlin
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Colleen H Neal
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Rebecca Oudsema
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - W Tania Rahman
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Marilyn A Roubidoux
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Lubomir M Hadjiiski
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Chuan Zhou
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| | - Ravi K Samala
- Department of Radiology, University of Michigan, 1500 E. Medical Center Dr., Med Inn Building C477, Ann Arbor, MI 48109-5842
| |
Collapse
|
9
|
Hadjiiski LM, Cha KH, Cohan RH, Chan HP, Caoili EM, Davenport MS, Samala RK, Weizer AZ, Alva A, Kirova-Nedyalkova G, Shampain K, Meyer N, Barkmeier D, Woolen SA, Shankar PR, Francis IR, Palmbos PL. Intraobserver Variability in Bladder Cancer Treatment Response Assessment With and Without Computerized Decision Support. ACTA ACUST UNITED AC 2021; 6:194-202. [PMID: 32548296 PMCID: PMC7289252 DOI: 10.18383/j.tom.2020.00013] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
We evaluated the intraobserver variability of physicians aided by a computerized decision-support system for treatment response assessment (CDSS-T) to identify patients who show complete response to neoadjuvant chemotherapy for bladder cancer, and the effects of the intraobserver variability on physicians' assessment accuracy. A CDSS-T tool was developed that uses a combination of deep learning neural network and radiomic features from computed tomography (CT) scans to detect bladder cancers that have fully responded to neoadjuvant treatment. Pre- and postchemotherapy CT scans of 157 bladder cancers from 123 patients were collected. In a multireader, multicase observer study, physician-observers estimated the likelihood of pathologic T0 disease by viewing paired pre/posttreatment CT scans placed side by side on an in-house-developed graphical user interface. Five abdominal radiologists, 4 diagnostic radiology residents, 2 oncologists, and 1 urologist participated as observers. They first provided an estimate without CDSS-T and then with CDSS-T. A subset of cases was evaluated twice to study the intraobserver variability and its effects on observer consistency. The mean areas under the curves for assessment of pathologic T0 disease were 0.85 for CDSS-T alone, 0.76 for physicians without CDSS-T and improved to 0.80 for physicians with CDSS-T (P = .001) in the original evaluation, and 0.78 for physicians without CDSS-T and improved to 0.81 for physicians with CDSS-T (P = .010) in the repeated evaluation. The intraobserver variability was significantly reduced with CDSS-T (P < .0001). The CDSS-T can significantly reduce physicians' variability and improve their accuracy for identifying complete response of muscle-invasive bladder cancer to neoadjuvant chemotherapy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Ajjai Alva
- Internal Medicine, Division of Hematology-Oncology, University of Michigan, Ann Arbor, MI
| | | | | | | | | | - Sean A Woolen
- Department of Radiology, University of California, San Francisco, Medical Center, San Francisco, CA
| | | | | | - Phillip L Palmbos
- Internal Medicine, Division of Hematology-Oncology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
10
|
Samala RK, Chan HP, Hadjiiski L, Helvie MA. Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification. Med Phys 2021; 48:2827-2837. [PMID: 33368376 PMCID: PMC8601676 DOI: 10.1002/mp.14678] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 11/27/2020] [Accepted: 12/06/2020] [Indexed: 12/20/2022] Open
Abstract
PURPOSE Transfer learning is commonly used in deep learning for medical imaging to alleviate the problem of limited available data. In this work, we studied the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) as feature extractor for classification breast masses in mammography. METHODS Feature leakage occurs when the training set is used for feature selection and classifier modeling while the cost function is guided by the validation performance or informed by the test performance. The high-dimensional feature space extracted from pretrained DCNN suffers from the curse of dimensionality; feature subsets that can provide excessively optimistic performance can be found for the validation set or test set if the latter is allowed for unlimited reuse during algorithm development. We designed a simulation study to examine feature leakage when using DCNN as feature extractor for mass classification in mammography. Four thousand five hundred and seventy-seven unique mass lesions were partitioned by patient into three sets: 3222 for training, 508 for validation, and 847 for independent testing. Three pretrained DCNNs, AlexNet, GoogLeNet, and VGG16, were first compared using a training set in fourfold cross validation and one was selected as the feature extractor. To assess generalization errors, the independent test set was sequestered as truly unseen cases. A training set of a range of sizes from 10% to 75% was simulated by random drawing from the available training set in addition to 100% of the training set. Three commonly used feature classifiers, the linear discriminant, the support vector machine, and the random forest were evaluated. A sequential feature selection method was used to find feature subsets that could achieve high classification performance in terms of the area under the receiver operating characteristic curve (AUC) in the validation set. The extent of feature leakage and the impact of training set size were analyzed by comparison to the performance in the unseen test set. RESULTS All three classifiers showed large generalization error between the validation set and the independent sequestered test set at all sample sizes. The generalization error decreased as the sample size increased. At 100% of the sample size, one classifier achieved an AUC as high as 0.91 on the validation set while the corresponding performance on the unseen test set only reached an AUC of 0.72. CONCLUSIONS Our results demonstrate that large generalization errors can occur in AI tools due to feature leakage. Without evaluation on unseen test cases, optimistically biased performance may be reported inadvertently, and can lead to unrealistic expectations and reduce confidence for clinical implementation.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| | | | - Mark A Helvie
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Chan HP, Hadjiiski LM, Samala RK. Computer-aided diagnosis in the era of deep learning. Med Phys 2021; 47:e218-e227. [PMID: 32418340 DOI: 10.1002/mp.13764] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 05/13/2019] [Accepted: 05/13/2019] [Indexed: 12/15/2022] Open
Abstract
Computer-aided diagnosis (CAD) has been a major field of research for the past few decades. CAD uses machine learning methods to analyze imaging and/or nonimaging patient data and makes assessment of the patient's condition, which can then be used to assist clinicians in their decision-making process. The recent success of the deep learning technology in machine learning spurs new research and development efforts to improve CAD performance and to develop CAD for many other complex clinical tasks. In this paper, we discuss the potential and challenges in developing CAD tools using deep learning technology or artificial intelligence (AI) in general, the pitfalls and lessons learned from CAD in screening mammography and considerations needed for future implementation of CAD or AI in clinical use. It is hoped that the past experiences and the deep learning technology will lead to successful advancement and lasting growth in this new era of CAD, thereby enabling CAD to deliver intelligent aids to improve health care.
Collapse
Affiliation(s)
- Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI, 48109-5842, USA
| | - Lubomir M Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI, 48109-5842, USA
| | - Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI, 48109-5842, USA
| |
Collapse
|
12
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Richter CD. Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis. Phys Med Biol 2020; 65:105002. [PMID: 32208369 DOI: 10.1088/1361-6560/ab82e8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Deep convolutional neural network (DCNN), now popularly called artificial intelligence (AI), has shown the potential to improve over previous computer-assisted tools in medical imaging developed in the past decades. A DCNN has millions of free parameters that need to be trained, but the training sample set is limited in size for most medical imaging tasks so that transfer learning is typically used. Automatic data mining may be an efficient way to enlarge the collected data set but the data can be noisy such as incorrect labels or even a wrong type of image. In this work we studied the generalization error of DCNN with transfer learning in medical imaging for the task of classifying malignant and benign masses on mammograms. With a finite available data set, we simulated a training set containing corrupted data or noisy labels. The balance between learning and memorization of the DCNN was manipulated by varying the proportion of corrupted data in the training set. The generalization error of DCNN was analyzed by the area under the receiver operating characteristic curve for the training and test sets and the weight changes after transfer learning. The study demonstrates that the transfer learning strategy of DCNN for such tasks needs to be designed properly, taking into consideration the constraints of the available training set having limited size and quality for the classification task at hand, to minimize memorization and improve generalizability.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, United States of America
| | | | | | | | | |
Collapse
|
13
|
Chan HP, Samala RK, Hadjiiski LM. CAD and AI for breast cancer-recent development and challenges. Br J Radiol 2020; 93:20190580. [PMID: 31742424 PMCID: PMC7362917 DOI: 10.1259/bjr.20190580] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 11/13/2019] [Accepted: 11/17/2019] [Indexed: 12/15/2022] Open
Abstract
Computer-aided diagnosis (CAD) has been a popular area of research and development in the past few decades. In CAD, machine learning methods and multidisciplinary knowledge and techniques are used to analyze the patient information and the results can be used to assist clinicians in their decision making process. CAD may analyze imaging information alone or in combination with other clinical data. It may provide the analyzed information directly to the clinician or correlate the analyzed results with the likelihood of certain diseases based on statistical modeling of the past cases in the population. CAD systems can be developed to provide decision support for many applications in the patient care processes, such as lesion detection, characterization, cancer staging, treatment planning and response assessment, recurrence and prognosis prediction. The new state-of-the-art machine learning technique, known as deep learning (DL), has revolutionized speech and text recognition as well as computer vision. The potential of major breakthrough by DL in medical image analysis and other CAD applications for patient care has brought about unprecedented excitement of applying CAD, or artificial intelligence (AI), to medicine in general and to radiology in particular. In this paper, we will provide an overview of the recent developments of CAD using DL in breast imaging and discuss some challenges and practical issues that may impact the advancement of artificial intelligence and its integration into clinical workflow.
Collapse
Affiliation(s)
- Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI, United States
| | - Ravi K. Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI, United States
| | | |
Collapse
|
14
|
Abstract
Deep learning is the state-of-the-art machine learning approach. The success of deep learning in many pattern recognition applications has brought excitement and high expectations that deep learning, or artificial intelligence (AI), can bring revolutionary changes in health care. Early studies of deep learning applied to lesion detection or classification have reported superior performance compared to those by conventional techniques or even better than radiologists in some tasks. The potential of applying deep-learning-based medical image analysis to computer-aided diagnosis (CAD), thus providing decision support to clinicians and improving the accuracy and efficiency of various diagnostic and treatment processes, has spurred new research and development efforts in CAD. Despite the optimism in this new era of machine learning, the development and implementation of CAD or AI tools in clinical practice face many challenges. In this chapter, we will discuss some of these issues and efforts needed to develop robust deep-learning-based CAD tools and integrate these tools into the clinical workflow, thereby advancing towards the goal of providing reliable intelligent aids for patient care.
Collapse
Affiliation(s)
- Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA.
| | - Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| | | | - Chuan Zhou
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
15
|
Cha KH, Hadjiiski LM, Cohan RH, Chan HP, Caoili EM, Davenport MS, Samala RK, Weizer AZ, Alva A, Kirova-Nedyalkova G, Shampain K, Meyer N, Barkmeier D, Woolen S, Shankar PR, Francis IR, Palmbos P. Diagnostic Accuracy of CT for Prediction of Bladder Cancer Treatment Response with and without Computerized Decision Support. Acad Radiol 2019; 26:1137-1145. [PMID: 30424999 DOI: 10.1016/j.acra.2018.10.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 09/23/2018] [Accepted: 10/09/2018] [Indexed: 10/27/2022]
Abstract
RATIONALE AND OBJECTIVES To evaluate whether a computed tomography (CT)-based computerized decision-support system for muscle-invasive bladder cancer treatment response assessment (CDSS-T) can improve identification of patients who have responded completely to neoadjuvant chemotherapy. MATERIALS AND METHODS Following Institutional Review Board approval, pre-chemotherapy and post-chemotherapy CT scans of 123 subjects with 157 muscle-invasive bladder cancer foci were collected retrospectively. CT data were analyzed with a CDSS-T that uses a combination of deep-learning convolutional neural network and radiomic features to distinguish muscle-invasive bladder cancers that have fully responded to neoadjuvant treatment from those that have not. Leave-one-case-out cross-validation was used to minimize overfitting. Five attending abdominal radiologists, four diagnostic radiology residents, two attending oncologists, and one attending urologist estimated the likelihood of pathologic T0 disease (complete response) by viewing paired pre/post-treatment CT scans placed side-by-side on an internally-developed graphical user interface. The observers provided an estimate without use of CDSS-T and then were permitted to revise their estimate after a CDSS-T-derived likelihood score was displayed. Observer estimates were analyzed with multi-reader, multi-case receiver operating characteristic methodology. The area under the curve (AUC) and the statistical significance of the difference were estimated. RESULTS The mean AUCs for assessment of pathologic T0 disease were 0.80 for CDSS-T alone, 0.74 for physicians not using CDSS-T, and 0.77 for physicians using CDSS-T. The increase in the physicians' performance was statistically significant (P < .05). CONCLUSION CDSS-T improves physician performance for identifying complete response of muscle-invasive bladder cancer to neoadjuvant chemotherapy.
Collapse
|
16
|
Wu E, Hadjiiski LM, Samala RK, Chan HP, Cha KH, Richter C, Cohan RH, Caoili EM, Paramagul C, Alva A, Weizer AZ. Deep Learning Approach for Assessment of Bladder Cancer Treatment Response. Tomography 2019; 5:201-208. [PMID: 30854458 PMCID: PMC6403041 DOI: 10.18383/j.tom.2018.00036] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
We compared the performance of different Deep learning-convolutional neural network (DL-CNN) models for bladder cancer treatment response assessment based on transfer learning by freezing different DL-CNN layers and varying the DL-CNN structure. Pre- and posttreatment computed tomography scans of 123 patients (cancers, 129; pre- and posttreatment cancer pairs, 158) undergoing chemotherapy were collected. After chemotherapy 33% of patients had T0 stage cancer (complete response). Regions of interest in pre- and posttreatment scans were extracted from the segmented lesions and combined into hybrid pre -post image pairs (h-ROIs). Training (pairs, 94; h-ROIs, 6209), validation (10 pairs) and test sets (54 pairs) were obtained. The DL-CNN consisted of 2 convolution (C1-C2), 2 locally connected (L3-L4), and 1 fully connected layers. The DL-CNN was trained with h-ROIs to classify cancers as fully responding (stage T0) or not fully responding to chemotherapy. Two radiologists provided lesion likelihood of being stage T0 posttreatment. The test area under the ROC curve (AUC) was 0.73 for T0 prediction by the base DL-CNN structure with randomly initialized weights. The base DL-CNN structure with pretrained weights and transfer learning (no frozen layers) achieved test AUC of 0.79. The test AUCs for 3 modified DL-CNN structures (different C1-C2 max pooling filter sizes, strides, and padding, with transfer learning) were 0.72, 0.86, and 0.69. For the base DL-CNN with (C1) frozen, (C1-C2) frozen, and (C1-C2-L3) frozen, the test AUCs were 0.81, 0.78, and 0.71, respectively. The radiologists' AUCs were 0.76 and 0.77. DL-CNN performed better with pretrained than randomly initialized weights.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Ajjai Alva
- Internal Medicine-Hematology/Oncology, and
| | | |
Collapse
|
17
|
Samala RK, Hadjiiski L, Helvie MA, Richter CD, Cha KH. Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets. IEEE Trans Med Imaging 2019; 38:686-696. [PMID: 31622238 PMCID: PMC6812655 DOI: 10.1109/tmi.2018.2870343] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this paper, we developed a deep convolutional neural network (CNN) for the classification of malignant and benign masses in digital breast tomosynthesis (DBT) using a multi-stage transfer learning approach that utilized data from similar auxiliary domains for intermediate-stage fine-tuning. Breast imaging data from DBT, digitized screen-film mammography, and digital mammography totaling 4039 unique regions of interest (1797 malignant and 2242 benign) were collected. Using cross validation, we selected the best transfer network from six transfer networks by varying the level up to which the convolutional layers were frozen. In a single-stage transfer learning approach, knowledge from CNN trained on the ImageNet data was fine-tuned directly with the DBT data. In a multi-stage transfer learning approach, knowledge learned from ImageNet was first fine-tuned with the mammography data and then fine-tuned with the DBT data. Two transfer networks were compared for the second-stage transfer learning by freezing most of the CNN structures versus freezing only the first convolutional layer. We studied the dependence of the classification performance on training sample size for various transfer learning and fine-tuning schemes by varying the training data from 1% to 100% of the available sets. The area under the receiver operating characteristic curve (AUC) was used as a performance measure. The view-based AUC on the test set for single-stage transfer learning was 0.85 ± 0.05 and improved significantly (p <; 0.05$ ) to 0.91 ± 0.03 for multi-stage learning. This paper demonstrated that, when the training sample size from the target domain is limited, an additional stage of transfer learning using data from a similar auxiliary domain is advantageous.
Collapse
|
18
|
Gordon MN, Hadjiiski LM, Cha KH, Samala RK, Chan HP, Cohan RH, Caoili EM. Deep-learning convolutional neural network: Inner and outer bladder wall segmentation in CT urography. Med Phys 2019; 46:634-648. [PMID: 30520055 DOI: 10.1002/mp.13326] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 09/30/2018] [Accepted: 11/15/2018] [Indexed: 11/08/2022] Open
Abstract
PURPOSE We are developing a computerized segmentation tool for the inner and outer bladder wall as a part of an image analysis pipeline for CT urography (CTU). MATERIALS AND METHODS A data set of 172 CTU cases was collected retrospectively with Institutional Review Board (IRB) approval. The data set was randomly split into two independent sets of training (81 cases) and testing (92 cases) which were manually outlined for both the inner and outer wall. We trained a deep-learning convolutional neural network (DL-CNN) to distinguish the bladder wall from the inside and outside of the bladder using neighborhood information. Approximately, 240 000 regions of interest (ROIs) of 16 × 16 pixels in size were extracted from regions in the training cases identified by the manually outlined inner and outer bladder walls to form a training set for the DL-CNN; half of the ROIs were selected to include the bladder wall and the other half were selected to exclude the bladder wall with some of these ROIs being inside the bladder and the rest outside the bladder entirely. The DL-CNN trained on these ROIs was applied to the cases in the test set slice-by-slice to generate a bladder wall likelihood map where the gray level of a given pixel represents the likelihood that a given pixel would belong to the bladder wall. We then used the DL-CNN likelihood map as an energy term in the energy equation of a cascaded level sets method to segment the inner and outer bladder wall. The DL-CNN segmentation with level sets was compared to the three-dimensional (3D) hand-segmented contours as a reference standard. RESULTS For the inner wall contour, the training set achieved the average volume intersection, average volume error, average absolute volume error, and average distance of 90.0 ± 8.7%, -4.2 ± 18.4%, 12.9 ± 13.9%, and 3.0 ± 1.6 mm, respectively. The corresponding values for the test set were 86.9 ± 9.6%, -8.3 ± 37.7%, 18.4 ± 33.8%, and 3.4 ± 1.8 mm, respectively. For the outer wall contour, the training set achieved the values of 93.7 ± 3.9%, -7.8 ± 11.4%, 10.3 ± 9.3%, and 3.0 ± 1.2 mm, respectively. The corresponding values for the test set were 87.5 ± 9.9%, -1.2 ± 20.8%, 11.9 ± 17.0%, and 3.5 ± 2.3 mm, respectively. CONCLUSIONS Our study demonstrates that DL-CNN-assisted level sets can effectively segment bladder walls from the inner bladder and outer structures despite a lack of consistent distinctions along the inner wall. However, even with the addition of level sets, the inner and outer walls may still be over-segmented and the DL-CNN-assisted level sets may incorrectly segment parts of the prostate that overlap with the outer bladder wall. The outer wall segmentation was improved compared to our previous method and the DL-CNN-assisted level sets were also able to segment the inner bladder wall with similar performance. This study shows the DL-CNN-assisted level set segmentation tool can effectively segment the inner and outer wall of the bladder.
Collapse
Affiliation(s)
- Marshall N Gordon
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Lubomir M Hadjiiski
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Kenny H Cha
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Ravi K Samala
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Heang-Ping Chan
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Richard H Cohan
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| | - Elaine M Caoili
- Department of Radiology, The University of Michigan, Ann Arbor, MI, 48109-0904, USA
| |
Collapse
|
19
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Richter C, Cha K. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. Phys Med Biol 2018; 63:095005. [PMID: 29616660 PMCID: PMC5967610 DOI: 10.1088/1361-6560/aabb5b] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Deep learning models are highly parameterized, resulting in difficulty in inference and transfer learning for image recognition tasks. In this work, we propose a layered pathway evolution method to compress a deep convolutional neural network (DCNN) for classification of masses in digital breast tomosynthesis (DBT). The objective is to prune the number of tunable parameters while preserving the classification accuracy. In the first stage transfer learning, 19 632 augmented regions-of-interest (ROIs) from 2454 mass lesions on mammograms were used to train a pre-trained DCNN on ImageNet. In the second stage transfer learning, the DCNN was used as a feature extractor followed by feature selection and random forest classification. The pathway evolution was performed using genetic algorithm in an iterative approach with tournament selection driven by count-preserving crossover and mutation. The second stage was trained with 9120 DBT ROIs from 228 mass lesions using leave-one-case-out cross-validation. The DCNN was reduced by 87% in the number of neurons, 34% in the number of parameters, and 95% in the number of multiply-and-add operations required in the convolutional layers. The test AUC on 89 mass lesions from 94 independent DBT cases before and after pruning were 0.88 and 0.90, respectively, and the difference was not statistically significant (p > 0.05). The proposed DCNN compression approach can reduce the number of required operations by 95% while maintaining the classification performance. The approach can be extended to other deep neural networks and imaging tasks where transfer learning is appropriate.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, United States of America
| | | | | | | | | | | |
Collapse
|
20
|
Li S, Wei J, Chan HP, Helvie MA, Roubidoux MA, Lu Y, Zhou C, Hadjiiski LM, Samala RK. Computer-aided assessment of breast density: comparison of supervised deep learning and feature-based statistical learning. Phys Med Biol 2018; 63:025005. [PMID: 29210358 DOI: 10.1088/1361-6560/aa9f87] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Breast density is one of the most significant factors that is associated with cancer risk. In this study, our purpose was to develop a supervised deep learning approach for automated estimation of percentage density (PD) on digital mammograms (DMs). The input 'for processing' DMs was first log-transformed, enhanced by a multi-resolution preprocessing scheme, and subsampled to a pixel size of 800 µm × 800 µm from 100 µm × 100 µm. A deep convolutional neural network (DCNN) was trained to estimate a probability map of breast density (PMD) by using a domain adaptation resampling method. The PD was estimated as the ratio of the dense area to the breast area based on the PMD. The DCNN approach was compared to a feature-based statistical learning approach. Gray level, texture and morphological features were extracted and a least absolute shrinkage and selection operator was used to combine the features into a feature-based PMD. With approval of the Institutional Review Board, we retrospectively collected a training set of 478 DMs and an independent test set of 183 DMs from patient files in our institution. Two experienced mammography quality standards act radiologists interactively segmented PD as the reference standard. Ten-fold cross-validation was used for model selection and evaluation with the training set. With cross-validation, DCNN obtained a Dice's coefficient (DC) of 0.79 ± 0.13 and Pearson's correlation (r) of 0.97, whereas feature-based learning obtained DC = 0.72 ± 0.18 and r = 0.85. For the independent test set, DCNN achieved DC = 0.76 ± 0.09 and r = 0.94, while feature-based learning achieved DC = 0.62 ± 0.21 and r = 0.75. Our DCNN approach was significantly better and more robust than the feature-based learning approach for automated PD estimation on DMs, demonstrating its potential use for automated density reporting as well as for model-based risk prediction.
Collapse
Affiliation(s)
- Songfeng Li
- School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China. School of Data and Computer Science, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Cha KH, Richter CD. Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms. Phys Med Biol 2017; 62:8894-8908. [PMID: 29035873 PMCID: PMC5859950 DOI: 10.1088/1361-6560/aa93d4] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Transfer learning in deep convolutional neural networks (DCNNs) is an important step in its application to medical imaging tasks. We propose a multi-task transfer learning DCNN with the aim of translating the 'knowledge' learned from non-medical images to medical diagnostic tasks through supervised training and increasing the generalization capabilities of DCNNs by simultaneously learning auxiliary tasks. We studied this approach in an important application: classification of malignant and benign breast masses. With Institutional Review Board (IRB) approval, digitized screen-film mammograms (SFMs) and digital mammograms (DMs) were collected from our patient files and additional SFMs were obtained from the Digital Database for Screening Mammography. The data set consisted of 2242 views with 2454 masses (1057 malignant, 1397 benign). In single-task transfer learning, the DCNN was trained and tested on SFMs. In multi-task transfer learning, SFMs and DMs were used to train the DCNN, which was then tested on SFMs. N-fold cross-validation with the training set was used for training and parameter optimization. On the independent test set, the multi-task transfer learning DCNN was found to have significantly (p = 0.007) higher performance compared to the single-task transfer learning DCNN. This study demonstrates that multi-task transfer learning may be an effective approach for training DCNN in medical imaging applications when training samples from a single modality are limited.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, United States of America
| | | | | | | | | | | |
Collapse
|
22
|
Lu Y, Chan HP, Wei J, Hadjiiski LM, Samala RK. Improving image quality for digital breast tomosynthesis: an automated detection and diffusion-based method for metal artifact reduction. Phys Med Biol 2017; 62:7765-7783. [PMID: 28832336 DOI: 10.1088/1361-6560/aa8803] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In digital breast tomosynthesis (DBT), the high-attenuation metallic clips marking a previous biopsy site in the breast cause errors in the estimation of attenuation along the ray paths intersecting the markers during reconstruction, which result in interplane and inplane artifacts obscuring the visibility of subtle lesions. We proposed a new metal artifact reduction (MAR) method to improve image quality. Our method uses automatic detection and segmentation to generate a marker location map for each projection (PV). A voting technique based on the geometric correlation among different PVs is designed to reduce false positives (FPs) and to label the pixels on the PVs and the voxels in the imaged volume that represent the location and shape of the markers. An iterative diffusion method replaces the labeled pixels on the PVs with estimated tissue intensity from the neighboring regions while preserving the original pixel values in the neighboring regions. The inpainted PVs are then used for DBT reconstruction. The markers are repainted on the reconstructed DBT slices for radiologists' information. The MAR method is independent of reconstruction techniques or acquisition geometry. For the training set, the method achieved 100% success rate with one FP in 19 views. For the test set, the success rate by view was 97.2% for core biopsy microclips and 66.7% for clusters of large post-lumpectomy markers with a total of 10 FPs in 58 views. All FPs were large dense benign calcifications that also generated artifacts if they were not corrected by MAR. For the views with successful detection, the metal artifacts were reduced to a level that was not visually apparent in the reconstructed slices. The visibility of breast lesions obscured by the reconstruction artifacts from the metallic markers was restored.
Collapse
|
23
|
Samala RK, Chan HP, Hadjiiski L, Helvie MA, Wei J, Cha K. Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography. Med Phys 2017; 43:6654. [PMID: 27908154 DOI: 10.1118/1.4967345] [Citation(s) in RCA: 192] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
PURPOSE Develop a computer-aided detection (CAD) system for masses in digital breast tomosynthesis (DBT) volume using a deep convolutional neural network (DCNN) with transfer learning from mammograms. METHODS A data set containing 2282 digitized film and digital mammograms and 324 DBT volumes were collected with IRB approval. The mass of interest on the images was marked by an experienced breast radiologist as reference standard. The data set was partitioned into a training set (2282 mammograms with 2461 masses and 230 DBT views with 228 masses) and an independent test set (94 DBT views with 89 masses). For DCNN training, the region of interest (ROI) containing the mass (true positive) was extracted from each image. False positive (FP) ROIs were identified at prescreening by their previously developed CAD systems. After data augmentation, a total of 45 072 mammographic ROIs and 37 450 DBT ROIs were obtained. Data normalization and reduction of non-uniformity in the ROIs across heterogeneous data was achieved using a background correction method applied to each ROI. A DCNN with four convolutional layers and three fully connected (FC) layers was first trained on the mammography data. Jittering and dropout techniques were used to reduce overfitting. After training with the mammographic ROIs, all weights in the first three convolutional layers were frozen, and only the last convolution layer and the FC layers were randomly initialized again and trained using the DBT training ROIs. The authors compared the performances of two CAD systems for mass detection in DBT: one used the DCNN-based approach and the other used their previously developed feature-based approach for FP reduction. The prescreening stage was identical in both systems, passing the same set of mass candidates to the FP reduction stage. For the feature-based CAD system, 3D clustering and active contour method was used for segmentation; morphological, gray level, and texture features were extracted and merged with a linear discriminant classifier to score the detected masses. For the DCNN-based CAD system, ROIs from five consecutive slices centered at each candidate were passed through the trained DCNN and a mass likelihood score was generated. The performances of the CAD systems were evaluated using free-response ROC curves and the performance difference was analyzed using a non-parametric method. RESULTS Before transfer learning, the DCNN trained only on mammograms with an AUC of 0.99 classified DBT masses with an AUC of 0.81 in the DBT training set. After transfer learning with DBT, the AUC improved to 0.90. For breast-based CAD detection in the test set, the sensitivity for the feature-based and the DCNN-based CAD systems was 83% and 91%, respectively, at 1 FP/DBT volume. The difference between the performances for the two systems was statistically significant (p-value < 0.05). CONCLUSIONS The image patterns learned from the mammograms were transferred to the mass detection on DBT slices through the DCNN. This study demonstrated that large data sets collected from mammography are useful for developing new CAD systems for DBT, alleviating the problem and effort of collecting entirely new large data sets for the new modality.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Mark A Helvie
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Jun Wei
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Kenny Cha
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| |
Collapse
|
24
|
Cha KH, Hadjiiski L, Samala RK, Chan HP, Caoili EM, Cohan RH. Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets. Med Phys 2016; 43:1882. [PMID: 27036584 DOI: 10.1118/1.4944498] [Citation(s) in RCA: 171] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE The authors are developing a computerized system for bladder segmentation in CT urography (CTU) as a critical component for computer-aided detection of bladder cancer. METHODS A deep-learning convolutional neural network (DL-CNN) was trained to distinguish between the inside and the outside of the bladder using 160 000 regions of interest (ROI) from CTU images. The trained DL-CNN was used to estimate the likelihood of an ROI being inside the bladder for ROIs centered at each voxel in a CTU case, resulting in a likelihood map. Thresholding and hole-filling were applied to the map to generate the initial contour for the bladder, which was then refined by 3D and 2D level sets. The segmentation performance was evaluated using 173 cases: 81 cases in the training set (42 lesions, 21 wall thickenings, and 18 normal bladders) and 92 cases in the test set (43 lesions, 36 wall thickenings, and 13 normal bladders). The computerized segmentation accuracy using the DL likelihood map was compared to that using a likelihood map generated by Haar features and a random forest classifier, and that using our previous conjoint level set analysis and segmentation system (CLASS) without using a likelihood map. All methods were evaluated relative to the 3D hand-segmented reference contours. RESULTS With DL-CNN-based likelihood map and level sets, the average volume intersection ratio, average percent volume error, average absolute volume error, average minimum distance, and the Jaccard index for the test set were 81.9% ± 12.1%, 10.2% ± 16.2%, 14.0% ± 13.0%, 3.6 ± 2.0 mm, and 76.2% ± 11.8%, respectively. With the Haar-feature-based likelihood map and level sets, the corresponding values were 74.3% ± 12.7%, 13.0% ± 22.3%, 20.5% ± 15.7%, 5.7 ± 2.6 mm, and 66.7% ± 12.6%, respectively. With our previous CLASS with local contour refinement (LCR) method, the corresponding values were 78.0% ± 14.7%, 16.5% ± 16.8%, 18.2% ± 15.0%, 3.8 ± 2.3 mm, and 73.9% ± 13.5%, respectively. CONCLUSIONS The authors demonstrated that the DL-CNN can overcome the strong boundary between two regions that have large difference in gray levels and provides a seamless mask to guide level set segmentation, which has been a problem for many gradient-based segmentation methods. Compared to our previous CLASS with LCR method, which required two user inputs to initialize the segmentation, DL-CNN with level sets achieved better segmentation performance while using a single user input. Compared to the Haar-feature-based likelihood map, the DL-CNN-based likelihood map could guide the level sets to achieve better segmentation. The results demonstrate the feasibility of our new approach of using DL-CNN in combination with level sets for segmentation of the bladder.
Collapse
Affiliation(s)
- Kenny H Cha
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| | - Lubomir Hadjiiski
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| | - Ravi K Samala
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| | - Heang-Ping Chan
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| | - Elaine M Caoili
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| | - Richard H Cohan
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904
| |
Collapse
|
25
|
Cha KH, Hadjiiski LM, Samala RK, Chan HP, Cohan RH, Caoili EM, Paramagul C, Alva A, Weizer AZ. Bladder Cancer Segmentation in CT for Treatment Response Assessment: Application of Deep-Learning Convolution Neural Network-A Pilot Study. ACTA ACUST UNITED AC 2016; 2:421-429. [PMID: 28105470 PMCID: PMC5241049 DOI: 10.18383/j.tom.2016.00184] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Assessing the response of bladder cancer to neoadjuvant chemotherapy is crucial for reducing morbidity and increasing quality of life of patients. Changes in tumor volume during treatment is generally used to predict treatment outcome. We are developing a method for bladder cancer segmentation in CT using a pilot data set of 62 cases. 65 000 regions of interests were extracted from pre-treatment CT images to train a deep-learning convolution neural network (DL-CNN) for tumor boundary detection using leave-one-case-out cross-validation. The results were compared to our previous AI-CALS method. For all lesions in the data set, the longest diameter and its perpendicular were measured by two radiologists, and 3D manual segmentation was obtained from one radiologist. The World Health Organization (WHO) criteria and the Response Evaluation Criteria In Solid Tumors (RECIST) were calculated, and the prediction accuracy of complete response to chemotherapy was estimated by the area under the receiver operating characteristic curve (AUC). The AUCs were 0.73 ± 0.06, 0.70 ± 0.07, and 0.70 ± 0.06, respectively, for the volume change calculated using DL-CNN segmentation, the AI-CALS and the manual contours. The differences did not achieve statistical significance. The AUCs using the WHO criteria were 0.63 ± 0.07 and 0.61 ± 0.06, while the AUCs using RECIST were 0.65 ± 007 and 0.63 ± 0.06 for the two radiologists, respectively. Our results indicate that DL-CNN can produce accurate bladder cancer segmentation for calculation of tumor size change in response to treatment. The volume change performed better than the estimations from the WHO criteria and RECIST for the prediction of complete response.
Collapse
Affiliation(s)
- Kenny H Cha
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | | | - Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | - Richard H Cohan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | - Elaine M Caoili
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | | | - Ajjai Alva
- Department of Internal Medicine, Hematology-Oncology, University of Michigan, Ann Arbor, Michigan
| | - Alon Z Weizer
- Department of Urology, Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
26
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA. Analysis of computer-aided detection techniques and signal characteristics for clustered microcalcifications on digital mammography and digital breast tomosynthesis. Phys Med Biol 2016; 61:7092-7112. [PMID: 27648708 DOI: 10.1088/0031-9155/61/19/7092] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
With IRB approval, digital breast tomosynthesis (DBT) images of human subjects were collected using a GE GEN2 DBT prototype system. Corresponding digital mammograms (DMs) of the same subjects were collected retrospectively from patient files. The data set contained a total of 237 views of DBT and equal number of DM views from 120 human subjects, each included 163 views with microcalcification clusters (MCs) and 74 views without MCs. The data set was separated into training and independent test sets. The pre-processing, object prescreening and segmentation, false positive reduction and clustering strategies for MC detection by three computer-aided detection (CADe) systems designed for DM, DBT, and a planar projection image generated from DBT were analyzed. Receiver operating characteristic (ROC) curves based on features extracted from microcalcifications and free-response ROC (FROC) curves based on scores from MCs were used to quantify the performance of the systems. Jackknife FROC (JAFROC) and non-parametric analysis methods were used to determine the statistical difference between the FROC curves. The difference between the CADDM and CADDBT systems when the false positive rate was estimated from cases without MCs did not reach statistical significance. The study indicates that the large search space in DBT may not be a limiting factor for CADe to achieve similar performance as that observed in DM.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, USA
| | | | | | | |
Collapse
|
27
|
Samala RK, Chan HP, Lu Y, Hadjiiski LM, Wei J, Helvie MA. Computer-aided detection system for clustered microcalcifications in digital breast tomosynthesis using joint information from volumetric and planar projection images. Phys Med Biol 2015; 60:8457-79. [PMID: 26464355 DOI: 10.1088/0031-9155/60/21/8457] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose a novel approach for the detection of microcalcification clusters (MCs) using joint information from digital breast tomosynthesis (DBT) volume and planar projection (PPJ) image. A data set of 307 DBT views was collected with IRB approval using a prototype DBT system. The system acquires 21 projection views (PVs) from a wide tomographic angle of 60° (60°-21PV) at about twice the dose of a digital mammography (DM) system, which allows us the flexibility of simulating other DBT acquisition geometries using a subset of the PVs. In this study, we simulated a 30° DBT geometry using the central 11 PVs (30°-11PV). The narrower tomographic angle is closer to DBT geometries commercially available or under development and the dose is matched approximately to that of a DM. We developed a new joint-CAD system for detection of clustered microcalcifications. The DBT volume was reconstructed with a multiscale bilateral filtering regularized method and a PPJ image was generated from the reconstructed volume. Task-specific detection strategies were designed to combine information from the DBT volume and the PPJ image. The data set was divided into a training set (127 views with MCs) and an independent test set (104 views with MCs and 76 views without MCs). The joint-CAD system outperformed the individual CAD systems for DBT volume or PPJ image alone; the differences in the test performances were statistically significant (p < 0.05) using JAFROC analysis.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109842, USA
| | | | | | | | | | | |
Collapse
|
28
|
Lu Y, Chan HP, Wei J, Hadjiiski LM, Samala RK. Multiscale bilateral filtering for improving image quality in digital breast tomosynthesis. Med Phys 2015; 42:182-95. [PMID: 25563259 DOI: 10.1118/1.4903283] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE Detection of subtle microcalcifications in digital breast tomosynthesis (DBT) is a challenging task because of the large, noisy DBT volume. It is important to enhance the contrast-to-noise ratio (CNR) of microcalcifications in DBT reconstruction. Most regularization methods depend on local gradient and may treat the ill-defined margins or subtle spiculations of masses and subtle microcalcifications as noise because of their small gradient. The authors developed a new multiscale bilateral filtering (MSBF) regularization method for the simultaneous algebraic reconstruction technique (SART) to improve the CNR of microcalcifications without compromising the quality of masses. METHODS The MSBF exploits a multiscale structure of DBT images to suppress noise and selectively enhance high frequency structures. At the end of each SART iteration, every DBT slice is decomposed into several frequency bands via Laplacian pyramid decomposition. No regularization is applied to the low frequency bands so that subtle edges of masses and structured background are preserved. Bilateral filtering is applied to the high frequency bands to enhance microcalcifications while suppressing noise. The regularized DBT images are used for updating in the next SART iteration. The new MSBF method was compared with the nonconvex total p-variation (TpV) method for noise regularization with SART. A GE GEN2 prototype DBT system was used for acquisition of projections at 21 angles in 3° increments over a ± 30° range. The reconstruction image quality with no regularization (NR) and that with the two regularization methods were compared using the DBT scans of a heterogeneous breast phantom and several human subjects with masses and microcalcifications. The CNR and the full width at half maximum (FWHM) of the line profiles of microcalcifications and across the spiculations within their in-focus DBT slices were used as image quality measures. RESULTS The MSBF method reduced contouring artifacts and enhanced the CNR of microcalcifications compared to the TpV method, thus preserving the image quality of the structured background. The MSBF method achieved the highest CNR of microcalcifications among the three methods. The FWHM of the microcalcifications and mass spiculations resulting from the MSBF method was comparable to that without regularization, and superior to that of the TpV method. CONCLUSIONS The SART regularized by the multiscale bilateral filtering method enhanced the CNR of microcalcifications and preserved the sharpness of microcalcifications and spiculated masses. The MSBF method provided better image quality of the structured background and was superior to TpV and NR for enhancing microcalcifications while preserving the appearance of mass margins.
Collapse
Affiliation(s)
- Yao Lu
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | - Jun Wei
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| | | | - Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109
| |
Collapse
|
29
|
Samala RK, Chan HP, Lu Y, Hadjiiski LM, Wei J, Helvie MA. Digital breast tomosynthesis: computer-aided detection of clustered microcalcifications on planar projection images. Phys Med Biol 2014; 59:7457-77. [PMID: 25393654 DOI: 10.1088/0031-9155/59/23/7457] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
This paper describes a new approach to detect microcalcification clusters (MCs) in digital breast tomosynthesis (DBT) via its planar projection (PPJ) image. With IRB approval, two-view (cranio-caudal and mediolateral oblique views) DBTs of human subject breasts were obtained with a GE GEN2 prototype DBT system that acquires 21 projection angles spanning 60° in 3° increments. A data set of 307 volumes (154 human subjects) was divided by case into independent training (127 with MCs) and test sets (104 with MCs and 76 free of MCs). A simultaneous algebraic reconstruction technique with multiscale bilateral filtering (MSBF) regularization was used to enhance microcalcifications and suppress noise. During the MSBF regularized reconstruction, the DBT volume was separated into high frequency (HF) and low frequency components representing microcalcifications and larger structures. At the final iteration, maximum intensity projection was applied to the regularized HF volume to generate a PPJ image that contained MCs with increased contrast-to-noise ratio (CNR) and reduced search space. High CNR objects in the PPJ image were extracted and labeled as microcalcification candidates. Convolution neural network trained to recognize the image pattern of microcalcifications was used to classify the candidates into true calcifications and tissue structures and artifacts. The remaining microcalcification candidates were grouped into MCs by dynamic conditional clustering based on adaptive CNR threshold and radial distance criteria. False positive (FP) clusters were further reduced using the number of candidates in a cluster, CNR and size of microcalcification candidates. At 85% sensitivity an FP rate of 0.71 and 0.54 was achieved for view- and case-based sensitivity, respectively, compared to 2.16 and 0.85 achieved in DBT. The improvement was significant (p-value = 0.003) by JAFROC analysis.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, USA
| | | | | | | | | | | |
Collapse
|
30
|
Samala RK, Chan HP, Lu Y, Hadjiiski L, Wei J, Sahiner B, Helvie MA. Computer-aided detection of clustered microcalcifications in multiscale bilateral filtering regularized reconstructed digital breast tomosynthesis volume. Med Phys 2014; 41:021901. [PMID: 24506622 PMCID: PMC3977832 DOI: 10.1118/1.4860955] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 12/18/2013] [Accepted: 12/18/2013] [Indexed: 01/03/2023] Open
Abstract
PURPOSE Develop a computer-aided detection (CADe) system for clustered microcalcifications in digital breast tomosynthesis (DBT) volume enhanced with multiscale bilateral filtering (MSBF) regularization. METHODS With Institutional Review Board approval and written informed consent, two-view DBT of 154 breasts, of which 116 had biopsy-proven microcalcification (MC) clusters and 38 were free of MCs, was imaged with a General Electric GEN2 prototype DBT system. The DBT volumes were reconstructed with MSBF-regularized simultaneous algebraic reconstruction technique (SART) that was designed to enhance MCs and reduce background noise while preserving the quality of other tissue structures. The contrast-to-noise ratio (CNR) of MCs was further improved with enhancement-modulated calcification response (EMCR) preprocessing, which combined multiscale Hessian response to enhance MCs by shape and bandpass filtering to remove the low-frequency structured background. MC candidates were then located in the EMCR volume using iterative thresholding and segmented by adaptive region growing. Two sets of potential MC objects, cluster centroid objects and MC seed objects, were generated and the CNR of each object was calculated. The number of candidates in each set was controlled based on the breast volume. Dynamic clustering around the centroid objects grouped the MC candidates to form clusters. Adaptive criteria were designed to reduce false positive (FP) clusters based on the size, CNR values and the number of MCs in the cluster, cluster shape, and cluster based maximum intensity projection. Free-response receiver operating characteristic (FROC) and jackknife alternative FROC (JAFROC) analyses were used to assess the performance and compare with that of a previous study. RESULTS Unpaired two-tailed t-test showed a significant increase (p < 0.0001) in the ratio of CNRs for MCs with and without MSBF regularization compared to similar ratios for FPs. For view-based detection, a sensitivity of 85% was achieved at an FP rate of 2.16 per DBT volume. For case-based detection, a sensitivity of 85% was achieved at an FP rate of 0.85 per DBT volume. JAFROC analysis showed a significant improvement in the performance of the current CADe system compared to that of our previous system (p = 0.003). CONCLUSIONS MBSF regularized SART reconstruction enhances MCs. The enhancement in the signals, in combination with properly designed adaptive threshold criteria, effective MC feature analysis, and false positive reduction techniques, leads to a significant improvement in the detection of clustered MCs in DBT.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| | - Yao Lu
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| | - Jun Wei
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| | - Berkman Sahiner
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Maryland 20993
| | - Mark A Helvie
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842
| |
Collapse
|
31
|
Gun'ko VM, Blitz JP, Zarko VI, Turov VV, Pakhlov EM, Oranska OI, Goncharuk EV, Gornikov YI, Sergeev VS, Kulik TV, Palyanytsya BB, Samala RK. Structural and adsorption characteristics and catalytic activity of titania and titania-containing nanomaterials. J Colloid Interface Sci 2008; 330:125-37. [PMID: 18996539 DOI: 10.1016/j.jcis.2008.10.049] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Revised: 09/22/2008] [Accepted: 10/11/2008] [Indexed: 11/18/2022]
Abstract
Morphological, structural, adsorption, and catalytic properties of highly disperse titania prepared using sulfate and pyrogenic methods, and fumed titania-containing mixed oxides, were studied using XRD, TG/DTA, nitrogen adsorption, (1)H NMR, FTIR, microcalorimetry on immersion of oxides in water and decane, thermally stimulated depolarization current (TSDC) and catalytic photodecomposition of methylene blue (MB). Phase composition and aggregation characteristics of nanoparticles (pore size distribution) of sulfate and pyrogenically prepared titania are very different; temperature dependent structural properties are thus very different. Catalytic activity for the photodecomposition of MB is greatest (per gram of TiO(2) for the pure oxide materials) for non-treated ultrafine titania PC-500, which has the largest S(BET) value and smallest particle size of the materials studied. However, this activity calculated per m(2) is higher for PC-105, possessing a much smaller S(BET) value than PC-500. The activity per unit surface area of titania is greatest for the fumed silica-titania mixed oxide ST20. Calcination of PC-500 at 650 degrees C leads to enhancement of anatase content and catalytic activity, but heating at 800 and 900 degrees C lowers the anatase content (since rutile appears) and diminishes catalytic activity, as well as the specific surface area because of nanoparticle sintering.
Collapse
Affiliation(s)
- V M Gun'ko
- Institute of Surface Chemistry, 17 General Naumov Street, 03164 Kiev, Ukraine
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|