Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

28
(from Reference Citation Analysis)

Article PDFs (12)

Cited by > 0 (24)

Searched Name

Po-Hsuan Cameron Chen

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study. EClinicalMedicine 2024;70:102479. [PMID: 38685924 PMCID: PMC11056401 DOI: 10.1016/j.eclinm.2024.102479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/16/2024] [Accepted: 01/25/2024] [Indexed: 05/02/2024] Open Abstract Background Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding Google LLC. Collapse Key Words Artificial intelligence Dermatology Health equity Machine learning Collapse MESH Headings Collapse Grants Collapse
2	An End-to-End Platform for Digital Pathology Using Hyperspectral Autofluorescence Microscopy and Deep Learning-Based Virtual Histology. Mod Pathol 2024;37:100377. [PMID: 37926422 DOI: 10.1016/j.modpat.2023.100377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/07/2023] Abstract Conventional histopathology involves expensive and labor-intensive processes that often consume tissue samples, rendering them unavailable for other analyses. We present a novel end-to-end workflow for pathology powered by hyperspectral microscopy and deep learning. First, we developed a custom hyperspectral microscope to nondestructively image the autofluorescence of unstained tissue sections. We then trained a deep learning model to use autofluorescence to generate virtual histologic stains, which avoids the cost and variability of chemical staining procedures and conserves tissue samples. We showed that the virtual images reproduce the histologic features present in the real-stained images using a randomized nonalcoholic steatohepatitis (NASH) scoring comparison study, where both real and virtual stains are scored by pathologists (D.T., A.D.B., R.K.P.). The test showed moderate-to-good concordance between pathologists' scoring on corresponding real and virtual stains. Finally, we developed deep learning-based models for automated NASH Clinical Research Network score prediction. We showed that the end-to-end automated pathology platform is comparable with an independent panel of pathologists for NASH Clinical Research Network scoring when evaluated against the expert pathologist consensus scores. This study provides proof of concept for this virtual staining strategy, which could improve cost, efficiency, and reliability in pathology and enable novel approaches to spatial biology research. Collapse Key Words artificial intelligence deep learning hyperspectral microscopy machine learning nonalcoholic steatohepatitis virtual staining Collapse MESH Headings Humans Deep Learning Microscopy Non-alcoholic Fatty Liver Disease Reproducibility of Results Pathologists Collapse Grants Collapse
3	Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance. Arch Pathol Lab Med 2024:498573. [PMID: 38244054 DOI: 10.5858/arpa.2023-0296-oa] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 01/22/2024] Abstract CONTEXT.— Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine. OBJECTIVES.— To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4. DESIGN.— An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist that recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT. RESULTS.— GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5. CONCLUSIONS.— By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
4	Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng 2023:10.1038/s41551-023-01049-7. [PMID: 37291435 DOI: 10.1038/s41551-023-01049-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 05/02/2023] [Indexed: 06/10/2023] Abstract Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
5	Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning. COMMUNICATIONS MEDICINE 2023;3:59. [PMID: 37095223 PMCID: PMC10125969 DOI: 10.1038/s43856-023-00282-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 03/29/2023] [Indexed: 04/26/2023] Open Abstract BACKGROUND Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. METHODS Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. RESULTS The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). CONCLUSION This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
6	Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification. JAMA Netw Open 2023;6:e2254891. [PMID: 36917112 PMCID: PMC10015309 DOI: 10.1001/jamanetworkopen.2022.54891] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/16/2023] Open Abstract IMPORTANCE Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. OBJECTIVE To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. DESIGN, SETTING, AND PARTICIPANTS This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. MAIN OUTCOMES AND MEASURES Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. RESULTS A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). CONCLUSIONS AND RELEVANCE In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice. Collapse Key Words Collapse MESH Headings Male Humans Aged Colonic Neoplasms/diagnosis Pathologists Artificial Intelligence Adenocarcinoma Machine Learning Risk Assessment Collapse Grants Collapse
7	Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists. Radiology 2023;306:124-137. [PMID: 36066366 DOI: 10.1148/radiol.212213] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Abstract Background The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning ("noisy-student") were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%-80% per TB-positive patient detected. Conclusion A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
8	BrainIAK: The Brain Imaging Analysis Kit. APERTURE NEURO 2022;1. [PMID: 35939268 PMCID: PMC9351935 DOI: 10.52294/31bb5b68-2184-411b-8c00-a1dacb61e1da] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Abstract Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEMs), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high-performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
9	Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat Med 2022;28:154-163. [PMID: 35027755 PMCID: PMC8799467 DOI: 10.1038/s41591-021-01620-2] [Citation(s) in RCA: 81] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022] Abstract Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge-the largest histopathology competition to date, joined by 1,290 developers-to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840-0.884) and 0.868 (95% CI, 0.835-0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials. Collapse Key Words machine learning prostate cancer medical imaging Collapse MESH Headings Algorithms Biopsy Cohort Studies Humans Male Neoplasm Grading Prostatic Neoplasms/diagnosis Prostatic Neoplasms/pathology Reproducibility of Results Collapse Grants KWF Kankerbestrijding (Dutch Cancer Society) Google LLC, Verily Life Sciences Syöpäsäätiö (Cancer Foundation Finland) Academy of Finland (Suomen Akatemia) ERAPermed Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Netherlands Organisation for Scientific Research) Vetenskapsrådet (Swedish Research Council) Cancerfonden (Swedish Cancer Society) Åke Wiberg Stiftelse (Åke Wiberg Foundation) EIT Health, Prostatacancerförbundet Collapse
10	Evaluation of artificial intelligence on a reference standard based on subjective interpretation. Lancet Digit Health 2021;3:e693-e695. [PMID: 34561202 DOI: 10.1016/s2589-7500(21)00216-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/04/2021] [Accepted: 09/02/2021] [Indexed: 12/23/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19. Sci Rep 2021;11:15523. [PMID: 34471144 PMCID: PMC8410908 DOI: 10.1038/s41598-021-93967-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/01/2021] [Indexed: 01/20/2023] Open Abstract Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to detect every possible condition by building multiple separate systems, each of which detects one or more pre-specified conditions. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For training and tuning the system, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. Lastly, to facilitate the continued development of AI models for CXR, we release our collected labels for the publicly available dataset. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
12	Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images. Sci Rep 2021;11:16605. [PMID: 34400666 PMCID: PMC8368039 DOI: 10.1038/s41598-021-95747-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 07/12/2021] [Indexed: 01/11/2023] Open Abstract Both histologic subtypes and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis and treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens and costly, time-consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but can be challenging and is associated with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using de-identified Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to map histologic features across whole slide images of lung cancer resection specimens. On evaluation using an external data source, this model achieved patch-level area under the receiver operating characteristic curve (AUC) of 0.78–0.98 across nine histologic features. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification. The resulting end-to-end system was evaluated on 172 held out cases from TCGA, achieving an AUC of 0.71 (95% CI 0.63–0.80). The benefit of using histologic features in predicting TMB is highlighted by the significant improvement this approach offers over using the clinical features alone (AUC of 0.63 [95% CI 0.53–0.72], p = 0.002). Furthermore, we found that our histologic subtype-based approach achieved performance similar to that of a weakly supervised approach (AUC of 0.72 [95% CI 0.64–0.80]). Together these results underscore that incorporating histologic patterns in biomarker prediction for lung cancer provides informative signals, and that interpretable approaches utilizing these patterns perform comparably with less interpretable, weakly supervised approaches. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
13	Determining breast cancer biomarker status and associated morphological features using deep learning. COMMUNICATIONS MEDICINE 2021;1:14. [PMID: 35602213 PMCID: PMC9037318 DOI: 10.1038/s43856-021-00013-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 06/18/2021] [Indexed: 11/09/2022] Open Abstract Abstract Background Breast cancer management depends on biomarkers including estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (ER/PR/HER2). Though existing scoring systems are widely used and well-validated, they can involve costly preparation and variable interpretation. Additionally, discordances between histology and expected biomarker findings can prompt repeat testing to address biological, interpretative, or technical reasons for unexpected results. Methods We developed three independent deep learning systems (DLS) to directly predict ER/PR/HER2 status for both focal tissue regions (patches) and slides using hematoxylin-and-eosin-stained (H&E) images as input. Models were trained and evaluated using pathologist annotated slides from three data sources. Areas under the receiver operator characteristic curve (AUCs) were calculated for test sets at both a patch-level (>135 million patches, 181 slides) and slide-level (n = 3274 slides, 1249 cases, 37 sites). Interpretability analyses were performed using Testing with Concept Activation Vectors (TCAV), saliency analysis, and pathologist review of clustered patches. Results The patch-level AUCs are 0.939 (95%CI 0.936–0.941), 0.938 (0.936–0.940), and 0.808 (0.802–0.813) for ER/PR/HER2, respectively. At the slide level, AUCs are 0.86 (95%CI 0.84–0.87), 0.75 (0.73–0.77), and 0.60 (0.56–0.64) for ER/PR/HER2, respectively. Interpretability analyses show known biomarker-histomorphology associations including associations of low-grade and lobular histology with ER/PR positivity, and increased inflammatory infiltrates with triple-negative staining. Conclusions This study presents rapid breast cancer biomarker estimation from routine H&E slides and builds on prior advances by prioritizing interpretability of computationally learned features in the context of existing pathological knowledge. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens. JAMA Oncol 2021;6:1372-1380. [PMID: 32701148 PMCID: PMC7378872 DOI: 10.1001/jamaoncol.2020.2485] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Abstract Question How does a deep learning system for assessing prostate biopsy specimens compare with interpretations determined by specialists in urologic pathology and by general pathologists? Findings In a validation data set of 752 biopsy specimens obtained from 2 independent medical laboratories and a tertiary teaching hospital, this study found that rate of agreement with subspecialists was significantly higher for the deep learning system than it was for a cohort of general pathologists. Meaning The deep learning system warrants evaluation as an assistive tool for improving prostate cancer diagnosis and treatment decisions, especially where subspecialist expertise is unavailable. Importance For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice. Objective To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens. Design, Setting, and Participants The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019. Main Outcomes and Measures The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists’ opinions with the subspecialists’ majority opinions was also evaluated. Results For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P < .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58). Conclusions and Relevance In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies. JAMA Netw Open 2020;3:e2023267. [PMID: 33180129 PMCID: PMC7662146 DOI: 10.1001/jamanetworkopen.2020.23267] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open Abstract IMPORTANCE Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored. OBJECTIVE To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists. EXPOSURE An AI-based assistive tool for Gleason grading of prostate biopsies. MAIN OUTCOMES AND MEASURES Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies. RESULTS Biopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence-assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%; P < .001) in agreement with subspecialists (from 69.7% for unassisted reviews to 75.3% for assisted reviews) across all biopsies and a 6.2% increase (95% CI, 2.7%-9.8%; P = .001) in agreement with subspecialists (from 72.3% for unassisted reviews to 78.5% for assisted reviews) for grade group 1 biopsies. A secondary analysis indicated that AI assistance was also associated with improvements in tumor detection, mean review time, mean self-reported confidence, and interpathologist agreement. CONCLUSIONS AND RELEVANCE In this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading. Collapse Key Words Collapse MESH Headings Adult Aged Aged, 80 and over Artificial Intelligence/standards Biopsy, Large-Core Needle/statistics & numerical data Humans Male Middle Aged Neoplasm Grading Pathology, Clinical/standards Prostatic Neoplasms/diagnosis Prostatic Neoplasms/pathology Retrospective Studies Collapse Grants Collapse
16	Closing the translation gap: AI applications in digital pathology. Biochim Biophys Acta Rev Cancer 2020;1875:188452. [PMID: 33065195 DOI: 10.1016/j.bbcan.2020.188452] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 07/31/2020] [Accepted: 10/09/2020] [Indexed: 02/07/2023] Abstract Recent advances in artificial intelligence show tremendous promise to improve the accuracy, reproducibility, and availability of medical diagnostics across a number of medical subspecialities. This is especially true in the field of digital pathology, which has recently witnessed a surge in publications describing state-of-the-art performance for machine learning models across a wide range of diagnostic applications. Nonetheless, despite this promise, there remain significant gaps in translating applications for any of these technologies into actual clinical practice. In this review, we will first give a brief overview of the recent progress in applying AI to digitized pathology images, focusing on how these tools might be applied in clinical workflows in the near term to improve the accuracy and efficiency of pathologists. Then we define and describe in detail the various factors that need to be addressed in order to successfully close the "translation gap" for AI applications in digital pathology. Collapse Key Words Artificial intelligence Cancer diagnosis Computer vision Deep learning Digital pathology Collapse MESH Headings Collapse Grants Collapse
17	Abstract 2096: A deep learning system to predict disease-specific survival in stage II and stage III colorectal cancer. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-2096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Accurate prognosis in colorectal cancer can have important implications for clinical management. Here, we develop a deep learning system (DLS) to first identify invasive cancer and then directly predict disease specific survival (DSS) for stage II and stage III colorectal cancer using only digitized histopathology whole-slide images. The DLS was trained using slides from 1173 stage II and 1266 stage III cases (18,304 total slides) and was evaluated on a held-out test set of 601 stage II and 638 stage III cases (9,340 total slides). The area under the receiver operating characteristic curve (AUC) for 5-year DSS prediction was 68.0 for stage II (95% CI 62.2-73.1) and 65.5 for stage III (95% CI 61.1-70.0). For stage II, 5-year DSS was 64% for DLS-predicted high-risk cases versus 89% for DLS-predicted low-risk cases (upper and lower risk quartiles; p<0.001, log rank test). For stage III, 5-year DSS was 35% for DLS-predicted high-risk cases versus 66% for DLS-predicted low-risk cases (upper and lower risk quartiles; p<0.001, log rank test). In a multivariable Cox model, the DLS prediction remained significantly associated with DSS after adjusting for T-category, N-category, age, gender, tumor grade, and lymphovascular invasion (stage II: adjusted hazard ratio 1.55, 95% CI 1.33-1.81, p<0.0001; stage III: adjusted hazard ratio 1.35, 95% CI (1.21-1.51), p<0.0001). Finally, a combined proportional-hazards model using the DLS along with baseline clinicopathologic information provided better risk prediction than the DLS or baseline information alone, increasing 5-year AUC over the baseline-only model by 8.9 points (95% CI 3.9-13.6) and 5.3 points (95% CI 2.3-8.4) for stages II and III, respectively. Taken together, these findings demonstrate that the DLS provides significant prognostic value and risk stratification in both stage II and stage III colorectal cancer, and can be combined with known risk features to further improve prognostic accuracy. This represents novel work to train a DLS to directly predict patient outcomes using whole-slide images and weakly supervised learning. The ability to use non-annotated slides as input has important implications for possible clinical applications and the features learned by the model may also help to identify new prognosis-associated morphologic factors in colorectal cancer. Additional work is ongoing to confirm the utility of these findings, such as validation in additional datasets and interpretability experiments to better understand the features learned by the DLS for these predictions. Citation Format: Ellery Wulczyn, David F. Steiner, Melissa Moran, Markus Plass, Robert Reihs, Heimo Mueller, Apaar Sadhwani, Yuannan Cai, Isabelle Flament, Po-Hsuan Cameron Chen, Yun Liu, Martin C. Stumpe, Zhaoyang Xu, Kurt Zatloukal, Craig H. Mermel. A deep learning system to predict disease-specific survival in stage II and stage III colorectal cancer [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2096. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
18	Current and future applications of artificial intelligence in pathology: a clinical perspective. J Clin Pathol 2020;74:409-414. [PMID: 32763920 DOI: 10.1136/jclinpath-2020-206908] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 07/07/2020] [Indexed: 12/17/2022] Abstract During the last decade, a dramatic rise in the development and application of artificial intelligence (AI) tools for use in pathology services has occurred. This trend is often expected to continue and reshape the field of pathology in the coming years. The deployment of computational pathology and applications of AI tools can be considered as a paradigm shift that will change pathology services, making them more efficient and capable of meeting the needs of this era of precision medicine. Despite the success of AI models, the translational process from discovery to clinical applications has been slow. The gap between self-contained research and clinical environment may be too wide and has been largely neglected. In this review, we cover the current and prospective applications of AI in pathology. We examine its applications in diagnosis and prognosis, and we offer insights for considerations that could improve clinical applicability of these tools. Then, we discuss its potential to improve workflow efficiency, and its benefits in pathologist education. Finally, we review the factors that could influence adoption in clinical practices and the associated regulatory processes. Collapse Key Words breast pathology department, hospital telepathology Collapse MESH Headings Collapse Grants Collapse
19	Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One 2020;15:e0233678. [PMID: 32555646 PMCID: PMC7299324 DOI: 10.1371/journal.pone.0233678] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 05/10/2020] [Indexed: 12/12/2022] Open Abstract Providing prognostic information at the time of cancer diagnosis has important implications for treatment and monitoring. Although cancer staging, histopathological assessment, molecular features, and clinical variables can provide useful prognostic insights, improving risk stratification remains an active research area. We developed a deep learning system (DLS) to predict disease specific survival across 10 cancer types from The Cancer Genome Atlas (TCGA). We used a weakly-supervised approach without pixel-level annotations, and tested three different survival loss functions. The DLS was developed using 9,086 slides from 3,664 cases and evaluated using 3,009 slides from 1,216 cases. In multivariable Cox regression analysis of the combined cohort including all 10 cancers, the DLS was significantly associated with disease specific survival (hazard ratio of 1.58, 95% CI 1.28–1.70, p<0.0001) after adjusting for cancer type, stage, age, and sex. In a per-cancer adjusted subanalysis, the DLS remained a significant predictor of survival in 5 of 10 cancer types. Compared to a baseline model including stage, age, and sex, the c-index of the model demonstrated an absolute 3.7% improvement (95% CI 1.0–6.5) in the combined cohort. Additionally, our models stratified patients within individual cancer stages, particularly stage II (p = 0.025) and stage III (p<0.001). By developing and evaluating prognostic models across multiple cancer types, this work represents one of the most comprehensive studies exploring the direct prediction of clinical outcomes using deep learning and histopathology images. Our analysis demonstrates the potential for this approach to provide significant prognostic information in multiple cancer types, and even within specific pathologic stages. However, given the relatively small number of cases and observed clinical events for a deep learning task of this type, we observed wide confidence intervals for model performance, thus highlighting that future work will benefit from larger datasets assembled for the purposes for survival modeling. Collapse Key Words Collapse MESH Headings Adult Age Factors Datasets as Topic Deep Learning Feasibility Studies Female Humans Image Processing, Computer-Assisted/methods Male Middle Aged Neoplasm Staging Neoplasms/diagnosis Neoplasms/mortality Neoplasms/pathology Prognosis Risk Assessment/methods Risk Factors Sex Factors Survival Analysis Collapse Grants Google (US) Collapse
20	Artificial intelligence in digital breast pathology: Techniques and applications. Breast 2019;49:267-273. [PMID: 31935669 PMCID: PMC7375550 DOI: 10.1016/j.breast.2019.12.007] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/12/2019] [Indexed: 12/16/2022] Open Abstract Breast cancer is the most common cancer and second leading cause of cancer-related death worldwide. The mainstay of breast cancer workup is histopathological diagnosis - which guides therapy and prognosis. However, emerging knowledge about the complex nature of cancer and the availability of tailored therapies have exposed opportunities for improvements in diagnostic precision. In parallel, advances in artificial intelligence (AI) along with the growing digitization of pathology slides for the primary diagnosis are a promising approach to meet the demand for more accurate detection, classification and prediction of behaviour of breast tumours. In this article, we cover the current and prospective uses of AI in digital pathology for breast cancer, review the basics of digital pathology and AI, and outline outstanding challenges in the field. Collapse Key Words (Artificial intelligence) (Deep learning) (Machine learning) (Whole slide image) AI Applications Breast cancer Breast pathology DL Digital ML Pathology WSI Collapse MESH Headings Collapse Grants Collapse
21	Whole-Slide Image Focus Quality: Automatic Assessment and Impact on AI Cancer Detection. J Pathol Inform 2019;10:39. [PMID: 31921487 PMCID: PMC6939343 DOI: 10.4103/jpi.jpi_11_19] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 09/29/2019] [Indexed: 12/24/2022] Open Abstract Background Digital pathology enables remote access or consults and powerful image analysis algorithms. However, the slide digitization process can create artifacts such as out-of-focus (OOF). OOF is often only detected on careful review, potentially causing rescanning, and workflow delays. Although scan time operator screening for whole-slide OOF is feasible, manual screening for OOF affecting only parts of a slide is impractical. Methods We developed a convolutional neural network (ConvFocus) to exhaustively localize and quantify the severity of OOF regions on digitized slides. ConvFocus was developed using our refined semi-synthetic OOF data generation process and evaluated using seven slides spanning three different tissue and three different stain types, each of which were digitized using two different whole-slide scanner models ConvFocus's predictions were compared with pathologist-annotated focus quality grades across 514 distinct regions representing 37,700 35 μm × 35 μm image patches, and 21 digitized "z-stack" WSIs that contain known OOF patterns. Results When compared to pathologist-graded focus quality, ConvFocus achieved Spearman rank coefficients of 0.81 and 0.94 on two scanners and reproduced the expected OOF patterns from z-stack scanning. We also evaluated the impact of OOF on the accuracy of a state-of-the-art metastatic breast cancer detector and saw a consistent decrease in performance with increasing OOF. Conclusions Comprehensive whole-slide OOF categorization could enable rescans before pathologist review, potentially reducing the impact of digitization focus issues on the clinical workflow. We show that the algorithm trained on our semi-synthetic OOF data generalizes well to real OOF regions across tissue types, stains, and scanners. Finally, quantitative OOF maps can flag regions that might otherwise be misclassified by image analysis algorithms, preventing OOF-induced errors. Collapse Key Words Computer-aided diagnostics digital pathology focus quality out-of-focus quality control Collapse MESH Headings Collapse Grants Collapse
22	Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. Radiology 2019;294:421-431. [PMID: 31793848 DOI: 10.1148/radiol.2019191293] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Abstract BackgroundDeep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies.PurposeTo develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards.Materials and MethodsDeep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language processing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to account for positive radiograph enrichment and estimate population-level performance.ResultsIn DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively.ConclusionExpert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX-ray14 validation set images and 1962 test set images are provided.© RSNA, 2019Online supplemental material is available for this article.See also the editorial by Chang in this issue. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
23	Erratum: Publisher Correction: Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019;2:113. [PMID: 31754638 PMCID: PMC6864046 DOI: 10.1038/s41746-019-0196-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract Collapse Key Words Prostate cancer Collapse MESH Headings Collapse Grants Collapse
24	How to Read Articles That Use Machine Learning: Users' Guides to the Medical Literature. JAMA 2019;322:1806-1816. [PMID: 31714992 DOI: 10.1001/jama.2019.16489] [Citation(s) in RCA: 278] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Abstract In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning-based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading. Collapse Key Words Collapse MESH Headings Algorithms Humans Machine Learning Models, Theoretical Publications Sensitivity and Specificity Collapse Grants Collapse
25	Reply: 'The importance of study design in the application of artificial intelligence methods in medicine'. NPJ Digit Med 2019;2:100. [PMID: 31646182 PMCID: PMC6800435 DOI: 10.1038/s41746-019-0175-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 08/19/2019] [Indexed: 11/30/2022] Open Abstract Collapse Key Words Pathology Prostate cancer Collapse MESH Headings Collapse Grants Collapse
26	An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat Med 2019;25:1453-1457. [PMID: 31406351 DOI: 10.1038/s41591-019-0539-7] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 07/02/2019] [Indexed: 12/17/2022] Abstract The microscopic assessment of tissue samples is instrumental for the diagnosis and staging of cancer, and thus guides therapy. However, these assessments demonstrate considerable variability and many regions of the world lack access to trained pathologists. Though artificial intelligence (AI) promises to improve the access and quality of healthcare, the costs of image digitization in pathology and difficulties in deploying AI solutions remain as barriers to real-world use. Here we propose a cost-effective solution: the augmented reality microscope (ARM). The ARM overlays AI-based information onto the current view of the sample in real time, enabling seamless integration of AI into routine workflows. We demonstrate the utility of ARM in the detection of metastatic breast cancer and the identification of prostate cancer, with latency compatible with real-time use. We anticipate that the ARM will remove barriers towards the use of AI designed to improve the accuracy and efficiency of cancer diagnosis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
27	Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019;2:48. [PMID: 31304394 PMCID: PMC6555810 DOI: 10.1038/s41746-019-0112-2] [Citation(s) in RCA: 167] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/15/2019] [Indexed: 12/20/2022] Open Abstract For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1226 slides, and evaluated on an independent validation dataset of 331 slides. Compared to a reference standard provided by genitourinary pathology experts, the mean accuracy among 29 general pathologists was 0.61 on the validation set. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p = 0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself. Collapse Key Words Prostate cancer Collapse MESH Headings Collapse Grants Collapse
28	How to develop machine learning models for healthcare. NATURE MATERIALS 2019;18:410-414. [PMID: 31000806 DOI: 10.1038/s41563-019-0345-0] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023] Abstract Collapse Key Words Collapse MESH Headings Algorithms Cardiovascular Diseases/diagnosis Diabetic Retinopathy/diagnosis Diagnosis, Computer-Assisted/methods Diagnostic Errors/prevention & control Drug Discovery Echocardiography/methods Humans Image Processing, Computer-Assisted Machine Learning Medical Informatics/methods Models, Theoretical Predictive Value of Tests Prognosis Radiotherapy Planning, Computer-Assisted Reproducibility of Results Collapse Grants Collapse