1
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
2
|
Ma W, Wu H, Chen Y, Xu H, Jiang J, Du B, Wan M, Ma X, Chen X, Lin L, Su X, Bao X, Shen Y, Xu N, Ruan J, Jiang H, Ding Y. New techniques to identify the tissue of origin for cancer of unknown primary in the era of precision medicine: progress and challenges. Brief Bioinform 2024; 25:bbae028. [PMID: 38343328 PMCID: PMC10859692 DOI: 10.1093/bib/bbae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 12/10/2023] [Accepted: 01/11/2024] [Indexed: 02/15/2024] Open
Abstract
Despite a standardized diagnostic examination, cancer of unknown primary (CUP) is a rare metastatic malignancy with an unidentified tissue of origin (TOO). Patients diagnosed with CUP are typically treated with empiric chemotherapy, although their prognosis is worse than those with metastatic cancer of a known origin. TOO identification of CUP has been employed in precision medicine, and subsequent site-specific therapy is clinically helpful. For example, molecular profiling, including genomic profiling, gene expression profiling, epigenetics and proteins, has facilitated TOO identification. Moreover, machine learning has improved identification accuracy, and non-invasive methods, such as liquid biopsy and image omics, are gaining momentum. However, the heterogeneity in prediction accuracy, sample requirements and technical fundamentals among the various techniques is noteworthy. Accordingly, we systematically reviewed the development and limitations of novel TOO identification methods, compared their pros and cons and assessed their potential clinical usefulness. Our study may help patients shift from empirical to customized care and improve their prognoses.
Collapse
Affiliation(s)
- Wenyuan Ma
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hui Wu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yiran Chen
- Department of Surgical Oncology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hongxia Xu
- Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), Zhejiang University School of Medicine, Zhejiang University, Haining, China
| | - Junjie Jiang
- Department of Gastroenterology, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Bang Du
- Real Doctor AI Research Centre, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingyu Wan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaolu Ma
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoyu Chen
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lili Lin
- Department of Nuclear Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xinhui Su
- Department of Nuclear Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xuanwen Bao
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yifei Shen
- Department of Laboratory Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Nong Xu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jian Ruan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Haiping Jiang
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yongfeng Ding
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
3
|
Yi F, Yang H, Chen D, Qin Y, Han H, Cui J, Bai W, Ma Y, Zhang R, Yu H. XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease. BMC Med Inform Decis Mak 2023; 23:137. [PMID: 37491248 PMCID: PMC10369804 DOI: 10.1186/s12911-023-02238-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 07/13/2023] [Indexed: 07/27/2023] Open
Abstract
BACKGROUND Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.
Collapse
Affiliation(s)
- Fuliang Yi
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hui Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Durong Chen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Yao Qin
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hongjuan Han
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Jing Cui
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Wenlin Bai
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Yifei Ma
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Rong Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| |
Collapse
|
4
|
Zhang J, Yang X, Chen J, Han J, Chen X, Fan Y, Zheng H. Construction of a diagnostic classifier for cervical intraepithelial neoplasia and cervical cancer based on XGBoost feature selection and random forest model. J Obstet Gynaecol Res 2023; 49:296-303. [PMID: 36220631 DOI: 10.1111/jog.15458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/18/2022] [Accepted: 09/23/2022] [Indexed: 01/19/2023]
Abstract
BACKGROUND The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN. METHODS Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC. RESULTS The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC. CONCLUSION A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Xiuqing Yang
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Jia Chen
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Jing Han
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Xiaofeng Chen
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Yueping Fan
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| | - Hui Zheng
- Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China
| |
Collapse
|
5
|
Predicting Overall Survival in Patients with Nonmetastatic Gastric Signet Ring Cell Carcinoma: A Machine Learning Approach. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4862376. [PMID: 36148015 PMCID: PMC9489421 DOI: 10.1155/2022/4862376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/16/2022] [Accepted: 08/24/2022] [Indexed: 11/30/2022]
Abstract
Background and Aims Accurate prediction is essential for the survival of patients with nonmetastatic gastric signet ring cell carcinoma (GSRC) and medical decision-making. Current models rely on prespecified variables, limiting their performance and not being suitable for individual patients. Our study is aimed at developing a more precise model for predicting 1-, 3-, and 5-year overall survival (OS) in patients with nonmetastatic GSRC based on a machine learning approach. Methods We selected 2127 GSRC patients diagnosed from 2004 to 2014 from the Surveillance, Epidemiology, and End Results (SEER) database and then randomly partitioned them into a training and validation cohort. We compared the performance of several machine learning-based models and finally chose the eXtreme gradient boosting (XGBoost) model as the optimal method to predict the OS in patients with nonmetastatic GSRC. The model was assessed using the receiver operating characteristic curve (ROC). Results In the training cohort, for predicting OS rates at 1-, 3-, and 5-year, the AUCs of the XGBoost model were 0.842, 0.831, and 0.838, respectively, while in the testing cohort, the AUCs of 1-, 3-, and 5-year OS rates were 0.749, 0.823, and 0.829, respectively. Besides, the XGBoost model also performed better when compared with the American Joint Committee on Cancer (AJCC) stage. The performance for this model was stably maintained when stratified by age and ethnicity. Conclusion The XGBoost-based model accurately predicts the 1-, 3-, and 5-year OS in patients with nonmetastatic GSRC. Machine learning is a promising way to predict the survival outcomes of tumor patients.
Collapse
|
6
|
Lu Q, Chen F, Li Q, Chen L, Tong L, Tian G, Zhou X. A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data. Front Oncol 2022; 12:832567. [PMID: 35530331 PMCID: PMC9071249 DOI: 10.3389/fonc.2022.832567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer of unknown primary site (CUP) is a heterogeneous group of cancers whose tissue of origin remains unknown after detailed investigation by conventional clinical methods. The number of CUP accounts for roughly 3%–5% of all human malignancies. CUP patients are usually treated with broad-spectrum chemotherapy, which often leads to a poor prognosis. Recent studies suggest that the treatment targeting the primary lesion of CUP will significantly improve the prognosis of the patient. Therefore, it is urgent to develop an efficient method to accurately detect tissue of origin of CUP in clinical cancer research. In this work, we developed a novel framework that uses Extreme Gradient Boosting (XGBoost) to trace the primary site of CUP based on microarray-based gene expression data. First, we downloaded the microarray-based gene expression profiles of 59,385 genes for 57,08 samples from The Cancer Genome Atlas (TCGA) and 6,364 genes for 3,101 samples from the Gene Expression Omnibus (GEO). Both data were divided into training and independent testing data with a ratio of 4:1. Then, we obtained in the training data 200 and 290 genes from TCGA and the GEO datasets, respectively, to train XGBoost models for the identification of the primary site of CUP. The overall 5-fold cross-validation accuracies of our methods were 96.9% and 95.3% on TCGA and GEO training datasets, respectively. Meanwhile, the macro-precision for the independent dataset reached 96.75% and 98.8% on, respectively, TCGA and GEO. Experimental results demonstrated that the XGBoost framework not only can reduce the cost of clinical cancer traceability but also has high efficiency, which might be useful in clinical usage.
Collapse
Affiliation(s)
- Qingfeng Lu
- Oncology Department, Daqing Oilfield General Hospital, Daqing, China
| | - Fengxia Chen
- Department of Thoracic Surgery, Hainan General Hospital, Haikou, China
| | - Qianyue Li
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Chen
- Department of Emergency, Qingdao Eighth People's Hospital, Qingdao, China
| | - Ling Tong
- Department of Pathology, Chifeng Municipal Hospital, Chifeng Clinical Medical School of Inner Mongolia Medical University, Chifeng, China
| | - Geng Tian
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiaohong Zhou
- Second Division of Cancer, Jiamusi Cancer Hospital, Jiamusi, China
| |
Collapse
|
7
|
Li Q, Yang H, Wang P, Liu X, Lv K, Ye M. XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer. J Transl Med 2022; 20:177. [PMID: 35436939 PMCID: PMC9014628 DOI: 10.1186/s12967-022-03369-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/26/2022] [Indexed: 12/23/2022] Open
Abstract
Background For a long time, breast cancer has been a leading cancer diagnosed in women worldwide, and approximately 90% of cancer-related deaths are caused by metastasis. For this reason, finding new biomarkers related to metastasis is an urgent task to predict the metastatic status of breast cancer and provide new therapeutic targets. Methods In this research, an efficient model of eXtreme Gradient Boosting (XGBoost) optimized by a grid search algorithm is established to realize auxiliary identification of metastatic breast tumors based on gene expression. Estimated by ten-fold cross-validation, the optimized XGBoost classifier can achieve an overall higher mean AUC of 0.82 compared to other classifiers such as DT, SVM, KNN, LR, and RF. Results A novel 6-gene signature (SQSTM1, GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) was selected by feature importance ranking and a series of in vitro experiments were conducted to verify the potential role of each biomarker. In general, the effects of SQSTM in tumor cells are assigned as a risk factor, while the effects of the other 5 genes (GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) in immune cells are assigned as protective factors. Conclusions Our findings will allow for a more accurate prediction of the metastatic status of breast cancer and will benefit the mining of breast cancer metastasis-related biomarkers. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03369-9.
Collapse
|
8
|
Ding Y, Jiang J, Xu J, Chen Y, Zheng Y, Jiang W, Mao C, Jiang H, Bao X, Shen Y, Li X, Teng L, Xu N. Site-specific therapy in cancers of unknown primary site: a systematic review and meta-analysis. ESMO Open 2022; 7:100407. [PMID: 35248824 PMCID: PMC8897579 DOI: 10.1016/j.esmoop.2022.100407] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Background Cancer of unknown primary site (CUP) is a term applied to characterize pathologically confirmed metastatic cancer with unknown primary tumor origin. It remains uncertain whether patients with CUP benefit from site-specific therapy guided by molecular profiling. Patients and methods A systematic search in PubMed, Web of Science, Embase, Cochrane Library, and ClinicalTrials.gov, and of conference abstracts from January 1976 to January 2021 was performed to identify studies investigating the efficacy of site-specific therapy on patients with CUP. The quality of included studies was evaluated using the Cochrane risk of bias tool and Newcastle–Ottawa scale. Eligible studies were weighted and pooled for meta-analysis. Hazard ratios (HRs) for overall survival (OS) and progression-free survival (PFS) were assessed to compare the efficacy of site-specific therapy with empiric therapy in patients with CUP. In addition, subgroup analyses were conducted. Results Five studies comprising 1114 patients were identified, of which 454 patients received site-specific therapy, and 660 patients received empiric therapy. Our meta-analysis revealed that site-specific therapy was not significantly associated with improved PFS [HR 0.93, 95% confidence interval (CI) 0.74-1.17, P = 0.534] and OS (HR 0.75, 95% CI 0.55-1.03, P = 0.069), compared with empiric therapy. However, during subgroup analysis significantly improved OS was associated with site-specific therapy in the high-accuracy predictive assay subgroup (HR 0.46, 95% CI 0.26-0.81, P = 0.008) compared with the low accuracy predictive assay subgroup (HR 0.93, 95% CI 0.75-1.15, P = 0.509). Furthermore, compared with patients with less responsive tumor types, more survival benefit from site-specific therapy was found in patients with more responsive tumors (HR 0.67, 95% CI 0.46-0.97, P = 0.037). Conclusions Our results suggest that site-specific therapy is not significantly associated with improved survival outcomes; however, it might benefit patients with CUP with responsive tumor types. Studies evaluating the role of site-specific therapy guided by molecular profiling in CUP provided contradictory results. Site-specific therapy is not significantly associated with improved survival outcomes in the overall CUP population. Molecularly defined site-specific therapy may improve OS only when high-accuracy assays assign CUP to responsive tumor types.
Collapse
Affiliation(s)
- Y Ding
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Jiang
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Xu
- Department of Thoracic Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Chen
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Zheng
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - W Jiang
- Department of Colorectal Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China
| | - C Mao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - H Jiang
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - X Bao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Shen
- Centre of Clinical Laboratory, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China; Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, Hangzhou; China; Institute of Laboratory Medicine, Zhejiang University, Hangzhou; China
| | - X Li
- Department of Surgery, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - L Teng
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| | - N Xu
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| |
Collapse
|
9
|
Gong X, Zheng B, Xu G, Chen H, Chen C. Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer. J Thorac Dis 2022; 13:6240-6251. [PMID: 34992804 PMCID: PMC8662490 DOI: 10.21037/jtd-21-1107] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 09/24/2021] [Indexed: 01/15/2023]
Abstract
Background Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms. Methods We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement. Results After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ2 test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes. Conclusions The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.
Collapse
Affiliation(s)
- Xian Gong
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Bin Zheng
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Guobing Xu
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Hao Chen
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Chun Chen
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| |
Collapse
|
10
|
Zhao Y, Li X, Li S, Dong M, Yu H, Zhang M, Chen W, Li P, Yu Q, Liu X, Gao Z. Using Machine Learning Techniques to Develop Risk Prediction Models for the Risk of Incident Diabetic Retinopathy Among Patients With Type 2 Diabetes Mellitus: A Cohort Study. Front Endocrinol (Lausanne) 2022; 13:876559. [PMID: 35655800 PMCID: PMC9152028 DOI: 10.3389/fendo.2022.876559] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To construct and validate prediction models for the risk of diabetic retinopathy (DR) in patients with type 2 diabetes mellitus. METHODS Patients with type 2 diabetes mellitus hospitalized over the period between January 2010 and September 2018 were retrospectively collected. Eighteen baseline demographic and clinical characteristics were used as predictors to train five machine-learning models. The model that showed favorable predictive efficacy was evaluated at annual follow-ups. Multi-point data of the patients in the test set were utilized to further evaluate the model's performance. We also assessed the relative prognostic importance of the selected risk factors for DR outcomes. RESULTS Of 7943 collected patients, 1692 (21.30%) developed DR during follow-up. Among the five models, the XGBoost model achieved the highest predictive performance with an AUC, accuracy, sensitivity, and specificity of 0.803, 88.9%, 74.0%, and 81.1%, respectively. The XGBoost model's AUCs in the different follow-up periods were 0.834 to 0.966. In addition to the classical risk factors of DR, serum uric acid (SUA), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), estimated glomerular filtration rate (eGFR), and triglyceride (TG) were also identified to be important and strong predictors for the disease. Compared with the clinical diagnosis method of DR, the XGBoost model achieved an average of 2.895 years prior to the first diagnosis. CONCLUSION The proposed model achieved high performance in predicting the risk of DR among patients with type 2 diabetes mellitus at each time point. This study established the potential of the XGBoost model to facilitate clinicians in identifying high-risk patients and making type 2 diabetes management-related decisions.
Collapse
Affiliation(s)
- Yuedong Zhao
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | - Xinyu Li
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | - Shen Li
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | | | - Han Yu
- Graduate School of Art and Science, Yale University, New Haven, CT, United States
| | - Mengxian Zhang
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | - Weidao Chen
- Infervision Institute of Research, Beijing, China
| | - Peihua Li
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | - Qing Yu
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
| | - Xuhan Liu
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
- *Correspondence: Xuhan Liu, ; Zhengnan Gao,
| | - Zhengnan Gao
- Department of Endocrinology, Dalian Municipal Central Hospital, Dalian, China
- *Correspondence: Xuhan Liu, ; Zhengnan Gao,
| |
Collapse
|
11
|
Pang H, Zhang G, Yan N, Lang J, Liang Y, Xu X, Cui Y, Wu X, Li X, Shan M, Wang X, Meng X, Liu J, Tian G, Cai L, Yuan D, Wang X. Evaluating the Risk of Breast Cancer Recurrence and Metastasis After Adjuvant Tamoxifen Therapy by Integrating Polymorphisms in Cytochrome P450 Genes and Clinicopathological Characteristics. Front Oncol 2021; 11:738222. [PMID: 34868931 PMCID: PMC8639703 DOI: 10.3389/fonc.2021.738222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Tamoxifen (TAM) is the most commonly used adjuvant endocrine drug for hormone receptor-positive (HR+) breast cancer patients. However, how to accurately evaluate the risk of breast cancer recurrence and metastasis after adjuvant TAM therapy is still a major concern. In recent years, many studies have shown that the clinical outcomes of TAM-treated breast cancer patients are influenced by the activity of some cytochrome P450 (CYP) enzymes that catalyze the formation of active TAM metabolites like endoxifen and 4-hydroxytamoxifen. In this study, we aimed to first develop and validate an algorithm combining polymorphisms in CYP genes and clinicopathological signatures to identify a subpopulation of breast cancer patients who might benefit most from TAM adjuvant therapy and meanwhile evaluate major risk factors related to TAM resistance. Specifically, a total of 256 patients with invasive breast cancer who received adjuvant endocrine therapy were selected. The genotypes at 10 loci from three TAM metabolism-related CYP genes were detected by time-of-flight mass spectrometry and multiplex long PCR. Combining the 10 loci with nine clinicopathological characteristics, we obtained 19 important features whose association with cancer recurrence was assessed by importance score via random forests. After that, a logistic regression model was trained to calculate TAM risk-of-recurrence score (TAM RORs), which is adopted to assess a patient's risk of recurrence after TAM treatment. The sensitivity and specificity of the model in an independent test cohort were 86.67% and 64.56%, respectively. This study showed that breast cancer patients with high TAM RORs were less sensitive to TAM treatment and manifested more invasive characteristics, whereas those with low TAM RORs were highly sensitive to TAM treatment, and their conditions were stable during the follow-up period. There were some risk factors that had a significant effect on the efficacy of TAM. They were tissue classification (tumor Grade < 2 vs. Grade ≥ 2, p = 2.2e-16), the number of lymph node metastases (Node-Negative vs. Node < 4, p = 5.3e-07; Node < 4 vs. Node ≥ 4, p = 0.003; Node-Negative vs. Node ≥ 4, p = 7.2e-15), and the expression levels of estrogen receptor (ER) and progesterone receptor (PR) (ER < 50% vs. ER ≥ 50%, p = 1.3e-12; PR < 50% vs. PR ≥ 50%, p = 2.6e-08). The really remarkable thing is that different genotypes of CYP2D6*10(C188T) show significant differences in prediction function (CYP2D6*10 CC vs. TT, p < 0.019; CYP2D6*10 CT vs. TT, p < 0.037). There are more than 50% Chinese who have CYP2D6*10 mutation. So the genotype of CYP2D6*10(C188T) should be tested before TAM therapy.
Collapse
Affiliation(s)
- Hui Pang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Guoqiang Zhang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Na Yan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jidong Lang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Yuebin Liang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Xinyuan Xu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yaowen Cui
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xueya Wu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xianjun Li
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Ming Shan
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xiaoqin Wang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiangzhi Meng
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiaxiang Liu
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Geng Tian
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Li Cai
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Dawei Yuan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xin Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
12
|
Wei Q, Ramsey SA. Predicting chemotherapy response using a variational autoencoder approach. BMC Bioinformatics 2021; 22:453. [PMID: 34551729 PMCID: PMC8456615 DOI: 10.1186/s12859-021-04339-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 08/17/2021] [Indexed: 01/14/2023] Open
Abstract
Background Multiple studies have shown the utility of transcriptome-wide RNA-seq profiles as features for machine learning-based prediction of response to chemotherapy in cancer. While tumor transcriptome profiles are publicly available for thousands of tumors for many cancer types, a relatively modest number of tumor profiles are clinically annotated for response to chemotherapy. The paucity of labeled examples and the high dimension of the feature data limit performance for predicting therapeutic response using fully-supervised classification methods. Recently, multiple studies have established the utility of a deep neural network approach, the variational autoencoder (VAE), for generating meaningful latent features from original data. Here, we report the first study of a semi-supervised approach using VAE-encoded tumor transcriptome features and regularized gradient boosted decision trees (XGBoost) to predict chemotherapy drug response for five cancer types: colon, pancreatic, bladder, breast, and sarcoma. Results We found: (1) VAE-encoding of the tumor transcriptome preserves the cancer type identity of the tumor, suggesting preservation of biologically relevant information; and (2) as a feature-set for supervised classification to predict response-to-chemotherapy, the unsupervised VAE encoding of the tumor’s gene expression profile leads to better area under the receiver operating characteristic curve and area under the precision-recall curve classification performance than the original gene expression profile or the PCA principal components or the ICA components of the gene expression profile, in four out of five cancer types that we tested. Conclusions Given high-dimensional “omics” data, the VAE is a powerful tool for obtaining a nonlinear low-dimensional embedding; it yields features that retain biological patterns that distinguish between different types of cancer and that enable more accurate tumor transcriptome-based prediction of response to chemotherapy than would be possible using the original data or their principal components. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04339-6.
Collapse
Affiliation(s)
- Qi Wei
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
| | - Stephen A Ramsey
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
13
|
Feature Selection on Elite Hybrid Binary Cuckoo Search in Binary Label Classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5588385. [PMID: 34055039 PMCID: PMC8133872 DOI: 10.1155/2021/5588385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/22/2021] [Indexed: 01/10/2023]
Abstract
For the low optimization accuracy of the cuckoo search algorithm, a new search algorithm, the Elite Hybrid Binary Cuckoo Search (EHBCS) algorithm, is improved by feature weighting and elite strategy. The EHBCS algorithm has been designed for feature selection on a series of binary classification datasets, including low-dimensional and high-dimensional samples by SVM classifier. The experimental results show that the EHBCS algorithm achieves better classification performances compared with binary genetic algorithm and binary particle swarm optimization algorithm. Besides, we explain its superiority in terms of standard deviation, sensitivity, specificity, precision, and F-measure.
Collapse
|