1
|
Alataş E, Tanyıldızı Kökkülünk H, Tanyıldızı H, Alcın G. Treatment prediction with machine learning in prostate cancer patients. Comput Methods Biomech Biomed Engin 2023:1-9. [PMID: 38148626 DOI: 10.1080/10255842.2023.2298364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/16/2023] [Indexed: 12/28/2023]
Abstract
There are various treatment modalities for prostate cancer, which has a high incidence. In this study, it is aimed to make predictions with machine learning in order to determine the optimal treatment option for prostate cancer patients. The study included 88 male patients diagnosed with prostate cancer. Independent variables were determined as Gleason scores, biopsy, PSA, SUVmax, and age. Prostate cancer treatments, which are dependent variables, were determined as hormone therapy(n = 30), radiotherapy(n = 28) and radiotherapy + hormone therapy(n = 30). Machine learning was carried out in the Python with SVM, RF, DT, ETC and XGBoost. Metrics such as accuracy, ROC curve, and AUC were used to evaluate the performance of multi-class predictions. The model with the highest number of successful predictions was the XGBoost. False negative rates for hormone therapy, radiotherapy, and radiotherapy + hormone therapy treatments were, respectively, 12.5, 33.3, and 0%. The accuracy values were computed as 0.61, 0.83, 0.83, 0.72 and 0.89 for SVM, RF, DT, ETC and XGBoost, respectively. The three features that had the greatest influence on the treatment model prediction for prostate cancer with XGBoost were biopsy, Gleason score (3 + 3), and PSA level, respectively. According to the AUC, ROC and accuracy, it was determined that the XGBoost was the model that made the best estimation of prostate cancer treatment. Among the variables biopsy, Gleason score, and PSA level are identified as key variables in prediction of treatment.
Collapse
Affiliation(s)
- Emre Alataş
- Management Information Systems, Faculty of Economics and Administrative Sciences, Beykent University, Istanbul, Turkey
- Management Information Systems, Institute of Science and Technology, Kadir Has University, Istanbul, Turkey
| | | | - Hilal Tanyıldızı
- International Trade and Finance, Faculty of Economics and Administrative Sciences, Beykent University, Istanbul, Turkey
- Business Administration, Institute of Social Sciences, Istanbul University, Istanbul, Turkey
| | - Goksel Alcın
- Department of Nuclear Medicine, Istanbul Education and Research Hospital, Istanbul, Turkey
| |
Collapse
|
2
|
Ben-Assuli O, Ramon-Gonen R, Heart T, Jacobi A, Klempfner R. Utilizing shared frailty with the Cox proportional hazards regression: Post discharge survival analysis of CHF patients. J Biomed Inform 2023; 140:104340. [PMID: 36935013 DOI: 10.1016/j.jbi.2023.104340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 02/02/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Understanding patients' survival probability as well as the factors affecting it constitute a significant concern for researchers and practitioners, in particular for patients with severe chronic illnesses such as congestive heart failure (CHF). CHF is a clinical syndrome characterized by comorbidities and adverse medical events. Risk stratification to identify patients most likely to die shortly after hospital discharge can improve the quality of care by better allocating organizational resources and personalized interventions. Probability assessment improves clinical decision-making, contributes to personalized care, and saves costs. Although one of the most informative indices is the time to an adverse event for each patient, commonly analyzed using survival analysis methods, these are often challenging to implement due to the complexity of the medical data. Numerous studies have used the Cox proportional hazards (PH) regression method to generate the survival distribution pattern and factors affecting survival. This model, although advantageous for survival analysis, assumes the homogeneity of the hazard ratio across patients and independence of the observations in terms of survival time. These assumptions are often violated in real-world data, especially when the dataset is composed of readmission data for chronically ill patients, since these recurring observations are inherently dependent. This study ran the Cox PH regression on a feature set selected by machine learning algorithms from a rich hospital dataset. The event modeled here was patient mortality within 90 days post-hospital discharge. The sample was composed of medical records of patients hospitalized in the Israeli Sheba Medical Center more than once, with CHF as the primary diagnosis. We modeled the survival of CHF patients using the Cox PH regression with and without the shared frailty correction that addresses the shortcomings of the Cox Model. The results of the two models of the Cox PH regression - with and without the shared frailty correction were compared. The results demonstrate that the shared frailty correction, which was statistically significant in our analysis, improved the performance of the basic Cox PH model. While this is the main contribution, we also show that this model outperforms two commonly used measures (ADHERE and EFFECT) for predicting early mortality of CHF patients. Thus, the results illustrate how applying advanced analytics can outperform traditional methods. An additional contribution is the feature set selected using machine-learning methods that is different from those used in the extant literature.
Collapse
Affiliation(s)
- Ofir Ben-Assuli
- Faculty of Business Administration, Ono Academic College, 104 Zahal Street, Kiryat Ono 55000, Israel.
| | - Roni Ramon-Gonen
- The Graduate School of Business Administration, Bar-Ilan University, Ramat-Gan, Israel.
| | - Tsipi Heart
- Faculty of Business Administration, Ono Academic College, 104 Zahal Street, Kiryat Ono 55000, Israel.
| | - Arie Jacobi
- Faculty of Business Administration, Ono Academic College, 104 Zahal Street, Kiryat Ono 55000, Israel; Peres Academic Center, 10 Shimon Peres Street, Rehovot, Israel.
| | - Robert Klempfner
- The Leviev Heart Center, Sheba Medical Center, Ramat-Gan, Israel.
| |
Collapse
|
3
|
Feng Y, Leung AA, Lu X, Liang Z, Quan H, Walker RL. Personalized prediction of incident hospitalization for cardiovascular disease in patients with hypertension using machine learning. BMC Med Res Methodol 2022; 22:325. [PMID: 36528631 PMCID: PMC9758895 DOI: 10.1186/s12874-022-01814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Prognostic information for patients with hypertension is largely based on population averages. The purpose of this study was to compare the performance of four machine learning approaches for personalized prediction of incident hospitalization for cardiovascular disease among newly diagnosed hypertensive patients. METHODS Using province-wide linked administrative health data in Alberta, we analyzed a cohort of 259,873 newly-diagnosed hypertensive patients from 2009 to 2015 who collectively had 11,863 incident hospitalizations for heart failure, myocardial infarction, and stroke. Linear multi-task logistic regression, neural multi-task logistic regression, random survival forest and Cox proportional hazard models were used to determine the number of event-free survivors at each time-point and to construct individual event-free survival probability curves. The predictive performance was evaluated by root mean squared error, mean absolute error, concordance index, and the Brier score. RESULTS The random survival forest model has the lowest root mean squared error value at 33.94 and lowest mean absolute error value at 28.37. Machine learning methods provide similar discrimination and calibration in the personalized survival prediction of hospitalizations for cardiovascular events in patients with hypertension. Neural multi-task logistic regression model has the highest concordance index at 0.8149 and lowest Brier score at 0.0242 for the personalized survival prediction. CONCLUSIONS This is the first personalized survival prediction for cardiovascular diseases among hypertensive patients using administrative data. The four models tested in this analysis exhibited a similar discrimination and calibration ability in predicting personalized survival prediction of hypertension patients.
Collapse
Affiliation(s)
- Yuanchao Feng
- grid.22072.350000 0004 1936 7697Centre for Health informatics, Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB Canada ,grid.22072.350000 0004 1936 7697Libin Cardiovascular Institute, University of Calgary, Calgary, AB Canada
| | - Alexander A. Leung
- grid.22072.350000 0004 1936 7697Centre for Health informatics, Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB Canada ,grid.22072.350000 0004 1936 7697Libin Cardiovascular Institute, University of Calgary, Calgary, AB Canada ,grid.22072.350000 0004 1936 7697Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB Canada
| | - Xuewen Lu
- grid.22072.350000 0004 1936 7697Department of Mathematics and Statistics, University of Calgary, Calgary, AB Canada
| | - Zhiying Liang
- grid.22072.350000 0004 1936 7697Centre for Health informatics, Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB Canada ,grid.22072.350000 0004 1936 7697Libin Cardiovascular Institute, University of Calgary, Calgary, AB Canada
| | - Hude Quan
- grid.22072.350000 0004 1936 7697Centre for Health informatics, Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB Canada ,grid.22072.350000 0004 1936 7697Libin Cardiovascular Institute, University of Calgary, Calgary, AB Canada ,grid.413574.00000 0001 0693 8815O’Brien Institute for Public Health and Alberta Health Services, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
| | - Robin L. Walker
- grid.22072.350000 0004 1936 7697Centre for Health informatics, Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB Canada ,grid.413574.00000 0001 0693 8815O’Brien Institute for Public Health and Alberta Health Services, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
| |
Collapse
|
4
|
He QE, Zhu JX, Wang LY, Ding EC, Song K. DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method. Front Genet 2022; 13:940214. [PMID: 36338981 PMCID: PMC9626520 DOI: 10.3389/fgene.2022.940214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 09/30/2022] [Indexed: 11/17/2022] Open
Abstract
Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data.
Collapse
Affiliation(s)
- Qi-en He
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Jun-xuan Zhu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Li-yan Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - En-ci Ding
- Tianjin First Central Hospital, Tianjin, China
| | - Kai Song
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
- *Correspondence: Kai Song,
| |
Collapse
|
5
|
Yin Q, Chen W, Zhang C, Wei Z. A convolutional neural network model for survival prediction based on prognosis-related cascaded Wx feature selection. J Transl Med 2022; 102:1064-1074. [PMID: 35810236 DOI: 10.1038/s41374-022-00801-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/22/2022] [Accepted: 04/26/2022] [Indexed: 12/14/2022] Open
Abstract
Great advances in deep learning have provided effective solutions for prediction tasks in the biomedical field. However, accurate prognosis prediction using cancer genomics data remains challenging due to the severe overfitting problem caused by curse of dimensionality inherent to high-throughput sequencing data. Moreover, there are unique challenges to perform survival analysis, arising from the difficulty in utilizing censored samples whose events of interest are not observed. Convolutional neural network (CNN) models provide us the opportunity to extract meaningful hierarchical features to characterize cancer subtype and prognosis outcomes. On the other hand, feature selection can mitigate overfitting and reduce subsequent model training computation burden by screening out significant genes from redundant genes. To accomplish model simplification, we developed a concise and efficient survival analysis model, named CNN-Cox model, which combines a special CNN framework with prognosis-related feature selection cascaded Wx, with the advantage of less computation demand utilizing light training parameters. Experiment results show that CNN-Cox model achieved consistent higher C-index values and better survival prediction performance across seven cancer type datasets in The Cancer Genome Atlas cohort, including bladder carcinoma, head and neck squamous cell carcinoma, kidney renal cell carcinoma, brain low-grade glioma, lung adenocarcinoma (LUAD), lung squamous cell carcinoma, and skin cutaneous melanoma, compared with the existing state-of-the-art survival analysis methods. As an illustration of model interpretation, we examined potential prognostic gene signatures of LUAD dataset using the proposed CNN-Cox model. We conducted protein-protein interaction network analysis to identify potential prognostic genes and further analyzed the biological function of 13 hub genes, including ANLN, RACGAP1, KIF4A, KIF20A, KIF14, ASPM, CDK1, SPC25, NCAPG, MKI67, HJURP, EXO1, HMMR, whose high expression is significantly associated with poor survival of LUAD patients. These findings confirmed that CNN-Cox model is effective in extracting not only prognosis factors but also biologically meaningful gene features. The codes are available at the GitHub website: https://github.com/wangwangCCChen/CNN-Cox .
Collapse
Affiliation(s)
- Qingyan Yin
- School of Science, Xi'an University of Architecture and Technology, Xi'an, Shaanxi, 710055, China.
| | - Wangwang Chen
- School of Science, Xi'an University of Architecture and Technology, Xi'an, Shaanxi, 710055, China
| | - Chunxia Zhang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| |
Collapse
|
6
|
Nsugbe E, Ser HL, Ong HF, Ming LC, Goh KW, Goh BH, Lee WL. On an Affordable Approach towards the Diagnosis and Care for Prostate Cancer Patients Using Urine, FTIR and Prediction Machines. Diagnostics (Basel) 2022; 12:diagnostics12092099. [PMID: 36140500 PMCID: PMC9497845 DOI: 10.3390/diagnostics12092099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/23/2022] [Accepted: 08/25/2022] [Indexed: 11/16/2022] Open
Abstract
Prostate cancer is a widespread form of cancer that affects patients globally and is challenging to diagnose, especially in its early stages. The common means of diagnosing cancer involve mostly invasive methods, such as the use of patient’s blood as well as digital biopsies, which are relatively expensive and require a considerable amount of expertise. Studies have shown that various cancer biomarkers can be present in urine samples from patients who have prostate cancers; this paper aimed to leverage this information and investigate this further by using urine samples from a group of patients alongside FTIR analysis for the prediction of prostate cancer. This investigation was carried out using three sets of data where all spectra were preprocessed with the linear series decomposition learner (LSDL) and post-processed using signal processing methods alongside a contrast across nine machine-learning models, the results of which showcased that the proposed modeling approach carries potential to be used for clinical prediction of prostate cancer. This would allow for a much more affordable and high-throughput means for active prediction and associated care for patients with prostate cancer. Further investigations on the prediction of cancer stage (i.e., early or late stage) were carried out, where high prediction accuracy was obtained across the various metrics that were investigated, further showing the promise and capability of urine sample analysis alongside the proposed and presented modeling approaches.
Collapse
Affiliation(s)
- Ejay Nsugbe
- Nsugbe Research Labs, Swindon SN1 3LG, UK
- Correspondence: (E.N.); (K.-W.G.); (W.-L.L.); Tel.: +603-551-46098 (W.-L.L.)
| | - Hooi-Leng Ser
- Department of Biological Sciences, School of Medical and Life Sciences, Sunway University, Bandar Sunway 47500, Malaysia
| | - Huey-Fang Ong
- School of Information Technology, Monash University Malaysia, Bandar Sunway 47500, Malaysia
| | - Long Chiau Ming
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong BE-1410, Brunei
| | - Khang-Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai 71800, Malaysia
- Correspondence: (E.N.); (K.-W.G.); (W.-L.L.); Tel.: +603-551-46098 (W.-L.L.)
| | - Bey-Hing Goh
- Biofunctional Molecule Exploratory (BMEX) Research Group, School of Pharmacy, Monash University Malaysia, Subang Jaya 47500, Malaysia
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
| | - Wai-Leng Lee
- School of Science, Monash University Malaysia, Subang Jaya 47500, Malaysia
- Correspondence: (E.N.); (K.-W.G.); (W.-L.L.); Tel.: +603-551-46098 (W.-L.L.)
| |
Collapse
|
7
|
Smith H, Sweeting M, Morris T, Crowther MJ. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data. Diagn Progn Res 2022; 6:10. [PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/01/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading. METHODS We conducted a scoping review of research articles published between 1 January 2000 and 2 December 2020 using PubMed. Eligible articles were those that used simulation studies to compare statistical and machine learning methods for risk prediction with a time-to-event outcome in a medical/healthcare setting. We focus on data-generating mechanisms (DGMs), the methods that have been compared, the estimands of the simulation studies, and the performance measures used to evaluate them. RESULTS A total of ten articles were identified as eligible for the review. Six of the articles evaluated a method that was developed by the authors, four of which were machine learning methods, and the results almost always stated that this developed method's performance was equivalent to or better than the other methods compared. Comparisons were often biased towards the novel approach, with the majority only comparing against a basic Cox proportional hazards model, and in scenarios where it is clear it would not perform well. In many of the articles reviewed, key information was unclear, such as the number of simulation repetitions and how performance measures were calculated. CONCLUSION It is vital that method comparisons are unbiased and comprehensive, and this should be the goal even if realising it is difficult. Fully assessing how newly developed methods perform and how they compare to a variety of traditional statistical methods for prognostic modelling is imperative as these methods are already being applied in clinical contexts. Evaluations of the performance and usefulness of recently developed methods for risk prediction should be continued and reporting standards improved as these methods become increasingly popular.
Collapse
Affiliation(s)
- Hayley Smith
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
| | - Michael Sweeting
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
- grid.417815.e0000 0004 5929 4381Statistical Innovation, Oncology Biometrics, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Tim Morris
- grid.415052.70000 0004 0606 323XMRC Clinical Trials Unit at UCL, 90 High Holborn, London, WC1V 6LJ UK
| | - Michael J. Crowther
- grid.4714.60000 0004 1937 0626Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
8
|
Prediction of Trypanosoma evansi infection in dromedaries using artificial neural network (ANN). Vet Parasitol 2022; 306:109716. [DOI: 10.1016/j.vetpar.2022.109716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/05/2022] [Accepted: 05/06/2022] [Indexed: 11/20/2022]
|
9
|
Dag AZ, Akcam Z, Kibis E, Simsek S, Delen D. A probabilistic data analytics methodology based on Bayesian belief network for predicting and understanding breast cancer survival. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
11
|
Survival analysis with semi-supervised predictive clustering trees. Comput Biol Med 2021; 141:105001. [PMID: 34782112 DOI: 10.1016/j.compbiomed.2021.105001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 11/21/2022]
Abstract
Many clinical studies follow patients over time and record the time until the occurrence of an event of interest (e.g., recovery, death, …). When patients drop out of the study or when their event did not happen before the study ended, the collected dataset is said to contain censored observations. Given the rise of personalized medicine, clinicians are often interested in accurate risk prediction models that predict, for unseen patients, a survival profile, including the expected time until the event. Survival analysis methods are used to detect associations or compare subpopulations of patients in this context. In this article, we propose to cast the time-to-event prediction task as a multi-target regression task, with censored observations modeled as partially labeled examples. We then apply semi-supervised learning to the resulting data representation. More specifically, we use semi-supervised predictive clustering trees and ensembles thereof. Empirical results over eleven real-life datasets demonstrate superior or equivalent predictive performance of the proposed approach as compared to three competitor methods. Moreover, smaller models are obtained compared to random survival forests, another tree ensemble method. Finally, we illustrate the informative feature selection mechanism of our method, by interpreting the splits induced by a single tree model when predicting survival for amyotrophic lateral sclerosis patients.
Collapse
|
12
|
Dumas D, Dong Y, Grajzel K, Forthmann B, Doherty M. Understanding ideational fluency as a survival process. BRITISH JOURNAL OF EDUCATIONAL PSYCHOLOGY 2021; 92:e12469. [PMID: 34693984 DOI: 10.1111/bjep.12469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 09/01/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND When students generate ideas, important inter-individual variance exists both in the quantity and the quality of ideas they are able to produce (e.g., perfectionists who have few highly creative ideas or mass producers who produce a lot of uncreative ideas). In educational psychology research on creativity, the relation between the quantity and quality of ideas has not been well understood, limiting progress in this area. AIMS We conceptualized Ideational Fluency as a phenomenon that requires participants to 'survive' to produce more ideas, and where dropping out of the ideational process was analogous to 'dying'. Using this novel paradigm, we aimed to test the relations among Fluency (as a dependent variable); and creative Expertise, Originality and self-reported Personality attributes (as independent variables). SAMPLE AND METHOD Participants were drawn from three groups: those with demonstrated expertise in stage or screen acting (n = 104); undergraduates being trained in the same domain (n = 100), and adults with no acting training or experience (n = 92). Participants responded to the Alternate Uses Task; Non-parametric and semi-parametric survival models were fit to their Ideational Fluency; and average and maximum Originality scores, as well as self-reported Personality attributes, were used as covariates. RESULTS Across all participants, the Ideational Fluency survival function showed an S-shape, but the Expertise grouping interacted with that pattern. The survival rate of professional actors decreased more rapidly during the first few ideas, but after the 5th idea, professional actors displayed a clear advantage in survival rate. Participants who were less original on average but who showed a high maximum Originality, as well as those participants who reported more Assertiveness and less Industriousness, also survived further into the Ideational process. CONCLUSIONS Contrary to our hypothesis, professional actors' advantage in Fluency did not manifest in the survival model until after the 5th idea generated. A quantity-quality trade-off was observed with average Originality being associated with shorter survival, but that trade-off was not observed with maximum Originality, which was associated with longer survival.
Collapse
Affiliation(s)
- Denis Dumas
- Department of Research Methods and Information Science, University of Denver, Colorado, USA
| | - Yixiao Dong
- Department of Research Methods and Information Science, University of Denver, Colorado, USA
| | - Katalin Grajzel
- Department of Research Methods and Information Science, University of Denver, Colorado, USA
| | - Boris Forthmann
- Institute for Psychology in Education, University of Münster, New York, USA
| | | |
Collapse
|
13
|
Le NQK, Kha QH, Nguyen VH, Chen YC, Cheng SJ, Chen CY. Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer. Int J Mol Sci 2021; 22:ijms22179254. [PMID: 34502160 PMCID: PMC8431041 DOI: 10.3390/ijms22179254] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 12/25/2022] Open
Abstract
Early identification of epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations is crucial for selecting a therapeutic strategy for patients with non-small-cell lung cancer (NSCLC). We proposed a machine learning-based model for feature selection and prediction of EGFR and KRAS mutations in patients with NSCLC by including the least number of the most semantic radiomics features. We included a cohort of 161 patients from 211 patients with NSCLC from The Cancer Imaging Archive (TCIA) and analyzed 161 low-dose computed tomography (LDCT) images for detecting EGFR and KRAS mutations. A total of 851 radiomics features, which were classified into 9 categories, were obtained through manual segmentation and radiomics feature extraction from LDCT. We evaluated our models using a validation set consisting of 18 patients derived from the same TCIA dataset. The results showed that the genetic algorithm plus XGBoost classifier exhibited the most favorable performance, with an accuracy of 0.836 and 0.86 for detecting EGFR and KRAS mutations, respectively. We demonstrated that a noninvasive machine learning-based model including the least number of the most semantic radiomics signatures could robustly predict EGFR and KRAS mutations in patients with NSCLC.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan;
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
- Correspondence: (N.Q.K.L.); (S.-J.C.); Tel.: +886-02-66382736 (ext. 1992) (N.Q.K.L.)
| | - Quang Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (Q.H.K.); (V.H.N.)
| | - Van Hiep Nguyen
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (Q.H.K.); (V.H.N.)
- Oncology Center, Bai Chay Hospital, Quang Ninh 20000, Vietnam
| | - Yung-Chieh Chen
- Department of Medical Imaging, Taipei Medical University Hospital, Taipei 11031, Taiwan;
| | - Sho-Jen Cheng
- Department of Medical Imaging, Taipei Medical University Hospital, Taipei 11031, Taiwan;
- Correspondence: (N.Q.K.L.); (S.-J.C.); Tel.: +886-02-66382736 (ext. 1992) (N.Q.K.L.)
| | - Cheng-Yu Chen
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan;
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Department of Medical Imaging, Taipei Medical University Hospital, Taipei 11031, Taiwan;
- Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| |
Collapse
|
14
|
Pellegrini M. Accurate prediction of breast cancer survival through coherent voting networks with gene expression profiling. Sci Rep 2021; 11:14645. [PMID: 34282236 PMCID: PMC8289832 DOI: 10.1038/s41598-021-94243-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 07/07/2021] [Indexed: 02/06/2023] Open
Abstract
For a patient affected by breast cancer, after tumor removal, it is necessary to decide which adjuvant therapy is able to prevent tumor relapse and formation of metastases. A prediction of the outcome of adjuvant therapy tailored for the patient is hard, due to the heterogeneous nature of the disease. We devised a methodology for predicting 5-years survival based on the new machine learning paradigm of coherent voting networks, with improved accuracy over state-of-the-art prediction methods. The 'coherent voting communities' metaphor provides a certificate justifying the survival prediction for an individual patient, thus facilitating its acceptability in practice, in the vein of explainable Artificial Intelligence. The method we propose is quite flexible and applicable to other types of cancer.
Collapse
Affiliation(s)
- Marco Pellegrini
- Institute of Informatics and Telematics (IIT), CNR, 56124, Pisa, Italy.
| |
Collapse
|
15
|
Gonçalves DM, Henriques R, Costa RS. Predicting Postoperative Complications in Cancer Patients: A Survey Bridging Classical and Machine Learning Contributions to Postsurgical Risk Analysis. Cancers (Basel) 2021; 13:cancers13133217. [PMID: 34203189 PMCID: PMC8269422 DOI: 10.3390/cancers13133217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/04/2021] [Accepted: 06/22/2021] [Indexed: 02/05/2023] Open
Abstract
Simple Summary Structured survey on the predictive analysis of postoperative complications in oncology, bridging classic risk scores with machine learning advances, and further establishing principles to guide the design of cohort studies and the predictive modeling of postsurgical risks. Abstract Postoperative complications can impose a significant burden, increasing morbidity, mortality, and the in-hospital length of stay. Today, the number of studies available on the prognostication of postsurgical complications in cancer patients is growing and has already created a considerable set of dispersed contributions. This work provides a comprehensive survey on postoperative risk analysis, integrating principles from classic risk scores and machine-learning approaches within a coherent frame. A qualitative comparison is offered, taking into consideration the available cohort data and the targeted postsurgical outcomes of morbidity (such as the occurrence, nature or severity of postsurgical complications and hospitalization needs) and mortality. This work further establishes a taxonomy to assess the adequacy of cohort studies and guide the development and assessment of new learning approaches for the study and prediction of postoperative complications.
Collapse
Affiliation(s)
- Daniel M. Gonçalves
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal; (D.M.G.); (R.S.C.)
- INESC-ID, Lisboa Portugal and Instituto Superior Técnico, Universidade de Lisboa, R. Alves Redol 9, 1000-029 Lisboa, Portugal
| | - Rui Henriques
- INESC-ID, Lisboa Portugal and Instituto Superior Técnico, Universidade de Lisboa, R. Alves Redol 9, 1000-029 Lisboa, Portugal
- Correspondence: ; Tel.: +351-21-310-0300
| | - Rafael S. Costa
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal; (D.M.G.); (R.S.C.)
- LAQV-REQUIMTE, NOVA School of Science and Technology, Campus Caparica, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| |
Collapse
|
16
|
Talatian Azad S, Ahmadi G, Rezaeipanah A. An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J EXP THEOR ARTIF IN 2021. [DOI: 10.1080/0952813x.2021.1938698] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
| | - Gholamreza Ahmadi
- Department of Computer Engineering, Persian Gulf University, Bushehr, Iran
| | - Amin Rezaeipanah
- Department of Computer Engineering, Persian Gulf University, Bushehr, Iran
| |
Collapse
|
17
|
Doyle PW, Kavoussi NL. Machine learning applications to enhance patient specific care for urologic surgery. World J Urol 2021; 40:679-686. [PMID: 34047826 DOI: 10.1007/s00345-021-03738-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 05/17/2021] [Indexed: 11/24/2022] Open
Abstract
PURPOSE As computational power has improved over the past 20 years, the daily application of machine learning methods has become more prevalent in daily life. Additionally, there is increasing interest in the clinical application of machine learning techniques. We sought to review the current literature regarding machine learning applications for patient-specific urologic surgical care. METHODS We performed a broad search of the current literature via the PubMed-Medline and Google Scholar databases up to Dec 2020. The search terms "urologic surgery" as well as "artificial intelligence", "machine learning", "neural network", and "automation" were used. RESULTS The focus of machine learning applications for patient counseling is disease-specific. For stone disease, multiple studies focused on the prediction of stone-free rate based on preoperative characteristics of clinical and imaging data. For kidney cancer, many studies focused on advanced imaging analysis to predict renal mass pathology preoperatively. Machine learning applications in prostate cancer could provide for treatment counseling as well as prediction of disease-specific outcomes. Furthermore, for bladder cancer, the reviewed studies focus on staging via imaging, to better counsel patients towards neoadjuvant chemotherapy. Additionally, there have been many efforts on automatically segmenting and matching preoperative imaging with intraoperative anatomy. CONCLUSION Machine learning techniques can be implemented to assist patient-centered surgical care and increase patient engagement within their decision-making processes. As data sets improve and expand, especially with the transition to large-scale EHR usage, these tools will improve in efficacy and be utilized more frequently.
Collapse
Affiliation(s)
- Patrick W Doyle
- Department of Urology, Vanderbilt University Medical Center, 3823 The Vanderbilt Clinic, Nashville, Tennessee, 37232, USA
| | - Nicholas L Kavoussi
- Department of Urology, Vanderbilt University Medical Center, 3823 The Vanderbilt Clinic, Nashville, Tennessee, 37232, USA.
| |
Collapse
|
18
|
Tan X, Yu Y, Duan K, Zhang J, Sun P, Sun H. Current Advances and Limitations of Deep Learning in Anticancer Drug Sensitivity Prediction. Curr Top Med Chem 2021; 20:1858-1867. [PMID: 32648840 DOI: 10.2174/1568026620666200710101307] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/02/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023]
Abstract
Anticancer drug screening can accelerate drug discovery to save the lives of cancer patients, but cancer heterogeneity makes this screening challenging. The prediction of anticancer drug sensitivity is useful for anticancer drug development and the identification of biomarkers of drug sensitivity. Deep learning, as a branch of machine learning, is an important aspect of in silico research. Its outstanding computational performance means that it has been used for many biomedical purposes, such as medical image interpretation, biological sequence analysis, and drug discovery. Several studies have predicted anticancer drug sensitivity based on deep learning algorithms. The field of deep learning has made progress regarding model performance and multi-omics data integration. However, deep learning is limited by the number of studies performed and data sources available, so it is not perfect as a pre-clinical approach for use in the anticancer drug screening process. Improving the performance of deep learning models is a pressing issue for researchers. In this review, we introduce the research of anticancer drug sensitivity prediction and the use of deep learning in this research area. To provide a reference for future research, we also review some common data sources and machine learning methods. Lastly, we discuss the advantages and disadvantages of deep learning, as well as the limitations and future perspectives regarding this approach.
Collapse
Affiliation(s)
- Xian Tan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yang Yu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Kaiwen Duan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Hui Sun
- College of Humanities and Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
19
|
Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int J Mol Sci 2021; 22:4394. [PMID: 33922356 PMCID: PMC8122817 DOI: 10.3390/ijms22094394] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine-specifically, to cancer research-and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors' predictive capacity and achieve individualised therapies in the near future.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Jorge Peña-García
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Adrian Iftene
- Faculty of Computer Science, Universitatea Alexandru Ioan Cuza (UAIC), 700505 Jashi, Romania;
| | - Fiorella Guadagni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Patrizia Ferroni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Noemi Scarpato
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Fabio Massimo Zanzotto
- Dipartimento di Ingegneria dell’Impresa “Mario Lucertini”, University of Rome Tor Vergata, 00133 Rome, Italy;
| | - Andrés Bueno-Crespo
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| |
Collapse
|
20
|
Kim H, Lee SJ, Park SJ, Choi IY, Hong SH. Machine Learning Approach to Predict the Probability of Recurrence of Renal Cell Carcinoma After Surgery: Prediction Model Development Study. JMIR Med Inform 2021; 9:e25635. [PMID: 33646127 PMCID: PMC7961397 DOI: 10.2196/25635] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/23/2021] [Accepted: 01/29/2021] [Indexed: 12/15/2022] Open
Abstract
Background Renal cell carcinoma (RCC) has a high recurrence rate of 20% to 30% after nephrectomy for clinically localized disease, and more than 40% of patients eventually die of the disease, making regular monitoring and constant management of utmost importance. Objective The objective of this study was to develop an algorithm that predicts the probability of recurrence of RCC within 5 and 10 years of surgery. Methods Data from 6849 Korean patients with RCC were collected from eight tertiary care hospitals listed in the KOrean Renal Cell Carcinoma (KORCC) web-based database. To predict RCC recurrence, analytical data from 2814 patients were extracted from the database. Eight machine learning algorithms were used to predict the probability of RCC recurrence, and the results were compared. Results Within 5 years of surgery, the highest area under the receiver operating characteristic curve (AUROC) was obtained from the naïve Bayes (NB) model, with a value of 0.836. Within 10 years of surgery, the highest AUROC was obtained from the NB model, with a value of 0.784. Conclusions An algorithm was developed that predicts the probability of RCC recurrence within 5 and 10 years using the KORCC database, a large-scale RCC cohort in Korea. It is expected that the developed algorithm will help clinicians manage prognosis and establish customized treatment strategies for patients with RCC after surgery.
Collapse
Affiliation(s)
- HyungMin Kim
- Department of Medical Informatics, College of Medicine, The Catholic University, Seoul, Republic of Korea.,Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University, Seoul, Republic of Korea
| | - Sun Jung Lee
- Department of Medical Informatics, College of Medicine, The Catholic University, Seoul, Republic of Korea.,Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University, Seoul, Republic of Korea
| | - So Jin Park
- Department of Medical Informatics, College of Medicine, The Catholic University, Seoul, Republic of Korea.,Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University, Seoul, Republic of Korea
| | - In Young Choi
- Department of Medical Informatics, College of Medicine, The Catholic University, Seoul, Republic of Korea.,Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University, Seoul, Republic of Korea
| | - Sung-Hoo Hong
- Department of Urology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University, Seoul, Republic of Korea
| |
Collapse
|
21
|
Chen JB, Yang HS, Moi SH, Chuang LY, Yang CH. Identification of mortality-risk-related missense variant for renal clear cell carcinoma using deep learning. Ther Adv Chronic Dis 2021; 12:2040622321992624. [PMID: 33643601 PMCID: PMC7890720 DOI: 10.1177/2040622321992624] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 01/13/2021] [Indexed: 11/24/2022] Open
Abstract
Introduction: Kidney renal clear cell carcinoma (KIRCC) is a highly heterogeneous and lethal cancer that can arise in patients with renal disease. DeepSurv combines a deep feed-forward neural network with a Cox proportional hazards function and could provide optimized survival results compared with convenient survival analysis. Methods: This study used an improved DeepSurv algorithm to identify the candidate genes to be targeted for treatment on the basis of the overall mortality status of KIRCC subjects. All the somatic mutation missense variants of KIRCC subjects were abstracted from TCGA-KIRC database. Results: The improved DeepSurv model (95.1%) achieved greater balanced accuracy compared with the DeepSurv model (75%), and identified 610 high-risk variants associated with overall mortality. The results of gene differential expression analysis also indicated nine KIRCC mortality-risk-related pathways, namely the tRNA charging pathway, the D-myo-inositol-5-phosphate metabolism pathway, the DNA double-strand break repair by nonhomologous end-joining pathway, the superpathway of inositol phosphate compounds, the 3-phosphoinositide degradation pathway, the production of nitric oxide and reactive oxygen species in macrophages pathway, the synaptic long-term depression pathway, the sperm motility pathway, and the role of JAK2 in hormone-like cytokine signaling pathway. The biological findings in this study indicate the KIRCC mortality-risk-related pathways were more likely to be associated with cancer cell growth, cancer cell differentiation, and immune response inhibition. Conclusion: The results proved that the improved DeepSurv model effectively classified mortality-related high-risk variants and identified the candidate genes. In the context of KIRCC overall mortality, the proposed model effectively recognized mortality-related high-risk variants for KIRCC.
Collapse
Affiliation(s)
- Jin-Bor Chen
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung
| | - Huai-Shuo Yang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung
| | - Sin-Hua Moi
- Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung
| | - Li-Yeh Chuang
- Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung
| | - Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, 415 Jiangong Road, San-Min District, Kaohsiung, 82444
| |
Collapse
|
22
|
Sargos P, Leduc N, Giraud N, Gandaglia G, Roumiguié M, Ploussard G, Rozet F, Soulié M, Mathieu R, Artus PM, Niazi T, Vinh-Hung V, Beauval JB. Deep Neural Networks Outperform the CAPRA Score in Predicting Biochemical Recurrence After Prostatectomy. Front Oncol 2021; 10:607923. [PMID: 33643910 PMCID: PMC7906005 DOI: 10.3389/fonc.2020.607923] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 12/14/2020] [Indexed: 01/16/2023] Open
Abstract
Background Use of predictive models for the prediction of biochemical recurrence (BCR) is gaining attention for prostate cancer (PCa). Specifically, BCR occurs in approximately 20–40% of patients five years after radical prostatectomy (RP) and the ability to predict BCR may help clinicians to make better treatment decisions. We aim to investigate the accuracy of CAPRA score compared to others models in predicting the 3-year BCR of PCa patients. Material and Methods A total of 5043 men who underwent RP were analyzed retrospectively. The accuracy of CAPRA score, Cox regression analysis, logistic regression, K-nearest neighbor (KNN), random forest (RF) and a densely connected feed-forward neural network (DNN) classifier were compared in terms of 3-year BCR predictive value. The area under the receiver operating characteristic curve was mainly used to assess the performance of the predictive models in predicting the 3 years BCR of PCa patients. Pre-operative data such as PSA level, Gleason grade, and T stage were included in the multivariate analysis. To measure potential improvements to the model performance due to additional data, each model was trained once more with an additional set of post-operative surgical data from definitive pathology. Results Using the CAPRA score variables, DNN predictive model showed the highest AUC value of 0.7 comparing to the CAPRA score, logistic regression, KNN, RF, and cox regression with 0.63, 0.63, 0.55, 0.64, and 0.64, respectively. After including the post-operative variables to the model, the AUC values based on KNN, RF, and cox regression and DNN were improved to 0.77, 0.74, 0.75, and 0.84, respectively. Conclusions Our results showed that the DNN has the potential to predict the 3-year BCR and outperformed the CAPRA score and other predictive models.
Collapse
Affiliation(s)
- Paul Sargos
- Department of Radiation Oncology, Institut Bergonié, Bordeaux, France
| | - Nicolas Leduc
- Department of Radiation Oncology, Institut Bergonié, Bordeaux, France
| | - Nicolas Giraud
- Division of Radiation Oncology, Department of Oncology, McGill University, Montreal, QC, Canada
| | - Giorgio Gandaglia
- Division of Oncology, Unit of Urology, Urological Research Institute, IRCCS Ospedale San Raffaele, Milan, Italy
| | | | | | - Francois Rozet
- Department of Urology, Institut Mutualiste Montsouris, Paris, France
| | - Michel Soulié
- Department of Urology, CHU de Toulouse, Toulouse, France
| | | | | | - Tamim Niazi
- Division of Radiation Oncology, Department of Oncology, McGill University, Montreal, QC, Canada
| | - Vincent Vinh-Hung
- Department of Radiation Oncology, Hôpital Clarac, CHU de la Martinique, Fort-de-France, France
| | | |
Collapse
|
23
|
Wang J, Chen N, Guo J, Xu X, Liu L, Yi Z. SurvNet: A Novel Deep Neural Network for Lung Cancer Survival Analysis With Missing Values. Front Oncol 2021; 10:588990. [PMID: 33552965 PMCID: PMC7855857 DOI: 10.3389/fonc.2020.588990] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 12/04/2020] [Indexed: 02/05/2023] Open
Abstract
Survival analysis is important for guiding further treatment and improving lung cancer prognosis. It is a challenging task because of the poor distinguishability of features and the missing values in practice. A novel multi-task based neural network, SurvNet, is proposed in this paper. The proposed SurvNet model is trained in a multi-task learning framework to jointly learn across three related tasks: input reconstruction, survival classification, and Cox regression. It uses an input reconstruction mechanism cooperating with incomplete-aware reconstruction loss for latent feature learning of incomplete data with missing values. Besides, the SurvNet model introduces a context gating mechanism to bridge the gap between survival classification and Cox regression. A new real-world dataset of 1,137 patients with IB-IIA stage non-small cell lung cancer is collected to evaluate the performance of the SurvNet model. The proposed SurvNet achieves a higher concordance index than the traditional Cox model and Cox-Net. The difference between high-risk and low-risk groups obtained by SurvNet is more significant than that of high-risk and low-risk groups obtained by the other models. Moreover, the SurvNet outperforms the other models even though the input data is randomly cropped and it achieves better generalization performance on the Surveillance, Epidemiology, and End Results Program (SEER) dataset.
Collapse
Affiliation(s)
- Jianyong Wang
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| | - Nan Chen
- Department of Thoracic Surgery, West China Hospital and West China School of Medicine, Sichuan University, Chengdu, China
| | - Jixiang Guo
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| | - Xiuyuan Xu
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| | - Lunxu Liu
- Department of Thoracic Surgery, West China Hospital and West China School of Medicine, Sichuan University, Chengdu, China
| | - Zhang Yi
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
24
|
Momenzadeh N, Hafezalseheh H, Nayebpour M, Fathian M, Noorossana R. A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100763] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
|
25
|
Vittrant B, Leclercq M, Martin-Magniette ML, Collins C, Bergeron A, Fradet Y, Droit A. Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer. Front Genet 2020; 11:550894. [PMID: 33324443 PMCID: PMC7723980 DOI: 10.3389/fgene.2020.550894] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 10/29/2020] [Indexed: 01/31/2023] Open
Abstract
Determining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients.
Collapse
Affiliation(s)
- Benjamin Vittrant
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Mickael Leclercq
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Marie-Laure Martin-Magniette
- Universities of Paris Saclay, Paris, Evry, CNRS, INRAE, Institute of Plant Sciences Paris Saclay (IPS2), 91192, GIf sur Yvette, France.,UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Colin Collins
- Vancouver Prostate Cancer Centre, Vancouver, BC, Canada.,Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Alain Bergeron
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Yves Fradet
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Arnaud Droit
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| |
Collapse
|
26
|
Abstract
The re-kindled fascination in machine learning (ML), observed over the last few decades, has also percolated into natural sciences and engineering. ML algorithms are now used in scientific computing, as well as in data-mining and processing. In this paper, we provide a review of the state-of-the-art in ML for computational science and engineering. We discuss ways of using ML to speed up or improve the quality of simulation techniques such as computational fluid dynamics, molecular dynamics, and structural analysis. We explore the ability of ML to produce computationally efficient surrogate models of physical applications that circumvent the need for the more expensive simulation techniques entirely. We also discuss how ML can be used to process large amounts of data, using as examples many different scientific fields, such as engineering, medicine, astronomy and computing. Finally, we review how ML has been used to create more realistic and responsive virtual reality applications.
Collapse
|
27
|
Brunese L, Mercaldo F, Reginelli A, Santone A. An ensemble learning approach for brain cancer detection exploiting radiomic features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 185:105134. [PMID: 31675644 DOI: 10.1016/j.cmpb.2019.105134] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 09/27/2019] [Accepted: 10/15/2019] [Indexed: 05/03/2023]
Abstract
BACKGROUND AND OBJECTIVE The brain cancer is one of the most aggressive tumour: the 70% of the patients diagnosed with this malignant cancer will not survive. Early detection of brain tumours can be fundamental to increase survival rates. The brain cancers are classified into four different grades (i.e., I, II, III and IV) according to how normal or abnormal the brain cells look. The following work aims to recognize the different brain cancer grades by analysing brain magnetic resonance images. METHODS A method to identify the components of an ensemble learner is proposed. The ensemble learner is focused on the discrimination between different brain cancer grades using non invasive radiomic features. The considered radiomic features are belonging to five different groups: First Order, Shape, Gray Level Co-occurrence Matrix, Gray Level Run Length Matrix and Gray Level Size Zone Matrix. We evaluate the features effectiveness through hypothesis testing and through decision boundaries, performance analysis and calibration plots thus we select the best candidate classifiers for the ensemble learner. RESULTS We evaluate the proposed method with 111,205 brain magnetic resonances belonging to two freely available data-sets for research purposes. The results are encouraging: we obtain an accuracy of 99% for the benign grade I and the II, III and IV malignant brain cancer detection. CONCLUSION The experimental results confirm that the ensemble learner designed with the proposed method outperforms the current state-of-the-art approaches in brain cancer grade detection starting from magnetic resonance images.
Collapse
Affiliation(s)
- Luca Brunese
- Department of Medicine and Health Sciences "Vincenzo Tiberio", University of Molise, Campobasso, Italy
| | - Francesco Mercaldo
- Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy; Department of Biosciences and Territory, University of Molise, Pesche (IS), Italy.
| | - Alfonso Reginelli
- Department of Precision Medicine, University of Campania "Luigi Vanvitelli", Napoli, Italy
| | - Antonella Santone
- Department of Biosciences and Territory, University of Molise, Pesche (IS), Italy
| |
Collapse
|
28
|
Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak 2019; 19:281. [PMID: 31864346 PMCID: PMC6925840 DOI: 10.1186/s12911-019-1004-8] [Citation(s) in RCA: 363] [Impact Index Per Article: 72.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 12/11/2019] [Indexed: 12/17/2022] Open
Abstract
Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.
Collapse
Affiliation(s)
- Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Room 524, SIT Building (J12), Darlington, NSW, 2008, Australia.
| | - Arif Khan
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Room 524, SIT Building (J12), Darlington, NSW, 2008, Australia.,Health Market Quality Research Stream, Capital Markets CRC, Level 3, 55 Harrington Street, Sydney, NSW, Australia
| | - Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Room 524, SIT Building (J12), Darlington, NSW, 2008, Australia
| | - Mohammad Ali Moni
- Faculty of Medicine and Health, School of Medical Sciences, The University of Sydney, Camperdown, NSW, 2006, Australia
| |
Collapse
|
29
|
Sohail A. INFERENCE OF BIOMEDICAL DATA SETS USING BAYESIAN MACHINE LEARNING. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2019. [DOI: 10.4015/s1016237219500303] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Due to the advancement in data collection and maintenance strategies, the current clinical databases around the globe are rich in a sense that these contain detailed information not only about the individual’s medical conditions, but also about the environmental features, associated with the individual. Classification within this data could provide new medical insights. Data mining technology has become an attraction for researchers due to its affectivity and efficacy in the field of biomedicine research. Due to the diverse structure of such data sets, only few successful techniques and easy to use softwares, are available in literature. A Bayesian analysis provides a more intuitive statement of probability that hypothesis is true. Bayesian approach uses all available information and can give answers to complex questions more accurately. This means that Bayesian methods include prior information. In Bayesian analysis, no relevant information is excluded as prior represents all the available information apart from data itself. Bayesian techniques are specifically used for decision making. Uncertainty is the main hurdle in making decisions. Due to lack of information about relevant parameters, there is uncertainty about given decision. Bayesian methods measure these uncertainties by using probability. In this study, selected techniques of biostatistical Bayesian inference (the probability based inferencing approach, to identify uncertainty in databases) are discussed. To show the efficiency of a Hybrid technique, its application on two distinct data sets is presented in a novel way.
Collapse
Affiliation(s)
- Ayesha Sohail
- Department of Mathematics, Comsats Institute of Information Technology, Lahore 54000, Pakistan
- Department of Mathematics, The University of Sheffield, Hounsfield Road, S3 7RH UK
| |
Collapse
|
30
|
Li X, Duan F, Bennett I, Mba D. Canonical variate analysis, probability approach and support vector regression for fault identification and failure time prediction. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169550] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Xiaochuan Li
- Faculty of Technology, De Montfort University, Leicester, UK
| | - Fang Duan
- School of Engineering, London South Bank University, London, UK
| | - Ian Bennett
- Department of Rotating Equipment, Royal Dutch Shell, Hague, AN, The Netherlands
| | - David Mba
- Faculty of Technology, De Montfort University, Leicester, UK
| |
Collapse
|
31
|
Sikora M, Wróbel Ł. Censoring Weighted Separate-and-Conquer Rule Induction from Survival Data. Methods Inf Med 2018; 53:137-48. [DOI: 10.3414/me13-01-0046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 12/20/2013] [Indexed: 11/09/2022]
Abstract
SummaryObjectives: Rule induction is one of the major methods of machine learning. Rule-based models can be easily read and interpreted by humans, that makes them particularly useful in survival studies as they can help clinicians to better understand analysed data and make informed decisions about patient treatment. Although of such usefulness, there is still a little research on rule learning in survival analysis. In this paper we take a step towards rule-based analysis of survival data.Methods: We investigate so-called covering or separate-and-conquer method of rule induction in combination with a weighting scheme for handling censored observations. We also focus on rule quality measures being one of the key elements differentiating particular implementations of separate-and-conquer rule induction algorithms. We examine 15 rule quality measures guiding rule induction process and reflecting a wide range of different rule learning heuristics.Results: The algorithm is extensively tested on a collection of 20 real survival datasets and compared with the state-of-the-art survival trees and random survival forests algorithms. Most of the rule quality measures outperform Kaplan-Meier estimate and perform at least equally well as tree-based algorithms.Conclusions: Separate-and-conquer rule induction in combination with weighting scheme is an effective technique for building rule-based models of survival data which, according to predictive accuracy, are competitive with tree-based representations.
Collapse
|
32
|
Complete hazard ranking to analyze right-censored data: An ALS survival study. PLoS Comput Biol 2017; 13:e1005887. [PMID: 29253881 PMCID: PMC5749893 DOI: 10.1371/journal.pcbi.1005887] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 01/02/2018] [Accepted: 11/21/2017] [Indexed: 12/11/2022] Open
Abstract
Survival analysis represents an important outcome measure in clinical research and clinical trials; further, survival ranking may offer additional advantages in clinical trials. In this study, we developed GuanRank, a non-parametric ranking-based technique to transform patients' survival data into a linear space of hazard ranks. The transformation enables the utilization of machine learning base-learners including Gaussian process regression, Lasso, and random forest on survival data. The method was submitted to the DREAM Amyotrophic Lateral Sclerosis (ALS) Stratification Challenge. Ranked first place, the model gave more accurate ranking predictions on the PRO-ACT ALS dataset in comparison to Cox proportional hazard model. By utilizing right-censored data in its training process, the method demonstrated its state-of-the-art predictive power in ALS survival ranking. Its feature selection identified multiple important factors, some of which conflicts with previous studies.
Collapse
|
33
|
Abstract
Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.
Collapse
Affiliation(s)
- Joseph A. Cruz
- Departments of Biological Science and Computing Science, University of Alberta Edmonton, AB, Canada T6G 2E8
| | - David S. Wishart
- Departments of Biological Science and Computing Science, University of Alberta Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
34
|
Attallah O, Karthikesalingam A, Holt PJ, Thompson MM, Sayers R, Bown MJ, Choke EC, Ma X. Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection. Proc Inst Mech Eng H 2017; 231:1048-1063. [PMID: 28925817 DOI: 10.1177/0954411917731592] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan.
Collapse
Affiliation(s)
- Omneya Attallah
- 1 Department of Electronics and Communications, College of Engineering and Technology, Arab Academy for Science and Technology, Alexandria, Egypt.,2 School of Engineering and Applied Science, Aston University, Birmingham, UK
| | - Alan Karthikesalingam
- 3 St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, London, UK
| | - Peter Je Holt
- 3 St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, London, UK
| | - Matthew M Thompson
- 3 St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, London, UK
| | - Rob Sayers
- 4 NIHR Leicester Cardiovascular Biomedical Research Unit and Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | - Matthew J Bown
- 4 NIHR Leicester Cardiovascular Biomedical Research Unit and Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | - Eddie C Choke
- 4 NIHR Leicester Cardiovascular Biomedical Research Unit and Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | - Xianghong Ma
- 2 School of Engineering and Applied Science, Aston University, Birmingham, UK
| |
Collapse
|
35
|
Gómez I, Ribelles N, Franco L, Alba E, Jerez JM. Supervised discretization can discover risk groups in cancer survival analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 136:11-19. [PMID: 27686699 DOI: 10.1016/j.cmpb.2016.08.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Revised: 07/07/2016] [Accepted: 08/12/2016] [Indexed: 06/06/2023]
Abstract
Discretization of continuous variables is a common practice in medical research to identify risk patient groups. This work compares the performance of gold-standard categorization procedures (TNM+A protocol) with that of three supervised discretization methods from Machine Learning (CAIM, ChiM and DTree) in the stratification of patients with breast cancer. The performance for the discretization algorithms was evaluated based on the results obtained after applying standard survival analysis procedures such as Kaplan-Meier curves, Cox regression and predictive modelling. The results show that the application of alternative discretization algorithms could lead the clinicians to get valuable information for the diagnosis and outcome of the disease. Patient data were collected from the Medical Oncology Service of the Hospital Clínico Universitario (Málaga, Spain) considering a follow up period from 1982 to 2008.
Collapse
Affiliation(s)
- Iván Gómez
- Computer Science Department, University of Málaga, Campus de Teatinos S/N, 29071 Málaga, Spain; Málaga Biomedical Research Institute (IBIMA), Málaga, Spain.
| | - Nuria Ribelles
- Málaga Biomedical Research Institute (IBIMA), Málaga, Spain; Virgen de la Victoria Oncology Service, Málaga, Campus de Teatinos S/N, 29071 Málaga, Spain
| | - Leonardo Franco
- Computer Science Department, University of Málaga, Campus de Teatinos S/N, 29071 Málaga, Spain; Málaga Biomedical Research Institute (IBIMA), Málaga, Spain
| | - Emilio Alba
- Málaga Biomedical Research Institute (IBIMA), Málaga, Spain; Virgen de la Victoria Oncology Service, Málaga, Campus de Teatinos S/N, 29071 Málaga, Spain
| | - José M Jerez
- Computer Science Department, University of Málaga, Campus de Teatinos S/N, 29071 Málaga, Spain; Málaga Biomedical Research Institute (IBIMA), Málaga, Spain
| |
Collapse
|
36
|
Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med 2016; 44:368-74. [PMID: 26771782 DOI: 10.1097/ccm.0000000000001571] [Citation(s) in RCA: 339] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
OBJECTIVE Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. DESIGN Observational cohort study. SETTING Five hospitals, from November 2008 until January 2013. PATIENTS Hospitalized ward patients INTERVENTIONS None MEASUREMENTS AND MAIN RESULTS Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]). CONCLUSIONS In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.
Collapse
|
37
|
Taslimitehrani V, Dong G, Pereira NL, Panahiazar M, Pathak J. Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function. J Biomed Inform 2016; 60:260-9. [PMID: 26844760 PMCID: PMC4886658 DOI: 10.1016/j.jbi.2016.01.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 01/12/2016] [Accepted: 01/20/2016] [Indexed: 11/30/2022]
Abstract
Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases.
Collapse
Affiliation(s)
- Vahid Taslimitehrani
- Department of Computer Science and Engineering, Kno.e.sis Center, Wright State University, Dayton, OH, USA; Division of Health Informatics, Weill Cornell Medical College, New York, NY, USA.
| | - Guozhu Dong
- Department of Computer Science and Engineering, Kno.e.sis Center, Wright State University, Dayton, OH, USA
| | - Naveen L Pereira
- Division of Cardiovascular Diseases and Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Maryam Panahiazar
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, CA, USA
| | - Jyotishman Pathak
- Division of Health Informatics, Weill Cornell Medical College, New York, NY, USA
| |
Collapse
|
38
|
Early-Stage Event Prediction for Longitudinal Data. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2016. [DOI: 10.1007/978-3-319-31753-3_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
39
|
Wolfson J, Bandyopadhyay S, Elidrisi M, Vazquez-Benitez G, Vock DM, Musgrove D, Adomavicius G, Johnson PE, O'Connor PJ. A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat Med 2015; 34:2941-57. [PMID: 25980520 PMCID: PMC4523419 DOI: 10.1002/sim.6526] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 03/24/2015] [Accepted: 04/19/2015] [Indexed: 01/08/2023]
Abstract
Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.
Collapse
Affiliation(s)
- Julian Wolfson
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, U.S.A
| | - Sunayan Bandyopadhyay
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, U.S.A
| | - Mohamed Elidrisi
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, U.S.A
| | | | - David M Vock
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, U.S.A
| | - Donald Musgrove
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, U.S.A
| | - Gediminas Adomavicius
- Department of Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, MN, U.S.A
| | - Paul E Johnson
- Department of Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, MN, U.S.A
| | - Patrick J O'Connor
- HealthPartners Institute for Education and Research, Minneapolis, MN, U.S.A
| |
Collapse
|
40
|
Attallah O, Ma X. Bayesian neural network approach for determining the risk of re-intervention after endovascular aortic aneurysm repair. Proc Inst Mech Eng H 2014; 228:857-66. [PMID: 25212212 DOI: 10.1177/0954411914549980] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This article proposes a Bayesian neural network approach to determine the risk of re-intervention after endovascular aortic aneurysm repair surgery. The target of proposed technique is to determine which patients have high chance to re-intervention (high-risk patients) and which are not (low-risk patients) after 5 years of the surgery. Two censored datasets relating to the clinical conditions of aortic aneurysms have been collected from two different vascular centers in the United Kingdom. A Bayesian network was first employed to solve the censoring issue in the datasets. Then, a back propagation neural network model was built using the uncensored data of the first center to predict re-intervention on the second center and classify the patients into high-risk and low-risk groups. Kaplan-Meier curves were plotted for each group of patients separately to show whether there is a significant difference between the two risk groups. Finally, the logrank test was applied to determine whether the neural network model was capable of predicting and distinguishing between the two risk groups. The results show that the Bayesian network used for uncensoring the data has improved the performance of the neural networks that were built for the two centers separately. More importantly, the neural network that was trained with uncensored data of the first center was able to predict and discriminate between groups of low risk and high risk of re-intervention after 5 years of endovascular aortic aneurysm surgery at center 2 (p = 0.0037 in the logrank test).
Collapse
Affiliation(s)
- Omneya Attallah
- Department of Electronics and Communications Engineering, Arab Academy for Science, Technology & Maritime Transport, Alexandria, Egypt School of Engineering and Applied Science, Aston University, Birmingham, UK
| | - Xianghong Ma
- School of Engineering and Applied Science, Aston University, Birmingham, UK
| |
Collapse
|
41
|
Clinical prognostic methods: Trends and developments. J Biomed Inform 2014; 48:1-4. [DOI: 10.1016/j.jbi.2014.02.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 02/28/2014] [Indexed: 02/04/2023]
|
42
|
Abstract
The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.
Collapse
|
43
|
A gradient boosting algorithm for survival analysis via direct optimization of concordance index. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:873595. [PMID: 24348746 PMCID: PMC3853154 DOI: 10.1155/2013/873595] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 10/08/2013] [Indexed: 01/15/2023]
Abstract
Survival analysis focuses on modeling and predicting the time to an event of interest. Many
statistical models have been proposed for survival analysis. They often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In particular, the prevalent proportional hazards model assumes that covariates are multiplicatively related to the hazard. Here we propose a nonparametric model for survival analysis that does not explicitly assume particular forms of hazard functions. Our nonparametric model utilizes an ensemble of regression trees to determine how the hazard function varies according to the associated covariates. The ensemble model is trained using a gradient boosting method to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model performance evaluation. We implemented our model in a software package called GBMCI (gradient boosting machine for concordance index) and benchmarked the performance of our model against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. GBMCI is implemented in R and is freely available online.
Collapse
|
44
|
Hijazi H, Chan C. A classification framework applied to cancer gene expression profiles. JOURNAL OF HEALTHCARE ENGINEERING 2013; 4:255-83. [PMID: 23778014 DOI: 10.1260/2040-2295.4.2.255] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.
Collapse
Affiliation(s)
- Hussein Hijazi
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
| | | |
Collapse
|
45
|
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2012.09.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
46
|
Learning Bayesian networks from survival data using weighting censored instances. J Biomed Inform 2010; 43:613-22. [DOI: 10.1016/j.jbi.2010.03.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2009] [Revised: 01/05/2010] [Accepted: 03/16/2010] [Indexed: 11/24/2022]
|
47
|
Using Decision Trees for the Semi-automatic Development of Medical Data Patterns: A Computer-Supported Framework. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-1-4419-1274-9_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
48
|
Štajduhar I, Dalbelo-Bašić B, Bogunović N. Impact of censoring on learning Bayesian networks in survival modelling. Artif Intell Med 2009; 47:199-217. [DOI: 10.1016/j.artmed.2009.08.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2008] [Revised: 01/12/2009] [Accepted: 08/28/2009] [Indexed: 02/06/2023]
|
49
|
Wu P, Koistinen H, Finne P, Zhang W, Zhu L, Leinonen J, Stenman U. Advances in Prostate‐Specific Antigen Testing. Adv Clin Chem 2006; 41:231-261. [DOI: 10.1016/s0065-2423(05)41007-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
50
|
|