1
|
Wilson SB, Ward J, Munjal V, Lam CSA, Patel M, Zhang P, Xu DS, Chakravarthy VB. Machine Learning in Spine Oncology: A Narrative Review. Global Spine J 2025; 15:210-227. [PMID: 38860699 PMCID: PMC11571526 DOI: 10.1177/21925682241261342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/12/2024] Open
Abstract
STUDY DESIGN Narrative Review. OBJECTIVE Machine learning (ML) is one of the latest advancements in artificial intelligence used in medicine and surgery with the potential to significantly impact the way physicians diagnose, prognose, and treat spine tumors. In the realm of spine oncology, ML is utilized to analyze and interpret medical imaging and classify tumors with incredible accuracy. The authors present a narrative review that specifically addresses the use of machine learning in spine oncology. METHODS This study was conducted in accordance with the Preferred Reporting Items of Systematic Reviews and Meta-Analysis (PRISMA) methodology. A systematic review of the literature in the PubMed, EMBASE, Web of Science, Scopus, and Cochrane Library databases since inception was performed to present all clinical studies with the search terms '[[Machine Learning] OR [Artificial Intelligence]] AND [[Spine Oncology] OR [Spine Cancer]]'. Data included studies that were extracted and included algorithms, training and test size, outcomes reported. Studies were separated based on the type of tumor investigated using the machine learning algorithms into primary, metastatic, both, and intradural. A minimum of 2 independent reviewers conducted the study appraisal, data abstraction, and quality assessments of the studies. RESULTS Forty-five studies met inclusion criteria out of 480 references screened from the initial search results. Studies were grouped by metastatic, primary, and intradural tumors. The majority of ML studies relevant to spine oncology focused on utilizing a mixture of clinical and imaging features to risk stratify mortality and frailty. Overall, these studies showed that ML is a helpful tool in tumor detection, differentiation, segmentation, predicting survival, predicting readmission rates of patients with either primary, metastatic, or intradural spine tumors. CONCLUSION Specialized neural networks and deep learning algorithms have shown to be highly effective at predicting malignant probability and aid in diagnosis. ML algorithms can predict the risk of tumor recurrence or progression based on imaging and clinical features. Additionally, ML can optimize treatment planning, such as predicting radiotherapy dose distribution to the tumor and surrounding normal tissue or in surgical resection planning. It has the potential to significantly enhance the accuracy and efficiency of health care delivery, leading to improved patient outcomes.
Collapse
Affiliation(s)
- Seth B. Wilson
- Department of Neurosurgery, The Ohio State University, Columbus, OH, USA
| | - Jacob Ward
- Department of Neurosurgery, The Ohio State University, Columbus, OH, USA
| | - Vikas Munjal
- Department of Neurosurgery, The Ohio State University, Columbus, OH, USA
| | | | - Mayur Patel
- Department of Neurosurgery, The Ohio State University, Columbus, OH, USA
| | - Ping Zhang
- Department of Computer Science and Engineering, The Ohio State University College of Engineering, Columbus, OH, USA
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - David S. Xu
- Department of Neurosurgery, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
2
|
Luu HS. Laboratory Data as a Potential Source of Bias in Healthcare Artificial Intelligence and Machine Learning Models. Ann Lab Med 2025; 45:12-21. [PMID: 39444135 PMCID: PMC11609702 DOI: 10.3343/alm.2024.0323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 09/10/2024] [Accepted: 10/18/2024] [Indexed: 10/25/2024] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) are anticipated to transform the practice of medicine. As one of the largest sources of digital data in healthcare, laboratory results can strongly influence AI and ML algorithms that require large sets of healthcare data for training. Embedded bias introduced into AI and ML models not only has disastrous consequences for quality of care but also may perpetuate and exacerbate health disparities. The lack of test harmonization, which is defined as the ability to produce comparable results and the same interpretation irrespective of the method or instrument platform used to produce the result, may introduce aggregation bias into algorithms with potential adverse outcomes for patients. Limited interoperability of laboratory results at the technical, syntactic, semantic, and organizational levels is a source of embedded bias that limits the accuracy and generalizability of algorithmic models. Population-specific issues, such as inadequate representation in clinical trials and inaccurate race attribution, not only affect the interpretation of laboratory results but also may perpetuate erroneous conclusions based on AI and ML models in the healthcare literature.
Collapse
Affiliation(s)
- Hung S. Luu
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
3
|
Li X, Wu Q, Chen Y, Jin Y, Ma J, Yang J. Memristor-based Bayesian spiking neural network for IBD diagnosis. Knowl Based Syst 2024; 300:112099. [DOI: 10.1016/j.knosys.2024.112099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
|
4
|
Herskovits AZ, Newman T, Nicholas K, Colorado-Jimenez CF, Perry CE, Valentino A, Wagner I, Egan B, Gorenshteyn D, Vickers AJ, Pessin MS. Comparing Clinician Estimates versus a Statistical Tool for Predicting Risk of Death within 45 Days of Admission for Cancer Patients. Appl Clin Inform 2024; 15:489-500. [PMID: 38925539 PMCID: PMC11208110 DOI: 10.1055/s-0044-1787185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/29/2024] [Indexed: 06/28/2024] Open
Abstract
OBJECTIVES While clinical practice guidelines recommend that oncologists discuss goals of care with patients who have advanced cancer, it is estimated that less than 20% of individuals admitted to the hospital with high-risk cancers have end-of-life discussions with their providers. While there has been interest in developing models for mortality prediction to trigger such discussions, few studies have compared how such models compare with clinical judgment to determine a patient's mortality risk. METHODS This study is a prospective analysis of 1,069 solid tumor medical oncology hospital admissions (n = 911 unique patients) from February 7 to June 7, 2022, at Memorial Sloan Kettering Cancer Center. Electronic surveys were sent to hospitalists, advanced practice providers, and medical oncologists the first afternoon following a hospital admission and they were asked to estimate the probability that the patient would die within 45 days. Provider estimates of mortality were compared with those from a predictive model developed using a supervised machine learning methodology, and incorporated routine laboratory, demographic, biometric, and admission data. Area under the receiver operating characteristic curve (AUC), calibration and decision curves were compared between clinician estimates and the model predictions. RESULTS Within 45 days following hospital admission, 229 (25%) of 911 patients died. The model performed better than the clinician estimates (AUC 0.834 vs. 0.753, p < 0.0001). Integrating clinician predictions with the model's estimates further increased the AUC to 0.853 (p < 0.0001). Clinicians overestimated risk whereas the model was extremely well-calibrated. The model demonstrated net benefit over a wide range of threshold probabilities. CONCLUSION The inpatient prognosis at admission model is a robust tool to assist clinical providers in evaluating mortality risk, and it has recently been implemented in the electronic medical record at our institution to improve end-of-life care planning for hospitalized cancer patients.
Collapse
Affiliation(s)
- Adrianna Z. Herskovits
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Tiffanny Newman
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Kevin Nicholas
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Cesar F. Colorado-Jimenez
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Claire E. Perry
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Alisa Valentino
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Isaac Wagner
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Barbara Egan
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | | | - Andrew J. Vickers
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Melissa S. Pessin
- Department of Pathology, University of Chicago, Chicago, Illinois, United States
| |
Collapse
|
5
|
Chae S, Street WN, Ramaraju N, Gilbertson-White S. Prediction of Cancer Symptom Trajectory Using Longitudinal Electronic Health Record Data and Long Short-Term Memory Neural Network. JCO Clin Cancer Inform 2024; 8:e2300039. [PMID: 38471054 PMCID: PMC10948138 DOI: 10.1200/cci.23.00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 12/08/2023] [Accepted: 01/26/2024] [Indexed: 03/14/2024] Open
Abstract
PURPOSE Ability to predict symptom severity and progression across treatment trajectories would allow clinicians to provide timely intervention and treatment planning. However, such predictions are difficult because of sparse and inconsistent assessment, and simplistic measures such as the last observed symptom severity are often used. The purpose of this study is to develop a model for predicting future cancer symptom experiences on the basis of past symptom experiences. PATIENTS AND METHODS We performed a retrospective, longitudinal analysis using records of patients with cancer (n = 208) hospitalized between 2008 and 2014. A long short-term memory (LSTM)-based recurrent neural network, a linear regression, and random forest models were trained on previous symptoms experienced and used to predict future symptom trajectories. RESULTS We found that at least one of three tested models (LSTM, linear regression, and random forest) outperform predictions based solely on the previous clinical observation. LSTM models significantly outperformed linear regression and random forest models in predicting nausea (P < .1) and psychosocial status (P < .01). Linear regression outperformed all models when predicting oral health (P < .01), while random forest outperformed all models when predicting mobility (P < .01) and nutrition (P < .01). CONCLUSION We can successfully predict patients' symptom trajectories with a prediction model, built with sparse assessment data, using routinely collected nursing documentation. The results of this project can be applied to better individualize symptom management to support cancer patients' quality of life.
Collapse
Affiliation(s)
- Sena Chae
- The University of Iowa College of Nursing, Iowa City, IA
| | - W. Nick Street
- The University of Iowa Tippie College of Business, Iowa City, IA
| | - Naveenkumar Ramaraju
- University of Illinois Urbana-Champaign, Gies College of Business, Champaign, IL
| | | |
Collapse
|
6
|
Galadima H, Anson-Dwamena R, Johnson A, Bello G, Adunlin G, Blando J. Machine Learning as a Tool for Early Detection: A Focus on Late-Stage Colorectal Cancer across Socioeconomic Spectrums. Cancers (Basel) 2024; 16:540. [PMID: 38339293 PMCID: PMC10854986 DOI: 10.3390/cancers16030540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
PURPOSE To assess the efficacy of various machine learning (ML) algorithms in predicting late-stage colorectal cancer (CRC) diagnoses against the backdrop of socio-economic and regional healthcare disparities. METHODS An innovative theoretical framework was developed to integrate individual- and census tract-level social determinants of health (SDOH) with sociodemographic factors. A comparative analysis of the ML models was conducted using key performance metrics such as AUC-ROC to evaluate their predictive accuracy. Spatio-temporal analysis was used to identify disparities in late-stage CRC diagnosis probabilities. RESULTS Gradient boosting emerged as the superior model, with the top predictors for late-stage CRC diagnosis being anatomic site, year of diagnosis, age, proximity to superfund sites, and primary payer. Spatio-temporal clusters highlighted geographic areas with a statistically significant high probability of late-stage diagnoses, emphasizing the need for targeted healthcare interventions. CONCLUSIONS This research underlines the potential of ML in enhancing the prognostic predictions in oncology, particularly in CRC. The gradient boosting model, with its robust performance, holds promise for deployment in healthcare systems to aid early detection and formulate localized cancer prevention strategies. The study's methodology demonstrates a significant step toward utilizing AI in public health to mitigate disparities and improve cancer care outcomes.
Collapse
Affiliation(s)
- Hadiza Galadima
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Rexford Anson-Dwamena
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Ashley Johnson
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Ghalib Bello
- Department of Environmental Medicine & Public Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
| | - Georges Adunlin
- Department of Pharmaceutical, Social and Administrative Sciences, Samford University, Birmingham, AL 35229, USA;
| | - James Blando
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| |
Collapse
|
7
|
Treleaven L, Komesaroff P, La Brooy C, Olver I, Kerridge I, Philip J. A review of the utility of prognostic tools in predicting 6-month mortality in cancer patients, conducted in the context of voluntary assisted dying. Intern Med J 2023; 53:2180-2197. [PMID: 37029711 DOI: 10.1111/imj.16081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 03/07/2023] [Indexed: 04/09/2023]
Abstract
BACKGROUND Eligibility to access the Victorian voluntary assisted dying (VAD) legislation requires that people have a prognosis of 6 months or less (or 12 months or less in the setting of a neurodegenerative diagnosis). Yet prognostic determination is frequently inaccurate and prompts clinician discomfort. Based on functional capacity and clinical and biochemical markers, prognostic tools have been developed to increase the accuracy of life expectancy predictions. AIMS This review of prognostic tools explores their accuracy to determine 6-month mortality in adults when treated under palliative care with a primary diagnosis of cancer (the diagnosis of a large proportion of people who are requesting VAD). METHODS A systematic search of the literature was performed on electronic databases Medline, Embase and Cinahl. RESULTS Limitations of prognostication identified include the following: (i) prognostic tools still provide uncertain prognoses; (ii) prognostic tools have greater accuracy predicting shorter prognoses, such as weeks to months, rather than 6 months; and (iii) functionality was often weighted significantly when calculating prognoses. Challenges of prognostication identified include the following: (i) the area under the curve (a value that represents how well a model can distinguish between two outcomes) cannot be directly interpreted clinically and (ii) difficulties exist related to determining appropriate thresholds of accuracy in this context. CONCLUSIONS Prognostication is a significant aspect of VAD, and the utility of the currently available prognostic tools appears limited but may prompt discussions about prognosis and alternative means (other than prognostic estimates) to identify those eligible for VAD.
Collapse
Affiliation(s)
- Lydia Treleaven
- Department of Medicine, The University of Melbourne, Melbourne, Victoria, Australia
| | - Paul Komesaroff
- School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Victoria, Australia
- Department of Medicine, Alfred Hospital, Melbourne, Victoria, Australia
| | - Camille La Brooy
- School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Victoria, Australia
| | - Ian Olver
- School of Psychology, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, South Australia, Australia
| | - Ian Kerridge
- Department of Medicine, Royal North Shore Hospital, St Leonards, New South Wales, Australia
- Sydney Health Ethics, The University of Sydney, Camperdown, New South Wales, Australia
| | - Jennifer Philip
- Department of Medicine, The University of Melbourne, Melbourne, Victoria, Australia
- Palliative Care Service, St Vincent's Hospital, Melbourne, Victoria, Australia
- Palliative Care Service, Peter MacCallum Cancer Centre, Royal Melbourne Hospital, Melbourne, Victoria, Australia
| |
Collapse
|
8
|
Llorián-Salvador Ó, Akhgar J, Pigorsch S, Borm K, Münch S, Bernhardt D, Rost B, Andrade-Navarro MA, Combs SE, Peeken JC. The importance of planning CT-based imaging features for machine learning-based prediction of pain response. Sci Rep 2023; 13:17427. [PMID: 37833283 PMCID: PMC10576053 DOI: 10.1038/s41598-023-43768-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023] Open
Abstract
Patients suffering from painful spinal bone metastases (PSBMs) often undergo palliative radiation therapy (RT), with an efficacy of approximately two thirds of patients. In this exploratory investigation, we assessed the effectiveness of machine learning (ML) models trained on radiomics, semantic and clinical features to estimate complete pain response. Gross tumour volumes (GTV) and clinical target volumes (CTV) of 261 PSBMs were segmented on planning computed tomography (CT) scans. Radiomics, semantic and clinical features were collected for all patients. Random forest (RFC) and support vector machine (SVM) classifiers were compared using repeated nested cross-validation. The best radiomics classifier was trained on CTV with an area under the receiver-operator curve (AUROC) of 0.62 ± 0.01 (RFC; 95% confidence interval). The semantic model achieved a comparable AUROC of 0.63 ± 0.01 (RFC), significantly below the clinical model (SVM, AUROC: 0.80 ± 0.01); and slightly lower than the spinal instability neoplastic score (SINS; LR, AUROC: 0.65 ± 0.01). A combined model did not improve performance (AUROC: 0,74 ± 0,01). We could demonstrate that radiomics and semantic analyses of planning CTs allowed for limited prediction of therapy response to palliative RT. ML predictions based on established clinical parameters achieved the best results.
Collapse
Affiliation(s)
- Óscar Llorián-Salvador
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
- Department for Bioinformatics and Computational Biology, Informatik 12, Technical University of Munich (TUM), Boltzmannstraße 3, 85748, Garching, Germany
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Joachim Akhgar
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
| | - Steffi Pigorsch
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
| | - Kai Borm
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
| | - Stefan Münch
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
| | - Denise Bernhardt
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
- Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany
- Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany
| | - Burkhard Rost
- Department for Bioinformatics and Computational Biology, Informatik 12, Technical University of Munich (TUM), Boltzmannstraße 3, 85748, Garching, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Stephanie E Combs
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
- Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany
- Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany
| | - Jan C Peeken
- Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany.
- Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany.
- Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany.
| |
Collapse
|
9
|
Feng Y, McGuire N, Walton A, Fox S, Papa A, Lakhani SR, McCart Reed AE. Predicting breast cancer-specific survival in metaplastic breast cancer patients using machine learning algorithms. J Pathol Inform 2023; 14:100329. [PMID: 37664452 PMCID: PMC10470383 DOI: 10.1016/j.jpi.2023.100329] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Metaplastic breast cancer (MpBC) is a rare and aggressive subtype of breast cancer, with data emerging on prognostic factors and survival prediction. This study aimed to develop machine learning models to predict breast cancer-specific survival (BCSS) in MpBC patients, utilizing a dataset of 160 patients with clinical, pathological, and biological variables. An in-depth variable selection process was carried out using gain ratio and correlation-based methods, resulting in 10 variables for model estimation. Five models (decision tree with bagging; logistic regression; multilayer perceptron; naïve Bayes; and, random forest algorithms) were evaluated using 10-fold cross-validation. Despite the constraints posed by the absence of therapeutic information, the random forest model exhibited the highest performance in predicting BCSS, with an ROC area of 0.808. This study emphasizes the potential of machine learning algorithms in predicting prognosis for complex and heterogeneous cancer subtypes using clinical datasets, and their potential to contribute to patient management. Further research that incorporates additional variables, such as treatment response, and more advanced machine learning techniques will likely enhance the predictive power of MpBC prognostic models.
Collapse
Affiliation(s)
- Yufan Feng
- UQ Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane 4029, Australia
| | - Natasha McGuire
- UQ Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane 4029, Australia
| | - Alexandra Walton
- UQ Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane 4029, Australia
- Pathology Queensland, The Royal Brisbane and Women’s Hospital, Brisbane 4029, Australia
| | | | - Stephen Fox
- Peter MacCallum Cancer Centre and University of Melbourne, Melbourne 3000, Australia
| | - Antonella Papa
- Monash Biomedicine Discovery Institute, Monash University, Melbourne 3800, Australia
| | - Sunil R. Lakhani
- UQ Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane 4029, Australia
- Pathology Queensland, The Royal Brisbane and Women’s Hospital, Brisbane 4029, Australia
| | - Amy E. McCart Reed
- UQ Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane 4029, Australia
| |
Collapse
|
10
|
Zarean Shahraki S, Azizmohammad Looha M, Mohammadi kazaj P, Aria M, Akbari A, Emami H, Asadi F, Akbari ME. Time-related survival prediction in molecular subtypes of breast cancer using time-to-event deep-learning-based models. Front Oncol 2023; 13:1147604. [PMID: 37342184 PMCID: PMC10277681 DOI: 10.3389/fonc.2023.1147604] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023] Open
Abstract
Background Breast cancer (BC) survival prediction can be a helpful tool for identifying important factors selecting the effective treatment reducing mortality rates. This study aims to predict the time-related survival probability of BC patients in different molecular subtypes over 30 years of follow-up. Materials and methods This study retrospectively analyzed 3580 patients diagnosed with invasive breast cancer (BC) from 1991 to 2021 in the Cancer Research Center of Shahid Beheshti University of Medical Science. The dataset contained 18 predictor variables and two dependent variables, which referred to the survival status of patients and the time patients survived from diagnosis. Feature importance was performed using the random forest algorithm to identify significant prognostic factors. Time-to-event deep-learning-based models, including Nnet-survival, DeepHit, DeepSurve, NMLTR and Cox-time, were developed using a grid search approach with all variables initially and then with only the most important variables selected from feature importance. The performance metrics used to determine the best-performing model were C-index and IBS. Additionally, the dataset was clustered based on molecular receptor status (i.e., luminal A, luminal B, HER2-enriched, and triple-negative), and the best-performing prediction model was used to estimate survival probability for each molecular subtype. Results The random forest method identified tumor state, age at diagnosis, and lymph node status as the best subset of variables for predicting breast cancer (BC) survival probabilities. All models yielded very close performance, with Nnet-survival (C-index=0.77, IBS=0.13) slightly higher using all 18 variables or the three most important variables. The results showed that the Luminal A had the highest predicted BC survival probabilities, while triple-negative and HER2-enriched had the lowest predicted survival probabilities over time. Additionally, the luminal B subtype followed a similar trend as luminal A for the first five years, after which the predicted survival probability decreased steadily in 10- and 15-year intervals. Conclusion This study provides valuable insight into the survival probability of patients based on their molecular receptor status, particularly for HER2-positive patients. This information can be used by healthcare providers to make informed decisions regarding the appropriateness of medical interventions for high-risk patients. Future clinical trials should further explore the response of different molecular subtypes to treatment in order to optimize the efficacy of breast cancer treatments.
Collapse
Affiliation(s)
- Saba Zarean Shahraki
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehdi Azizmohammad Looha
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Pooya Mohammadi kazaj
- Geographic Information Systems Department, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran, Iran
| | - Mehrad Aria
- Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tehran, Iran
| | - Atieh Akbari
- Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hassan Emami
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farkhondeh Asadi
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
11
|
Liu Y, Lyu X, Yang B, Fang Z, Hu D, Shi L, Wu B, Tian Y, Zhang E, Yang Y. Early Triage of Critically Ill Adult Patients With Mushroom Poisoning: Machine Learning Approach. JMIR Form Res 2023; 7:e44666. [PMID: 36943366 PMCID: PMC10131621 DOI: 10.2196/44666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Early triage of patients with mushroom poisoning is essential for administering precise treatment and reducing mortality. To our knowledge, there has been no established method to triage patients with mushroom poisoning based on clinical data. OBJECTIVE The purpose of this work was to construct a triage system to identify patients with mushroom poisoning based on clinical indicators using several machine learning approaches and to assess the prediction accuracy of these strategies. METHODS In all, 567 patients were collected from 5 primary care hospitals and facilities in Enshi, Hubei Province, China, and divided into 2 groups; 322 patients from 2 hospitals were used as the training cohort, and 245 patients from 3 hospitals were used as the test cohort. Four machine learning algorithms were used to construct the triage model for patients with mushroom poisoning. Performance was assessed using the area under the receiver operating characteristic curve (AUC), decision curve, sensitivity, specificity, and other representative statistics. Feature contributions were evaluated using Shapley additive explanations. RESULTS Among several machine learning algorithms, extreme gradient boosting (XGBoost) showed the best discriminative ability in 5-fold cross-validation (AUC=0.83, 95% CI 0.77-0.90) and the test set (AUC=0.90, 95% CI 0.83-0.96). In the test set, the XGBoost model had a sensitivity of 0.93 (95% CI 0.81-0.99) and a specificity of 0.79 (95% CI 0.73-0.85), whereas the physicians' assessment had a sensitivity of 0.86 (95% CI 0.72-0.95) and a specificity of 0.66 (95% CI 0.59-0.73). CONCLUSIONS The 14-factor XGBoost model for the early triage of mushroom poisoning can rapidly and accurately identify critically ill patients and will possibly serve as an important basis for the selection of treatment options and referral of patients, potentially reducing patient mortality and improving clinical outcomes.
Collapse
Affiliation(s)
- Yuxuan Liu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Xiaoguang Lyu
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Bo Yang
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Zhixiang Fang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Dejun Hu
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Lei Shi
- Department of Nephrology, Minda Hospital of Hubei Minzu University, Enshi, China
| | - Bisheng Wu
- Department of General Surgery, Renmin Hospital of Xianfeng, Enshi, China
| | - Yong Tian
- Department of Internal Medicine, Renmin Hospital of Laifeng, Enshi, China
| | - Enli Zhang
- Department of General Surgery, Central Hospital of Hefeng, Enshi, China
| | - YuanChao Yang
- Department of Gastroenterology, Renmin Hospital of Xuanen, Enshi, China
| |
Collapse
|
12
|
Kokabi M, Sui J, Gandotra N, Pournadali Khamseh A, Scharfe C, Javanmard M. Nucleic Acid Quantification by Multi-Frequency Impedance Cytometry and Machine Learning. BIOSENSORS 2023; 13:bios13030316. [PMID: 36979528 PMCID: PMC10046493 DOI: 10.3390/bios13030316] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/15/2023] [Accepted: 02/20/2023] [Indexed: 06/10/2023]
Abstract
Determining nucleic acid concentrations in a sample is an important step prior to proceeding with downstream analysis in molecular diagnostics. Given the need for testing DNA amounts and its purity in many samples, including in samples with very small input DNA, there is utility of novel machine learning approaches for accurate and high-throughput DNA quantification. Here, we demonstrated the ability of a neural network to predict DNA amounts coupled to paramagnetic beads. To this end, a custom-made microfluidic chip is applied to detect DNA molecules bound to beads by measuring the impedance peak response (IPR) at multiple frequencies. We leveraged electrical measurements including the frequency and imaginary and real parts of the peak intensity within a microfluidic channel as the input of deep learning models to predict DNA concentration. Specifically, 10 different deep learning architectures are examined. The results of the proposed regression model indicate that an R_Squared of 97% with a slope of 0.68 is achievable. Consequently, machine learning models can be a suitable, fast, and accurate method to measure nucleic acid concentration in a sample. The results presented in this study demonstrate the ability of the proposed neural network to use the information embedded in raw impedance data to predict the amount of DNA concentration.
Collapse
Affiliation(s)
- Mahtab Kokabi
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Jianye Sui
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Neeru Gandotra
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | | | - Curt Scharfe
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | - Mehdi Javanmard
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
13
|
Kotevski DP, Smee RI, Vajdic CM, Field M. Empirical comparison of routinely collected electronic health record data for head and neck cancer-specific survival in machine-learnt prognostic models. Head Neck 2023; 45:365-379. [PMID: 36369773 PMCID: PMC10100433 DOI: 10.1002/hed.27241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/21/2022] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Knowledge of the prognostic factors and performance of machine learning predictive models for 2-year cancer-specific survival (CSS) is limited in the head and neck cancer (HNC) population. METHODS Data from our facilities' oncology information system (OIS) collected for routine practice (OIS dataset, n = 430 patients) and research purposes (research dataset, n = 529 patients) were extracted on adults diagnosed between 2000 and 2017 with squamous cell carcinoma of the head and neck. RESULTS Machine learning demonstrated excellent performance (area under the curve, AUC) in the whole cohort (AUC = 0.97, research dataset), larynx cohort (AUC = 0.98, both datasets), and oropharynx cohort (AUC = 0.99, both datasets). Tumor site and T classification were identified as predictors of 2-year CSS in both datasets. Hypothyroidism and fitness for operation were further identified in the research dataset. CONCLUSIONS Datasets extracted from an OIS for routine clinical practice and research purposes demonstrated high utility for informing 2-year head and neck CSS.
Collapse
Affiliation(s)
- Damian P Kotevski
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Robert I Smee
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Department of Radiation Oncology, Tamworth Base Hospital, Tamworth, New South Wales, Australia
| | - Claire M Vajdic
- Centre for Big Data Research in Health, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Kirby Institute, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Matthew Field
- South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Sydney, New South Wales, Australia
| |
Collapse
|
14
|
Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Sci Rep 2023; 13:1370. [PMID: 36697455 PMCID: PMC9877029 DOI: 10.1038/s41598-023-28393-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 01/18/2023] [Indexed: 01/26/2023] Open
Abstract
The Cox proportional hazards model is commonly used in evaluating risk factors in cancer survival data. The model assumes an additive, linear relationship between the risk factors and the log hazard. However, this assumption may be too simplistic. Further, failure to take time-varying covariates into account, if present, may lower prediction accuracy. In this retrospective, population-based, prognostic study of data from patients diagnosed with cancer from 2008 to 2015 in Ontario, Canada, we applied machine learning-based time-to-event prediction methods and compared their predictive performance in two sets of analyses: (1) yearly-cohort-based time-invariant and (2) fully time-varying covariates analysis. Machine learning-based methods-gradient boosting model (gbm), random survival forest (rsf), elastic net (enet), lasso and ridge-were compared to the traditional Cox proportional hazards (coxph) model and the prior study which used the yearly-cohort-based time-invariant analysis. Using Harrell's C index as our primary measure, we found that using both machine learning techniques and incorporating time-dependent covariates can improve predictive performance. Gradient boosting machine showed the best performance on test data in both time-invariant and time-varying covariates analysis.
Collapse
|
15
|
Lin FPY, Salih OS, Scott N, Jameson MB, Epstein RJ. Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer. JCO Clin Cancer Inform 2022; 6:e2200064. [DOI: 10.1200/cci.22.00064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
PURPOSE Predicting short-term mortality in patients with advanced cancer remains challenging. Whether digitalized clinical text can be used to build models to enhance survival prediction in this population is unclear. MATERIALS AND METHODS We conducted a single-centered retrospective cohort study in patients with advanced solid tumors. Clinical correspondence authored by oncologists at the first patient encounter was extracted from the electronic medical records. Machine learning (ML) models were trained using narratives from the derivation cohort, before being tested on a temporal validation cohort at the same site. Performance was benchmarked against Eastern Cooperative Oncology Group performance status (PS), comparing ML models alone (comparison 1) or in combination with PS (comparison 2), assessed by areas under receiver operating characteristic curves (AUCs) for predicting vital status at 11 time points from 2 to 52 weeks. RESULTS ML models were built on the derivation cohort (4,791 patients from 2001 to April 2017) and tested on the validation cohort of 726 patients (May 2017-June 2019). In 441 patients (61%) where clinical narratives were available and PS was documented, ML models outperformed the predictivity of PS (mean AUC improvement, 0.039, P < .001, comparison 1). Inclusion of both clinical text and PS in ML models resulted in further improvement in prediction accuracy over PS with a mean AUC improvement of 0.050 ( P < .001, comparison 2); the AUC was > 0.80 at all assessed time points for models incorporating clinical text. Exploratory analysis of oncologist's narratives revealed recurring descriptors correlating with survival, including referral patterns, mobility, physical functions, and concomitant medications. CONCLUSION Applying ML to oncologists' narratives with or without including patient's PS significantly improved survival prediction to 12 months, suggesting the utility of clinical text in building prognostic support tools.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
- NHMRC Clinical Trials Centre, Sydney University, Camperdown, Australia
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
| | - Osama S.M. Salih
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Auckland City Hospital, Auckland, New Zealand
| | - Nina Scott
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Michael B. Jameson
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Richard J. Epstein
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
- Cancer Research Division, Garvan Institute of Medical Research, Sydney, Australia
- New Hope Cancer Centre, Beijing United Hospital, Beijing, China
| |
Collapse
|
16
|
Hu D, Zhang H, Li S, Duan H, Wu N, Lu X. An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients. BMC Med Inform Decis Mak 2022; 22:245. [PMID: 36123745 PMCID: PMC9487160 DOI: 10.1186/s12911-022-01960-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 08/02/2022] [Indexed: 11/12/2022] Open
Abstract
Background Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models. Methods In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model. Results We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks. Conclusions We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01960-0.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Huanyao Zhang
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Huilong Duan
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China.
| | - Xudong Lu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China. .,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China.
| |
Collapse
|
17
|
Yang YS, Kimm H, Jung KJ, Moon S, Lee S, Jee SH. Prediction of cancer survivors' mortality risk in Korea: a 25-year nationwide prospective cohort study. Epidemiol Health 2022; 44:e2022075. [PMID: 36108669 PMCID: PMC9943637 DOI: 10.4178/epih.e2022075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 09/13/2022] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES This study aimed to investigate the factors affecting cancer survival and develop a mortality prediction model for Korean cancer survivors. Our study identified lifestyle and mortality risk factors and attempted to determine whether health-promoting lifestyles affect mortality. METHODS Among the 1,637,287 participants in the Korean Cancer Prevention Study (KCPS) cohort, 200,834 cancer survivors who were alive after cancer diagnosis were analyzed. Discrimination and calibration for predicting the 10-year mortality risk were evaluated. A prediction model was derived using the Cox model coefficients, mean risk factor values, and mean mortality from the cancer survivors in the KCPS cohort. RESULTS During the 21.6-year follow-up, the all-cause mortality rates of cancer survivors were 57.2% and 39.4% in men and women, respectively. Men, older age, current smoking, and a history of diabetes were high-risk factors for mortality, while exercise habits and a family history of cancer were associated with reduced risk. The prediction model discrimination in the validation dataset for both KCPS all-cause mortality and KCPS cancer mortality was shown by C-statistics of 0.69 and 0.68, respectively. Based on the constructed prediction models, when we modified exercise status and smoking status, as modifiable factors, the cancer survivors' risk of mortality decreased linearly. CONCLUSIONS A mortality prediction model for cancer survivors was developed that may be helpful in supporting a healthy life. Lifestyle modifications in cancer survivors may affect their risk of mortality in the future.
Collapse
Affiliation(s)
- Yeun Soo Yang
- Department of Public Health, Yonsei University Graduate School, Seoul, Korea,Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Heejin Kimm
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Keum Ji Jung
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Seulji Moon
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Sunmi Lee
- Health Insurance Policy Research Institute, National Health Insurance Service, Wonju, Korea
| | - Sun Ha Jee
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| |
Collapse
|
18
|
Field M, I Thwaites D, Carolan M, Delaney GP, Lehmann J, Sykes J, Vinod S, Holloway L. Infrastructure platform for privacy-preserving distributed machine learning development of computer-assisted theragnostics in cancer. J Biomed Inform 2022; 134:104181. [PMID: 36055639 DOI: 10.1016/j.jbi.2022.104181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 04/29/2022] [Accepted: 08/20/2022] [Indexed: 11/26/2022]
Abstract
INTRODUCTION Emerging evidence suggests that data-driven support tools have found their way into clinical decision-making in a number of areas, including cancer care. Improving them and widening their scope of availability in various differing clinical scenarios, including for prognostic models derived from retrospective data, requires co-ordinated data sharing between clinical centres, secondary analyses of large multi-institutional clinical trial data, or distributed (federated) learning infrastructures. A systematic approach to utilizing routinely collected data across cancer care clinics remains a significant challenge due to privacy, administrative and political barriers. METHODS An information technology infrastructure and web service software was developed and implemented which uses machine learning to construct clinical decision support systems in a privacy-preserving manner across datasets geographically distributed in different hospitals. The infrastructure was deployed in a network of Australian hospitals. A harmonized, international ontology-linked, set of lung cancer databases were built with the routine clinical and imaging data at each centre. The infrastructure was demonstrated with the development of logistic regression models to predict major cardiovascular events following radiation therapy. RESULTS The infrastructure implemented forms the basis of the Australian computer-assisted theragnostics (AusCAT) network for radiation oncology data extraction, reporting and distributed learning. Four radiation oncology departments (across seven hospitals) in New South Wales (NSW) participated in this demonstration study. Infrastructure was deployed at each centre and used to develop a model predicting for cardiovascular admission within a year of receiving curative radiotherapy for non-small cell lung cancer. A total of 10417 lung cancer patients were identified with 802 being eligible for the model. Twenty features were chosen for analysis from the clinical record and linked registries. After selection, 8 features were included and a logistic regression model achieved an area under the receiver operating characteristic (AUROC) curve of 0.70 and C-index of 0.65 on out-of-sample data. CONCLUSION The infrastructure developed was demonstrated to be usable in practice between clinical centres to harmonize routinely collected oncology data and develop models with federated learning. It provides a promising approach to enable further research studies in radiation oncology using real world clinical data.
Collapse
Affiliation(s)
- Matthew Field
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia.
| | - David I Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia
| | - Martin Carolan
- Illawarra Cancer Care Centre, Wollongong, NSW, Australia
| | - Geoff P Delaney
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Joerg Lehmann
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia; Department of Radiation Oncology, Calvary Mater Newcastle, NSW, Australia
| | - Jonathan Sykes
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia; Blacktown Haematology and Oncology Cancer Care Centre, Blacktown Hospital, Blacktown, NSW, Australia; Crown Princess Mary Cancer Centre, Westmead Hospital, Westmead, NSW, Australia
| | - Shalini Vinod
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Lois Holloway
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia; Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia
| |
Collapse
|
19
|
Lei H, Li X, Ma W, Hong N, Liu C, Zhou W, Zhou H, Gong M, Wang Y, Wang G, Wu Y. Comparison of nomogram and machine-learning methods for predicting the survival of non-small cell lung cancer patients. CANCER INNOVATION 2022; 1:135-145. [PMID: 38090651 PMCID: PMC10686174 DOI: 10.1002/cai2.24] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 05/28/2022] [Accepted: 06/29/2022] [Indexed: 10/15/2024]
Abstract
BACKGROUND Most patients with advanced non-small cell lung cancer (NSCLC) have a poor prognosis. Predicting overall survival using clinical data would benefit cancer patients by allowing providers to design an optimum treatment plan. We compared the performance of nomograms with machine-learning models at predicting the overall survival of NSCLC patients. This comparison benefits the development and selection of models during the clinical decision-making process for NSCLC patients. METHODS Multiple machine-learning models were used in a retrospective cohort of 6586 patients. First, we modeled and validated a nomogram to predict the overall survival of NSCLC patients. Subsequently, five machine-learning models (logistic regression, random forest, XGBoost, decision tree, and light gradient boosting machine) were used to predict survival status. Next, we evaluated the performance of the models. Finally, the machine-learning model with the highest accuracy was chosen for comparison with the nomogram at predicting survival status by observing a novel performance measure: time-dependent prediction accuracy. RESULTS Among the five machine-learning models, the accuracy of random forest model outperformed the others. Compared with the nomogram for time-dependent prediction accuracy with a follow-up time ranging from 12 to 60 months, the prediction accuracies of both the nomogram and machine-learning models changed as time varied. The nomogram reached a maximum prediction accuracy of 0.85 in the 60th month, and the random forest algorithm reached a maximum prediction accuracy of 0.74 in the 13th month. CONCLUSIONS Overall, the nomogram provided more reliable prognostic assessments of NSCLC patients than machine-learning models over our observation period. Although machine-learning methods have been widely adopted for predicting clinical prognoses in recent studies, the conventional nomogram was competitive. In real clinical applications, a comprehensive model that combines these two methods may demonstrate superior capabilities.
Collapse
Affiliation(s)
- Haike Lei
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| | - Xiaosheng Li
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| | - Wuren Ma
- Digital Health China Technologies, Co., Ltd.BeijingChina
| | - Na Hong
- Digital Health China Technologies, Co., Ltd.BeijingChina
| | - Chun Liu
- Digital Health China Technologies, Co., Ltd.BeijingChina
| | - Wei Zhou
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| | - Hong Zhou
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| | - Mengchun Gong
- Digital Health China Technologies, Co., Ltd.BeijingChina
| | - Ying Wang
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| | - Guixue Wang
- MOE Key Lab for Biorheological Science and Technology, State and Local Joint Engineering Laboratory for Vascular ImplantsCollege of Bioengineering Chongqing UniversityChongqingChina
| | - Yongzhong Wu
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized TreatmentChongqing University Cancer HospitalChongqingChina
| |
Collapse
|
20
|
Kananura RM. Machine learning predictive modelling for identification of predictors of acute respiratory infection and diarrhoea in Uganda's rural and urban settings. PLOS GLOBAL PUBLIC HEALTH 2022; 2:e0000430. [PMID: 36962243 PMCID: PMC10021828 DOI: 10.1371/journal.pgph.0000430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 04/07/2022] [Indexed: 11/19/2022]
Abstract
Despite the widely known preventive interventions, the dyad of acute respiratory infections (ARI) and diarrhoea remain among the top global causes of mortality in under- 5 years. Studies on child morbidity have enormously applied "traditional" statistical techniques that have limitations in handling high dimension data, which leads to the exclusion of some variables. Machine Learning (ML) models appear to perform better on high dimension data (dataset with the number of features p (usually correlated) larger than the number of observations N). Using Uganda's 2006-2016 DHS pooled data on children aged 6-59 months, I applied ML techniques to identify rural-urban differentials in the predictors of child's diarrhoea and ARI. I also used ML to identify other omitted variables in the current child morbidity frameworks. The predictors were grouped into four categories: child characteristics, maternal characteristics, household characteristics and immunisation. I used 90% of the datasets as a training sets (dataset used to fit (train) a prediction model), which were tested or validated (dataset (pseudo new) used for evaluating the performance of the model on a new dataset) on 10% and 30% datasets. The measure of prediction was based on a 10-fold cross-validation (resampling technique). The gradient-boosted machine (ML technique) was the best-selected model for the identification of the predictors of ARI (Accuracy: 100% -rural and 100%-urban) and diarrhoea (Accuracy: 70%-rural and 100%-urban). These factors relate to the household's structure and composition, which is characterised by poor hygiene and sanitation and poor household environments that make children more suspectable of developing these diseases; maternal socio-economic factors such as education, occupation, and fertility (birth order); individual risk factors such as child age, birth weight and nutritional status; and protective interventions (immunisation). The study findings confirm the notion that ARI and diarrhoea risk factors overlap. The results highlight the need for a holistic approach with multisectoral emphasis in addressing the occurrence of ARI and diarrhoea among children. In particular, the results provide an insight into the importance of implementing interventions that are responsive to the unique structure and composition of the household. Finally, alongside traditional models, machine learning could be applied in generating research hypotheses and providing insight into the selection of key variables that should be considered in the model.
Collapse
Affiliation(s)
- Rornald Muhumuza Kananura
- London School of Economics and Political Science, Department of International Development, London, United Kingdom
- Makerere University School of Public Health, Department of Health Policy Planning and Management, Kampala, Uganda
| |
Collapse
|
21
|
Yang X, Mu D, Peng H, Li H, Wang Y, Wang P, Wang Y, Han S. Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review. JMIR Med Inform 2022; 10:e33799. [PMID: 35442195 PMCID: PMC9069295 DOI: 10.2196/33799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 01/24/2022] [Accepted: 03/14/2022] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care. OBJECTIVE The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care. METHODS Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes. RESULTS Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases. CONCLUSIONS Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists.
Collapse
Affiliation(s)
- Xinyu Yang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Dongmei Mu
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hao Peng
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hua Li
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ying Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ping Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Yue Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Siqi Han
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| |
Collapse
|
22
|
Machine learning outperforms clinical experts in classification of hip fractures. Sci Rep 2022; 12:2058. [PMID: 35136091 PMCID: PMC8825848 DOI: 10.1038/s41598-022-06018-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 01/18/2022] [Indexed: 11/12/2022] Open
Abstract
Hip fractures are a major cause of morbidity and mortality in the elderly, and incur high health and social care costs. Given projected population ageing, the number of incident hip fractures is predicted to increase globally. As fracture classification strongly determines the chosen surgical treatment, differences in fracture classification influence patient outcomes and treatment costs. We aimed to create a machine learning method for identifying and classifying hip fractures, and to compare its performance to experienced human observers. We used 3659 hip radiographs, classified by at least two expert clinicians. The machine learning method was able to classify hip fractures with 19% greater accuracy than humans, achieving overall accuracy of 92%.
Collapse
|
23
|
Chicco D, Oneto L. Computational intelligence identifies alkaline phosphatase (ALP), alpha-fetoprotein (AFP), and hemoglobin levels as most predictive survival factors for hepatocellular carcinoma. Health Informatics J 2021; 27:1460458220984205. [PMID: 33504243 DOI: 10.1177/1460458220984205] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Liver cancer kills approximately 800 thousand people annually worldwide, and its most common subtype is hepatocellular carcinoma (HCC), which usually affects people with cirrhosis. Predicting survival of patients with HCC remains an important challenge, especially because technologies needed for this scope are not available in all hospitals. In this context, machine learning applied to medical records can be a fast, low-cost tool to predict survival and detect the most predictive features from health records. In this study, we analyzed medical data of 165 patients with HCC: we employed computational intelligence to predict their survival, and to detect the most relevant clinical factors able to discriminate survived from deceased cases. Afterwards, we compared our data mining results with those obtained through statistical tests and scientific literature findings. Our analysis revealed that blood levels of alkaline-phosphatase (ALP), alpha-fetoprotein (AFP), and hemoglobin are the most effective prognostic factors in this dataset. We found literature supporting association of these three factors with hepatoma, even though only AFP has been used in a prognostic index. Our results suggest that ALP and hemoglobin can be candidates for future HCC prognostic indexes, and that physicians could focus on ALP, AFP, and hemoglobin when studying HCC records.
Collapse
Affiliation(s)
| | - Luca Oneto
- Università di Genova, Italy; ZenaByte Srl
| |
Collapse
|
24
|
Karadaghy OA, Shew M, New J, Bur AM. Development and Assessment of a Machine Learning Model to Help Predict Survival Among Patients With Oral Squamous Cell Carcinoma. JAMA Otolaryngol Head Neck Surg 2021; 145:1115-1120. [PMID: 31045212 DOI: 10.1001/jamaoto.2019.0981] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Importance Predicting survival of oral squamous cell carcinoma through the use of prediction modeling has been underused, and the development of prediction models would augment clinicians' ability to provide absolute risk estimates for individual patients. Objectives To develop a prediction model using machine learning for 5-year overall survival among patients with oral squamous cell carcinoma and compare this model with a prediction model created from the TNM (Tumor, Node, Metastasis) clinical and pathologic stage. Design, Setting, and Participants A retrospective cohort study was conducted of 33 065 patients with oral squamous cell carcinoma from the National Cancer Data Base between January 1, 2004, and December 31, 2011. Patients were excluded if the treatment was considered palliative, staging demonstrated T0 or Tis, or survival or staging data were missing. Patient, tumor, treatment, and outcome information were obtained from the National Cancer Data Base. The data were split into a distribution of 80% for training and 20% for testing. The model was created using 2-class decision forest architecture. Permutation feature importance scores were used to determine the variables that were used in the model's prediction and their order of significance. Statistical analysis was conducted from August 1, 2018, to January 10, 2019. Main Outcomes and Measures Ability to predict 5-year overall survival assessed through area under the curve, accuracy, precision, and recall. Results Among the 33 065 patients in the study, the mean (SD) age was 64.6 (14.0) years, 19 791 were men (59.9%), 13 274 were women (40.1%), and 29 783 (90.1%) were white. At 60 months, there were 16 745 deaths (50.6%). The median time of follow-up was 56.8 months (range, 0-155.6 months). Age, pathologic T stage, positive margins at the time of surgery, lymph node size, and institutional identification were identified among the most significant variables. The calculated area under the curve for this machine learning model was 0.80 (95% CI, 0.79-0.81), accuracy was 71%, precision was 71%, and recall was 68%. In comparison, the calculated area under the curve of the TNM staging system was 0.68 (95% CI, 0.67-0.70), accuracy was 65%, precision was 69%, and recall was 52%. Conclusions and Relevance Using machine learning algorithms, a prediction model was created based on patient social, demographic, clinical, and pathologic features. The developed prediction model proved to be better than a prediction model that exclusively used TNM pathologic and clinical stage according to all performance metrics. This study highlights the role that machine learning may play in individual patient risk estimation in the era of big data.
Collapse
Affiliation(s)
- Omar A Karadaghy
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City
| | - Matthew Shew
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City
| | - Jacob New
- University of Kansas Medical Center, School of Medicine, Kansas City
| | - Andrés M Bur
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City
| |
Collapse
|
25
|
Prediction of Incident Cancers in the Lifelines Population-Based Cohort. Cancers (Basel) 2021; 13:cancers13092133. [PMID: 33925159 PMCID: PMC8125183 DOI: 10.3390/cancers13092133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 04/23/2021] [Indexed: 12/23/2022] Open
Abstract
Simple Summary The accurate prediction of incident cancers could be relevant to understanding and reducing cancer incidence. The aim of this study was to develop machine learning (ML) models that could predict an incident diagnosis of cancer. Data were available for 116,188 cancer-free participants and 4232 incident cancer cases. The main outcome was an incident cancer (excluding skin cancer) during follow-up assessment in a population-based cohort. The performance of three ML algorithms was evaluated using supervised binary classification to identify incident cancers among participants. An overall area under the receiver operator curve (AUC) < 0.75 was obtained; the highest AUC was for prostate cancer AUC > 0.80. Linear and non-linear ML algorithms including socioeconomic, lifestyle, and clinical variables produced a moderate predictive performance of incident cancers in the Lifelines cohort. Abstract Cancer incidence is rising, and accurate prediction of incident cancers could be relevant to understanding and reducing cancer incidence. The aim of this study was to develop machine learning (ML) models that could predict an incident diagnosis of cancer. Participants without any history of cancer within the Lifelines population-based cohort were followed for a median of 7 years. Data were available for 116,188 cancer-free participants and 4232 incident cancer cases. At baseline, socioeconomic, lifestyle, and clinical variables were assessed. The main outcome was an incident cancer during follow-up (excluding skin cancer), based on linkage with the national pathology registry. The performance of three ML algorithms was evaluated using supervised binary classification to identify incident cancers among participants. Elastic net regularization and Gini index were used for variables selection. An overall area under the receiver operator curve (AUC) <0.75 was obtained, the highest AUC value was for prostate cancer (random forest AUC = 0.82 (95% CI 0.77–0.87), logistic regression AUC = 0.81 (95% CI 0.76–0.86), and support vector machines AUC = 0.83 (95% CI 0.78–0.88), respectively); age was the most important predictor in these models. Linear and non-linear ML algorithms including socioeconomic, lifestyle, and clinical variables produced a moderate predictive performance of incident cancers in the Lifelines cohort.
Collapse
|
26
|
Veiga RV, Schuler-Faccini L, França GVA, Andrade RFS, Teixeira MG, Costa LC, Paixão ES, Costa MDCN, Barreto ML, Oliveira JF, Oliveira WK, Cardim LL, Rodrigues MS. Classification algorithm for congenital Zika Syndrome: characterizations, diagnosis and validation. Sci Rep 2021; 11:6770. [PMID: 33762667 PMCID: PMC7990918 DOI: 10.1038/s41598-021-86361-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 03/09/2021] [Indexed: 11/09/2022] Open
Abstract
Zika virus was responsible for the microcephaly epidemic in Brazil which began in October 2015 and brought great challenges to the scientific community and health professionals in terms of diagnosis and classification. Due to the difficulties in correctly identifying Zika cases, it is necessary to develop an automatic procedure to classify the probability of a CZS case from the clinical data. This work presents a machine learning algorithm capable of achieving this from structured and unstructured available data. The proposed algorithm reached 83% accuracy with textual information in medical records and image reports and 76% accuracy in classifying data without textual information. Therefore, the proposed algorithm has the potential to classify CZS cases in order to clarify the real effects of this epidemic, as well as to contribute to health surveillance in monitoring possible future epidemics.
Collapse
Affiliation(s)
- Rafael V Veiga
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil. .,Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia, Brazil.
| | | | | | - Roberto F S Andrade
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil.,Instituto de Física, Universidade Federal da Bahia, Salvador, Bahia, Brazil
| | - Maria Glória Teixeira
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil.,Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Bahia, Brazil
| | - Larissa C Costa
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil
| | - Enny S Paixão
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil.,London School of Hygiene and Tropical Medicine, London, England, United Kingdom
| | - Maria da Conceição N Costa
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil.,Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Bahia, Brazil
| | - Maurício L Barreto
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil
| | - Juliane F Oliveira
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil.,Department of Mathematics, Centre of Mathematics of the University of Porto (CMUP), Porto, Portugal
| | - Wanderson K Oliveira
- Hospital das Forças Armadas, Ministério da Defesa, Distrito Federal, Brasília, Brazil
| | - Luciana L Cardim
- Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Bahia, Brazil
| | | |
Collapse
|
27
|
Predicting Survival in Veterans with Follicular Lymphoma Using Structured Electronic Health Record Information and Machine Learning. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18052679. [PMID: 33799968 PMCID: PMC7967359 DOI: 10.3390/ijerph18052679] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 11/26/2022]
Abstract
The most accurate prognostic approach for follicular lymphoma (FL), progression of disease at 24 months (POD24), requires two years’ observation after initiating first-line therapy (L1) to predict outcomes. We applied machine learning to structured electronic health record (EHR) data to predict individual survival at L1 initiation. We grouped 523 observations and 1933 variables from a nationwide cohort of FL patients diagnosed 2006–2014 in the Veterans Health Administration into traditionally used prognostic variables (“curated”), commonly measured labs (“labs”), and International Classification of Diseases diagnostic codes (“ICD”) sets. We compared performance of random survival forests (RSF) vs. traditional Cox model using four datasets: curated, curated + labs, curated + ICD, and curated + ICD + labs, also using Cox on curated + POD24. We evaluated variable importance and partial dependence plots with area under the receiver operating characteristic curve (AUC). RSF with curated + labs performed best, with mean AUC 0.73 (95% CI: 0.71–0.75). It approximated, but did not surpass, Cox with POD24 (mean AUC 0.74 [95% CI: 0.71–0.77]). RSF using EHR data achieved better performance than traditional prognostic variables, setting the foundation for the incorporation of our algorithm into the EHR. It also provides for possible future scenarios in which clinicians could be provided an EHR-based tool which approximates the predictive ability of the most accurate known indicator, using information available 24 months earlier.
Collapse
|
28
|
Hossain ME, Khan A, Moni MA, Uddin S. Use of Electronic Health Data for Disease Prediction: A Comprehensive Literature Review. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:745-758. [PMID: 31478869 DOI: 10.1109/tcbb.2019.2937862] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Disease prediction has the potential to benefit stakeholders such as the government and health insurance companies. It can identify patients at risk of disease or health conditions. Clinicians can then take appropriate measures to avoid or minimize the risk and in turn, improve quality of care and avoid potential hospital admissions. Due to the recent advancement of tools and techniques for data analytics, disease risk prediction can leverage large amounts of semantic information, such as demographics, clinical diagnosis and measurements, health behaviours, laboratory results, prescriptions and care utilisation. In this regard, electronic health data can be a potential choice for developing disease prediction models. A significant number of such disease prediction models have been proposed in the literature over time utilizing large-scale electronic health databases, different methods, and healthcare variables. The goal of this comprehensive literature review was to discuss different risk prediction models that have been proposed based on electronic health data. Search terms were designed to find relevant research articles that utilized electronic health data to predict disease risks. Online scholarly databases were searched to retrieve results, which were then reviewed and compared in terms of the method used, disease type, and prediction accuracy. This paper provides a comprehensive review of the use of electronic health data for risk prediction models. A comparison of the results from different techniques for three frequently modelled diseases using electronic health data was also discussed in this study. In addition, the advantages and disadvantages of different risk prediction models, as well as their performance, were presented. Electronic health data have been widely used for disease prediction. A few modelling approaches show very high accuracy in predicting different diseases using such data. These modelling approaches have been used to inform the clinical decision process to achieve better outcomes.
Collapse
|
29
|
Abstract
COVID-19 is a disease currently ravaging the world, bringing unprecedented health and economic challenges to several nations. There are presently close to five million reported cases in over 200 countries with fatalities numbering over 300,000 persons. This study presents machine-learning models for the prediction and visualization of the significant factors that determine the survivability of COVID-19 patients. This study develops prediction models using a decision tree, logistic regression (LR), gradient boosting, and LR algorithms to identify the significant factors and predict the survivability of COVID-19 patients. The results of the simulation showed that the LR model had the lowest prediction accuracy. The other three showed over 95% correct accuracy and indicated that the essential factors in determining patients' survivability were underlying health conditions and age. The findings of this study agreed with the medical claims that patients with underlying health challenges and those advanced in age are liable to have complications; hence, providing a research-based credence to this belief. This proposed model thus serves as a decision support system for the management of COVID-19 patients, as well as predicts a patient’s chances of survival at the first presentation at the hospitals.
Collapse
|
30
|
Fridrichova I, Kalinkova L, Karhanek M, Smolkova B, Machalekova K, Wachsmannova L, Nikolaieva N, Kajo K. miR-497-5p Decreased Expression Associated with High-Risk Endometrial Cancer. Int J Mol Sci 2020; 22:E127. [PMID: 33374439 PMCID: PMC7795869 DOI: 10.3390/ijms22010127] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/16/2020] [Accepted: 12/19/2020] [Indexed: 12/12/2022] Open
Abstract
The current guidelines for diagnosis, prognosis, and treatment of endometrial cancer (EC), based on clinicopathological factors, are insufficient for numerous reasons; therefore, we investigated the relevance of miRNA expression profiles for the discrimination of different EC subtypes. Among the miRNAs previously predicted to allow distinguishing of endometrioid ECs (EECs) according to different grades (G) and from serous subtypes (SECs), we verified the utility of miR-497-5p. In ECs, we observed downregulated miR-497-5p levels that were significantly decreased in SECs, clear cell carcinomas (CCCs), and carcinosarcomas (CaSas) compared to EECs, thereby distinguishing EEC from SEC and rare EC subtypes. Significantly reduced miR-497-5p expression was found in high-grade ECs (EEC G3, SEC, CaSa, and CCC) compared to low-grade carcinomas (EEC G1 and mucinous carcinoma) and ECs classified as being in advanced FIGO (International Federation of Gynecology and Obstetrics) stages, that is, with loco-regional and distant spread compared to cancers located only in the uterus. Based on immunohistochemical features, lower miR-497-5p levels were observed in hormone-receptor-negative, p53-positive, and highly Ki-67-expressing ECs. Using a machine learning method, we showed that consideration of miR-497-5p expression, in addition to the traditional clinical and histopathologic parameters, slightly improves the prediction accuracy of EC diagnosis. Our results demonstrate that changes in miR-497-5p expression influence endometrial tumorigenesis and its evaluation may contribute to more precise diagnoses.
Collapse
Affiliation(s)
- Ivana Fridrichova
- Department of Genetics, Cancer Research Institute, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (L.K.); (L.W.); (N.N.)
| | - Lenka Kalinkova
- Department of Genetics, Cancer Research Institute, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (L.K.); (L.W.); (N.N.)
| | - Miloslav Karhanek
- Laboratory of Bioinformatics, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia;
| | - Bozena Smolkova
- Department of Molecular Oncology, Cancer Research Institute, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia;
| | - Katarina Machalekova
- Department of Pathology, St. Elisabeth Cancer Institute, 81250 Bratislava, Slovakia; (K.M.); (K.K.)
| | - Lenka Wachsmannova
- Department of Genetics, Cancer Research Institute, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (L.K.); (L.W.); (N.N.)
| | - Nataliia Nikolaieva
- Department of Genetics, Cancer Research Institute, Biomedical Research Center of Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (L.K.); (L.W.); (N.N.)
| | - Karol Kajo
- Department of Pathology, St. Elisabeth Cancer Institute, 81250 Bratislava, Slovakia; (K.M.); (K.K.)
| |
Collapse
|
31
|
Alkhadar H, Macluskey M, White S, Ellis I, Gardner A. Comparison of machine learning algorithms for the prediction of five-year survival in oral squamous cell carcinoma. J Oral Pathol Med 2020; 50:378-384. [PMID: 33220109 DOI: 10.1111/jop.13135] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 10/15/2020] [Accepted: 10/25/2020] [Indexed: 12/15/2022]
Abstract
BACKGROUND/AIM Machine learning analyses of cancer outcomes for oral cancer remain sparse compared to other types of cancer like breast or lung. The purpose of the present study was to compare the performance of machine learning algorithms in the prediction of global, recurrence-free five-year survival in oral cancer patients based on clinical and histopathological data. METHODS Data were gathered retrospectively from 416 patients with oral squamous cell carcinoma. The data set was divided into training and test data set (75:25 split). Training performance of five machine learning algorithms (Logistic regression, K-nearest neighbours, Naïve Bayes, Decision tree and Random forest classifiers) for prediction was assessed by k-fold cross-validation. Variables used in the machine learning models were age, sex, pain symptoms, grade of lesion, lymphovascular invasion, extracapsular extension, perineural invasion, bone invasion and type of treatment. Variable importance was assessed and model performance on the testing data was assessed using receiver operating characteristic curves, accuracy, sensitivity, specificity and F1 score. RESULTS The best performing model was the Decision tree classifier, followed by the Logistic Regression model (accuracy 76% and 60%, respectively). The Naïve Bayes model did not display any predictive value with 0% specificity. CONCLUSIONS Machine learning presents a promising and accessible toolset for improving prediction of oral cancer outcomes. Our findings add to a growing body of evidence that Decision tree models are useful in models in predicting OSCC outcomes. We would advise that future similar studies explore a variety of machine learning models including Logistic regression to help evaluate model performance.
Collapse
Affiliation(s)
- Huda Alkhadar
- Unit of Cell and Molecular Biology, Dundee Dental School, University of Dundee, Dundee, UK
| | - Michaelina Macluskey
- Department of Oral Surgery, Medicine and Pathology, Dundee Dental School, University of Dundee, Dundee, UK
| | - Sharon White
- Department of Oral Surgery, Medicine and Pathology, Dundee Dental School, University of Dundee, Dundee, UK
| | - Ian Ellis
- Unit of Cell and Molecular Biology, Dundee Dental School, University of Dundee, Dundee, UK
| | - Alexander Gardner
- Department of Restorative Dentistry, Dundee Dental School, University of Dundee, Dundee, UK
| |
Collapse
|
32
|
Development of Machine Learning Model to Predict the 5-Year Risk of Starting Biologic Agents in Patients with Inflammatory Bowel Disease (IBD): K-CDM Network Study. J Clin Med 2020; 9:jcm9113427. [PMID: 33114505 PMCID: PMC7693158 DOI: 10.3390/jcm9113427] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/14/2020] [Accepted: 10/21/2020] [Indexed: 12/17/2022] Open
Abstract
Background: The incidence and global burden of inflammatory bowel disease (IBD) have steadily increased in the past few decades. Improved methods to stratify risk and predict disease-related outcomes are required for IBD. Aim: The aim of this study was to develop and validate a machine learning (ML) model to predict the 5-year risk of starting biologic agents in IBD patients. Method: We applied an ML method to the database of the Korean common data model (K-CDM) network, a data sharing consortium of tertiary centers in Korea, to develop a model to predict the 5-year risk of starting biologic agents in IBD patients. The records analyzed were those of patients diagnosed with IBD between January 2006 and June 2017 at Gil Medical Center (GMC; n = 1299) or present in the K-CDM network (n = 3286). The ML algorithm was developed to predict 5- year risk of starting biologic agents in IBD patients using data from GMC and externally validated with the K-CDM network database. Result: The ML model for prediction of IBD-related outcomes at 5 years after diagnosis yielded an area under the curve (AUC) of 0.86 (95% CI: 0.82–0.92), in an internal validation study carried out at GMC. The model performed consistently across a range of other datasets, including that of the K-CDM network (AUC = 0.81; 95% CI: 0.80–0.85), in an external validation study. Conclusion: The ML-based prediction model can be used to identify IBD-related outcomes in patients at risk, enabling physicians to perform close follow-up based on the patient’s risk level, estimated through the ML algorithm.
Collapse
|
33
|
|
34
|
Laios A, Gryparis A, DeJong D, Hutson R, Theophilou G, Leach C. Predicting complete cytoreduction for advanced ovarian cancer patients using nearest-neighbor models. J Ovarian Res 2020; 13:117. [PMID: 32993745 PMCID: PMC7526140 DOI: 10.1186/s13048-020-00700-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/04/2020] [Indexed: 12/12/2022] Open
Abstract
Background The foundation of modern ovarian cancer care is cytoreductive surgery to remove all macroscopic disease (R0). Identification of R0 resection patients may help individualise treatment. Machine learning and AI have been shown to be effective systems for classification and prediction. For a disease as heterogenous as ovarian cancer, they could potentially outperform conventional predictive algorithms for routine clinical use. We investigated the performance of an AI system, the k-nearest neighbor (k-NN) classifier, to predict R0, comparing it with logistic regression. Patients diagnosed with advanced stage, high grade serous ovarian, tubal and primary peritoneal cancer, undergoing surgical cytoreduction from 2015 to 2019, was selected from the ovarian database. Performance variables included age, BMI, Charlson Comorbidity Index, timing of surgery, surgical complexity and disease score. The k-NN algorithm classified R0 vs non-R0 patients using 3–20 nearest neighbors. Prediction accuracy was estimated as percentage of observations in the training set correctly classified. Results 154 patients were identified, with mean age of 64.4 + 10.5 yrs., BMI of 27.2 + 5.8 and mean SCS of 3 + 1 (1–8). Complete and optimal cytoreduction was achieved in 62 and 88% patients. The mean predictive accuracy was 66%. R0 resection prediction of true negatives was as high as 90% using k = 20 neighbors. Conclusions The k-NN algorithm is a promising and versatile tool for R0 resection prediction. It slightly outperforms logistic regression and is expected to improve accuracy with data expansion.
Collapse
Affiliation(s)
- Alexandros Laios
- Department of Gynaecological Oncology, St James's University Hospital, Leeds Teaching Hospitals, Leeds, LS9 7TF, UK.
| | - Alexandros Gryparis
- Unit of Endocrinology, Diabetes Mellitus and Metabolism, Aretaion Hospital, National and Kapodistrian University of Athens School of Medicine, Athens, Greece
| | - Diederick DeJong
- Department of Gynaecological Oncology, St James's University Hospital, Leeds Teaching Hospitals, Leeds, LS9 7TF, UK
| | - Richard Hutson
- Department of Gynaecological Oncology, St James's University Hospital, Leeds Teaching Hospitals, Leeds, LS9 7TF, UK
| | - Georgios Theophilou
- Department of Gynaecological Oncology, St James's University Hospital, Leeds Teaching Hospitals, Leeds, LS9 7TF, UK
| | - Chris Leach
- School of Human & Health Sciences, University of Huddersfield, Huddersfield, HD1 3DH, UK.,Department of Psychology Services, South West Yorkshire Mental Health NHS Foundation Trust, The Laura Mitchell Health & Wellbeing Centre, Halifax, HX1 1YR, UK
| |
Collapse
|
35
|
Sim JA, Kim YA, Kim JH, Lee JM, Kim MS, Shim YM, Zo JI, Yun YH. The major effects of health-related quality of life on 5-year survival prediction among lung cancer survivors: applications of machine learning. Sci Rep 2020; 10:10693. [PMID: 32612283 PMCID: PMC7329866 DOI: 10.1038/s41598-020-67604-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 06/01/2020] [Indexed: 01/11/2023] Open
Abstract
The primary goal of this study was to evaluate the major roles of health-related quality of life (HRQOL) in a 5-year lung cancer survival prediction model using machine learning techniques (MLTs). The predictive performances of the models were compared with data from 809 survivors who underwent lung cancer surgery. Each of the modeling technique was applied to two feature sets: feature set 1 included clinical and sociodemographic variables, and feature set 2 added HRQOL factors to the variables from feature set 1. One of each developed prediction model was trained with the decision tree (DT), logistic regression (LR), bagging, random forest (RF), and adaptive boosting (AdaBoost) methods, and then, the best algorithm for modeling was determined. The models' performances were compared using fivefold cross-validation. For feature set 1, there were no significant differences in model accuracies (ranging from 0.647 to 0.713). Among the models in feature set 2, the AdaBoost and RF models outperformed the other prognostic models [area under the curve (AUC) = 0.850, 0.898, 0.981, 0.966, and 0.949 for the DT, LR, bagging, RF and AdaBoost models, respectively] in the test set. Overall, 5-year disease-free lung cancer survival prediction models with MLTs that included HRQOL as well as clinical variables improved predictive performance.
Collapse
Affiliation(s)
- Jin-Ah Sim
- Department of Biomedical Science, Seoul National University College of Medicine, Seoul, Korea
| | - Young Ae Kim
- National Cancer Control Institute, National Cancer Center, Goyang, Korea
| | - Ju Han Kim
- Department of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea
| | - Jong Mog Lee
- Center for Lung Cancer, National Cancer Center, Goyang, Korea
| | - Moon Soo Kim
- Center for Lung Cancer, National Cancer Center, Goyang, Korea
| | - Young Mog Shim
- Lung and Esophageal Cancer Center, Samsung Comprehensive Cancer Center, Samsung Medical Center, Seoul, Korea
| | - Jae Ill Zo
- Lung and Esophageal Cancer Center, Samsung Comprehensive Cancer Center, Samsung Medical Center, Seoul, Korea
| | - Young Ho Yun
- Department of Biomedical Science, Seoul National University College of Medicine, Seoul, Korea.
- Department of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea.
- Department of Family Medicine, Seoul National University College of Medicine, Seoul, Korea.
| |
Collapse
|
36
|
Wingrove P, Liaw W, Weiss J, Petterson S, Maier J, Bazemore A. Using Machine Learning to Predict Primary Care and Advance Workforce Research. Ann Fam Med 2020; 18:334-340. [PMID: 32661034 PMCID: PMC7358033 DOI: 10.1370/afm.2550] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 11/27/2019] [Accepted: 01/06/2020] [Indexed: 11/09/2022] Open
Abstract
PURPOSE To develop and test a machine-learning-based model to predict primary care and other specialties using Medicare claims data. METHODS We used 2014-2016 prescription and procedure Medicare data to train 3 sets of random forest classifiers (prescription only, procedure only, and combined) to predict specialty. Self-reported specialties were condensed to 27 categories. Physicians were assigned to testing and training cohorts, and random forest models were trained and then applied to 2014-2016 data sets for the testing cohort to generate a series of specialty predictions. Comparing the predicted specialty to self-report, we assessed performance with F1 scores and area under the receiver operating characteristic curve (AUROC) values. RESULTS A total of 564,986 physicians were included. The combined model had a greater aggregate (macro) F1 score (0.876) than the prescription-only (0.745; P <.01) or procedure-only (0.821; P <.01) model. Mean F1 scores across specialties in the combined model ranged from 0.533 to 0.987. The mean F1 score was 0.920 for primary care. The mean AUROC value for the combined model was 0.992, with values ranging from 0.982 to 0.999. The AUROC value for primary care was 0.982. CONCLUSIONS This novel approach showed high performance and provides a near real-time assessment of current primary care practice. These findings have important implications for primary care workforce research in the absence of accurate data.
Collapse
Affiliation(s)
- Peter Wingrove
- University of Pittsburgh, School of Medicine, Pittsburgh, Pennsylvania
- Robert Graham Center, Washington, DC
| | - Winston Liaw
- Robert Graham Center, Washington, DC
- University of Houston, College of Medicine, Department of Health Systems and Population Health Sciences, Houston, Texas
| | - Jeremy Weiss
- Carnegie Mellon University, Pittsburgh, Pennsylvania
| | | | - John Maier
- University of Pittsburgh, Department of Biomedical Informatics, Pittsburgh, Pennsylvania
| | | |
Collapse
|
37
|
Establishment and evaluation of a multicenter collaborative prediction model construction framework supporting model generalization and continuous improvement: A pilot study. Int J Med Inform 2020; 141:104173. [PMID: 32531725 DOI: 10.1016/j.ijmedinf.2020.104173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/10/2020] [Accepted: 05/09/2020] [Indexed: 12/13/2022]
Abstract
BACKGROUND AND OBJECTIVE In recent years, an increasing number of clinical prediction models have been developed to serve clinical care. Establishing a data-driven prediction model based on large-scale electronic health record (EHR) data can provide a more empirical basis for clinical decision making. However, research on model generalization and continuous improvement is insufficiently focused, which also hinders the application and evaluation of prediction models in real clinical environments. Therefore, this study proposes a multicenter collaborative prediction model construction framework to build a prediction model with greater generalizability and continuous improvement capabilities while preserving patient data security and privacy. MATERIALS AND METHODS Based on a multicenter collaborative research network, such as the Observational Health Data Sciences and Informatics (OHDSI), a multicenter collaborative prediction model construction framework is proposed. Based on the idea of multi-source transfer learning, in each source hospital, a base classifier was trained according to the model research setting. Then, in the target hospital with missing calibration data, a prediction model was established through weighted integration of base classifiers from source hospitals based on the smoothness assumption. Moreover, a passive-aggressive online learning algorithm was used for continuous improvement of the prediction model, which can help to maintain a high predictive performance to provide reliable clinical decision-making abilities. To evaluate the proposed prediction model construction framework, a prototype system for colorectal cancer prognosis prediction was developed. To evaluate the performance of models, 70,906 patients were screened, including 70,090 from 5 US hospital-specific datasets and 816 from a Chinese hospital-specific dataset. The area under the receiver operating characteristic curve (AUC) and the estimated calibration index (ECI) were used to evaluate the discrimination and calibration of models. RESULTS Regarding the colorectal cancer prognosis prediction in our prototype system, compared with the reference models, our model achieved a better performance in model calibration (ECI = 9.294 [9.146, 9.441]) and a similar ability in model discrimination (AUC = 0.783 [0.780, 0.786]). Furthermore, the online learning process provided in this study can continuously improve the performance of the prediction model when patient data with specified labels arrive (the AUC value increased from 0.709 to 0.715 and the ECI value decreased from 13.013 to 9.634 after 650 patient instances with specified labels from the Chinese hospital arrived), enabling the prediction model to maintain a good predictive performance during clinical application. CONCLUSIONS This study proposes and evaluates a multicenter collaborative prediction model construction framework that can support the construction of prediction models with better generalizability and continuous improvement capabilities without the need to aggregate multicenter patient-level data.
Collapse
|
38
|
Lu CC, Li JL, Wang YF, Ko BS, Tang JL, Lee CC. A BLSTM with Attention Network for Predicting Acute Myeloid Leukemia Patient's Prognosis using Comprehensive Clinical Parameters. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:2455-2458. [PMID: 31946395 DOI: 10.1109/embc.2019.8856524] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The prognosis management is crucial for highrisk disease like Acute Myeloid Leukemia (AML) in order to support decisions of clinical treatment. However, the challenges of accurate and consistent forecasting lie in the high variability of the disease outcomes and the complexity of the multiple clinical measurements available over the course of the treatment. In order to capture the multi-dimensional and longitudinal aspect of these comprehensive clinical parameters, we utilize an attention-based bi-directional long shortterm memory (Att-BLSTM) network to predict AML patient's survival and relapse. Specifically, we gather a 10-year worth of real patient's clinical data including blood test, medication, HSCT status, and gene mutation information. Our proposed Att-BLSTM framework achieves 77.1% and 67.3% AUC in tasks of predicting the next 2-year mortality and disease relapse with these comprehensive clinical parameters, and our further analysis demonstrates that a next 0 to 3 months prediction performs equally well, i.e., 74.8% and 67% AUC for mortality and relapse respectively.
Collapse
|
39
|
Franzo G, Corso B, Tucciarone CM, Drigo M, Caldin M, Cecchinato M. Comparison and validation of different models and variable selection methods for predicting survival after canine parvovirus infection. Vet Rec 2020; 187:e76. [PMID: 32169946 DOI: 10.1136/vr.105283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 02/04/2020] [Accepted: 03/01/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND Canine parvovirus (CPV) represents one of the major infections in dogs. While supportive therapy significantly reduces mortality, other approaches have been reported to provide significant benefits. Unfortunately, the high cost of these treatments is typically a limiting factor. Consequently, a reliable prognostic tool allowing for an informed therapeutic approach would be of great interest. However, current methods are essentially based on 'a priori' selection of predictive variables, which could limit their predictive potential. METHODS In the present study, the predictive performances in terms of CPV enteritis survival likelihood of an operator-validated logistic regression were compared with those of more flexible methods featured by automatic variable selection. Several anamnestic, clinical, haematological and biochemical parameters were collected from 134 dogs at admission in a veterinary practice. Animal status was monitored until dismissal or death (mortality=21.6%). RESULTS The best automatic variable selection method (random forest) showed excellent discriminatory capabilities (AUC=0.997, sensitivity=0.941 and specificity=1) compared with the logistic regression model (AUC=0.831, sensitivity=0.882 and specificity=0.652), when evaluated on a fully independent test data set. The implemented approaches allowed to identify antithrombin, serum aspartate aminotransferase, serum lipase, monocyte and lymphocyte count as the clinical parameter combination with the highest predictive capability, thus limiting the panel of required tests. CONCLUSION The model validated in the present study allows prompt prediction of disease severity at admission and provides objective and reliable criteria to support the clinician in selection of the therapeutic approach.
Collapse
Affiliation(s)
- Giovanni Franzo
- Animal Medicine, Production and Health, Università degli Studi di Padova, Scuola di Agraria e Medicina Veterinaria, Legnaro, Padova, Italy
| | - Barbara Corso
- Biomedical Sciences, Neuroscience Institute, National Research Council, Padova, Italy
| | - Claudia Maria Tucciarone
- Animal Medicine, Production and Health, Università degli Studi di Padova, Scuola di Agraria e Medicina Veterinaria, Legnaro, Padova, Italy
| | - Michele Drigo
- Animal Medicine, Production and Health, Università degli Studi di Padova, Scuola di Agraria e Medicina Veterinaria, Legnaro, Padova, Italy
| | - Marco Caldin
- San Marco Private Veterinary Clinic, Veggiano, Padova, Italy
| | - Mattia Cecchinato
- Animal Medicine, Production and Health, Università degli Studi di Padova, Scuola di Agraria e Medicina Veterinaria, Legnaro, Padova, Italy
| |
Collapse
|
40
|
Hossain ME, Uddin S, Khan A, Moni MA. A Framework to Understand the Progression of Cardiovascular Disease for Type 2 Diabetes Mellitus Patients Using a Network Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E596. [PMID: 31963383 PMCID: PMC7013570 DOI: 10.3390/ijerph17020596] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022]
Abstract
The prevalence of chronic disease comorbidity has increased worldwide. Comorbidity-i.e., the presence of multiple chronic diseases-is associated with adverse health outcomes in terms of mobility and quality of life as well as financial burden. Understanding the progression of comorbidities can provide valuable insights towards the prevention and better management of chronic diseases. Administrative data can be used in this regard as they contain semantic information on patients' health conditions. Most studies in this field are focused on understanding the progression of one chronic disease rather than multiple diseases. This study aims to understand the progression of two chronic diseases in the Australian health context. It specifically focuses on the comorbidity progression of cardiovascular disease (CVD) in patients with type 2 diabetes mellitus (T2DM), as the prevalence of these chronic diseases in Australians is high. A research framework is proposed to understand and represent the progression of CVD in patients with T2DM using graph theory and social network analysis techniques. Two study cohorts (i.e., patients with both T2DM and CVD and patients with only T2DM) were selected from an administrative dataset obtained from an Australian health insurance company. Two baseline disease networks were constructed from these two selected cohorts. A final disease network from two baseline disease networks was then generated by weight adjustments in a normalized way. The prevalence of renal failure, fluid and electrolyte disorders, hypertension and obesity was significantly higher in patients with both CVD and T2DM than patients with only T2DM. This showed that these chronic diseases occurred frequently during the progression of CVD in patients with T2DM. The proposed network-based model may potentially help the healthcare provider to understand high-risk diseases and the progression patterns between the recurrence of T2DM and CVD. Also, the framework could be useful for stakeholders including governments and private health insurers to adopt appropriate preventive health management programs for patients at a high risk of developing multiple chronic diseases.
Collapse
Affiliation(s)
- Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Arif Khan
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Mohammad Ali Moni
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia;
| |
Collapse
|
41
|
Mihaylov I, Kańduła M, Krachunov M, Vassilev D. A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models. Biol Direct 2019; 14:22. [PMID: 31752974 PMCID: PMC6868770 DOI: 10.1186/s13062-019-0249-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 09/20/2019] [Indexed: 12/17/2022] Open
Abstract
Background Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. Results We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) − provided in the CAMDA 2018 ‘Cancer Data Integration Challenge’, and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) − a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. Conclusion We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. Reviewers This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera.
Collapse
Affiliation(s)
- Iliyan Mihaylov
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria
| | - Maciej Kańduła
- Department of Biotechnology, Boku University, Vienna, 1180, Austria.,Institute for Machine Learning, Johannes Kepler University, Linz, 4040, Austria
| | - Milko Krachunov
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria
| | - Dimitar Vassilev
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria.
| |
Collapse
|
42
|
Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, Chen E, Elfiky AA. Applied Informatics Decision Support Tool for Mortality Predictions in Patients With Cancer. JCO Clin Cancer Inform 2019; 2:1-11. [PMID: 30652575 DOI: 10.1200/cci.18.00003] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE With rapidly evolving treatment options in cancer, the complexity in the clinical decision-making process for oncologists represents a growing challenge magnified by oncologists' disposition of intuition-based assessment of treatment risks and overall mortality. Given the unmet need for accurate prognostication with meaningful clinical rationale, we developed a highly interpretable prediction tool to identify patients with high mortality risk before the start of treatment regimens. METHODS We obtained electronic health record data between 2004 and 2014 from a large national cancer center and extracted 401 predictors, including demographics, diagnosis, gene mutations, treatment history, comorbidities, resource utilization, vital signs, and laboratory test results. We built an actionable tool using novel developments in modern machine learning to predict 60-, 90- and 180-day mortality from the start of an anticancer regimen. The model was validated in unseen data against benchmark models. RESULTS We identified 23,983 patients who initiated 46,646 anticancer treatment lines, with a median survival of 514 days. Our proposed prediction models achieved significantly higher estimation quality in unseen data (area under the curve, 0.83 to 0.86) compared with benchmark models. We identified key predictors of mortality, such as change in weight and albumin levels. The results are presented in an interactive and interpretable tool ( www.oncomortality.com ). CONCLUSION Our fully transparent prediction model was able to distinguish with high precision between highest- and lowest-risk patients. Given the rich data available in electronic health records and advances in machine learning methods, this tool can have significant implications for value-based shared decision making at the point of care and personalized goals-of-care management to catalyze practice reforms.
Collapse
Affiliation(s)
- Dimitris Bertsimas
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Jack Dunn
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Colin Pawlowski
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - John Silberholz
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Alexander Weinstein
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Ying Daisy Zhuo
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Eddy Chen
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Aymen A Elfiky
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| |
Collapse
|
43
|
Wang L, Sha L, Lakin JR, Bynum J, Bates DW, Hong P, Zhou L. Development and Validation of a Deep Learning Algorithm for Mortality Prediction in Selecting Patients With Dementia for Earlier Palliative Care Interventions. JAMA Netw Open 2019; 2:e196972. [PMID: 31298717 PMCID: PMC6628612 DOI: 10.1001/jamanetworkopen.2019.6972] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 05/20/2019] [Indexed: 11/26/2022] Open
Abstract
Importance Early palliative care interventions drive high-value care but currently are underused. Health care professionals face challenges in identifying patients who may benefit from palliative care. Objective To develop a deep learning algorithm using longitudinal electronic health records to predict mortality risk as a proxy indicator for identifying patients with dementia who may benefit from palliative care. Design, Setting, and Participants In this retrospective cohort study, 6-month, 1-year, and 2-year mortality prediction models with recurrent neural networks used patient demographic information and topics generated from clinical notes within Partners HealthCare System, an integrated health care delivery system in Boston, Massachusetts. This study included 26 921 adult patients with dementia who visited the health care system from January 1, 2011, through December 31, 2017. The models were trained using a data set of 24 229 patients and validated using another data set of 2692 patients. Data were analyzed from September 18, 2018, to May 15, 2019. Main Outcomes and Measures The area under the receiver operating characteristic curve (AUC) for 6-month and 1- and 2-year mortality prediction models and the factors contributing to the predictions. Results The study cohort included 26 921 patients (16 263 women [60.4%]; mean [SD] age, 74.6 [13.5] years). For the 24 229 patients in the training data set, mean (SD) age was 74.8 (13.2) years and 14 632 (60.4%) were women. For the 2692 patients in the validation data set, mean (SD) age was 75.0 (12.6) years and 1631 (60.6%) were women. The 6-month model reached an AUC of 0.978 (95% CI, 0.977-0.978); the 1-year model, 0.956 (95% CI, 0.955-0.956); and the 2-year model, 0.943 (95% CI, 0.942-0.944). The top-ranked latent topics associated with 6-month and 1- and 2-year mortality in patients with dementia include palliative and end-of-life care, cognitive function, delirium, testing of cholesterol levels, cancer, pain, use of health care services, arthritis, nutritional status, skin care, family meeting, shock, respiratory failure, and swallowing function. Conclusions and Relevance A deep learning algorithm based on patient demographic information and longitudinal clinical notes appeared to show promising results in predicting mortality among patients with dementia in different time frames. Further research is necessary to determine the feasibility of applying this algorithm in clinical settings for identifying unmet palliative care needs earlier.
Collapse
Affiliation(s)
- Liqin Wang
- Harvard Medical School, Boston, Massachusetts
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Long Sha
- Michtom School of Computer Science, Brandeis University, Waltham, Massachusetts
| | - Joshua R. Lakin
- Harvard Medical School, Boston, Massachusetts
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts
- Division of Palliative Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Julie Bynum
- Division of Geriatrics and Palliative Care, Department of Medicine, University of Michigan School of Medicine, Ann Arbor
| | - David W. Bates
- Harvard Medical School, Boston, Massachusetts
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Pengyu Hong
- Michtom School of Computer Science, Brandeis University, Waltham, Massachusetts
| | - Li Zhou
- Harvard Medical School, Boston, Massachusetts
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts
| |
Collapse
|
44
|
Abstract
The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression—these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper.
Collapse
|
45
|
Bartholomai JA, Frieboes HB. Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2019; 2018:632-637. [PMID: 31312809 DOI: 10.1109/isspit.2018.8642753] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This study employs an approach for which regression models are used in combination with a classification model to predict survival time. A set of de-identified lung cancer patient data was obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The models use a subset of factors selected by ANOVA. Model accuracy is measured by a confusion matrix for classification and by Root Mean Square Error (RMSE) for regression. Random Forests are used for classification, while general Linear Regression, Gradient Boosted Machines (GBM), and Random Forests are used for regression. The regression results show that RF had the best performance for survival times ≤6 and >24 months (RMSE 10.52 and 20.51, respectively), while GBM performed best for 7-24 months (RMSE 15.65). Comparison plots of the results further indicate that the regression models perform better for shorter survival times than the RMSE values are able to reflect.
Collapse
|
46
|
Nguyen D, Luo W, Phung D, Venkatesh S. LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.07.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
47
|
Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med 2018; 284:603-619. [PMID: 30102808 DOI: 10.1111/joim.12822] [Citation(s) in RCA: 461] [Impact Index Per Article: 65.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Machine learning (ML) is a burgeoning field of medicine with huge resources being applied to fuse computer science and statistics to medical problems. Proponents of ML extol its ability to deal with large, complex and disparate data, often found within medicine and feel that ML is the future for biomedical research, personalized medicine, computer-aided diagnosis to significantly advance global health care. However, the concepts of ML are unfamiliar to many medical professionals and there is untapped potential in the use of ML as a research tool. In this article, we provide an overview of the theory behind ML, explore the common ML algorithms used in medicine including their pitfalls and discuss the potential future of ML in medicine.
Collapse
Affiliation(s)
| | - H K Kok
- Interventional Radiology Service, Northern Hospital Radiology, Epping, Vic, Australia
| | - R V Chandra
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Vic, Australia
| | - A H Razavi
- School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada.,BCE Corporate Security, Ottawa, ON, Canada
| | - M J Lee
- Department of Radiology, Beaumont Hospital and Royal College of Surgeons in Ireland, Dublin, Ireland
| | - H Asadi
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Department of Radiology, Interventional Neuroradiology Service, Austin Health, Heidelberg, Vic, Australia.,School of Medicine, Faculty of Health, Deakin University, Waurn Ponds, Vic, Australia
| |
Collapse
|
48
|
Resteghini C, Trama A, Borgonovi E, Hosni H, Corrao G, Orlandi E, Calareso G, De Cecco L, Piazza C, Mainardi L, Licitra L. Big Data in Head and Neck Cancer. Curr Treat Options Oncol 2018; 19:62. [DOI: 10.1007/s11864-018-0585-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
49
|
Ahn E, Kim J, Rahman K, Baldacchino T, Baird C. Development of a risk predictive scoring system to identify patients at risk of representation to emergency department: a retrospective population-based analysis in Australia. BMJ Open 2018; 8:e021323. [PMID: 30287606 PMCID: PMC6173240 DOI: 10.1136/bmjopen-2017-021323] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
OBJECTIVE To examine the characteristics of frequent visitors (FVs) to emergency departments (EDs) and develop a predictive model to identify those with high risk of a future representations to ED among younger and general population (aged ≤70 years). DESIGN AND SETTING A retrospective analysis of ED data targeting younger and general patients (aged ≤70 years) were collected between 1 January 2009 and 30 June 2016 from a public hospital in Australia. PARTICIPANTS A total of 343 014 ED presentations were identified from 170 134 individual patients. MAIN OUTCOME MEASURES Proportion of FVs (those attending four or more times annually), demographic characteristics (age, sex, indigenous and marital status), mode of separation (eg, admitted to ward), triage categories, time of arrival to ED, referral on departure and clinical conditions. Statistical estimates using a mixed-effects model to develop a risk predictive scoring system. RESULTS The FVs were characterised by young adulthood (32.53%) to late-middle (26.07%) aged patients with a higher proportion of indigenous (5.7%) and mental health-related presentations (10.92%). They were also more likely to arrive by ambulance (36.95%) and leave at own risk without completing their treatments (9.8%). They were also highly associated with socially disadvantage groups such as people who have been divorced, widowed or separated (12.81%). These findings were then used for the development of a predictive model to identify potential FVs. The performance of our derived risk predictive model was favourable with an area under the receiver operating characteristic (ie, C-statistic) of 65.7%. CONCLUSION The development of a demographic and clinical profile of FVs coupled with the use of predictive model can highlight the gaps in interventions and identify new opportunities for better health outcome and planning.
Collapse
Affiliation(s)
- Euijoon Ahn
- School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia
| | - Jinman Kim
- School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia
- Nepean Telehealth Technology Centre, Nepean Hospital, Penrith, New South Wales, Australia
| | - Khairunnessa Rahman
- Integrated Care Initiative, Nepean Hospital, Penrith, New South Wales, Australia
| | - Tanya Baldacchino
- Nepean Telehealth Technology Centre, Nepean Hospital, Penrith, New South Wales, Australia
| | - Christine Baird
- Integrated Care Initiative, Nepean Hospital, Penrith, New South Wales, Australia
| |
Collapse
|
50
|
Madan C, Chopra KK, Satyanarayana S, Surie D, Chadha V, Sachdeva KS, Khanna A, Deshmukh R, Dutta L, Namdeo A, Shukla A, Sagili K, Chauhan LS. Developing a model to predict unfavourable treatment outcomes in patients with tuberculosis and human immunodeficiency virus co-infection in Delhi, India. PLoS One 2018; 13:e0204982. [PMID: 30281679 PMCID: PMC6169917 DOI: 10.1371/journal.pone.0204982] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 09/18/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Tuberculosis (TB) patients with human immunodeficiency virus (HIV) co-infection have worse TB treatment outcomes compared to patients with TB alone. The distribution of unfavourable treatment outcomes differs by socio-demographic and clinical characteristics, allowing for early identification of patients at risk. OBJECTIVE To develop a statistical model that can provide individual probabilities of unfavourable outcomes based on demographic and clinical characteristics of TB-HIV co-infected patients. METHODOLOGY We used data from all TB patients with known HIV-positive test results (aged ≥15 years) registered for first-line anti-TB treatment (ATT) in 2015 under the Revised National TB Control Programme (RNTCP) in Delhi, India. We included variables on demographics and pre-treatment clinical characteristics routinely recorded and reported to RNTCP and the National AIDS Control Organization. Binomial logistic regression was used to develop a statistical model to estimate probabilities of unfavourable TB treatment outcomes (i.e., death, loss to follow-up, treatment failure, transfer out of program, and a switch to drug-resistant regimen). RESULTS Of 55,260 TB patients registered for ATT in 2015 in Delhi, 928 (2%) had known HIV-positive test results. Of these, 816 (88%) had drug-sensitive TB and were ≥15 years. Among 816 TB-HIV patients included, 157 (19%) had unfavourable TB treatment outcomes. We developed a model for predicting unfavourable outcomes using age, sex, disease classification (pulmonary versus extra-pulmonary), TB treatment category (new or previously treated case), sputum smear grade, known HIV status at TB diagnosis, antiretroviral treatment at TB diagnosis, and CD4 cell count at ATT initiation. The chi-square p-value for model calibration assessed using the Hosmer-Lemeshow test was 0.15. The model discrimination, measured as the area under the receiver operator characteristic (ROC) curve, was 0.78. CONCLUSION The model had good internal validity, but should be validated with an independent cohort of TB-HIV co-infected patients to assess its performance before clinical or programmatic use.
Collapse
Affiliation(s)
| | | | | | - Diya Surie
- Division of Global HIV and TB, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Vineet Chadha
- National Tuberculosis Institute, Bangalore, Karnataka, India
| | | | | | | | - Lopamudra Dutta
- The United Nations Children's Fund (UNICEF), New Delhi, India
| | - Amit Namdeo
- The United Nations Children's Fund (UNICEF), New Delhi, India
| | - Ajay Shukla
- Uttar Pradesh State AIDS Control Society, Lucknow, Uttar Pradesh, India
| | - Karuna Sagili
- International Union Against Tuberculosis and Lung Disease, New Delhi, India
| | | |
Collapse
|