1
|
Lin Q, Guan Q, Chen D, Li L, Lin Y. Peritoneal cytology predicting distant metastasis in uterine carcinosarcoma: machine learning model development and validation. World J Surg Oncol 2025; 23:167. [PMID: 40287676 PMCID: PMC12034135 DOI: 10.1186/s12957-025-03771-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2024] [Accepted: 03/23/2025] [Indexed: 04/29/2025] Open
Abstract
OBJECTIVE This study develops and validates a machine learning model using peritoneal cytology to predict distant metastasis in uterine carcinosarcoma, aiding clinical decision-making. METHODS This study utilized detailed clinical data and peritoneal cytology findings from uterine carcinosarcoma patients in the SEER database. Eight machine learning algorithms-Logistic Regression, SVM, GBM, Neural Network, RandomForest, KNN, AdaBoost, and LightGBM-were applied to predict distant metastasis. Model performance was assessed using AUC, calibration curves, DCA, confusion matrices, sensitivity, and specificity. The Logistic Regression model was visualized with a nomogram, and its results were analyzed. SHAP values were used to interpret the best-performing machine learning model. RESULTS Peritoneal cytology, T stage, age, and tumor size were key factors influencing distant metastasis in uterine carcinosarcoma patients. Peritoneal cytology had significant weight in the prediction models. The logistic regression model demonstrated excellent predictive performance with an AUC of 0.882 in the training set and 0.881 in the internal test set. The model was visualized and interpreted using a nomogram. In comprehensive evaluations, GBM was identified as the best-performing model and was explained using SHAP values. Additionally, calibration and DCA curves indicated that both models have significant potential clinical utility. CONCLUSION This study introduces the first effective tool for predicting distant metastasis in uterine carcinosarcoma patients by integrating peritoneal cytology features into model construction. It aids in early identification of high-risk patients, enhancing follow-up and monitoring during tumor development, and supports the optimization of personalized treatment strategies.
Collapse
Affiliation(s)
- Qiaoming Lin
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, N0.420 Fuma Road, Fuzhou, Fujian, 350014, China
- Fujian Medical University, Fuzhou, Fujian, 350122, China
| | - Qi Guan
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, N0.420 Fuma Road, Fuzhou, Fujian, 350014, China
- Fujian Medical University, Fuzhou, Fujian, 350122, China
| | - Danru Chen
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, N0.420 Fuma Road, Fuzhou, Fujian, 350014, China
- Fujian Medical University, Fuzhou, Fujian, 350122, China
| | - Lilan Li
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, N0.420 Fuma Road, Fuzhou, Fujian, 350014, China
- Fujian Medical University, Fuzhou, Fujian, 350122, China
| | - Yibin Lin
- Department of Gynecology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, N0.420 Fuma Road, Fuzhou, Fujian, 350014, China.
- Fujian Medical University, Fuzhou, Fujian, 350122, China.
| |
Collapse
|
2
|
Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ Digit Med 2024; 7:203. [PMID: 39097662 PMCID: PMC11297942 DOI: 10.1038/s41746-024-01196-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 07/12/2024] [Indexed: 08/05/2024] Open
Abstract
The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients' lives. While trustworthiness concerns various aspects including ethical, transparency and safety requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical ML products. We perform a systematic review following PRISMA guidelines using the databases Web of Science, PubMed and ACM Digital Library. We identify 5408 studies, out of which 120 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. The METRIC-framework may serve as a base for systematically assessing training datasets, establishing reference datasets, and designing test datasets which has the potential to accelerate the approval of medical ML products.
Collapse
Affiliation(s)
- Daniel Schwabe
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany.
| | - Katinka Becker
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Martin Seyferth
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Andreas Klaß
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Tobias Schaeffter
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
- Department of Medical Engineering, Technical University Berlin, Berlin, Germany
- Einstein Centre for Digital Future, Berlin, Germany
| |
Collapse
|
3
|
Field M, Vinod S, Delaney GP, Aherne N, Bailey M, Carolan M, Dekker A, Greenham S, Hau E, Lehmann J, Ludbrook J, Miller A, Rezo A, Selvaraj J, Sykes J, Thwaites D, Holloway L. Federated Learning Survival Model and Potential Radiotherapy Decision Support Impact Assessment for Non-small Cell Lung Cancer Using Real-World Data. Clin Oncol (R Coll Radiol) 2024; 36:e197-e208. [PMID: 38631978 DOI: 10.1016/j.clon.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 02/07/2024] [Accepted: 03/11/2024] [Indexed: 04/19/2024]
Abstract
AIMS The objective of this study was to develop a two-year overall survival model for inoperable stage I-III non-small cell lung cancer (NSCLC) patients using routine radiation oncology data over a federated (distributed) learning network and evaluate the potential of decision support for curative versus palliative radiotherapy. METHODS A federated infrastructure of data extraction, de-identification, standardisation, image analysis, and modelling was installed for seven clinics to obtain clinical and imaging features and survival information for patients treated in 2011-2019. A logistic regression model was trained for the 2011-2016 curative patient cohort and validated for the 2017-2019 cohort. Features were selected with univariate and model-based analysis and optimised using bootstrapping. System performance was assessed by the receiver operating characteristic (ROC) and corresponding area under curve (AUC), C-index, calibration metrics and Kaplan-Meier survival curves, with risk groups defined by model probability quartiles. Decision support was evaluated using a case-control analysis using propensity matching between treatment groups. RESULTS 1655 patient datasets were included. The overall model AUC was 0.68. Fifty-eight percent of patients treated with palliative radiotherapy had a low-to-moderate risk prediction according to the model, with survival times not significantly different (p = 0.87 and 0.061) from patients treated with curative radiotherapy classified as high-risk by the model. When survival was simulated by risk group and model-indicated treatment, there was an estimated 11% increase in survival rate at two years (p < 0.01). CONCLUSION Federated learning over multiple institution data can be used to develop and validate decision support systems for lung cancer while quantifying the potential impact of their use in practice. This paves the way for personalised medicine, where decisions can be based more closely on individual patient details from routine care.
Collapse
Affiliation(s)
- M Field
- South Western Sydney Clinical Campus, School of Clinical Medicine, UNSW, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, New South Wales, Australia.
| | - S Vinod
- South Western Sydney Clinical Campus, School of Clinical Medicine, UNSW, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, New South Wales, Australia
| | - G P Delaney
- South Western Sydney Clinical Campus, School of Clinical Medicine, UNSW, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, New South Wales, Australia
| | - N Aherne
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia; Rural Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - M Bailey
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - M Carolan
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - A Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, The Netherlands
| | - S Greenham
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia
| | - E Hau
- Sydney West Radiation Oncology Network, Sydney, Australia; Westmead Clinical School, University of Sydney, Sydney, New South Wales, Australia
| | - J Lehmann
- School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, New South Wales, Australia; Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia; Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - J Ludbrook
- Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia
| | - A Miller
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - A Rezo
- Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - J Selvaraj
- South Western Sydney Clinical Campus, School of Clinical Medicine, UNSW, Sydney, New South Wales, Australia; Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - J Sykes
- Sydney West Radiation Oncology Network, Sydney, Australia; Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - D Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia; Radiotherapy Research Group, Leeds Institute for Medical Research, St James's Hospital and the University of Leeds, Leeds, UK
| | - L Holloway
- South Western Sydney Clinical Campus, School of Clinical Medicine, UNSW, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, New South Wales, Australia; Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
4
|
Timilsina M, Tandan M, Nováček V. Machine learning approaches for predicting the onset time of the adverse drug events in oncology. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
5
|
Field M, I Thwaites D, Carolan M, Delaney GP, Lehmann J, Sykes J, Vinod S, Holloway L. Infrastructure platform for privacy-preserving distributed machine learning development of computer-assisted theragnostics in cancer. J Biomed Inform 2022; 134:104181. [PMID: 36055639 DOI: 10.1016/j.jbi.2022.104181] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 04/29/2022] [Accepted: 08/20/2022] [Indexed: 11/26/2022]
Abstract
INTRODUCTION Emerging evidence suggests that data-driven support tools have found their way into clinical decision-making in a number of areas, including cancer care. Improving them and widening their scope of availability in various differing clinical scenarios, including for prognostic models derived from retrospective data, requires co-ordinated data sharing between clinical centres, secondary analyses of large multi-institutional clinical trial data, or distributed (federated) learning infrastructures. A systematic approach to utilizing routinely collected data across cancer care clinics remains a significant challenge due to privacy, administrative and political barriers. METHODS An information technology infrastructure and web service software was developed and implemented which uses machine learning to construct clinical decision support systems in a privacy-preserving manner across datasets geographically distributed in different hospitals. The infrastructure was deployed in a network of Australian hospitals. A harmonized, international ontology-linked, set of lung cancer databases were built with the routine clinical and imaging data at each centre. The infrastructure was demonstrated with the development of logistic regression models to predict major cardiovascular events following radiation therapy. RESULTS The infrastructure implemented forms the basis of the Australian computer-assisted theragnostics (AusCAT) network for radiation oncology data extraction, reporting and distributed learning. Four radiation oncology departments (across seven hospitals) in New South Wales (NSW) participated in this demonstration study. Infrastructure was deployed at each centre and used to develop a model predicting for cardiovascular admission within a year of receiving curative radiotherapy for non-small cell lung cancer. A total of 10417 lung cancer patients were identified with 802 being eligible for the model. Twenty features were chosen for analysis from the clinical record and linked registries. After selection, 8 features were included and a logistic regression model achieved an area under the receiver operating characteristic (AUROC) curve of 0.70 and C-index of 0.65 on out-of-sample data. CONCLUSION The infrastructure developed was demonstrated to be usable in practice between clinical centres to harmonize routinely collected oncology data and develop models with federated learning. It provides a promising approach to enable further research studies in radiation oncology using real world clinical data.
Collapse
Affiliation(s)
- Matthew Field
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia.
| | - David I Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia
| | - Martin Carolan
- Illawarra Cancer Care Centre, Wollongong, NSW, Australia
| | - Geoff P Delaney
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Joerg Lehmann
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia; Department of Radiation Oncology, Calvary Mater Newcastle, NSW, Australia
| | - Jonathan Sykes
- Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia; Blacktown Haematology and Oncology Cancer Care Centre, Blacktown Hospital, Blacktown, NSW, Australia; Crown Princess Mary Cancer Centre, Westmead Hospital, Westmead, NSW, Australia
| | - Shalini Vinod
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Lois Holloway
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, NSW, Australia; South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia; Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia; Institute of Medical Physics, School of Physics, University of Sydney, NSW, Australia
| |
Collapse
|
6
|
Mohamed SK, Walsh B, Timilsina M, Torrente M, Franco F, Provencio M, Janik A, Costabello L, Minervini P, Stenetorp P, Novácˇek V. On Predicting Recurrence in Early Stage Non-small Cell Lung Cancer. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:853-862. [PMID: 35308971 PMCID: PMC8861763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Early detection and mitigation of disease recurrence in non-small cell lung cancer (NSCLC) patients is a nontrivial problem that is typically addressed either by rather generic follow-up screening guidelines, self-reporting, simple nomograms, or by models that predict relapse risk in individual patients using statistical analysis of retrospective data. We posit that machine learning models trained on patient data can provide an alternative approach that allows for more efficient development of many complementary models at once, superior accuracy, less dependency on the data collection protocols and increased support for explainability of the predictions. In this preliminary study, we describe an experimental suite of various machine learning models applied on a patient cohort of 2442 early stage NSCLC patients. We discuss the promising results achieved, as well as the lessons we learned while developing this baseline for further, more advanced studies in this area.
Collapse
Affiliation(s)
- Sameh K Mohamed
- Data Science Institute, NUI Galway, Galway, Ireland
- Insight Centre for Data Analytics, NUI Galway, Galway, Ireland
| | - Brian Walsh
- Data Science Institute, NUI Galway, Galway, Ireland
- Insight Centre for Data Analytics, NUI Galway, Galway, Ireland
| | - Mohan Timilsina
- Data Science Institute, NUI Galway, Galway, Ireland
- Insight Centre for Data Analytics, NUI Galway, Galway, Ireland
| | - Maria Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain
| | - Fabio Franco
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain
| | - Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain
| | | | | | | | | | - Vít Novácˇek
- Data Science Institute, NUI Galway, Galway, Ireland
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| |
Collapse
|
7
|
Field M, Vinod S, Aherne N, Carolan M, Dekker A, Delaney G, Greenham S, Hau E, Lehmann J, Ludbrook J, Miller A, Rezo A, Selvaraj J, Sykes J, Holloway L, Thwaites D. Implementation of the Australian Computer-Assisted Theragnostics (AusCAT) network for radiation oncology data extraction, reporting and distributed learning. J Med Imaging Radiat Oncol 2021; 65:627-636. [PMID: 34331748 DOI: 10.1111/1754-9485.13287] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 06/29/2021] [Indexed: 12/28/2022]
Abstract
INTRODUCTION There is significant potential to analyse and model routinely collected data for radiotherapy patients to provide evidence to support clinical decisions, particularly where clinical trials evidence is limited or non-existent. However, in practice there are administrative, ethical, technical, logistical and legislative barriers to having coordinated data analysis platforms across radiation oncology centres. METHODS A distributed learning network of computer systems is presented, with software tools to extract and report on oncology data and to enable statistical model development. A distributed or federated learning approach keeps data in the local centre, but models are developed from the entire cohort. RESULTS The feasibility of this approach is demonstrated across six Australian oncology centres, using routinely collected lung cancer data from oncology information systems. The infrastructure was used to validate and develop machine learning for model-based clinical decision support and for one centre to assess patient eligibility criteria for two major lung cancer radiotherapy clinical trials (RTOG-9410, RTOG-0617). External validation of a 2-year overall survival model for non-small cell lung cancer (NSCLC) gave an AUC of 0.65 and C-index of 0.62 across the network. For one centre, 65% of Stage III NSCLC patients did not meet eligibility criteria for either of the two practice-changing clinical trials, and these patients had poorer survival than eligible patients (10.6 m vs. 15.8 m, P = 0.024). CONCLUSION Population-based studies on routine data are possible using a distributed learning approach. This has the potential for decision support models for patients for whom supporting clinical trial evidence is not applicable.
Collapse
Affiliation(s)
- Matthew Field
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia
| | - Shalini Vinod
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia
| | - Noel Aherne
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia.,Rural Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Martin Carolan
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - Andre Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, The Netherlands
| | - Geoff Delaney
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia
| | - Stuart Greenham
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia
| | - Eric Hau
- Sydney West Radiation Oncology Network, Sydney, Australia.,Westmead Clinical School, University of Sydney, Sydney, New South Wales, Australia
| | - Joerg Lehmann
- School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, New South Wales, Australia.,Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - Joanna Ludbrook
- Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia
| | - Andrew Miller
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - Angela Rezo
- Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - Jothybasu Selvaraj
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - Jonathan Sykes
- Sydney West Radiation Oncology Network, Sydney, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - Lois Holloway
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - David Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
8
|
Zhang H, Guo Y, Prosperi M, Bian J. An ontology-based documentation of data discovery and integration process in cancer outcomes research. BMC Med Inform Decis Mak 2020; 20:292. [PMID: 33317497 PMCID: PMC7734720 DOI: 10.1186/s12911-020-01270-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 09/17/2020] [Indexed: 01/24/2023] Open
Abstract
Background To reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, prior research on RFs of cancer outcomes, has primarily focused on individual level RFs due to the lack of integrated datasets that contain multi-level, multi-domain RFs. Further, the lack of a consensus and proper guidance on systematically identify RFs also increase the difficulty of RF selection from heterogenous data sources in a multi-level integrative data analysis (mIDA) study. More importantly, as mIDA studies require integrating heterogenous data sources, the data integration processes in the limited number of existing mIDA studies are inconsistently performed and poorly documented, and thus threatening transparency and reproducibility. Methods Informed by the National Institute on Minority Health and Health Disparities (NIMHD) research framework, we (1) reviewed existing reporting guidelines from the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network and (2) developed a theory-driven reporting guideline to guide the RF variable selection, data source selection, and data integration process. Then, we developed an ontology to standardize the documentation of the RF selection and data integration process in mIDA studies. Results We summarized the review results and created a reporting guideline—ATTEST—for reporting the variable selection and data source selection and integration process. We provided an ATTEST check list to help researchers to annotate and clearly document each step of their mIDA studies to ensure the transparency and reproducibility. We used the ATTEST to report two mIDA case studies and further transformed annotation results into sematic triples, so that the relationships among variables, data sources and integration processes are explicitly standardized and modeled using the classes and properties from OD-ATTEST. Conclusion Our ontology-based reporting guideline solves some key challenges in current mIDA studies for cancer outcomes research, through providing (1) a theory-driven guidance for multi-level and multi-domain RF variable and data source selection; and (2) a standardized documentation of the data selection and integration processes powered by an ontology, thus a way to enable sharing of mIDA study reports among researchers.
Collapse
Affiliation(s)
- Hansi Zhang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, 2197 Mowry Road, Suite 122, PO Box 100177, Gainesville, FL, 32610-0177, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, 2197 Mowry Road, Suite 122, PO Box 100177, Gainesville, FL, 32610-0177, USA.,Cancer Informatics & eHealth Core, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Mattia Prosperi
- Department of Epidemiology, College of Medicine & College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, 2197 Mowry Road, Suite 122, PO Box 100177, Gainesville, FL, 32610-0177, USA. .,Cancer Informatics & eHealth Core, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
9
|
Nearest Neighbour Propensity Score Matching and Bootstrapping for Estimating Binary Patient Response in Oncology: A Monte Carlo Simulation. Sci Rep 2020; 10:964. [PMID: 31969627 PMCID: PMC6976708 DOI: 10.1038/s41598-020-57799-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 12/09/2019] [Indexed: 01/08/2023] Open
Abstract
Nearest Neighbour (NN) propensity score (PS) matching methods are commonly used in pharmacoepidemiology to estimate treatment response using observational data. Unfortunately, there is limited evidence on the optimal approach for accurately estimating binary treatment response and, more so, to estimate its variance. Bootstrapping, although commonly used to accurately estimate variance, is rarely used together with PS matching. In this Monte Carlo simulation-based study, we examined the performance of bootstrapping used in conjunction with PS matching, as opposed to different NN matching techniques, on a simulated dataset exhibiting varying levels of real world complexity. Thus, an experimental design was set up that independently varied the proportion of patients treated, the proportion of outcomes censored and the amount of PS matches used. Simulation results were externally validated on a real observational dataset obtained from the Belgian Cancer Registry. We found all investigated PS methods to be stable and concordant, with k-NN matching to be optimally dealing with the censoring problem, typically present in chronic cancer-related datasets, whilst being the least computationally expensive. In contrast, bootstrapping used in conjunction with PS matching, being the most computationally expensive, only showed superior results in small patient populations with long-term largely unobserved treatment effects.
Collapse
|
10
|
Imputation techniques on missing values in breast cancer treatment and fertility data. Health Inf Sci Syst 2019; 7:19. [PMID: 31656592 DOI: 10.1007/s13755-019-0082-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Accepted: 09/18/2019] [Indexed: 10/25/2022] Open
Abstract
Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on listwise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even the dataset experiences high missing rate (> 50%).
Collapse
|
11
|
Siuly S, Huang R, Daneshmand M. Guest editorial: special issue on "Artificial Intelligence in Health and Medicine". Health Inf Sci Syst 2018; 6:2. [PMID: 29354261 DOI: 10.1007/s13755-017-0040-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Affiliation(s)
- Siuly Siuly
- 1Centre for Applied Informatics, College of Engineering and Science, Victoria University Melbourne, Melbourne, VIC 8001 Australia
| | | | | |
Collapse
|