Utility of a Clinically Guided Data-Driven Approach for Predicting Breast Cancer Complications: An Application Using a Population-Based Claims Data Set.
JCO Clin Cancer Inform 2022;
6:e2100191. [PMID:
36417684 DOI:
10.1200/cci.21.00191]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
PURPOSE
With earlier detection and an increasing number of breast cancer (BCa) survivors, more women are living with side effects of BCa treatment. A predictive approach to studying treatment-related adverse events (AEs) may generate proactive strategies; however, many studies are descriptive in nature. Focusing on short-term AEs, we determine the performance of prediction models of disease- or treatment-related AEs among women diagnosed with BCa.
METHODS
We used administrative claims data from the Blue Health Intelligence National Data Repository. The study sample included female individuals age 18 years and older who were diagnosed with BCa and received cancer-directed treatment between January 1, 2014, and August 1, 2019. Using the information available in the claims data, we constructed longitudinal patient histories and identified disease- and treatment-related AEs occurring within 6 months of treatment. The following prediction models were developed: logistic regression, Lasso regression, gradient boosted tree (GBT), and random forest (RF). We compared models using the area under the receiver operating characteristic curve and its CI, among other metrics.
RESULTS
Data were extracted for 267,473 members meeting study inclusion criteria. The area under the curve for the logistic regression model was 0.82 (0.82-0.86), compared with 0.89 (0.87-0.90) for the Lasso, 0.91 (0.89-0.93) for the GBT, and 0.90 (0.93-0.89) for the RF models. The sensitivity was 0.96 for the GBT, Lasso, and RF models, whereas the specificity was 0.42, 0.44, and 0.39 for the GBT, Lasso, and RF models, respectively. Positive predictive values were 0.96 across all three models.
CONCLUSION
Prediction models developed using big data methods and grounded in a clinically guided framework have the potential to reliably predict short-term treatment-related AEs among women diagnosed with BCa.
Collapse