1
|
Yang S, Su H, Zhang N, Han Y, Ge Y, Fei Y, Liu Y, Hilowle A, Xu P, Zhang J. Discretizing multiple continuous predictors with U-shaped relationships with lnOR: introducing the recursive gradient scanning method in clinical and epidemiological research. BMC Med Res Methodol 2025; 25:70. [PMID: 40075286 PMCID: PMC11900475 DOI: 10.1186/s12874-025-02522-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Accepted: 02/25/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND Assuming a linear relationship between continuous predictors and outcomes in clinical prediction models is often inappropriate, as true linear relationships are rare, potentially resulting in biased estimates and inaccurate conclusions. Our research group addressed a single U-shaped independent variable before. Multiple U-shaped predictors can improve predictive accuracy by capturing nuanced relationships, but they also introduce challenges like increased complexity and potential overfitting. This study aims to extend the applicability of our previous research results to more common scenarios, thereby facilitating more comprehensive and practical investigations. METHODS In this study, we proposed a novel approach called the Recursive Gradient Scanning Method (RGS) for discretizing multiple continuous variables that exhibit U-shaped relationships with the natural logarithm of the odds ratio (lnOR). The RGS method involves a two-step approach: first, it conducts fine screening from the 2.5th to 97.5th percentiles of the lnOR. Then, it utilizes an iterative process that compares AIC metrics to identify optimal categorical variables. We conducted a Monte Carlo simulation study to investigate the performance of the RGS method. Different correlation levels, sample sizes, missing rates, and symmetry levels of U-shaped relationships were considered in the simulation process. To compare the RGS method with other common approaches (such as median, Q1-Q3, minimum P-value method), we assessed both the predictive ability (e.g., AUC) and goodness of fit (e.g., AIC) of logistic regression models with variables discretized at different cut-points using a real dataset. RESULTS Both simulation and empirical studies have consistently demonstrated the effectiveness of the RGS method. In simulation studies, the RGS method showed superior performance compared to other common discretization methods in discrimination ability and overall performance for logistic regression models across various U-shaped scenarios (with varying correlation levels, sample sizes, missing rates, and symmetry levels of U-shaped relationships). Similarly, empirical study showed that the optimal cut-points identified by RGS have superior clinical predictive power, as measured by metrics such as AUC, compared to other traditional methods. CONCLUSIONS The simulation and empirical study demonstrated that the RGS method outperformed other common discretization methods in terms of goodness of fit and predictive ability. However, in the future, we will focus on addressing challenges related to separation or missing binary responses, and we will require more data to validate our method.
Collapse
Affiliation(s)
- Shuo Yang
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Huaan Su
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
- The People's Hospital of Jiangmen, No. 172 Gaodi Li, Pengjiang District, Jiangmen, Guangdong, 529000, China
| | - Nanxiang Zhang
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Yuduan Han
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Yingfeng Ge
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Yi Fei
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Ying Liu
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Abdullahi Hilowle
- Department of Cardiology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510630, China
| | - Peng Xu
- The First Affiliated Hospital of Bengbu Medical University, Bengbuaq , Anhui, 233004, China
| | - Jinxin Zhang
- Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China.
| |
Collapse
|
2
|
Huang X, Gajewski BJ. Comparison of hierarchical EMAX and NDLM models in dose-response for early phase clinical trials. BMC Med Res Methodol 2020; 20:194. [PMID: 32690004 PMCID: PMC7370408 DOI: 10.1186/s12874-020-01071-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 07/01/2020] [Indexed: 11/29/2022] Open
Abstract
Background Phase II clinical trials primarily aim to find the optimal dose and investigate the relationship between dose and efficacy relative to standard of care (control). Therefore, before moving forward to a phase III confirmatory trial, the most effective dose is needed to be identified. Methods The primary endpoint of a phase II trial is typically a binary endpoint of success or failure. The EMAX model, ubiquitous in pharmacology research, was fit for many compounds and described the data well, except for a single compound, which had nonmonotone dose–response (Thomas et al., Stat Biopharmaceutical Res. 6:302-317 2014). To mitigate the risk of nonmonotone dose response one of the alternative options is a Bayesian hierarchical EMAX model (Gajewski et al., Stat Med. 38:3123-3138 2019). The hierarchical EMAX adapts to its environment. Results When the dose-response curve is monotonic it enjoys the efficiency of EMAX. When the dose-response curve is non-monotonic the additional random effect hyperprior makes the hierarchical EMAX model more adjustable and flexible. However, the normal dynamic linear model (NDLM) is a useful model to explore dose-response relationships in that the efficacy at the current dose depends on the efficacy of the previous dose(s). Previous research has compared the EMAX to the hierarchical EMAX (Gajewski et al., Stat Med. 38:3123-3138 2019) and the EMAX to the NDLM (Liu et al., BMC Med Res Method 17:149 2017), however, the hierarchical EMAX has not been directly compared to the NDLM. Conclusions The focus of this paper is to compare these models and discuss the relative merit for each of their uses for an ongoing early phase dose selection study.
Collapse
Affiliation(s)
- Xiaqing Huang
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Mail Stop 1026, 3901 Rainbow Blvd., Kansas City, KS, 66160, USA
| | - Byron J Gajewski
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Mail Stop 1026, 3901 Rainbow Blvd., Kansas City, KS, 66160, USA.
| |
Collapse
|