1
|
Cardoso P, McDonald TJ, Patel KA, Pearson ER, Hattersley AT, Shields BM, McKinley TJ. Comparison of Bayesian approaches for developing prediction models in rare disease: application to the identification of patients with Maturity-Onset Diabetes of the Young. BMC Med Res Methodol 2024; 24:128. [PMID: 38834992 PMCID: PMC11149229 DOI: 10.1186/s12874-024-02239-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Clinical prediction models can help identify high-risk patients and facilitate timely interventions. However, developing such models for rare diseases presents challenges due to the scarcity of affected patients for developing and calibrating models. Methods that pool information from multiple sources can help with these challenges. METHODS We compared three approaches for developing clinical prediction models for population screening based on an example of discriminating a rare form of diabetes (Maturity-Onset Diabetes of the Young - MODY) in insulin-treated patients from the more common Type 1 diabetes (T1D). Two datasets were used: a case-control dataset (278 T1D, 177 MODY) and a population-representative dataset (1418 patients, 96 MODY tested with biomarker testing, 7 MODY positive). To build a population-level prediction model, we compared three methods for recalibrating models developed in case-control data. These were prevalence adjustment ("offset"), shrinkage recalibration in the population-level dataset ("recalibration"), and a refitting of the model to the population-level dataset ("re-estimation"). We then developed a Bayesian hierarchical mixture model combining shrinkage recalibration with additional informative biomarker information only available in the population-representative dataset. We developed a method for dealing with missing biomarker and outcome information using prior information from the literature and other data sources to ensure the clinical validity of predictions for certain biomarker combinations. RESULTS The offset, re-estimation, and recalibration methods showed good calibration in the population-representative dataset. The offset and recalibration methods displayed the lowest predictive uncertainty due to borrowing information from the fitted case-control model. We demonstrate the potential of a mixture model for incorporating informative biomarkers, which significantly enhanced the model's predictive accuracy, reduced uncertainty, and showed higher stability in all ranges of predictive outcome probabilities. CONCLUSION We have compared several approaches that could be used to develop prediction models for rare diseases. Our findings highlight the recalibration mixture model as the optimal strategy if a population-level dataset is available. This approach offers the flexibility to incorporate additional predictors and informed prior probabilities, contributing to enhanced prediction accuracy for rare diseases. It also allows predictions without these additional tests, providing additional information on whether a patient should undergo further biomarker testing before genetic testing.
Collapse
Affiliation(s)
- Pedro Cardoso
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Timothy J McDonald
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Kashyap A Patel
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Ewan R Pearson
- University of Dundee. Address: Division of Population Health & Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD1 9SY, UK
| | - Andrew T Hattersley
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Beverley M Shields
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Trevelyan J McKinley
- University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK.
| |
Collapse
|
2
|
Chhoa H, Chabriat H, Anato AJ, Bamba M, Zittoun F, Chevret S, Biard L. Improvement of an External Predictive Model Based on New Information Using a Synthetic Data Approach: Application to CADASIL. Neurol Genet 2023; 9:e200091. [PMID: 38235365 PMCID: PMC10691224 DOI: 10.1212/nxg.0000000000200091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/07/2023] [Indexed: 01/19/2024]
Abstract
Background and Objectives Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is the most frequent hereditary cerebral small vessel disease. It is caused by mutations of the NOTCH3 gene. The disease evolves progressively over decades leading to stroke, disability, cognitive decline, and functional dependency. The course and clinical severity of CADASIL seem heterogeneous. Predictive models are thus needed to improve prognostic evaluation and inform future clinical trials. A predictive model of the 3-year variation in the Mattis Dementia Rating Scale (MDRS), which reflects the global cognitive performance of patients with CADASIL, was previously proposed. This model made predictions based on demographic, clinical, and MRI data. We aimed to improve this existing predictive model by integrating a new potential factor, the location of the genetic mutation in the different epidermal growth factor (EGFr) domains of the NOTCH3 gene, dichotomized into EGFr domains 1 to 6 or 7 to 34. Methods We used a new synthetic data approach to improve the initial predictive model by incorporating additional genetic information. This method combined the predicted outcomes from the previous model and 5 "synthetic" data sets with the observed outcome in a new data set. We then applied a multiple imputation method for missing data on the mutation location. Results The new data set included 367 patients who were followed up for 30 to 42 months. In the multivariable model with synthetic data, patients with NOTCH3 mutations in EGFr domains 7 to 34 had an additional average decrease of -1.4 points (standard error 0.67, p = 0.035) in their MDRS score variation over 3 years compared with patients with mutations located in EGFr domains 1 to 6. Cross-validation results highlighted the improved predictive performance of the enhanced model. Moreover, the model estimation was found to be more robust than fitting a model without synthetic data. Discussion The use of synthetic data improved the predictive model of MDRS change over 3 years in CADASIL. The predictive performance and estimation robustness of the predictive model were enhanced using this approach, whether genetic information was used. A statistically significant association between the location of the mutation in the NOTCH3 gene and the 3-year MDRS score variation was detected.
Collapse
Affiliation(s)
- Henri Chhoa
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Hugues Chabriat
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Adelina Joanita Anato
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Mamadou Bamba
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Florent Zittoun
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Sylvie Chevret
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Lucie Biard
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| |
Collapse
|
3
|
Su PF, Zhong J, Liu YC, Lin TH, Ou HT. Efficient estimation of a Cox model when integrating the subgroup incidence rate information. J Appl Stat 2022; 50:2151-2170. [PMID: 37434630 PMCID: PMC10332198 DOI: 10.1080/02664763.2022.2068512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 04/16/2022] [Indexed: 10/18/2022]
Abstract
Incidence rates for diseases are widely used in the field of medical research because they lead to clear and simple physical and clinical interpretations. In this study, we propose an efficient estimation method that incorporates auxiliary subgroup information related to the incidence rate into the estimation of the Cox proportional hazard model. The results show that utilizing the incidence rate information improves the efficiency of the estimation of regression parameters based on the double empirical likelihood method compared to that for conventional models that do not incorporation such information. We show that estimators of regression parameters asymptotically follow a multivariate normal distribution with a variance-covariance matrix that can be consistently estimated. Simulation results indicate that the proposed estimators significantly increase efficiency. Finally, an example of the effects of type 2 diabetes on stroke is applied to demonstrate the proposed method.
Collapse
Affiliation(s)
- Pei-Fang Su
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - Junjiang Zhong
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, People's Republic of China
| | - Yi-Chia Liu
- The Center for Quantitative Sciences, Clinical Medicine Research Center, National Cheng Kung University Hospital, Tainan, Taiwan
| | - Tzu-Hsuan Lin
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Huang-Tz Ou
- Institute of Clinical Pharmacy and Pharmaceutical Sciences, Department of Pharmacy, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
4
|
Kerpel-Fronius A, Tammemägi M, Cavic M, Henschke C, Jiang L, Kazerooni E, Lee CT, Ventura L, Yang D, Lam S, Huber RM. Screening for Lung Cancer in Individuals Who Never Smoked: An International Association for the Study of Lung Cancer Early Detection and Screening Committee Report. J Thorac Oncol 2021; 17:56-66. [PMID: 34455065 DOI: 10.1016/j.jtho.2021.07.031] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/15/2021] [Accepted: 07/27/2021] [Indexed: 12/17/2022]
Abstract
Screening with low-dose computed tomography of high-risk individuals with a smoking history reduces lung cancer mortality. Current screening guidelines and eligibility criteria can miss more than 50% of lung cancers, and in some geographic areas, such as East Asia, a large proportion of the missed lung cancers are in never-smokers. Although randomized trials revealed the benefits of screening for people who smoke, these trials generally excluded never-smokers. Thus, the feasibility and effectiveness of lung cancer screening of individuals who never smoked are uncertain. Several known and suspected risk factors for lung cancers in never-smokers such as exposure to secondhand smoke, occupational carcinogens, radon, air pollution, and pulmonary diseases, such as chronic obstructive pulmonary disease and interstitial lung diseases, and intrinsic factors, such as age, are well noted. In this regard, knowledge of risk factors may make possible quantification and prediction of lung cancer risk in never smokers. It is worth considering if and how never smokers could be included in population-based screening programs. As the implementation of these programs is challenging in many countries owing to multiple factors and the epidemiologic differences by global regions, these issues will need to be evaluated in each country taking into account various factors, including accuracy of risk assessment and cost-effectiveness of screening in never smokers. This report aims to outline current knowledge on risk factors for lung cancer in never smokers to propose research strategies for this topic and initiate a broader discussion on lung cancer screening of never smokers. Similar considerations can be made in current and ex-smokers, which do not fulfill the current screening inclusion criteria, but otherwise are at increased risk. Although screening of never smokers may in the future be effectively conducted, current evidence to support widespread implementation of this practice is lacking.
Collapse
Affiliation(s)
- Anna Kerpel-Fronius
- Országos Korányi Pulmonológiai Intézet, National Korányi Institute for Pulmonology, Budapest, Hungary.
| | - Martin Tammemägi
- Prevention and Cancer Control, Ontario Health (Cancer Care Ontario), Toronto, Ontario, Canada; Department of Health Sciences, Brock University, St. Catharines, Ontario, Canada
| | - Milena Cavic
- Department of Experimental Oncology, Institute of Oncology and Radiology of Serbia, Belgrade, Serbia
| | - Claudia Henschke
- Department of Radiology, Icahn School of Medicine, Mount Sinai Hospital, New York, New York
| | - Long Jiang
- Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Ella Kazerooni
- Division of Cardiothoracic Radiology and Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan; Division of Pulmonary and Critical Care Medicine, University of Michigan Medical School, Ann Arbor, Michigan
| | - Choon-Taek Lee
- Division of Pulmonology and Critical Care Medicine, Department of Internal Medicine, Seoul National University College of Medicine, Seoul, South Korea; Department of Internal Medicine and Respiratory Center, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Luigi Ventura
- Thoracic Surgery, Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Dawei Yang
- Department of Pulmonary Medicine and Critical Care, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Stephen Lam
- Department of Integrative Oncology, British Columbia Cancer Research Institute, Vancouver, British Columbia, Canada; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rudolf M Huber
- Division of Respiratory Medicine and Thoracic Oncology, Department of Internal Medicine V Thoracic Oncology Centre Munich University of Munich-Campus Innenstadt Munich, Germany, member of the German Center for Lung Research (DZL - CPC-M)
| | | | | |
Collapse
|
5
|
Pal Choudhury P, Wilcox AN, Brook MN, Zhang Y, Ahearn T, Orr N, Coulson P, Schoemaker MJ, Jones ME, Gail MH, Swerdlow AJ, Chatterjee N, Garcia-Closas M. Comparative Validation of Breast Cancer Risk Prediction Models and Projections for Future Risk Stratification. J Natl Cancer Inst 2020; 112:278-285. [PMID: 31165158 DOI: 10.1093/jnci/djz113] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 01/31/2019] [Accepted: 05/29/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND External validation of risk models is critical for risk-stratified breast cancer prevention. We used the Individualized Coherent Absolute Risk Estimation (iCARE) as a flexible tool for risk model development and comparative model validation and to make projections for population risk stratification. METHODS Performance of two recently developed models, one based on the Breast and Prostate Cancer Cohort Consortium analysis (iCARE-BPC3) and another based on a literature review (iCARE-Lit), were compared with two established models (Breast Cancer Risk Assessment Tool and International Breast Cancer Intervention Study Model) based on classical risk factors in a UK-based cohort of 64 874 white non-Hispanic women (863 patients) age 35-74 years. Risk projections in a target population of US white non-Hispanic women age 50-70 years assessed potential improvements in risk stratification by adding mammographic breast density (MD) and polygenic risk score (PRS). RESULTS The best calibrated models were iCARE-Lit (expected to observed number of cases [E/O] = 0.98, 95% confidence interval [CI] = 0.87 to 1.11) for women younger than 50 years, and iCARE-BPC3 (E/O = 1.00, 95% CI = 0.93 to 1.09) for women 50 years or older. Risk projections using iCARE-BPC3 indicated classical risk factors can identify approximately 500 000 women at moderate to high risk (>3% 5-year risk) in the target population. Addition of MD and a 313-variant PRS is expected to increase this number to approximately 3.5 million women, and among them, approximately 153 000 are expected to develop invasive breast cancer within 5 years. CONCLUSIONS iCARE models based on classical risk factors perform similarly to or better than BCRAT or IBIS in white non-Hispanic women. Addition of MD and PRS can lead to substantial improvements in risk stratification. However, these integrated models require independent prospective validation before broad clinical applications.
Collapse
Affiliation(s)
| | - Amber N Wilcox
- Johns Hopkins University, Baltimore, MD.,Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda
| | | | - Yan Zhang
- Department of Biostatistics, Bloomberg School of Public Health
| | - Thomas Ahearn
- Johns Hopkins University, Baltimore, MD.,Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda
| | - Nick Orr
- Department of Biostatistics, Bloomberg School of Public Health.,Department of Oncology, School of Medicine.,Division of Breast Cancer Research, The Institute of Cancer Research, London, UK.,Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, UK
| | | | | | | | - Mitchell H Gail
- Johns Hopkins University, Baltimore, MD.,Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda
| | - Anthony J Swerdlow
- Division of Genetics and Epidemiology.,Division of Breast Cancer Research, The Institute of Cancer Research, London, UK
| | | | - Montserrat Garcia-Closas
- Johns Hopkins University, Baltimore, MD.,Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda
| |
Collapse
|
6
|
Boonstra PS, Barbaro RP. Incorporating historical models with adaptive Bayesian updates. Biostatistics 2020; 21:e47-e64. [PMID: 30247557 PMCID: PMC7868052 DOI: 10.1093/biostatistics/kxy053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/15/2018] [Indexed: 01/17/2023] Open
Abstract
This article considers Bayesian approaches for incorporating information from a historical model into a current analysis when the historical model includes only a subset of covariates currently of interest. The statistical challenge is 2-fold. First, the parameters in the nested historical model are not generally equal to their counterparts in the larger current model, neither in value nor interpretation. Second, because the historical information will not be equally informative for all parameters in the current analysis, additional regularization may be required beyond that provided by the historical information. We propose several novel extensions of the so-called power prior that adaptively combine a prior based upon the historical information with a variance-reducing prior that shrinks parameter values toward zero. The ideas are directly motivated by our work building mortality risk prediction models for pediatric patients receiving extracorporeal membrane oxygenation (ECMO). We have developed a model on a registry-based cohort of ECMO patients and now seek to expand this model with additional biometric measurements, not available in the registry, collected on a small auxiliary cohort. Our adaptive priors are able to use the information in the original model and identify novel mortality risk factors. We support this with a simulation study, which demonstrates the potential for efficiency gains in estimation under a variety of scenarios.
Collapse
Affiliation(s)
- Philip S Boonstra
- Department of Biostatistics, University of Michigan, 1415 Washington Hts, SPHII, Ann Arbor, MI, USA
| | - Ryan P Barbaro
- Division of Pediatric Critical Care and Child Health Evaluation and Research Unit, University of Michigan, 1500 East Medical Center Drive, Mott, Ann Arbor, MI, USA
| |
Collapse
|
7
|
Gu T, Taylor JMG, Cheng W, Mukherjee B. Synthetic data method to incorporate external information into a current study. CAN J STAT 2019; 47:580-603. [PMID: 32773922 DOI: 10.1002/cjs.11513] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X. A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y|X,B that uses both the available individual level data and some summary information obtained from the known model for Y|X. We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n+m to estimate the parameters of the Y|X, B model. This combined dataset of size n+m now has missing values of B form of the observations, and is analyzed using methods that can handle missing data (e.g. multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variance of the parameter estimates in the Y|X, B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios.
Collapse
Affiliation(s)
- Tian Gu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48105, U.S.A
| | - Jeremy M G Taylor
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48105, U.S.A
| | - Wenting Cheng
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48105, U.S.A
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48105, U.S.A
| |
Collapse
|
8
|
Garcia-Closas M, Chatterjee N. Assessment of breast cancer risk: which tools to use? Lancet Oncol 2019; 20:463-464. [PMID: 30799258 PMCID: PMC8211385 DOI: 10.1016/s1470-2045(19)30071-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 02/07/2019] [Indexed: 01/15/2023]
Affiliation(s)
- Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Shady Grove Campus, Rockville, MD 20850.
| | - Nilanjan Chatterjee
- Bloomberg School of Public Health and Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
9
|
Cheng W, Taylor JMG, Gu T, Tomlins SA, Mukherjee B. Informing a Risk Prediction Model for Binary Outcomes with External Coefficient Information. J R Stat Soc Ser C Appl Stat 2018; 68:121-139. [PMID: 31105344 DOI: 10.1111/rssc.12306] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
We consider a situation where there is rich historical data available for the coefficients and their standard errors in an established regression model describing the association between a binary outcome variable Y and a set of predicting factors X, from a large study. We would like to utilize this summary information for improving estimation and prediction in an expanded model of interest, Y| X, B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We develop and evaluate several approaches for translating the external information into constraints on regression coefficients in a logistic regression model of Y| X, B. Borrowing from the measurement error literature we establish an approximate relationship between the regression coefficients in the models Pr(Y = 1| X , β), Pr(Y = 1| X, B, γ) and E(B| X, θ ) for a Gaussian distribution of B. For binary B we propose an alternate expression. The simulation results comparing these methods indicate that historical information on Pr(Y = 1| X , β) can improve the efficiency of estimation and enhance the predictive power in the regression model of interest Pr(Y = 1| X, B, γ). We illustrate our methodology by enhancing the High-grade Prostate Cancer Prevention Trial Risk Calculator, with two new biomarkers prostate cancer antigen 3 and TMPRSS2:ERG.
Collapse
Affiliation(s)
| | | | - Tian Gu
- University of Michigan, Ann Arbor, Michigan, USA
| | | | | |
Collapse
|
10
|
Estes JP, Mukherjee B, Taylor JMG. Empirical Bayes Estimation and Prediction Using Summary-Level Information From External Big Data Sources Adjusting for Violations of Transportability. STATISTICS IN BIOSCIENCES 2018; 10:568-586. [PMID: 31123532 DOI: 10.1007/s12561-018-9217-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large external data sources may be available to augment studies that collect data to address a specific research objective. In this article we consider the problem of building regression models for prediction based on individual-level data from an "internal" study while incorporating summary information from an "external" big data source. We extend the work of Chatterjee et al (2016a) by introducing an adaptive empirical Bayes shrinkage estimator that uses the external summary-level information and the internal data to trade bias with variance for protection against departures in the conditional probability distribution of the outcome given a set of covariates between the two populations. We use simulation studies and a real data application using external summary information from the Prostate Cancer Prevention Trial to assess the performance of the proposed methods in contrast to maximum likelihood estimation and the constrained maximum likelihood (CML) method developed by Chatterjee et al (2016a). Our simulation studies show that the CML method can be biased and inefficient when the assumption of a transportable covariate distribution between the external and internal populations is violated, and our empirical Bayes estimator provides protection against bias and loss of efficiency.
Collapse
|