1
|
Ma S, Yu K, Tang ML, Pan J, Härdle WK, Tian M. A Bayesian multistage spatio-temporally dependent model for spatial clustering and variable selection. Stat Med 2023; 42:4794-4823. [PMID: 37652405 DOI: 10.1002/sim.9889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 06/30/2023] [Accepted: 08/13/2023] [Indexed: 09/02/2023]
Abstract
In spatio-temporal epidemiological analysis, it is of critical importance to identify the significant covariates and estimate the associated time-varying effects on the health outcome. Due to the heterogeneity of spatio-temporal data, the subsets of important covariates may vary across space and the temporal trends of covariate effects could be locally different. However, many spatial models neglected the potential local variation patterns, leading to inappropriate inference. Thus, this article proposes a flexible Bayesian hierarchical model to simultaneously identify spatial clusters of regression coefficients with common temporal trends, select significant covariates for each spatial group by introducing binary entry parameters and estimate spatio-temporally varying disease risks. A multistage strategy is employed to reduce the confounding bias caused by spatially structured random components. A simulation study demonstrates the outperformance of the proposed method, compared with several alternatives based on different assessment criteria. The methodology is motivated by two important case studies. The first concerns the low birth weight incidence data in 159 counties of Georgia, USA, for the years 2007 to 2018 and investigates the time-varying effects of potential contributing covariates in different cluster regions. The second concerns the circulatory disease risks across 323 local authorities in England over 10 years and explores the underlying spatial clusters and associated important risk factors.
Collapse
Affiliation(s)
- Shaopei Ma
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Keming Yu
- Mathematical Sciences, Brunel University, Uxbridge, London, UK
| | - Man-Lai Tang
- Mathematical Sciences, Brunel University, Uxbridge, London, UK
| | - Jianxin Pan
- Research Center for Mathematics, Beijing Normal University, Zhuhai, China
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, China
| | - Wolfgang Karl Härdle
- School of Business and Economics, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
2
|
Maranzano P, Otto P, Fassò A. Adaptive LASSO estimation for functional hidden dynamic geostatistical models. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT : RESEARCH JOURNAL 2023; 37:1-23. [PMID: 37362848 PMCID: PMC10189237 DOI: 10.1007/s00477-023-02466-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/01/2023] [Indexed: 06/28/2023]
Abstract
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hidden dynamic geostatistical models (f-HDGM). These models employ a classic mixed-effect regression structure with embedded spatiotemporal dynamics to model georeferenced data observed in a functional domain. Thus, the regression coefficients are functions. The algorithm simultaneously selects the relevant spline basis functions and regressors that are used to model the fixed effects. In this way, it automatically shrinks to zero irrelevant parts of the functional coefficients or the entire function for an irrelevant regressor. The algorithm is based on an adaptive LASSO penalty function, with weights obtained by the unpenalised f-HDGM maximum likelihood estimators. The computational burden of maximisation is drastically reduced by a local quadratic approximation of the log-likelihood. A Monte Carlo simulation study provides insight in prediction ability and parameter estimate precision, considering increasing spatiotemporal dependence and cross-correlations among predictors. Further, the algorithm behaviour is investigated when modelling air quality functional data with several weather and land cover covariates. Within this application, we also explore some scalability properties of our algorithm. Both simulations and empirical results show that the prediction ability of the penalised estimates are equivalent to those provided by the maximum likelihood estimates. However, adopting the so-called one-standard-error rule, we obtain estimates closer to the real ones, as well as simpler and more interpretable models. Supplementary Information The online version contains supplementary material available at 10.1007/s00477-023-02466-5.
Collapse
Affiliation(s)
- Paolo Maranzano
- Department of Economics, Management and Statistics (DEMS), University of Milano-Bicocca, Piazza dell’Ateneo Nuovo 1, 20126 Milano, Italy
- Fondazione Eni Enrico Mattei (FEEM), Corso Magenta 63, 20123 Milano, Italy
| | - Philipp Otto
- Insitute of Cartography and Geoinformatics (IKG), Leibniz University of Hannover, Appelstrasse 9a, 30167 Hannover, Lower Saxony Germany
| | - Alessandro Fassò
- Department of Economics, University of Bergamo, Via dei Caniana 2, 24127 Bergamo, Italy
| |
Collapse
|
3
|
Rotejanaprasert C, Lawson AB, Maude RJ. Spatiotemporal reproduction number with Bayesian model selection for evaluation of emerging infectious disease transmissibility: an application to COVID-19 national surveillance data. BMC Med Res Methodol 2023; 23:62. [PMID: 36915077 PMCID: PMC10010957 DOI: 10.1186/s12874-023-01870-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 02/20/2023] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND To control emerging diseases, governments often have to make decisions based on limited evidence. The effective or temporal reproductive number is used to estimate the expected number of new cases caused by an infectious person in a partially susceptible population. While the temporal dynamic is captured in the temporal reproduction number, the dominant approach is currently based on modeling that implicitly treats people within a population as geographically well mixed. METHODS In this study we aimed to develop a generic and robust methodology for estimating spatiotemporal dynamic measures that can be instantaneously computed for each location and time within a Bayesian model selection and averaging framework. A simulation study was conducted to demonstrate robustness of the method. A case study was provided of a real-world application to COVID-19 national surveillance data in Thailand. RESULTS Overall, the proposed method allowed for estimation of different scenarios of reproduction numbers in the simulation study. The model selection chose the true serial interval when included in our study whereas model averaging yielded the weighted outcome which could be less accurate than model selection. In the case study of COVID-19 in Thailand, the best model based on model selection and averaging criteria had a similar trend to real data and was consistent with previously published findings in the country. CONCLUSIONS The method yielded robust estimation in several simulated scenarios of force of transmission with computing flexibility and practical benefits. Thus, this development can be suitable and practically useful for surveillance applications especially for newly emerging diseases. As new outbreak waves continue to develop and the risk changes on both local and global scales, our work can facilitate policymaking for timely disease control.
Collapse
Affiliation(s)
- Chawarat Rotejanaprasert
- Department of Tropical Hygiene, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.
| | - Andrew B Lawson
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Richard J Maude
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Harvard T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- The Open University, Milton Keynes, UK
| |
Collapse
|
4
|
Wang F, Li H, Wang H, Li Y. Spatial correlated incidence modeling with zero inflation. Biom J 2023; 65:e2200090. [PMID: 36732909 DOI: 10.1002/bimj.202200090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 09/19/2022] [Accepted: 09/20/2022] [Indexed: 02/04/2023]
Abstract
Disease mapping models have been popularly used to model disease incidence with spatial correlation. In disease mapping models, zero inflation is an important issue, which often occurs in disease incidence datasets with high proportions of zero disease count. It is originated from limited survey coverage or unadvanced testing equipment, which makes some regions have no observed patients. Then excessive zeros recorded in the disease incidence dataset would mess up the true distributions of disease incidence and lead to inaccurate estimates. To address this issue, a zero-inflated disease mapping model is developed in this work. In this model, a zero-inflated process using Bernoulli indicators is assumed to characterize whether the zero inflation occurs for each region. For regions without zero inflation, a coherent and generative disease mapping model is applied for mapping the spatially correlated disease incidence. Independent spatial random effects are incorporated in both processes to account for the spatial patterns of zero inflation and disease incidence. External covariates are also considered in both processes to better explain the disease count data. To estimate the model, a Markov chain Monte Carlo algorithm is proposed. We evaluate model performance via a variety of simulation experiments. Finally, a Lyme disease dataset of Virginia is analyzed to illustrate the application of the proposed model.
Collapse
Affiliation(s)
- Feifei Wang
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| | - Haofeng Li
- School of Statistics, Renmin University of China, Beijing, China
| | - Han Wang
- Chengdu Center for Disease Prevention and Control, Chengdu, China
| | - Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
5
|
Wah W, Ahern S, Earnest A. A systematic review of Bayesian spatial-temporal models on cancer incidence and mortality. Int J Public Health 2020; 65:673-682. [PMID: 32449006 DOI: 10.1007/s00038-020-01384-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 04/26/2020] [Accepted: 05/02/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVES This study aimed to review the types and applications of fully Bayesian (FB) spatial-temporal models and covariates used to study cancer incidence and mortality. METHODS This systematic review searched articles published within Medline, Embase, Web-of-Science and Google Scholar between 2014 and 2018. RESULTS A total of 38 studies were included in our study. All studies applied Bayesian spatial-temporal models to explore spatial patterns over time, and over half assessed the association with risk factors. Studies used different modelling approaches and prior distributions for spatial, temporal and spatial-temporal interaction effects depending on the nature of data, outcomes and applications. The most common Bayesian spatial-temporal model was a generalized linear mixed model. These models adjusted for covariates at the patient, area or temporal level, and through standardization. CONCLUSIONS Few studies (4) modelled patient-level clinical characteristics (11%), and the applications of an FB approach in the forecasting of spatial-temporally aligned cancer data were limited. This review highlighted the need for Bayesian spatial-temporal models to incorporate patient-level prognostic characteristics through the multi-level framework and forecast future cancer incidence and outcomes for cancer prevention and control strategies.
Collapse
Affiliation(s)
- Win Wah
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
| | - Susannah Ahern
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Arul Earnest
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| |
Collapse
|
6
|
Carroll R, Lawson AB, Zhao S. Temporally dependent accelerated failure time model for capturing the impact of events that alter survival in disease mapping. Biostatistics 2019; 20:666-680. [PMID: 29939209 DOI: 10.1093/biostatistics/kxy023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 03/08/2018] [Accepted: 04/24/2018] [Indexed: 11/15/2022] Open
Abstract
The introduction of spatial and temporal frailty parameters in survival models furnishes a way to represent unmeasured confounding in the outcome of interest. Using a Bayesian accelerated failure time model, we are able to flexibly explore a wide range of spatial and temporal options for structuring frailties as well as examine the benefits of using these different structures in certain settings. A setting of particular interest for this work involved using temporal frailties to capture the impact of events of interest on breast cancer survival. Our results suggest that it is important to include these temporal frailties when there is a true temporal structure to the outcome and including them when a true temporal structure is absent does not sacrifice model fit. Additionally, the frailties are able to correctly recover the truth imposed on simulated data without affecting the fixed effect estimates. In the case study involving Louisiana breast cancer-specific mortality, the temporal frailty played an important role in representing the unmeasured confounding related to improvements in knowledge, education, and disease screenings as well as the impacts of Hurricane Katrina and the passing of the Affordable Care Act. In conclusion, the incorporation of temporal, in addition to spatial, frailties in survival analysis can lead to better fitting models and improved inference by representing both spatially and temporally varying unmeasured risk factors and confounding that could impact survival. Specifically, we successfully estimated changes in survival around the time of events of interest.
Collapse
Affiliation(s)
- Rachel Carroll
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 TW Alexander Dr., Research Triangle Park, NC, USA
| | - Andrew B Lawson
- Department of Public Health Sciences, Medical University of South Carolina, 135 Cannon St., Charleston, SC, USA
| | - Shanshan Zhao
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 TW Alexander Dr., Research Triangle Park, NC, USA
| |
Collapse
|
7
|
Abstract
Dengue fever (DF) is one of the world's most disabling mosquito-borne diseases, with a variety of approaches available to model its spatial and temporal dynamics. This paper aims to identify and compare the different spatial and spatio-temporal Bayesian modelling methods that have been applied to DF and examine influential covariates that have been reportedly associated with the risk of DF. A systematic search was performed in December 2017, using Web of Science, Scopus, ScienceDirect, PubMed, ProQuest and Medline (via Ebscohost) electronic databases. The search was restricted to refereed journal articles published in English from January 2000 to November 2017. Thirty-one articles met the inclusion criteria. Using a modified quality assessment tool, the median quality score across studies was 14/16. The most popular Bayesian statistical approach to dengue modelling was a generalised linear mixed model with spatial random effects described by a conditional autoregressive prior. A limited number of studies included spatio-temporal random effects. Temperature and precipitation were shown to often influence the risk of dengue. Developing spatio-temporal random-effect models, considering other priors, using a dataset that covers an extended time period, and investigating other covariates would help to better understand and control DF transmission.
Collapse
|
8
|
Lawson AB, Carroll R, Faes C, Kirby RS, Aregay M, Watjou K. Spatiotemporal multivariate mixture models for Bayesian model selection in disease mapping. ENVIRONMETRICS 2017; 28:e2465. [PMID: 29230091 PMCID: PMC5722237 DOI: 10.1002/env.2465] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
It is often the case that researchers wish to simultaneously explore the behavior of and estimate overall risk for multiple, related diseases with varying rarity while accounting for potential spatial and/or temporal correlation. In this paper, we propose a flexible class of multivariate spatio-temporal mixture models to fill this role. Further, these models offer flexibility with the potential for model selection as well as the ability to accommodate lifestyle, socio-economic, and physical environmental variables with spatial, temporal, or both structures. Here, we explore the capability of this approach via a large scale simulation study and examine a motivating data example involving three cancers in South Carolina. The results which are focused on four model variants suggest that all models possess the ability to recover simulation ground truth and display improved model fit over two baseline Knorr-Held spatio-temporal interaction model variants in a real data application.
Collapse
Affiliation(s)
- AB Lawson
- Department of Public Health Sciences, Medical University of South Carolina
| | - R Carroll
- Department of Public Health Sciences, Medical University of South Carolina
| | - C Faes
- Interuniversity Institute for Statistics and Statistical Bioinformatics, Hasselt University
| | - RS Kirby
- Department of Community and Family Health, University of South Florida
| | - M Aregay
- Department of Public Health Sciences, Medical University of South Carolina
| | - K Watjou
- Interuniversity Institute for Statistics and Statistical Bioinformatics, Hasselt University
| |
Collapse
|
9
|
Extensions to Multivariate Space Time Mixture Modeling of Small Area Cancer Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2017; 14:ijerph14050503. [PMID: 28486417 PMCID: PMC5451954 DOI: 10.3390/ijerph14050503] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Revised: 05/03/2017] [Accepted: 05/05/2017] [Indexed: 11/16/2022]
Abstract
Oral cavity and pharynx cancer, even when considered together, is a fairly rare disease. Implementation of multivariate modeling with lung and bronchus cancer, as well as melanoma cancer of the skin, could lead to better inference for oral cavity and pharynx cancer. The multivariate structure of these models is accomplished via the use of shared random effects, as well as other multivariate prior distributions. The results in this paper indicate that care should be taken when executing these types of models, and that multivariate mixture models may not always be the ideal option, depending on the data of interest.
Collapse
|
10
|
Carroll R, Lawson AB, Kirby RS, Faes C, Aregay M, Watjou K. Space-time variation of respiratory cancers in South Carolina: a flexible multivariate mixture modeling approach to risk estimation. Ann Epidemiol 2017; 27:42-51. [PMID: 27653555 PMCID: PMC5272780 DOI: 10.1016/j.annepidem.2016.08.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Revised: 08/17/2016] [Accepted: 08/23/2016] [Indexed: 10/21/2022]
Abstract
PURPOSE Many types of cancer have an underlying spatiotemporal distribution. Spatiotemporal mixture modeling can offer a flexible approach to risk estimation via the inclusion of latent variables. METHODS In this article, we examine the application and benefits of using four different spatiotemporal mixture modeling methods in the modeling of cancer of the lung and bronchus as well as "other" respiratory cancer incidences in the state of South Carolina. RESULTS Of the methods tested, no single method outperforms the other methods; which method is best depends on the cancer under consideration. The lung and bronchus cancer incidence outcome is best described by the univariate modeling formulation, whereas the "other" respiratory cancer incidence outcome is best described by the multivariate modeling formulation. CONCLUSIONS Spatiotemporal multivariate mixture methods can aid in the modeling of cancers with small and sparse incidences when including information from a related, more common type of cancer.
Collapse
Affiliation(s)
- Rachel Carroll
- Department of Public Health, Medical University of South Carolina, Charleston.
| | - Andrew B Lawson
- Department of Public Health, Medical University of South Carolina, Charleston
| | - Russell S Kirby
- Department of Community and Family Health, University of South Florida, Tampa
| | - Christel Faes
- Interuniversity Institute for Statistics and Statistical Bioinformatics, Hasselt University, Agoralaan 1, Diepenbeek, Belgium
| | - Mehreteab Aregay
- Department of Public Health, Medical University of South Carolina, Charleston
| | - Kevin Watjou
- Interuniversity Institute for Statistics and Statistical Bioinformatics, Hasselt University, Agoralaan 1, Diepenbeek, Belgium
| |
Collapse
|