1
|
Assessing the dynamics and impact of COVID-19 vaccination on disease spread: A data-driven approach. Infect Dis Model 2024; 9:527-556. [PMID: 38525308 PMCID: PMC10958481 DOI: 10.1016/j.idm.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 02/23/2024] [Accepted: 02/23/2024] [Indexed: 03/26/2024] Open
Abstract
The COVID-19 pandemic has significantly impacted global health, social, and economic situations since its emergence in December 2019. The primary focus of this study is to propose a distinct vaccination policy and assess its impact on controlling COVID-19 transmission in Malaysia using a Bayesian data-driven approach, concentrating on the year 2021. We employ a compartmental Susceptible-Exposed-Infected-Recovered-Vaccinated (SEIRV) model, incorporating a time-varying transmission rate and a data-driven method for its estimation through an Exploratory Data Analysis (EDA) approach. While no vaccine guarantees total immunity against the disease, and vaccine immunity wanes over time, it is critical to include and accurately estimate vaccine efficacy, as well as a constant vaccine immunity decay or wane factor, to better simulate the dynamics of vaccine-induced protection over time. Based on the distribution and effectiveness of vaccines, we integrated a data-driven estimation of vaccine efficacy, calculated at 75% for Malaysia, underscoring the model's realism and relevance to the specific context of the country. The Bayesian inference framework is used to assimilate various data sources and account for underlying uncertainties in model parameters. The model is fitted to real-world data from Malaysia to analyze disease spread trends and evaluate the effectiveness of our proposed vaccination policy. Our findings reveal that this distinct vaccination policy, which emphasizes an accelerated vaccination rate during the initial stages of the program, is highly effective in mitigating the spread of COVID-19 and substantially reducing the pandemic peak and new infections. The study found that vaccinating 57-66% of the population (as opposed to 76% in the real data) with a better vaccination policy such as proposed here is able to significantly reduce the number of new infections and ultimately reduce the costs associated with new infections. The study contributes to the development of a robust and informative representation of COVID-19 transmission and vaccination, offering valuable insights for policymakers on the potential benefits and limitations of different vaccination policies, particularly highlighting the importance of a well-planned and efficient vaccination rollout strategy. While the methodology used in this study is specifically applied to national data from Malaysia, its successful application to local regions within Malaysia, such as Selangor and Johor, indicates its adaptability and potential for broader application. This demonstrates the model's adaptability for policy assessment and improvement across various demographic and epidemiological landscapes, implying its usefulness for similar datasets from various geographical regions.
Collapse
|
2
|
A multi-fidelity approach for reliability-based risk assessment of single-vehicle crashes. ACCIDENT; ANALYSIS AND PREVENTION 2024; 195:107391. [PMID: 38007876 DOI: 10.1016/j.aap.2023.107391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 11/02/2023] [Accepted: 11/16/2023] [Indexed: 11/28/2023]
Abstract
Road vehicles are highly susceptible to single-vehicle crashes (SVCs) under complex road geometry and inclement weather, which can significantly threaten traffic safety and mobility of the whole traffic system. Most existing studies involve various simplifications and approximations to assess the associated SVC risks promptly, and therefore the assessment accuracy is often compromised. A novel multi-fidelity approach is developed for the reliability-based risk assessment of SVCs to balance the simulation accuracy and efficiency. Specifically, a high-fidelity transient dynamic vehicle model is introduced for a robust estimation of the vehicle dynamics under various driving environments, assisted by a low-fidelity simplified physics-based vehicle model to improve the computational efficiency. Based on the simulations of the two models, a new multi-fidelity improved cross entropy-based importance sampling (MFICE) algorithm is proposed for integrating multi-fidelity information and facilitating accurate and efficient reliability analysis. Five demonstrative cases are studied to evaluate the performance of the proposed approach, including the comparison with existing representative approaches. The results show that the proposed innovative multi-fidelity approach can provide a reliability evaluation of SVCs both accurately and efficiently, with obviously superior performance over typical state-of-the-art counterparts. Therefore, the proposed approach bears great potential on developing proactive and near real-time intelligent traffic operation and management strategies against SVCs in both normal and hazardous conditions.
Collapse
|
3
|
Forecasting Pathogen Dynamics with Bayesian Model-Averaging: Application to Xylella fastidiosa. Bull Math Biol 2023; 85:67. [PMID: 37300801 PMCID: PMC10257384 DOI: 10.1007/s11538-023-01169-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 05/15/2023] [Indexed: 06/12/2023]
Abstract
Forecasting invasive-pathogen dynamics is paramount to anticipate eradication and containment strategies. Such predictions can be obtained using a model grounded on partial differential equations (PDE; often exploited to model invasions) and fitted to surveillance data. This framework allows the construction of phenomenological but concise models relying on mechanistic hypotheses and real observations. However, it may lead to models with overly rigid behavior and possible data-model mismatches. Hence, to avoid drawing a forecast grounded on a single PDE-based model that would be prone to errors, we propose to apply Bayesian model averaging (BMA), which allows us to account for both parameter and model uncertainties. Thus, we propose a set of different competing PDE-based models for representing the pathogen dynamics, we use an adaptive multiple importance sampling algorithm (AMIS) to estimate parameters of each competing model from surveillance data in a mechanistic-statistical framework, we evaluate the posterior probabilities of models by comparing different approaches proposed in the literature, and we apply BMA to draw posterior distributions of parameters and a posterior forecast of the pathogen dynamics. This approach is applied to predict the extent of Xylella fastidiosa in South Corsica, France, a phytopathogenic bacterium detected in situ in Europe less than 10 years ago (Italy 2013, France 2015). Separating data into training and validation sets, we show that the BMA forecast outperforms competing forecast approaches.
Collapse
|
4
|
An intuitive framework for Bayesian posterior simulation methods. GLOBAL EPIDEMIOLOGY 2021; 3:100060. [PMID: 37635729 PMCID: PMC10445998 DOI: 10.1016/j.gloepi.2021.100060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 08/10/2021] [Accepted: 08/10/2021] [Indexed: 10/20/2022] Open
Abstract
Purpose Bayesian inference has become popular. It offers several pragmatic approaches to account for uncertainty in inference decision-making. Various estimation methods have been introduced to implement Bayesian methods. Although these algorithms are powerful, they are not always easy to grasp for non-statisticians. This paper aims to provide an intuitive framework of four essential Bayesian computational methods for epidemiologists and other health researchers. We do not cover an extensive mathematical discussion of these approaches, but instead offer a non-quantitative description of these algorithms and provide some illuminating examples. Materials and methods Bayesian computational methods, namely importance sampling, rejection sampling, Markov chain Monte Carlo, and data augmentation are presented. Results and conclusions The substantial amount of research published on Bayesian inference has highlighted its popularity among researchers, while the basic concepts are not always straightforward for interested learners. We show that alternative approaches such as a weighted prior approach, which are intuitively appealing and easy-to-understand, work well in the case of low-dimensional problems and appropriate prior information. Otherwise, MCMC is a trouble-free tool in those cases.
Collapse
|
5
|
Comparing the performance of first-order conditional estimation (FOCE) and different expectation-maximization (EM) methods in NONMEM: real data experience with complex nonlinear parent-metabolite pharmacokinetic model. J Pharmacokinet Pharmacodyn 2021; 48:581-595. [PMID: 33884580 DOI: 10.1007/s10928-021-09753-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/24/2021] [Indexed: 11/25/2022]
Abstract
First-order conditional estimation (FOCE) has been the most frequently used estimation method in NONMEM, a leading program for population pharmacokinetic/pharmacodynamic modeling. However, with growing data complexity, the performance of FOCE is challenged by long run time, convergence problem and model instability. In NONMEM 7, expectation-maximization (EM) estimation methods and FOCE with FAST option (FOCE FAST) were introduced. In this study, we compared the performance of FOCE, FOCE FAST, and two EM methods, namely importance sampling (IMP) and stochastic approximation expectation-maximization (SAEM), utilizing the rich pharmacokinetic data of oxfendazole and its two metabolites obtained from the first-in-human single ascending dose study in healthy adults. All methods yielded similar parameter estimates, but great differences were observed in parameter precision and modeling time. For simpler models (i.e., models of oxfendazole and/or oxfendazole sulfone), FOCE and FOCE FAST were more efficient than EM methods with shorter run time and comparable parameter precision. FOCE FAST was about two times faster than FOCE but it was prone to premature termination. For the most complex model (i.e., model of all three analytes, one of which having high level of data below quantification limit), FOCE failed to reliably assess parameter precision, while parameter precision obtained by IMP and SAEM was similar with SAEM being the faster method. IMP was more sensitive to model misspecification; without pre-systemic metabolism, IMP analysis failed to converge. With parallel computing introduced in NONMEM 7.2, modeling speed increased less than proportionally with the increase in the number of CPUs from 1 to 16.
Collapse
|
6
|
Accurate confidence intervals for risk difference in meta-analysis with rare events. BMC Med Res Methodol 2020; 20:98. [PMID: 32349702 PMCID: PMC7191692 DOI: 10.1186/s12874-020-00954-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 03/17/2020] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Meta-analysis provides a useful statistical tool to effectively estimate treatment effect from multiple studies. When the outcome is binary and it is rare (e.g., safety data in clinical trials), the traditionally used methods may have unsatisfactory performance. METHODS We propose using importance sampling to compute confidence intervals for risk difference in meta-analysis with rare events. The proposed intervals are not exact, but they often have the coverage probabilities close to the nominal level. We compare the proposed accurate intervals with the existing intervals from the fixed- or random-effects models and the interval by Tian et al. (2009). RESULTS We conduct extensive simulation studies to compare them with regards to coverage probability and average length, when data are simulated under the homogeneity or heterogeneity assumption of study effects. CONCLUSIONS The proposed accurate interval based on the random-effects model for sample space ordering generally has satisfactory performance under the heterogeneity assumption, while the traditionally used interval based on the fixed-effects model works well when the studies are homogeneous.
Collapse
|
7
|
Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent. Theor Popul Biol 2020; 133:130-140. [PMID: 32142714 DOI: 10.1016/j.tpb.2020.01.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 01/01/2020] [Accepted: 01/03/2020] [Indexed: 10/24/2022]
Abstract
We address the effect of population structure on key properties of the Ewens sampling formula. We use our previously-introduced inductive method for determining exact allele frequency spectrum (AFS) probabilities under the infinite-allele model of mutation and population structure for samples of arbitrary size. Fundamental to the sampling distribution is the novel-allele probability, the probability that given the pattern of variation in the present sample, the next gene sampled belongs to an as-yet-unobserved allelic class. Unlike the case for panmictic populations, the novel-allele probability depends on the AFS of the present sample. We derive a recursion that directly provides the marginal novel-allele probability across AFSs, obviating the need first to determine the probability of each AFS. Our explorations suggest that the marginal novel-allele probability tends to be greater for initial samples comprising fewer alleles and for sampling configurations in which the next-observed gene derives from a deme different from that of the majority of the present sample. Comparison to the efficient importance sampling proposals developed by De Iorio and Griffiths and colleagues indicates that their approximation for the novel-allele probability generally agrees with the true marginal, although it may tend to overestimate the marginal in cases in which the novel-allele probability is high and migration rates are low.
Collapse
|
8
|
Bayesian model discrimination for partially-observed epidemic models. Math Biosci 2019; 317:108266. [PMID: 31589881 DOI: 10.1016/j.mbs.2019.108266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/22/2019] [Accepted: 09/27/2019] [Indexed: 10/25/2022]
Abstract
An efficient method for Bayesian model selection is presented for a broad class of continuous-time Markov chain models and is subsequently applied to two important problems in epidemiology. The first problem is to identify the shape of the infectious period distribution; the second problem is to determine whether individuals display symptoms before, at the same time, or after they become infectious. In both cases we show that the correct model can be identified, in the majority of cases, from symptom onset data generated from multiple outbreaks in small populations. The method works by evaluating the likelihood using a particle filter that incorporates a novel importance sampling algorithm designed for partially-observed continuous-time Markov chains. This is combined with another importance sampling method to unbiasedly estimate the model evidence. These come with estimates of precision, which allow for stopping criterion to be employed. Our method is general and can be applied to a wide range of model selection problems in biological and epidemiological systems with intractable likelihood functions.
Collapse
|
9
|
Dataset for reservoir impoundment operation coupling parallel dynamic programming with importance sampling and successive approximation. Data Brief 2019; 26:104440. [PMID: 31516958 PMCID: PMC6736769 DOI: 10.1016/j.dib.2019.104440] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 08/10/2019] [Accepted: 08/20/2019] [Indexed: 11/29/2022] Open
Abstract
The dataset contains reservoir characteristic parameters, streamflow series of reservoirs in the upper Yangtze River, the standard operating rules (SORs) and the seasonal top of buffer pools (seasonal TBPs) for these reservoirs, which were provided by the Yangtze River Commission. Moreover, annual hydropower of these reservoirs is tested to evaluate operation performance. These research materials are related to the research article in Advances in Water Resources, entitled 'Optimal impoundment operation for cascade reservoirs coupling parallel dynamic programming with importance sampling and successive approximation' (He et al., 2019). The dataset could be used to derive optimal operating rules to explore the potential benefits of water resources via our proposed algorithm (importance sampling - parallel dynamic programming, IS-PDP) in different runoff scenarios. It can also be further applied for water resources management and other potential users.
Collapse
|
10
|
Data-Driven Method for Efficient Characterization of Rare Event Probabilities in Biochemical Systems. Bull Math Biol 2018; 81:3097-3120. [PMID: 30225593 PMCID: PMC6677716 DOI: 10.1007/s11538-018-0509-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 09/07/2018] [Indexed: 11/24/2022]
Abstract
As mathematical models and computational tools become more sophisticated and powerful to accurately depict system dynamics, numerical methods that were previously considered computationally impractical started being utilized for large-scale simulations. Methods that characterize a rare event in biochemical systems are part of such phenomenon, as many of them are computationally expensive and require high-performance computing. In this paper, we introduce an enhanced version of the doubly weighted stochastic simulation algorithm (dwSSA) (Daigle et al. in J Chem Phys 134:044110, 2011), called dwSSA\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{++}$$\end{document}++, that significantly improves the speed of convergence to the rare event of interest when the conventional multilevel cross-entropy method in dwSSA is either unable to converge or converges very slowly. This achievement is enabled by a novel polynomial leaping method that uses past data to detect slow convergence and attempts to push the system toward the rare event. We demonstrate the performance of dwSSA\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{++}$$\end{document}++ on two systems—a susceptible–infectious–recovered–susceptible disease dynamics model and a yeast polarization model—and compare its computational efficiency to that of dwSSA.
Collapse
|
11
|
Abstract
Frequentist standard errors are a measure of uncertainty of an estimator, and the basis for statistical inferences. Frequestist standard errors can also be derived for Bayes estimators. However, except in special cases, the computation of the standard error of Bayesian estimators requires bootstrapping, which in combination with Markov chain Monte Carlo (MCMC) can be highly time consuming. We discuss an alternative approach for computing frequentist standard errors of Bayesian estimators, including importance sampling. Through several numerical examples we show that our approach can be much more computationally efficient than the standard bootstrap.
Collapse
|
12
|
Abstract
BACKGROUND Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework.
Collapse
|
13
|
Resampling: An improvement of importance sampling in varying population size models. Theor Popul Biol 2016; 114:70-87. [PMID: 27712980 DOI: 10.1016/j.tpb.2016.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 09/01/2016] [Accepted: 09/06/2016] [Indexed: 11/21/2022]
Abstract
Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In this work, we modify a previous sequential importance sampling algorithm to improve the efficiency of the likelihood estimation. Our procedure is still based on features of the model with constant size, but uses a resampling technique with a new resampling probability distribution depending on the pairwise composite likelihood. We tested our algorithm, called sequential importance sampling with resampling (SISR) on simulated data sets under different demographic cases. In most cases, we divided the computational cost by two for the same accuracy of inference, in some cases even by one hundred. This study provides the first assessment of the impact of such resampling techniques on parameter inference using sequential importance sampling, and extends the range of situations where likelihood inferences can be easily performed.
Collapse
|
14
|
Two new methods to fit models for network meta-analysis with random inconsistency effects. BMC Med Res Methodol 2016; 16:87. [PMID: 27465416 PMCID: PMC4964019 DOI: 10.1186/s12874-016-0184-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 07/03/2016] [Indexed: 11/10/2022] Open
Abstract
Background Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. Methods The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. Results We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses “ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. Conclusions The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported. Electronic supplementary material The online version of this article (doi:10.1186/s12874-016-0184-5) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Abstract
We use exponential tilting to obtain versions of asymptotic formulae for Bayesian computation that do not involve conditional maxima of the likelihood function, yielding a more stable computational procedure and significantly reducing computational time. In particular we present an alternative version of the Laplace approximation for a marginal posterior density. Implementation of the asymptotic formulae and a modified signed root based importance sampler are illustrated with an example.
Collapse
|
16
|
Inferring the demographic history from DNA sequences: An importance sampling approach based on non-homogeneous processes. Theor Popul Biol 2016; 111:16-27. [PMID: 27241900 DOI: 10.1016/j.tpb.2016.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Revised: 05/14/2016] [Accepted: 05/19/2016] [Indexed: 11/25/2022]
Abstract
In Ait Kaci Azzou et al. (2015) we introduced an Importance Sampling (IS) approach for estimating the demographic history of a sample of DNA sequences, the skywis plot. More precisely, we proposed a new nonparametric estimate of a population size that changes over time. We showed on simulated data that the skywis plot can work well in typical situations where the effective population size does not undergo very steep changes. In this paper, we introduce an iterative procedure which extends the previous method and gives good estimates under such rapid variations. In the iterative calibrated skywis plot we approximate the effective population size by a piecewise constant function, whose values are re-estimated at each step. These piecewise constant functions are used to generate the waiting times of non homogeneous Poisson processes related to a coalescent process with mutation under a variable population size model. Moreover, the present IS procedure is based on a modified version of the Stephens and Donnelly (2000) proposal distribution. Finally, we apply the iterative calibrated skywis plot method to a simulated data set from a rapidly expanding exponential model, and we show that the method based on this new IS strategy correctly reconstructs the demographic history.
Collapse
|
17
|
Accelerated failure time model under general biased sampling scheme. Biostatistics 2016; 17:576-88. [PMID: 26941240 DOI: 10.1093/biostatistics/kxw008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 01/28/2016] [Indexed: 11/13/2022] Open
Abstract
Right-censored time-to-event data are sometimes observed from a (sub)cohort of patients whose survival times can be subject to outcome-dependent sampling schemes. In this paper, we propose a unified estimation method for semiparametric accelerated failure time models under general biased estimating schemes. The proposed estimator of the regression covariates is developed upon a bias-offsetting weighting scheme and is proved to be consistent and asymptotically normally distributed. Large sample properties for the estimator are also derived. Using rank-based monotone estimating functions for the regression parameters, we find that the estimating equations can be easily solved via convex optimization. The methods are confirmed through simulations and illustrated by application to real datasets on various sampling schemes including length-bias sampling, the case-cohort design and its variants.
Collapse
|
18
|
Coalescent: an open-science framework for importance sampling in coalescent theory. PeerJ 2015; 3:e1203. [PMID: 26312189 PMCID: PMC4548476 DOI: 10.7717/peerj.1203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 07/30/2015] [Indexed: 11/20/2022] Open
Abstract
Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called "just-in-time delegation" available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the "2(8) programs problem" highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes.
Collapse
|
19
|
Likelihood-free simulation-based optimal design with an application to spatial extremes. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT : RESEARCH JOURNAL 2015; 30:481-492. [PMID: 27563280 PMCID: PMC4981187 DOI: 10.1007/s00477-015-1067-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper we employ a novel method to find the optimal design for problems where the likelihood is not available analytically, but simulation from the likelihood is feasible. To approximate the expected utility we make use of approximate Bayesian computation methods. We detail the approach for a model on spatial extremes, where the goal is to find the optimal design for efficiently estimating the parameters determining the dependence structure. The method is applied to determine the optimal design of weather stations for modeling maximum annual summer temperatures.
Collapse
|
20
|
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2015; 2014:722-730. [PMID: 25568835 DOI: 10.1137/1.9781611973440.83] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Collapse
|
21
|
Abstract
For the analysis of longitudinal data with nonignorable and nonmonotone missing responses, a full likelihood method often requires intensive computation, especially when there are many follow-up times. The authors propose and explore a Monte Carlo method, based on importance sampling, for approximating the maximum likelihood estimators. The finite-sample properties of the proposed estimators are studied using simulations. An application of the proposed method is also provided using longitudinal data on peptide intensities obtained from a proteomics experiment of trauma patients.
Collapse
|
22
|
A Unified Approach to Semiparametric Transformation Models under General Biased Sampling Schemes. J Am Stat Assoc 2013; 108:217-227. [PMID: 23667280 DOI: 10.1080/01621459.2012.746073] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We propose a unified estimation method for semiparametric linear transformation models under general biased sampling schemes. The new estimator is obtained from a set of counting process-based unbiased estimating equations, developed through introducing a general weighting scheme that offsets the sampling bias. The usual asymptotic properties, including consistency and asymptotic normality, are established under suitable regularity conditions. A closed-form formula is derived for the limiting variance and the plug-in estimator is shown to be consistent. We demonstrate the unified approach through the special cases of left truncation, length-bias, the case-cohort design and variants thereof. Simulation studies and applications to real data sets are presented.
Collapse
|