1
|
Kennedy JC, Henderson DA, Wilson KJ. Multilevel emulation for stochastic computer models with application to large offshore wind farms. J R Stat Soc Ser C Appl Stat 2023. [DOI: 10.1093/jrsssc/qlad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
Abstract
AbstractRenewable energy projects, such as large offshore wind farms, are critical to achieving low-emission targets set by governments. Stochastic computer models allow us to explore future scenarios to aid decision making while considering the most relevant uncertainties. Complex stochastic computer models can be prohibitively slow, and thus an emulator may be constructed and deployed to allow for efficient computation. We present a novel heteroscedastic Gaussian Process emulator that exploits cheap approximations to a stochastic offshore wind farm simulator. We also conduct a probabilistic sensitivity analysis to understand the influence of key parameters in the wind farm model, which will help us to plan a probability elicitation in the future.
Collapse
|
2
|
Baker E, Barbillon P, Fadikar A, Gramacy RB, Herbei R, Higdon D, Huang J, Johnson LR, Ma P, Mondal A, Pires B, Sacks J, Sokolov V. Analyzing Stochastic Computer Models: A Review with Opportunities. Stat Sci 2022. [DOI: 10.1214/21-sts822] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Evan Baker
- Evan Baker is Postdoctoral Research Fellow, Living Systems Institute, University of Exeter, Stocker Road, Exeter, EX4 4QD, UK
| | - Pierre Barbillon
- Pierre Barbillon is Associate Professor, Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris, 16 rue Claude Bernard, 75231 Paris Cedex 05, France
| | - Arindam Fadikar
- Arindam Fadikar is Postdoctoral Appointee, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Ave., Lemont, Illinois 60439, USA
| | - Robert B. Gramacy
- Robert B. Gramacy is Professor, Department of Statistics, Virginia Tech, 250 Drillfield Drive Blacksburg, Virginia 24061, USA
| | - Radu Herbei
- Radu Herbei is Professor of Statistics, Department of Statistics, College of Arts and Sciences, The Ohio State University, 1958 Neil Ave., Columbus, Ohio 43210, USA
| | - David Higdon
- David Higdon is Professor, Department of Statistics, Virginia Tech, MC0439, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Jiangeng Huang
- Jiangeng Huang is Senior Statistical Scientist, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA
| | - Leah R. Johnson
- Leah R. Johnson is Associate Professor, Department of Statistics, Computational Modeling and Data Analytics (CMDA), Virginia Tech, Hutcheson Hall, RM 409-B, 250 Drillfield Drive, Blacksburg, Virginia 24061, USA
| | - Pulong Ma
- Pulong Ma is Postdoctoral Fellow, Duke University and Statistical and Applied Mathematical Sciences Institute, 19 T.W. Alexander Drive, P.O. Box 110207, Durham, North Carolina 27709, USA
| | - Anirban Mondal
- Anirban Mondal is Assistant Professor, Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, 10900 Euclid Avenue, Yost Hall Room 337, Cleveland, Ohio 44106-7058, USA
| | - Bianica Pires
- Bianica Pires is Lead Modeling & Simulation Engineer, The MITRE Corporation, 7515 Colshire Dr, McLean, Virginia 22102, USA
| | - Jerome Sacks
- Jerome Sacks is Ph.D., NISS, 1460 N. Sandburg Ter, Apt 2902, Chicago, Illinois 60610, USA
| | - Vadim Sokolov
- Vadim Sokolov is Assistant Professor, Systems Engineering and Operations Research, George Mason University, Nguyen Engineering Building MS 4A6, Fairfax, Virginia 22302, USA
| |
Collapse
|
3
|
Fisher HF, Boys RJ, Gillespie CS, Proctor CJ, Golightly A. Parameter inference for a stochastic kinetic model of expanded polyglutamine proteins. Biometrics 2021; 78:1195-1208. [PMID: 33837525 DOI: 10.1111/biom.13467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 03/21/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]
Abstract
The presence of protein aggregates in cells is a known feature of many human age-related diseases, such as Huntington's disease. Simulations using fixed parameter values in a model of the dynamic evolution of expanded polyglutaime (PolyQ) proteins in cells have been used to gain a better understanding of the biological system. However, there is considerable uncertainty about the values of some of the parameters governing the system. Currently, appropriate values are chosen by ad hoc attempts to tune the parameters so that the model output matches experimental data. The problem is further complicated by the fact that the data only offer a partial insight into the underlying biological process: the data consist only of the proportions of cell death and of cells with inclusion bodies at a few time points, corrupted by measurement error. Developing inference procedures to estimate the model parameters in this scenario is a significant task. The model probabilities corresponding to the observed proportions cannot be evaluated exactly, and so they are estimated within the inference algorithm by repeatedly simulating realizations from the model. In general such an approach is computationally very expensive, and we therefore construct Gaussian process emulators for the key quantities and reformulate our algorithm around these fast stochastic approximations. We conclude by highlighting appropriate values of the model parameters leading to new insights into the underlying biological processes.
Collapse
Affiliation(s)
- H F Fisher
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK.,Population Health Sciences Institute, Newcastle University, Newcastle Upon Tyne, UK
| | - R J Boys
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| | - C S Gillespie
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| | - C J Proctor
- Institute of Cellular Medicine, Newcastle University, Newcastle Upon Tyne, UK
| | - A Golightly
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| |
Collapse
|
4
|
Hooten M, Wikle C, Schwob M. Statistical Implementations of Agent-Based Demographic Models. Int Stat Rev 2020; 88:441-461. [PMID: 32834401 PMCID: PMC7436772 DOI: 10.1111/insr.12399] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 06/30/2020] [Accepted: 07/01/2020] [Indexed: 11/28/2022]
Abstract
A variety of demographic statistical models exist for studying population dynamics when individuals can be tracked over time. In cases where data are missing due to imperfect detection of individuals, the associated measurement error can be accommodated under certain study designs (e.g. those that involve multiple surveys or replication). However, the interaction of the measurement error and the underlying dynamic process can complicate the implementation of statistical agent-based models (ABMs) for population demography. In a Bayesian setting, traditional computational algorithms for fitting hierarchical demographic models can be prohibitively cumbersome to construct. Thus, we discuss a variety of approaches for fitting statistical ABMs to data and demonstrate how to use multi-stage recursive Bayesian computing and statistical emulators to fit models in such a way that alleviates the need to have analytical knowledge of the ABM likelihood. Using two examples, a demographic model for survival and a compartment model for COVID-19, we illustrate statistical procedures for implementing ABMs. The approaches we describe are intuitive and accessible for practitioners and can be parallelised easily for additional computational efficiency.
Collapse
Affiliation(s)
- Mevin Hooten
- U.S. Geological Survey, Colorado Cooperative Fish and Wildlife Research Unit, Department of Fish, Wildlife, and Conservation Biology, Department of StatisticsColorado State UniversityFort Collins80523‐1484COUSA
| | - Christopher Wikle
- Department of StatisticsUniversity of MissouriColumbia65211‐6100MOUSA
| | - Michael Schwob
- Department of Mathematical SciencesUniversity of Nevada, Las VegasLas Vegas89154‐9900NVUSA
| |
Collapse
|
5
|
Lawless C, Greaves L, Reeve AK, Turnbull DM, Vincent AE. The rise and rise of mitochondrial DNA mutations. Open Biol 2020; 10:200061. [PMID: 32428418 PMCID: PMC7276526 DOI: 10.1098/rsob.200061] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 04/23/2020] [Indexed: 12/24/2022] Open
Abstract
How mitochondrial DNA mutations clonally expand in an individual cell is a question that has perplexed mitochondrial biologists for decades. A growing body of literature indicates that mitochondrial DNA mutations play a major role in ageing, metabolic diseases, neurodegenerative diseases, neuromuscular disorders and cancers. Importantly, this process of clonal expansion occurs for both inherited and somatic mitochondrial DNA mutations. To complicate matters further there are fundamental differences between mitochondrial DNA point mutations and deletions, and between mitotic and post-mitotic cells, that impact this pathogenic process. These differences, along with the challenges of investigating a longitudinal process occurring over decades in humans, have so far hindered progress towards understanding clonal expansion. Here we summarize our current understanding of the clonal expansion of mitochondrial DNA mutations in different tissues and highlight key unanswered questions. We then discuss the various existing biological models, along with their advantages and disadvantages. Finally, we explore what has been achieved with mathematical modelling so far and suggest future work to advance this important area of research.
Collapse
Affiliation(s)
| | | | | | - Doug M. Turnbull
- Wellcome Centre for Mitochondrial Research, Clinical and Translational Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle NE2 4HH, UK
| | - Amy E. Vincent
- Wellcome Centre for Mitochondrial Research, Clinical and Translational Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle NE2 4HH, UK
| |
Collapse
|
6
|
Seo YA, Lee Y, Park JS. Iterative method for tuning complex simulation code. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1728317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Yun Am Seo
- AI Weather Forecast Research Team, National Institute of Meteorological Science (NIMS), Seogwipo, Korea
| | - Youngsaeng Lee
- Digital Transformation Department, Korea Electric Power Corporation, Seoul, Korea
| | - Jeong-Soo Park
- Department of Statistics, Chonnam National University, Gwangju, Korea
| |
Collapse
|
7
|
Grosskopf M, Bingham D, Adams ML, Hawkins WD, Perez-Nunez D. Generalized Computer Model Calibration for Radiation Transport Simulation. Technometrics 2020. [DOI: 10.1080/00401706.2019.1701557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
| | - Derek Bingham
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| | - Marvin L. Adams
- Department of Nuclear Engineering, Texas A&M University, College Station, TX
| | - W. Daryl Hawkins
- Department of Nuclear Engineering, Texas A&M University, College Station, TX
| | - Delia Perez-Nunez
- Department of Nuclear Engineering, Texas A&M University, College Station, TX
| |
Collapse
|
8
|
Alden K, Cosgrove J, Coles M, Timmis J. Using Emulation to Engineer and Understand Simulations of Biological Systems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:302-315. [PMID: 29994223 DOI: 10.1109/tcbb.2018.2843339] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Modeling and simulation techniques have demonstrated success in studying biological systems. As the drive to better capture biological complexity leads to more sophisticated simulators, it becomes challenging to perform statistical analyses that help translate predictions into increased understanding. These analyses may require repeated executions and extensive sampling of high-dimensional parameter spaces: analyses that may become intractable due to time and resource limitations. Significant reduction in these requirements can be obtained using surrogate models, or emulators, that can rapidly and accurately predict the output of an existing simulator. We apply emulation to evaluate and enrich understanding of a previously published agent-based simulator of lymphoid tissue organogenesis, showing an ensemble of machine learning techniques can reproduce results obtained using a suite of statistical analyses within seconds. This performance improvement permits incorporation of previously intractable analyses, including multi-objective optimization to obtain parameter sets that yield a desired response, and Approximate Bayesian Computation to assess parametric uncertainty. To facilitate exploitation of emulation in simulation-focused studies, we extend our open source statistical package, spartan, to provide a suite of tools for emulator development, validation, and application. Overcoming resource limitations permits enriched evaluation and refinement, easing translation of simulator insights into increased biological understanding.
Collapse
|
9
|
Pope CA, Gosling JP, Barber S, Johnson JS, Yamaguchi T, Feingold G, Blackwell PG. Gaussian Process Modeling of Heterogeneity and Discontinuities Using Voronoi Tessellations. Technometrics 2019. [DOI: 10.1080/00401706.2019.1692696] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
| | | | - Stuart Barber
- School of Mathematics, University of Leeds, Leeds, UK
| | - Jill S. Johnson
- School of Earth and Environment, University of Leeds, Leeds, UK
| | - Takanobu Yamaguchi
- Chemical Sciences Division, Earth System Research Laboratory, National Ocean and Atmospheric Administration, Boulder, CO
| | - Graham Feingold
- Chemical Sciences Division, Earth System Research Laboratory, National Ocean and Atmospheric Administration, Boulder, CO
| | - Paul G. Blackwell
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| |
Collapse
|
10
|
Chen YC, Choe Y. Importance sampling and its optimality for stochastic simulation models. Electron J Stat 2019. [DOI: 10.1214/19-ejs1604] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Wilson KJ, Henderson DA, Quigley J. Emulation of Utility Functions Over a Set of Permutations: Sequencing Reliability Growth Tasks. Technometrics 2018. [DOI: 10.1080/00401706.2017.1377637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Kevin J. Wilson
- School of Mathematics and Statistics, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Daniel A. Henderson
- School of Mathematics and Statistics, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - John Quigley
- Department of Management Science, University of Strathclyde, Glasgow, United Kingdom
| |
Collapse
|
12
|
Boys RJ, Ainsworth HF, Gillespie CS. Bayesian inference for a partially observed birth-death process using data on proportions. AUST NZ J STAT 2018. [DOI: 10.1111/anzs.12230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Richard J. Boys
- School of Mathematics, Statistics & Physics; Newcastle University; Newcastle Upon Tyne NE1 7RU UK
| | - Holly F. Ainsworth
- Institute of Health and Society; Newcastle University; Newcastle Upon Tyne NE1 4AX UK
| | - Colin S. Gillespie
- School of Mathematics, Statistics & Physics; Newcastle University; Newcastle Upon Tyne NE1 7RU UK
| |
Collapse
|
13
|
Drovandi CC, Moores MT, Boys RJ. Accelerating pseudo-marginal MCMC using Gaussian processes. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2017.09.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
McKinley TJ, Vernon I, Andrianakis I, McCreesh N, Oakley JE, Nsubuga RN, Goldstein M, White RG. Approximate Bayesian Computation and Simulation-Based Inference for Complex Stochastic Epidemic Models. Stat Sci 2018. [DOI: 10.1214/17-sts618] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
15
|
Vernon I, Liu J, Goldstein M, Rowe J, Topping J, Lindsey K. Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions. BMC SYSTEMS BIOLOGY 2018; 12:1. [PMID: 29291750 PMCID: PMC5748965 DOI: 10.1186/s12918-017-0484-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 11/09/2017] [Indexed: 11/26/2022]
Abstract
Background Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology. Methods Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty. Results The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model’s structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described. Conclusions Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour and identifies the sets of rate parameters of interest. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0484-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ian Vernon
- Department of Mathematical Sciences, Durham University, South Road, Durham, DH1 3LE, UK.
| | - Junli Liu
- Department of Biosciences, Durham University, South Road, Durham, DH1 3LE, UK.
| | - Michael Goldstein
- Department of Mathematical Sciences, Durham University, South Road, Durham, DH1 3LE, UK
| | - James Rowe
- Department of Biosciences, Durham University, South Road, Durham, DH1 3LE, UK.,Current address: Department of Molecular Biology and Biotechnology, University of Sheffield, Firth Court, Western Bank, Sheffield, S10 2TN, UK
| | - Jen Topping
- Department of Biosciences, Durham University, South Road, Durham, DH1 3LE, UK
| | - Keith Lindsey
- Department of Biosciences, Durham University, South Road, Durham, DH1 3LE, UK
| |
Collapse
|
16
|
Oakley JE, Youngman BD. Calibration of Stochastic Computer Simulators Using Likelihood Emulation. Technometrics 2017. [DOI: 10.1080/00401706.2015.1125391] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Jeremy E. Oakley
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK, S3 7RH
| | - Benjamin D. Youngman
- Department of Mathematics and Computer Science University of Exeter, Exeter, UK, EX4 4QE
| |
Collapse
|
17
|
Ling MH, Wong SY, Tsui KL. Efficient heterogeneous sampling for stochastic simulation with an illustration in health care applications. COMMUN STAT-SIMUL C 2017. [DOI: 10.1080/03610918.2014.977914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- M. H. Ling
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Tai Po, Hong Kong SAR, China
| | - S. Y. Wong
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
- Center for Clinical Epidemiology, Graduate School of Public Health Planning Office, St. Luke's International University, Tokyo, Japan
| | - K. L. Tsui
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| |
Collapse
|
18
|
Andrianakis I, Vernon I, McCreesh N, McKinley TJ, Oakley JE, Nsubuga RN, Goldstein M, White RG. History matching of a complex epidemiological model of human immunodeficiency virus transmission by using variance emulation. J R Stat Soc Ser C Appl Stat 2016; 66:717-740. [PMID: 28781386 PMCID: PMC5516248 DOI: 10.1111/rssc.12198] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Complex stochastic models are commonplace in epidemiology, but their utility depends on their calibration to empirical data. History matching is a (pre)calibration method that has been applied successfully to complex deterministic models. In this work, we adapt history matching to stochastic models, by emulating the variance in the model outputs, and therefore accounting for its dependence on the model's input values. The method proposed is applied to a real complex epidemiological model of human immunodeficiency virus in Uganda with 22 inputs and 18 outputs, and is found to increase the efficiency of history matching, requiring 70% of the time and 43% fewer simulator evaluations compared with a previous variant of the method. The insight gained into the structure of the human immunodeficiency virus model, and the constraints placed on it, are then discussed.
Collapse
Affiliation(s)
| | | | - N McCreesh
- London School of Hygiene and Tropical Medicine UK
| | | | | | - R N Nsubuga
- Medical Research Council Uganda Kampala Uganda
| | | | | |
Collapse
|
19
|
Andrianakis I, Vernon IR, McCreesh N, McKinley TJ, Oakley JE, Nsubuga RN, Goldstein M, White RG. Bayesian history matching of complex infectious disease models using emulation: a tutorial and a case study on HIV in Uganda. PLoS Comput Biol 2015; 11:e1003968. [PMID: 25569850 PMCID: PMC4288726 DOI: 10.1371/journal.pcbi.1003968] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 10/08/2014] [Indexed: 12/03/2022] Open
Abstract
Advances in scientific computing have allowed the development of complex models that are being routinely applied to problems in disease epidemiology, public health and decision making. The utility of these models depends in part on how well they can reproduce empirical data. However, fitting such models to real world data is greatly hindered both by large numbers of input and output parameters, and by long run times, such that many modelling studies lack a formal calibration methodology. We present a novel method that has the potential to improve the calibration of complex infectious disease models (hereafter called simulators). We present this in the form of a tutorial and a case study where we history match a dynamic, event-driven, individual-based stochastic HIV simulator, using extensive demographic, behavioural and epidemiological data available from Uganda. The tutorial describes history matching and emulation. History matching is an iterative procedure that reduces the simulator's input space by identifying and discarding areas that are unlikely to provide a good match to the empirical data. History matching relies on the computational efficiency of a Bayesian representation of the simulator, known as an emulator. Emulators mimic the simulator's behaviour, but are often several orders of magnitude faster to evaluate. In the case study, we use a 22 input simulator, fitting its 18 outputs simultaneously. After 9 iterations of history matching, a non-implausible region of the simulator input space was identified that was times smaller than the original input space. Simulator evaluations made within this region were found to have a 65% probability of fitting all 18 outputs. History matching and emulation are useful additions to the toolbox of infectious disease modellers. Further research is required to explicitly address the stochastic nature of the simulator as well as to account for correlations between outputs. An increasing number of scientific disciplines, and biology in particular, rely on complex computational models. The utility of these models depends on how well they are fitted to empirical data. Fitting is achieved by searching for suitable values for the models' input parameters, in a process known as calibration. Modern computer models typically have a large number of input and output parameters, and long running times, a consequence of their increasing computational complexity. The above two things hinder the calibration process. In this work, we propose a method that can help the calibration of models with long running times and several inputs and outputs. We apply this method on an individual based, dynamic and stochastic HIV model, using HIV data from Uganda. The final system has a 65% probability of selecting an input parameter set that fits all 18 model outputs.
Collapse
Affiliation(s)
- Ioannis Andrianakis
- Dept. of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom
- * E-mail:
| | - Ian R. Vernon
- Dept. of Mathematical Sciences, Durham University, Durham, United Kingdom
| | - Nicky McCreesh
- School of Medicine, Pharmacy and Health, Durham University, Durham, United Kingdom
| | | | - Jeremy E. Oakley
- School of Mathematics and Statistics, University of Sheffield, Sheffield, United Kingdom
| | - Rebecca N. Nsubuga
- Medical Research Council/Uganda Virus Research Institute, Uganda Research Unit on AIDS, Entebbe, Uganda
| | - Michael Goldstein
- Dept. of Mathematical Sciences, Durham University, Durham, United Kingdom
| | - Richard G. White
- Dept. of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| |
Collapse
|
20
|
Farah M, Birrell P, Conti S, Angelis DD. Bayesian Emulation and Calibration of a Dynamic Epidemic Model for A/H1N1 Influenza. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.934453] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
21
|
Plumlee M, Tuo R. Building Accurate Emulators for Stochastic Simulations via Quantile Kriging. Technometrics 2014. [DOI: 10.1080/00401706.2013.860919] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Gupta A, Rawlings JB. Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology. AIChE J 2014; 60:1253-1268. [PMID: 27429455 DOI: 10.1002/aic.14409] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Stochastic chemical kinetics has become a staple for mechanistically modeling various phenomena in systems biology. These models, even more so than their deterministic counterparts, pose a challenging problem in the estimation of kinetic parameters from experimental data. As a result of the inherent randomness involved in stochastic chemical kinetic models, the estimation methods tend to be statistical in nature. Three classes of estimation methods are implemented and compared in this paper. The first is the exact method, which uses the continuous-time Markov chain representation of stochastic chemical kinetics and is tractable only for a very restricted class of problems. The next class of methods is based on Markov chain Monte Carlo (MCMC) techniques. The third method, termed conditional density importance sampling (CDIS), is a new method introduced in this paper. The use of these methods is demonstrated on two examples taken from systems biology, one of which is a new model of single-cell viral infection. The applicability, strengths and weaknesses of the three classes of estimation methods are discussed. Using simulated data for the two examples, some guidelines are provided on experimental design to obtain more information from a limited number of measurements.
Collapse
Affiliation(s)
- Ankur Gupta
- Dept. of Chemical and Biological Engineering; University of Wisconsin-Madison; 1415 Engineering Drive Madison WI 53705
| | - James B. Rawlings
- Dept. of Chemical and Biological Engineering; University of Wisconsin-Madison; 1415 Engineering Drive Madison WI 53705
| |
Collapse
|
23
|
Parameter estimation in stochastic chemical kinetic models using derivative free optimization and bootstrapping. Comput Chem Eng 2014; 63:152-158. [PMID: 24920866 DOI: 10.1016/j.compchemeng.2014.01.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Recent years have seen increasing popularity of stochastic chemical kinetic models due to their ability to explain and model several critical biological phenomena. Several developments in high resolution fluorescence microscopy have enabled researchers to obtain protein and mRNA data on the single cell level. The availability of these data along with the knowledge that the system is governed by a stochastic chemical kinetic model leads to the problem of parameter estimation. This paper develops a new method of parameter estimation for stochastic chemical kinetic models. There are three components of the new method. First, we propose a new expression for likelihood of the experimental data. Second, we use sample path optimization along with UOBYQA-Fit, a variant of of Powell's unconstrained optimization by quadratic approximation, for optimization. Third, we use a variant of Efron's percentile bootstrapping method to estimate the confidence regions for the parameter estimates. We apply the parameter estimation method in an RNA dynamics model of E. coli. We test the parameter estimates obtained and the confidence regions in this model. The testing of the parameter estimation method demonstrates the efficiency, reliability, and accuracy of the new method.
Collapse
|
24
|
Boukouvalas A, Cornford D, Stehlík M. Optimal design for correlated processes with input-dependent noise. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.09.024] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Jandarov R, Haran M, Bjørnstad O, Grenfell B. Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/rssc.12042] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Murali Haran
- Pennsylvania State University; University Park USA
| | | | | |
Collapse
|
26
|
Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models. BIOMED RESEARCH INTERNATIONAL 2013; 2013:210646. [PMID: 24288668 PMCID: PMC3830866 DOI: 10.1155/2013/210646] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 07/16/2013] [Accepted: 07/20/2013] [Indexed: 11/18/2022]
Abstract
Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.
Collapse
|
27
|
Overstall AM, Woods DC. A strategy for Bayesian inference for computationally expensive models with application to the estimation of stem cell properties. Biometrics 2013; 69:458-68. [PMID: 23421643 DOI: 10.1111/biom.12017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Revised: 10/01/2012] [Accepted: 12/01/2012] [Indexed: 11/28/2022]
Abstract
Bayesian inference is considered for statistical models that depend on the evaluation of a computationally expensive computer code or simulator. For such situations, the number of evaluations of the likelihood function, and hence of the unnormalized posterior probability density function, is determined by the available computational resource and may be extremely limited. We present a new example of such a simulator that describes the properties of human embryonic stem cells using data from optical trapping experiments. This application is used to motivate a novel strategy for Bayesian inference which exploits a Gaussian process approximation of the simulator and allows computationally efficient Markov chain Monte Carlo inference. The advantages of this strategy over previous methodology are that it is less reliant on the determination of tuning parameters and allows the application of model diagnostic procedures that require no additional evaluations of the simulator. We show the advantages of our method on synthetic examples and demonstrate its application on stem cell experiments.
Collapse
Affiliation(s)
- Antony M Overstall
- School of Mathematics and Statistics, University of St Andrews, St Andrews, KY16 9SS, UK.
| | | |
Collapse
|
28
|
Baggaley AW, Boys RJ, Golightly A, Sarson GR, Shukurov A. Inference for population dynamics in the Neolithic period. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas579] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
29
|
|
30
|
Hartig F, Calabrese JM, Reineking B, Wiegand T, Huth A. Statistical inference for stochastic simulation models - theory and application. Ecol Lett 2011; 14:816-27. [DOI: 10.1111/j.1461-0248.2011.01640.x] [Citation(s) in RCA: 269] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Drovandi CC, Pettitt AN, Faddy MJ. Approximate Bayesian computation using indirect inference. J R Stat Soc Ser C Appl Stat 2011. [DOI: 10.1111/j.1467-9876.2010.00747.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
32
|
Wang Y, Christley S, Mjolsness E, Xie X. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC SYSTEMS BIOLOGY 2010; 4:99. [PMID: 20663171 PMCID: PMC2914651 DOI: 10.1186/1752-0509-4-99] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Accepted: 07/21/2010] [Indexed: 11/10/2022]
Abstract
BACKGROUND Stochastic effects can be important for the behavior of processes involving small population numbers, so the study of stochastic models has become an important topic in the burgeoning field of computational systems biology. However analysis techniques for stochastic models have tended to lag behind their deterministic cousins due to the heavier computational demands of the statistical approaches for fitting the models to experimental data. There is a continuing need for more effective and efficient algorithms. In this article we focus on the parameter inference problem for stochastic kinetic models of biochemical reactions given discrete time-course observations of either some or all of the molecular species. RESULTS We propose an algorithm for inference of kinetic rate parameters based upon maximum likelihood using stochastic gradient descent (SGD). We derive a general formula for the gradient of the likelihood function given discrete time-course observations. The formula applies to any explicit functional form of the kinetic rate laws such as mass-action, Michaelis-Menten, etc. Our algorithm estimates the gradient of the likelihood function by reversible jump Markov chain Monte Carlo sampling (RJMCMC), and then gradient descent method is employed to obtain the maximum likelihood estimation of parameter values. Furthermore, we utilize flux balance analysis and show how to automatically construct reversible jump samplers for arbitrary biochemical reaction models. We provide RJMCMC sampling algorithms for both fully observed and partially observed time-course observation data. Our methods are illustrated with two examples: a birth-death model and an auto-regulatory gene network. We find good agreement of the inferred parameters with the actual parameters in both models. CONCLUSIONS The SGD method proposed in the paper presents a general framework of inferring parameters for stochastic kinetic models. The method is computationally efficient and is effective for both partially and fully observed systems. Automatic construction of reversible jump samplers and general formulation of the likelihood gradient function makes our method applicable to a wide range of stochastic models. Furthermore our derivations can be useful for other purposes such as using the gradient information for parametric sensitivity analysis or using the reversible jump samplers for full Bayesian inference. The software implementing the algorithms is publicly available at http://cbcl.ics.uci.edu/sgd.
Collapse
Affiliation(s)
- Yuanfeng Wang
- Department of Physics and Astronomy, University of California, Irvine, 92617, USA
| | | | | | | |
Collapse
|
33
|
Chen Y, Lawless C, Gillespie CS, Wu J, Boys RJ, Wilkinson DJ. CaliBayes and BASIS: integrated tools for the calibration, simulation and storage of biological simulation models. Brief Bioinform 2010; 11:278-89. [DOI: 10.1093/bib/bbp072] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
34
|
Daigle BJ, Srinivasan BS, Flannick JA, Novak AF, Batzoglou S. Current Progress in Static and Dynamic Modeling of Biological Networks. SYSTEMS BIOLOGY FOR SIGNALING NETWORKS 2010. [DOI: 10.1007/978-1-4419-5797-9_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
35
|
Henderson DA, Boys RJ, Wilkinson DJ. Bayesian calibration of a stochastic kinetic computer model using multiple data sources. Biometrics 2009; 66:249-56. [PMID: 19397580 DOI: 10.1111/j.1541-0420.2009.01245.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this article, we describe a Bayesian approach to the calibration of a stochastic computer model of chemical kinetics. As with many applications in the biological sciences, the data available to calibrate the model come from different sources. Furthermore, these data appear to provide somewhat conflicting information about the model parameters. We describe a modeling framework that allows us to synthesize this conflicting information and arrive at a consensus inference. In particular, we show how random effects can be incorporated into the model to account for between-individual heterogeneity that may be the source of the apparent conflict.
Collapse
Affiliation(s)
- D A Henderson
- School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K.
| | | | | |
Collapse
|