1
|
Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions. ALGORITHMS 2021. [DOI: 10.3390/a15010016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this article we present a class of mixed Poisson regression models with varying dispersion arising from non-conjugate to the Poisson mixing distributions for modelling overdispersed claim counts in non-life insurance. The proposed family of models combined with the adopted modelling framework can provide sufficient flexibility for dealing with different levels of overdispersion. For illustrative purposes, the Poisson-lognormal regression model with regression structures on both its mean and dispersion parameters is employed for modelling claim count data from a motor insurance portfolio. Maximum likelihood estimation is carried out via an expectation-maximization type algorithm, which is developed for the proposed family of models and is demonstrated to perform satisfactorily.
Collapse
|
2
|
Mselmi F. Generalized linear model for subordinated Lévy processes. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Farouk Mselmi
- Euromed University of Fes (UEMF), Fez‐Morocco and Sfax University Sfax Tunisia
| |
Collapse
|
3
|
Paynter A, Willis AD. Tuning parameter selection for a penalized estimator of species richness. J Appl Stat 2020; 48:1053-1070. [PMID: 33967371 PMCID: PMC8098713 DOI: 10.1080/02664763.2020.1754359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Our goal is to estimate the true number of classes in a population, called the species richness. We consider the case where multiple frequency count tables have been collected from a homogeneous population, and investigate a penalized maximum likelihood estimator under a negative binomial model. Because high probabilities of unobserved classes increase the variance of species richness estimates, our method penalizes the probability of a class being unobserved. Tuning the penalization parameter is challenging because the true species richness is never known, and so we propose and validate four novel methods for tuning the penalization parameter. We illustrate and contrast the performance of the proposed methods by estimating the strain-level microbial diversity of Lake Champlain over 3 consecutive years, and global human host-associated species-level microbial richness.
Collapse
Affiliation(s)
- Alex Paynter
- Department of Biostatistics, University of Washington, Health Sciences Building, Box 357232, 1705 NE Pacific St., Seattle, WA 98195
| | - Amy D Willis
- Department of Biostatistics, University of Washington, Health Sciences Building, Box 357232, 1705 NE Pacific St., Seattle, WA 98195
| |
Collapse
|
4
|
Heller GZ, Couturier DL, Heritier SR. Beyond mean modelling: Bias due to misspecification of dispersion in Poisson-inverse Gaussian regression. Biom J 2018; 61:333-342. [PMID: 30003579 DOI: 10.1002/bimj.201700218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 06/16/2018] [Accepted: 06/19/2018] [Indexed: 11/08/2022]
Abstract
In clinical trials one traditionally models the effect of treatment on the mean response. The underlying assumption is that treatment affects the response distribution through a mean location shift on a suitable scale, with other aspects of the distribution (shape/dispersion/variance) remaining the same. This work is motivated by a trial in Parkinson's disease patients in which one of the endpoints is the number of falls during a 10-week period. Inspection of the data reveals that the Poisson-inverse Gaussian (PiG) distribution is appropriate, and that the experimental treatment reduces not only the mean, but also the variability, substantially. The conventional analysis assumes a treatment effect on the mean, either adjusted or unadjusted for covariates, and a constant dispersion parameter. On our data, this analysis yields a non-significant treatment effect. However, if we model a treatment effect on both mean and dispersion parameters, both effects are highly significant. A simulation study shows that if a treatment effect exists on the dispersion and is ignored in the modelling, estimation of the treatment effect on the mean can be severely biased. We show further that if we use an orthogonal parametrization of the PiG distribution, estimates of the mean model are robust to misspecification of the dispersion model. We also discuss inferential aspects that are more difficult than anticipated in this setting. These findings have implications in the planning of statistical analyses for count data in clinical trials.
Collapse
Affiliation(s)
- Gillian Z Heller
- Department of Statistics, Macquarie University, Sydney, Australia
| | | | - Stephane R Heritier
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| |
Collapse
|
5
|
Gómez-Déniz E, Ghitany ME, Gupta RC. Poisson-mixed Inverse Gaussian Regression Model and Its Application. COMMUN STAT-SIMUL C 2016. [DOI: 10.1080/03610918.2014.925924] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
6
|
On zero-truncating and mixing Poisson distributions. ADV APPL PROBAB 2016. [DOI: 10.1017/s000186780000450x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The distributions that result from zero-truncating mixed Poisson (ZTMP) distributions and those obtained from mixing zero-truncated Poisson (MZTP) distributions are characterised based on their probability generating functions. One consequence is that every ZTMP distribution is an MZTP distribution, but not vice versa. These characterisations also indicate that the size-biased version of a Poisson mixture and, under certain regularity conditions, the shifted version of a Poisson mixture are neither ZTMP distributions nor MZTP distributions.
Collapse
|
7
|
Abstract
The distributions that result from zero-truncating mixed Poisson (ZTMP) distributions and those obtained from mixing zero-truncated Poisson (MZTP) distributions are characterised based on their probability generating functions. One consequence is that every ZTMP distribution is an MZTP distribution, but not vice versa. These characterisations also indicate that the size-biased version of a Poisson mixture and, under certain regularity conditions, the shifted version of a Poisson mixture are neither ZTMP distributions nor MZTP distributions.
Collapse
|
8
|
|
9
|
Analysis of discrete data by Conway–Maxwell Poisson distribution. ASTA ADVANCES IN STATISTICAL ANALYSIS 2014. [DOI: 10.1007/s10182-014-0226-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
10
|
Koziol J, Griffin N, Long F, Li Y, Latterich M, Schnitzer J. On protein abundance distributions in complex mixtures. Proteome Sci 2013; 11:5. [PMID: 23360617 PMCID: PMC3599228 DOI: 10.1186/1477-5956-11-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2011] [Accepted: 05/15/2012] [Indexed: 11/20/2022] Open
Abstract
Mass spectrometry, an analytical technique that measures the mass-to-charge ratio of ionized atoms or molecules, dates back more than 100 years, and has both qualitative and quantitative uses for determining chemical and structural information. Quantitative proteomic mass spectrometry on biological samples focuses on identifying the proteins present in the samples, and establishing the relative abundances of those proteins. Such protein inventories create the opportunity to discover novel biomarkers and disease targets. We have previously introduced a normalized, label-free method for quantification of protein abundances under a shotgun proteomics platform (Griffin et al., 2010). The introduction of this method for quantifying and comparing protein levels leads naturally to the issue of modeling protein abundances in individual samples. We here report that protein abundance levels from two recent proteomics experiments conducted by the authors can be adequately represented by Sichel distributions. Mathematically, Sichel distributions are mixtures of Poisson distributions with a rather complex mixing distribution, and have been previously and successfully applied to linguistics and species abundance data. The Sichel model can provide a direct measure of the heterogeneity of protein abundances, and can reveal protein abundance differences that simpler models fail to show.
Collapse
Affiliation(s)
- Ja Koziol
- The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA, 92037, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Affiliation(s)
- J. Bunge
- a Department of Economic and Social Statistics , New York State School of Industrial and Labor Relations, Cornell University , Ithaca , NY , 14853-3901
| | - M. Fitzpatrick
- a Department of Economic and Social Statistics , New York State School of Industrial and Labor Relations, Cornell University , Ithaca , NY , 14853-3901
| |
Collapse
|
12
|
Hu XS, Simila J, Platz SS, Moore SS, Plastow G, Meghen CN. Estimating animal abundance in ground beef batches assayed with molecular markers. PLoS One 2012; 7:e34191. [PMID: 22479559 PMCID: PMC3316629 DOI: 10.1371/journal.pone.0034191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 02/24/2012] [Indexed: 11/18/2022] Open
Abstract
Estimating animal abundance in industrial scale batches of ground meat is important for mapping meat products through the manufacturing process and for effectively tracing the finished product during a food safety recall. The processing of ground beef involves a potentially large number of animals from diverse sources in a single product batch, which produces a high heterogeneity in capture probability. In order to estimate animal abundance through DNA profiling of ground beef constituents, two parameter-based statistical models were developed for incidence data. Simulations were applied to evaluate the maximum likelihood estimate (MLE) of a joint likelihood function from multiple surveys, showing superiority in the presence of high capture heterogeneity with small sample sizes, or comparable estimation in the presence of low capture heterogeneity with a large sample size when compared to other existing models. Our model employs the full information on the pattern of the capture-recapture frequencies from multiple samples. We applied the proposed models to estimate animal abundance in six manufacturing beef batches, genotyped using 30 single nucleotide polymorphism (SNP) markers, from a large scale beef grinding facility. Results show that between 411∼1367 animals were present in six manufacturing beef batches. These estimates are informative as a reference for improving recall processes and tracing finished meat products back to source.
Collapse
Affiliation(s)
- Xin-Sheng Hu
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Canada.
| | | | | | | | | | | |
Collapse
|
13
|
Valero J, Ginebra J, Pérez-Casany M. Extended Truncated Tweedie-Poisson Model. Methodol Comput Appl Probab 2012. [DOI: 10.1007/s11009-012-9277-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Rempala GA, Seweryn M, Ignatowicz L. Model for comparative analysis of antigen receptor repertoires. J Theor Biol 2010; 269:1-15. [PMID: 20955715 DOI: 10.1016/j.jtbi.2010.10.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 09/09/2010] [Accepted: 10/04/2010] [Indexed: 11/30/2022]
Abstract
In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.
Collapse
Affiliation(s)
- Grzegorz A Rempala
- Department of Biostatistics and the Cancer Center, Medical College of Georgia, Augusta, GA 30912, USA.
| | | | | |
Collapse
|
15
|
Abstract
The inverse Gaussian–Poisson mixture model is very useful when modelling highly skewed non-negative integer data in fields as diverse as linguistics, ecology, market research, bibliometry, engineering and insurance. When using this statistical model on the frequency of word or species frequency data, one typically truncates its sample space at zero to accommodate for the ignorance about the number of words or species that are not observed. In this paper, we show that by truncating the sample space of the inverse Gaussian–Poisson model, one is allowed to extend its parameter space and in that way improve its fit when the frequency of one is larger and the right tail is heavier than is allowed by the unextended model. By fitting the extended model to word frequency count data, we find many instances where the maximum likelihood estimates fall in the extension of the parameter space.
Collapse
|
16
|
Abstract
We consider parametric distributions intended to model heterogeneity in population size estimation, especially parametric stochastic abundance models for species richness estimation. We briefly review (conditional) maximum likelihood estimation of the number of species, and summarize the results of fitting 7 candidate models to frequency-count data, from a database of >40000 such instances, mostly arising from microbial ecology. We consider error estimation, goodness-of-fit assessment, data subsetting, and other practical matters. We find that, although the array of candidate models can be improved, finite mixtures of a small number of components (point masses or simple diffuse distributions) represent a promising direction. Finally we consider the connections between parametric models for abundance and incidence data, again noting the usefulness of finite mixture models.
Collapse
Affiliation(s)
- John Bunge
- Departgment of Statistical Science, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|
17
|
Mao CX, Colwell RK. ESTIMATION OF SPECIES RICHNESS: MIXTURE MODELS, THE ROLE OF RARE SPECIES, AND INFERENTIAL CHALLENGES. Ecology 2005. [DOI: 10.1890/04-1078] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
18
|
|
19
|
Chao A, Shen TJ. Nonparametric prediction in species sampling. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2004. [DOI: 10.1198/108571104x3262] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
20
|
|
21
|
Abstract
Consider a stochastic abundance model in which the species arrive in the sample according to independent Poisson processes, where the abundance parameters of the processes follow a gamma distribution. We propose a new estimator of the number of species for this model. The estimator takes the form of the number of duplicated species (i.e., species represented by two or more individuals) divided by an estimated duplication fraction. The duplication fraction is estimated from all frequencies including singleton information. The new estimator is closely related to the sample coverage estimator presented by Chao and Lee (1992, Journal of the American Statistical Association 87, 210-217). We illustrate the procedure using the Malayan butterfly data discussed by Fisher, Corbet, and Williams (1943, Journal of Animal Ecology 12, 42-58) and a 1989 Christmas Bird Count dataset collected in Florida, U.S.A. Simulation studies show that this estimator compares well with maximum likelihood estimators (i.e., empirical Bayes estimators from the Bayesian viewpoint) for which an iterative numerical procedure is needed and may be infeasible.
Collapse
Affiliation(s)
- Anne Chao
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
| | | |
Collapse
|
22
|
Sorman MP, Bruzzi P, Rovaris M, Barkhof F, Comi G, Miller DH, Cutter GR, Filipp M. Modelling new enhancing MRI lesion counts in multiple sclerosis. Mult Scler 2001; 7:298-304. [PMID: 11724445 DOI: 10.1177/135245850100700505] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Magnetic resonance imaging (MRI) has been established as the most relevant paraclinical tool for diagnosing and monitoring multiple sclerosis (MS). In this context, counting the number of new enhancing lesions on monthly MRI scans is widely used as a surrogate marker of MS activity when evaluating the effect of treatments. In this study, we investigated whether parametric models based on mixed Poisson distributions (the Negative Binomial (NB) and the Poisson-Inverse Gaussian (P-IG) distributions) were able to provide adequate fitting of new enhancing lesion counts in MS. We found that the NB model gave good approximations in relapsing-remitting and secondary progressive MS patients not selected for baseline MRI activity, whereas the P-IG distribution modelled better new enhancing lesion counts in relapsing-remitting MS patients selected for baseline activity. This study shows that parametric modelling for MS new enhancing lesion counts is feasible. This approach should provide more targeted tools for the design and the analysis of MRI monitored clinical trials in MS.
Collapse
Affiliation(s)
- M P Sorman
- Unit of Clinical Epidemiology and Trials, National Institute for Cancer Research, Genoa, Italy
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Ong S. A note on the mixed poisson formulation of the poisson-inverse gaussian distribution. COMMUN STAT-SIMUL C 1998. [DOI: 10.1080/03610919808813465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
|
25
|
|
26
|
Whitmore GA, Lee MLT. A Multivariate Survival Distribution Generated by an Inverse Gaussian Mixture of Exponentials. Technometrics 1991. [DOI: 10.1080/00401706.1991.10484768] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
27
|
|
28
|
Stein GZ, Juritz JM. Linear models with an inverse gaussian poisson error distribution. COMMUN STAT-THEOR M 1988. [DOI: 10.1080/03610928808829640] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|