1
|
Philipson P, Huang A. A fast look-up method for Bayesian mean-parameterised Conway-Maxwell-Poisson regression models. STATISTICS AND COMPUTING 2023; 33:81. [PMID: 37220636 PMCID: PMC10193358 DOI: 10.1007/s11222-023-10244-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 04/09/2023] [Indexed: 05/25/2023]
Abstract
Count data that are subject to both under and overdispersion at some hierarchical level cannot be readily accommodated by classic models such as Poisson or negative binomial regression models. The mean-parameterised Conway-Maxwell-Poisson distribution allows for both types of dispersion within the same model, but is doubly intractable with an embedded normalising constant. We propose a look-up method where pre-computing values of the rate parameter dramatically reduces computing times and renders the proposed model a practicable alternative when faced with such bidispersed data. The approach is demonstrated and verified using a simulation study and applied to three datasets: an underdispersed small dataset on takeover bids, a medium dataset on yellow cards issued by referees in the English Premier League prior to and during the Covid-19 pandemic, and a large Test match cricket bowling dataset, the latter two of which each exhibit over and underdispersion at the individual level.
Collapse
Affiliation(s)
- Pete Philipson
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Alan Huang
- School of Mathematics and Physics, University of Queensland, St Lucia, Queensland 4066 Australia
| |
Collapse
|
2
|
da Silva GP, Laureano HA, Petterle RR, Ribeiro Jr PJ, Bonat WH. Multivariate generalized linear mixed models for underdispersed count data. J STAT COMPUT SIM 2023. [DOI: 10.1080/00949655.2023.2184474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Affiliation(s)
- Guilherme Parreira da Silva
- Laboratory of Statistics and Geoinformation, Department of Statistics, Paraná Federal University, Curitiba, Brazil
| | - Henrique Aparecido Laureano
- Laboratory of Statistics and Geoinformation, Department of Statistics, Paraná Federal University, Curitiba, Brazil
| | | | - Paulo Justiniano Ribeiro Jr
- Laboratory of Statistics and Geoinformation, Department of Statistics, Paraná Federal University, Curitiba, Brazil
| | - Wagner Hugo Bonat
- Laboratory of Statistics and Geoinformation, Department of Statistics, Paraná Federal University, Curitiba, Brazil
| |
Collapse
|
3
|
Haslett J, Parnell AC, Hinde J, Andrade Moral R. Modelling Excess Zeros in Count Data: A New Perspective on Modelling Approaches. Int Stat Rev 2021. [DOI: 10.1111/insr.12479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- John Haslett
- School of Computer Science and Statistics Trinity College Dublin Dublin Ireland
| | - Andrew C. Parnell
- Hamilton Institute, Insight Centre for Data Analytics Maynooth University Maynooth Ireland
| | - John Hinde
- School of Mathematics, Statistics and Applied Mathematics NUI Galway Galway Ireland
| | | |
Collapse
|
4
|
A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution. Stat Pap (Berl) 2021. [DOI: 10.1007/s00362-021-01253-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
5
|
Bayesian regression models for ecological count data in PyMC3. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
On the Discretization of Continuous Probability Distributions Using a Probabilistic Rounding Mechanism. MATHEMATICS 2021. [DOI: 10.3390/math9050555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Most existing flexible count distributions allow only approximate inference when used in a regression context. This work proposes a new framework to provide an exact and flexible alternative for modeling and simulating count data with various types of dispersion (equi-, under-, and over-dispersion). The new method, referred to as “balanced discretization”, consists of discretizing continuous probability distributions while preserving expectations. It is easy to generate pseudo random variates from the resulting balanced discrete distribution since it has a simple stochastic representation (probabilistic rounding) in terms of the continuous distribution. For illustrative purposes, we develop the family of balanced discrete gamma distributions that can model equi-, under-, and over-dispersed count data. This family of count distributions is appropriate for building flexible count regression models because the expectation of the distribution has a simple expression in terms of the parameters of the distribution. Using the Jensen–Shannon divergence measure, we show that under the equidispersion restriction, the family of balanced discrete gamma distributions is similar to the Poisson distribution. Based on this, we conjecture that while covering all types of dispersions, a count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model fit when the data are Poisson distributed.
Collapse
|
7
|
Machado FS, Moura AS, Mariano RF, Santos RMD, Garcia PO, Oliveira IRC, Fontes MAL. Small mammals in high fragmented landscape in Cerrado/Atlantic Forest ecotone, Southeastern Brazil. IHERINGIA. SERIE ZOOLOGIA 2021. [DOI: 10.1590/1678-4766e2021022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
ABSTRACT Exploratory human activities have resulted in small fragments inserted into a matrix which is inhospitable to small non-flying mammals. The effects of landscape changes alter the distribution patterns of species. Landscape fragmentation patterns for small mammals are controversial, especially considering small fragments and ecotonal regions. Based on these arguments, we investigated the diversity patterns of small mammals in small fragments in the ecotonal Cerrado/Atlantic Forest region. A total of 24 fragments (<40 ha) were studied using tomahawk, sherman and pitfall traps. We found low species richness (11 species, six marsupials and five rodents), which was not expected because it is an ecotonal region. It was verified that composition and community structure are simplified by the marked presence of generalist species and with the increase of species turnover. The small forest fragments present a microhabitat structure with lianas and streams as main environmental filters of groups with ecological similarities. Our findings suggest that these fragments must be managed in order to conserve the local biodiversity and maintain the needed characteristics to enable the occurrence of different ecological groups.
Collapse
Affiliation(s)
- Felipe S. Machado
- Governo do Estado de Minas Gerais, Brazil; Universidade Federal de Lavras, Brazil
| | | | | | | | | | | | | |
Collapse
|
8
|
Vanegas LH, Rondon LM. A data transformation to deal with constant under/over-dispersion in Poisson and binomial regression models. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1749276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Luis Hernando Vanegas
- Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia, South America
| | - Luz Marina Rondon
- Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia, South America
| |
Collapse
|
9
|
Ribeiro EE, Zeviani WM, Bonat WH, Demetrio CGB, Hinde J. Reparametrization of COM–Poisson regression models with applications in the analysis of experimental data. STAT MODEL 2019. [DOI: 10.1177/1471082x19838651] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The COM–Poisson distribution is a two-parameter generalization of the Poisson distribution that can deal with under-, equi- and overdispersed count data. Unfortunately, its location parameter does not correspond to the expectation, which complicates the parameter interpretation. In this article, we propose a straightforward reparametrization of the COM–Poisson distribution based on an approximation to the expectation. Estimation and inference are done using the likelihood paradigm. Simulation studies show that the maximum likelihood estimators are unbiased and consistent for both regression and dispersion parameters. In addition, the nature of the deviance surfaces suggests that these parameters are also orthogonal for most of the parameter space, which is advantageous for interpretation, inference and computational efficiency. Study of the distribution’s properties, through a consideration of dispersion, zero-inflation and heavy tail indexes, together with the results of data analyses show the flexibility over standard approaches. The computational routines and datasets are available in the supplementary material.
Collapse
Affiliation(s)
- Eduardo E Ribeiro
- Department of Exact Sciences, University of São Paulo ’ ESALQ, Piracicaba, SP, Brazil
| | - Walmes M Zeviani
- Department of Statistics, Paraná Federal University, Curitiba, PR, Brazil
| | - Wagner H Bonat
- Department of Statistics, Paraná Federal University, Curitiba, PR, Brazil
| | - Clarice GB Demetrio
- Department of Exact Sciences, University of São Paulo ’ ESALQ, Piracicaba, SP, Brazil
| | - John Hinde
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
10
|
Petterle RR, Bonat WH, Kokonendji CC, Seganfredo JC, Moraes A, da Silva MG. Double Poisson-Tweedie Regression Models. Int J Biostat 2019; 15:/j/ijb.ahead-of-print/ijb-2018-0119/ijb-2018-0119.xml. [PMID: 30998501 DOI: 10.1515/ijb-2018-0119] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 04/02/2019] [Indexed: 11/15/2022]
Abstract
In this paper, we further extend the recently proposed Poisson-Tweedie regression models to include a linear predictor for the dispersion as well as for the expectation of the count response variable. The family of the considered models is specified using only second-moments assumptions, where the variance of the count response has the form μ+ϕμp $\mu + \phi \mu^p$, where µ is the expectation, ϕ and p are the dispersion and power parameters, respectively. Parameter estimations are carried out using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions. The performance of the fitting algorithm is investigated through simulation studies. The results showed that our estimating function approach provides consistent estimators for both mean and dispersion parameters. The class of models is motivated by a data set concerning CD4 counting in HIV-positive pregnant women assisted in a public hospital in Curitiba, Paraná, Brazil. Specifically, we investigate the effects of a set of covariates in both expectation and dispersion structures. Our results showed that women living out of the capital Curitiba, with viral load equal or larger than 1000 copies and with previous diagnostic of HIV infection, present lower levels of CD4 cell count. Furthermore, we detected that the time to initiate the antiretroviral therapy decreases the data dispersion. The data set and R code are available as supplementary materials.
Collapse
Affiliation(s)
- Ricardo R Petterle
- Sector of Health Sciences, Medical School, Paraná Federal University, Curitiba, Brazil
| | - Wagner H Bonat
- Department of Statistics, Paraná Federal University, Curitiba, Brazil
| | - Célestin C Kokonendji
- Laboratoire de Mathématiques de Besançon, Bourgogne Franche-Comté University, Besançon, France
| | | | - Atamai Moraes
- Departamento de Saúde Comunitária, Paraná Federal University, Curitiba, Brazil
| | - Monica G da Silva
- Departamento de Saúde Comunitária, Paraná Federal University, Curitiba, Brazil
| |
Collapse
|
11
|
Abstract
The Rasch Poisson Counts Model is the oldest Rasch model developed by the Danish mathematician Georg Rasch in 1952. Nevertheless, the model has had limited applications in psychoeducational assessment. With the rise of neurocognitive and psychomotor testing, there is more room for new applications of the model where other item response theory models cannot be applied. In this paper, we give a general introduction to the Rasch Poisson Counts Model and then using data of an attention test walk the reader through how to use the “lme4” package in R to estimate the model and interpret the outputs.
Collapse
Affiliation(s)
- Purya Baghaei
- English Department, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | - Philipp Doebler
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| |
Collapse
|
12
|
Luyts M, Molenberghs G, Verbeke G, Matthijs K, Ribeiro Jr EE, Demétrio CGB, Hinde J. A Weibull-count approach for handling under- and overdispersed longitudinal/clustered data structures. STAT MODEL 2018. [DOI: 10.1177/1471082x18789992] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A Weibull-model-based approach is examined to handle under- and overdispersed discrete data in a hierarchical framework. This methodology was first introduced by Nakagawa and Osaki (1975, IEEE Transactions on Reliability, 24, 300–301), and later examined for under- and overdispersion by Klakattawi et al. (2018, Entropy, 20, 142) in the univariate case. Extensions to hierarchical approaches with under- and overdispersion were left unnoted, even though they can be obtained in a simple manner. This is of particular interest when analysing clustered/longitudinal data structures, where the underlying correlation structure is often more complex compared to cross-sectional studies. In this article, a random-effects extension of the Weibull-count model is proposed and applied to two motivating case studies, originating from the clinical and sociological research fields. A goodness-of-fit evaluation of the model is provided through a comparison of some well-known count models, that is, the negative binomial, Conway–Maxwell–Poisson and double Poisson models. Empirical results show that the proposed extension flexibly fits the data, more specifically, for heavy-tailed, zero-inflated, overdispersed and correlated count data. Discrete left-skewed time-to-event data structures are also flexibly modelled using the approach, with the ability to derive direct interpretations on the median scale, provided the complementary log–log link is used. Finally, a large simulated set of data is created to examine other characteristics such as computational ease and orthogonality properties of the model, with the conclusion that the approach behaves best for highly overdispersed cases.
Collapse
Affiliation(s)
- Martial Luyts
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, KU Leuven and Universiteit Hasselt, Leuven, Belgium
| | - Geert Molenberghs
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, KU Leuven and Universiteit Hasselt, Leuven, Belgium
| | - Geert Verbeke
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, KU Leuven and Universiteit Hasselt, Leuven, Belgium
| | - Koen Matthijs
- Family and Population Studies, KU Leuven, Leuven, Belgium
| | | | | | - John Hinde
- School of Mathematics, Statistics and Applied Mathematics, NUI Galway, Galway, Ireland
| |
Collapse
|
13
|
Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CGB. Extended Poisson–Tweedie: Properties and regression models for count data. STAT MODEL 2017. [DOI: 10.1177/1471082x17715718] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a new class of discrete generalized linear models based on the class of Poisson–Tweedie factorial dispersion models with variance of the form [Formula: see text], where [Formula: see text] is the mean and [Formula: see text] and [Formula: see text] are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for the estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya–Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson–Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter [Formula: see text]. Furthermore, the Poisson–Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under-, equi-, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability of the Poisson–Tweedie distributions to deal with count data through a consideration of dispersion, zero-inflated and heavy tail indices, and illustrate its application with four data analyses. We provide an R implementation and the datasets as supplementary materials.
Collapse
Affiliation(s)
- Wagner H. Bonat
- Laboratory of Statistics and Geoinformation, Department of Statistics, Paraná Federal University, Curitiba, Brazil
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Bent Jørgensen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Célestin C. Kokonendji
- Laboratoire de Mathématiques de Besançon, Bourgogne Franche-Comté University, Besançon, France
| | - John Hinde
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - Clarice G. B. Demétrio
- Departamento de Ciências Exatas, Escola Superior de Agricultura Luiz de Queiroz, São Paulo University, Piracicaba, Brazil
| |
Collapse
|
14
|
Liu S, Baret F, Allard D, Jin X, Andrieu B, Burger P, Hemmerlé M, Comar A. A method to estimate plant density and plant spacing heterogeneity: application to wheat crops. PLANT METHODS 2017; 13:38. [PMID: 28529535 PMCID: PMC5436426 DOI: 10.1186/s13007-017-0187-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 05/02/2017] [Indexed: 05/23/2023]
Abstract
BACKGROUND Plant density and its non-uniformity drive the competition among plants as well as with weeds. They need thus to be estimated with small uncertainties accuracy. An optimal sampling method is proposed to estimate the plant density in wheat crops from plant counting and reach a given precision. RESULTS Three experiments were conducted in 2014 resulting in 14 plots across varied sowing density, cultivars and environmental conditions. The coordinates of the plants along the row were measured over RGB high resolution images taken from the ground level. Results show that the spacing between consecutive plants along the row direction are independent and follow a gamma distribution under the varied conditions experienced. A gamma count model was then derived to define the optimal sample size required to estimate plant density for a given precision. Results suggest that measuring the length of segments containing 90 plants will achieve a precision better than 10%, independently from the plant density. This approach appears more efficient than the usual method based on fixed length segments where the number of plants are counted: the optimal length for a given precision on the density estimation will depend on the actual plant density. The gamma count model parameters may also be used to quantify the heterogeneity of plant spacing along the row by exploiting the variability between replicated samples. Results show that to achieve a 10% precision on the estimates of the 2 parameters of the gamma model, 200 elementary samples corresponding to the spacing between 2 consecutive plants should be measured. CONCLUSIONS This method provides an optimal sampling strategy to estimate the plant density and quantify the plant spacing heterogeneity along the row.
Collapse
Affiliation(s)
- Shouyang Liu
- INRA, UMR-EMMAH, UMT-CAPTE, UAPV, 228 Route de l'aérodrome CS 40509, 84914 Avignon, France
| | - Fred Baret
- INRA, UMR-EMMAH, UMT-CAPTE, UAPV, 228 Route de l'aérodrome CS 40509, 84914 Avignon, France
| | | | - Xiuliang Jin
- INRA, UMR-EMMAH, UMT-CAPTE, UAPV, 228 Route de l'aérodrome CS 40509, 84914 Avignon, France
| | - Bruno Andrieu
- UMR ECOSYS, INRA, AgroParisTech, Université Paris-Saclay, 78850 Thiverval-Grignon, France
| | | | | | | |
Collapse
|
15
|
Abstract
Conway–Maxwell–Poisson (CMP) distributions are flexible generalizations of the Poisson distribution for modelling overdispersed or underdispersed counts. The main hindrance to their wider use in practice seems to be the inability to directly model the mean of counts, making them not compatible with nor comparable to competing count regression models, such as the log-linear Poisson, negative-binomial or generalized Poisson regression models. This note illustrates how CMP distributions can be parametrized via the mean, so that simpler and more easily interpretable mean-models can be used, such as a log-linear model. Other link functions are also available, of course. In addition to establishing attractive theoretical and asymptotic properties of the proposed model, its good finite-sample performance is exhibited through various examples and a simulation study based on real datasets. Moreover, the MATLAB routine to fit the model to data is demonstrated to be up to an order of magnitude faster than the current software to fit standard CMP models, and over two orders of magnitude faster than the recently proposed hyper-Poisson model.
Collapse
Affiliation(s)
- Alan Huang
- School of Mathematics and Physics, University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
16
|
Affiliation(s)
- Kimberly F. Sellers
- Department of Mathematics and Statistics, Georgetown University, Washington DC, USA
- Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC, USA
| | - Darcy S. Morris
- Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC, USA
| |
Collapse
|