Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, RePEc: Research Papers in Economics
…
20 pages
1 file
The excess of zeros is a not a rare feature in count data. Statisticians advocate the Poisson-type hurdle model (among other techniques) as an interesting approach to handle this data peculiarity. However, the frequency of gross errors and the complexity intrinsic to some considered phenomena may render this classical model unreliable and too limiting. In this paper, we develop a robust version of the Poisson hurdle model by extending the robust procedure for GLM (Cantoni and Ronchetti, 2001) to the truncated Poisson regression model. The performance of the new robust approach is then investigated via a simulation study, a real data application and a sensitivity analysis. The results show the reliability of the new technique in the neighborhood of the truncated Poisson model. This robust modelling approach is therefore a valuable complement to the classical one, providing a tool for reliable statistical conclusions and to take more effective decisions.
Journal of Statistical Distributions and Applications, 2021
Zero-inflated and hurdle models are widely applied to count data possessing excess zeros, where they can simultaneously model the process from how the zeros were generated and potentially help mitigate the effects of overdispersion relative to the assumed count distribution. Which model to use depends on how the zeros are generated: zero-inflated models add an additional probability mass on zero, while hurdle models are two-part models comprised of a degenerate distribution for the zeros and a zero-truncated distribution. Developing confidence intervals for such models is challenging since no closed-form function is available to calculate the mean. In this study, generalized fiducial inference is used to construct confidence intervals for the means of zero-inflated Poisson and Poisson hurdle models. The proposed methods are assessed by an intensive simulation study. An illustrative example demonstrates the inference methods.
Communications Faculty Of Science University of Ankara Series A1Mathematics and Statistics
Count data regression has been widely used in various disciplines, particularly health area. Classical models like Poisson and negative binomial regression may not provide reasonable performance in the presence of excessive zeros and overdispersion problems. Zero-inflated and Hurdle variants of these models can be a remedy for dealing with these problems. As well as zero-inflated and Hurdle models, alternatives based on some biased estimators like ridge and Liu may improve the performance against to multicollinearity problem except excessive zeros and overdispersion. In this study, ten different regression models including classical Poisson and negative binomial regression with their variants based on zero-inflated, Hurdle, ridge and Liu approaches have been compared by using a health data. Some criteria including Akaike information criterion, log-likelihood value, mean squared error and mean absolute error have been used to investigate the performance of models. The results show th...
2019
Generalised Linear Models such as Poisson and Negative Binomial models have been routinely used to model count data. But, these models assumptions are violated when the data exhibits over-dispersion and zero-inflation. Over-dispersion is as a result of excess zeros in the data. For modelling data with such characteristics several extensions of Negative Binomial and Poisson models have been proposed, such as zero-inflated and Hurdles models. Our study focus is on identifying the most statistically fit model(s) which can be adopted in presence of over-dispersion and excess zeros in the count data. We simulate data-sets at varying proportions of zeros and varying proportions of dispersion then fit the data to a Poisson, Negative Binomial, Zero-inflated Poisson, Zero-inflated Negative Binomial, Hurdles Poisson and Negative Binomial Hurdles. Model selection is based on AIC, log-likelihood, Vuong statistics and Box-plots. The results obtained, suggest that Negative Binomial Hurdles performed well in most scenarios compared to other models hence, the most statistically fit model for overdispersed count data with excess zeros.
Quality & Quantity, 2012
In this paper, we employed SAS PROC NLMIXED (Nonlinear mixed model procedure) to analyze three example data having inflated zeros. Examples used are data having covariates and no covariates. The covariates utilized in this article have binary outcomes to simplify our analysis. Of course the analysis can readily be extended to situations with several covariates having multiple levels. Models fitted include the Poisson (P), the negative binomial (NB), the generalized Poisson (GP), and their zero-inflated variants, namely the ZIP, the ZINB and the ZIGP models respectively. Parameter estimates as well as the appropriate goodness-of-fit statistic (the deviance D) in this case are computed and in some cases, the Pearson's X 2 statistic, that is based on the variance of the relevant model distribution is also computed. Also obtained are the expected frequencies for the models and GOF tests are conducted based on the rule established by Lawal (Appl Stat 29:292-298, 1980). Our results extend previous results on the analysis of the chosen data in this example. Further, results obtained are very consistent with previous analyses on the data sets chosen for this article. We also present an hierarchical figure relating all the models employed in this paper. While we do not pretend that the results obtained are entirely new, however, the analyses give opportunities to researchers in the field the much needed means of implementing these models in SAS without having to resort to S-PLUS, R or Stata.
Entropy, 2021
Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper’s primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson–Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach ba...
International Journal of Statistics and Probability, 2015
Researchers in many fields including biomedical often make statistical inferences involving the analysis of count data that exhibit a substantially large proportion of zeros. Subjects in such research are broadly categorized into low-risk group that produces only zero counts and high-risk group leading to counts that can be modeled by a standard Poisson regression model. The aim of this study is to estimate the model parameters in presence of covariates, some of which may not have significant effects on the magnitude of the counts in presence of a large proportion of zeros. The estimation procedures we propose for the study are the pretest, shrinkage, and penalty when some of the covariates may be subject to certain restrictions. Properties of the pretest and shrinkage estimators are discussed in terms of the asymptotic distributional biases and risks. We show that if the dimension of parameters exceeds two, the risk of the shrinkage estimator is strictly less than that of the maximum likelihood estimator, and the risk of the pretest estimator depends on the validity of the restrictions on parameters. A Monte Carlo simulation study shows that the mean squared errors (MSE) of shrinkage estimator are comparable to the MSE of the penalty estimators and in particular it performs better than the penalty estimators when the dimension of the restricted parameter space is large. For illustrative purposes, the methods are applied to a real life data set
The Annals of Applied Statistics, 2010
Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over-or underdispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway-Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.
British Journal of Mathematical and Statistical Psychology, 2012
Infrequent count data in psychological research are commonly modelled using zeroinflated Poisson regression. This model can be viewed as a latent mixture of an "alwayszero" component and a Poisson component. Hurdle models are an alternative class of two-component models that are seldom used in psychological research, but clearly separate the zero counts and the non-zero counts by using a left-truncated count model for the latter. In this tutorial we revisit both classes of models, and discuss model comparisons and the interpretation of their parameters. As illustrated with an example from relational psychology, both types of models can easily be fitted using the R-package pscl.
Statistical Modelling: An International Journal
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form µ + φµ p , where µ is the mean, φ and p are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya-Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson-Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter φ. Furthermore, the Poisson-Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under, equi, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability
The American Journal of Drug and Alcohol Abuse, 2011
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Statistical Theory and Applications
Communications in Statistics - Simulation and Computation, 2018
Applied Stochastic Models in Business and Industry, 2012
Journal of Physics: Conference Series, 2017
Hacettepe Journal of Mathematics and Statistics
Journal of Economic Surveys, 1995
Austrian Journal of Statistics, 2018
Clinical Epidemiology and Global Health
Statistical Modelling, 2014
Communications in Statistics - Theory and Methods, 2005
Scientific and Academic Publisher, 2018
… and Computers in …, 2005