Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Journal of Statistical Planning and Inference
The excess of zeros is a not a rare feature in count data. Statisticians advocate the Poisson-type hurdle model (among other techniques) as an interesting approach to handle this data peculiarity. However, the frequency of gross errors and the complexity intrinsic to some considered phenomena may render this classical model unreliable and too limiting. In this paper, we develop a robust version of the Poisson hurdle model by extending the robust procedure for GLM (Cantoni and Ronchetti, 2001) to the truncated Poisson regression model. The performance of the new robust approach is then investigated via a simulation study, a real data application and a sensitivity analysis. The results show the reliability of the new technique in the neighborhood of the truncated Poisson model. This robust modelling approach is therefore a valuable complement to the classical one, providing a tool for reliable statistical conclusions and to take more effective decisions.
Journal of Statistical Distributions and Applications, 2021
Zero-inflated and hurdle models are widely applied to count data possessing excess zeros, where they can simultaneously model the process from how the zeros were generated and potentially help mitigate the effects of overdispersion relative to the assumed count distribution. Which model to use depends on how the zeros are generated: zero-inflated models add an additional probability mass on zero, while hurdle models are two-part models comprised of a degenerate distribution for the zeros and a zero-truncated distribution. Developing confidence intervals for such models is challenging since no closed-form function is available to calculate the mean. In this study, generalized fiducial inference is used to construct confidence intervals for the means of zero-inflated Poisson and Poisson hurdle models. The proposed methods are assessed by an intensive simulation study. An illustrative example demonstrates the inference methods.
Communications Faculty Of Science University of Ankara Series A1Mathematics and Statistics
Count data regression has been widely used in various disciplines, particularly health area. Classical models like Poisson and negative binomial regression may not provide reasonable performance in the presence of excessive zeros and overdispersion problems. Zero-inflated and Hurdle variants of these models can be a remedy for dealing with these problems. As well as zero-inflated and Hurdle models, alternatives based on some biased estimators like ridge and Liu may improve the performance against to multicollinearity problem except excessive zeros and overdispersion. In this study, ten different regression models including classical Poisson and negative binomial regression with their variants based on zero-inflated, Hurdle, ridge and Liu approaches have been compared by using a health data. Some criteria including Akaike information criterion, log-likelihood value, mean squared error and mean absolute error have been used to investigate the performance of models. The results show th...
2019
Generalised Linear Models such as Poisson and Negative Binomial models have been routinely used to model count data. But, these models assumptions are violated when the data exhibits over-dispersion and zero-inflation. Over-dispersion is as a result of excess zeros in the data. For modelling data with such characteristics several extensions of Negative Binomial and Poisson models have been proposed, such as zero-inflated and Hurdles models. Our study focus is on identifying the most statistically fit model(s) which can be adopted in presence of over-dispersion and excess zeros in the count data. We simulate data-sets at varying proportions of zeros and varying proportions of dispersion then fit the data to a Poisson, Negative Binomial, Zero-inflated Poisson, Zero-inflated Negative Binomial, Hurdles Poisson and Negative Binomial Hurdles. Model selection is based on AIC, log-likelihood, Vuong statistics and Box-plots. The results obtained, suggest that Negative Binomial Hurdles performed well in most scenarios compared to other models hence, the most statistically fit model for overdispersed count data with excess zeros.
Entropy, 2021
Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper’s primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson–Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach ba...
Quality & Quantity, 2012
In this paper, we employed SAS PROC NLMIXED (Nonlinear mixed model procedure) to analyze three example data having inflated zeros. Examples used are data having covariates and no covariates. The covariates utilized in this article have binary outcomes to simplify our analysis. Of course the analysis can readily be extended to situations with several covariates having multiple levels. Models fitted include the Poisson (P), the negative binomial (NB), the generalized Poisson (GP), and their zero-inflated variants, namely the ZIP, the ZINB and the ZIGP models respectively. Parameter estimates as well as the appropriate goodness-of-fit statistic (the deviance D) in this case are computed and in some cases, the Pearson's X 2 statistic, that is based on the variance of the relevant model distribution is also computed. Also obtained are the expected frequencies for the models and GOF tests are conducted based on the rule established by Lawal (Appl Stat 29:292-298, 1980). Our results extend previous results on the analysis of the chosen data in this example. Further, results obtained are very consistent with previous analyses on the data sets chosen for this article. We also present an hierarchical figure relating all the models employed in this paper. While we do not pretend that the results obtained are entirely new, however, the analyses give opportunities to researchers in the field the much needed means of implementing these models in SAS without having to resort to S-PLUS, R or Stata.
2016
Marginalised models are in great demand by many researchers in the life sciences, particularly in clinical trials, epidemiology, health-economics, surveys and many others, since they allow generalisation of inference to the entire population under study. For count data, standard procedures such as the Poisson regression and negative binomial model provide population average inference for model parameters. However, occurrence of excess zero counts and lack of independence in empirical data have necessitated their extension to accommodate these phenomena. These extensions, though useful, complicate interpretations of effects. For example, the zero-inflated Poisson model accounts for the presence of excess zeros, but the parameter estimates do not have a direct marginal inferential ability as the base model, the Poisson model. Marginalisations due to the presence of excess zeros are underdeveloped though demand for them is interestingly high. The aim of this paper,therefore, is to deve...
Hacettepe Journal of Mathematics and Statistics
In this paper, we propose a new generalization of the Poisson distribution by using the concept of the weighted distribution; a trigonometric weight with the cosine function is used. We derive some distributional properties of the new distribution, such as the cumulative distribution function, moment generating function, factorial moments, and index of dispersion. Then, the related model is considered for modeling purposes, with estimation of the model parameters performed via several methods. Zero-inflated count regression analysis is introduced by using the new distribution. Finally, we provide two applications of the obtained results on practical data sets.
Clinical Epidemiology and Global Health
Background: Count data represents the number of occurrences of an event within a fixed period of time. In count data modelling, overdispersion is inevitable. Sometimes, this overdispersion may not be just due to the excess zeros but may be due to the presence of two or more mixtures. Hence the main objective is to examine for the presence of mixtures if any, with excess zeros and compare Generalized Poisson model, Mixture models with other count data models using real time and simulated data. Methods: Three real time over-dispersed datasets were used for the comparison of the models. The real time data models were compared using information criteria like AIC and BIC and regression coefficients. Data was also simulated using mixture Poisson with excess zeros. The simulation was repeated for different sample sizes were used to identify the better model. Results: Generalized Poisson showed consistently lower bias and MSE when compared to the other models for varying sample of sizes. AIC and BIC values were almost similar for Generalized Poisson, ZIP and Mixture Poisson model. Similar findings were also obtained from real time data. Conclusion: Generalized Poisson models provides a better fit for overdispersed data due to excess zeros, consistently in real time and simulated with varying sample sizes. Negative Binomial models can be redistricted or reevaluated against Generalized Poisson model.
2011
The Poisson distribution is a popular distribution for modeling count data, yet it is constrained by its equidispersion assumption, making it less than ideal for modeling real data that often exhibit over-dispersion or under-dispersion. The COM-Poisson distribution is a two-parameter generalization of the Poisson distribution that allows for a wide range of over-dispersion and underdispersion. It not only generalizes the Poisson distribution but also contains the Bernoulli and geometric distributions as special cases.
Journal of Physics: Conference Series, 2017
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
International Journal of Statistics and Probability, 2015
Researchers in many fields including biomedical often make statistical inferences involving the analysis of count data that exhibit a substantially large proportion of zeros. Subjects in such research are broadly categorized into low-risk group that produces only zero counts and high-risk group leading to counts that can be modeled by a standard Poisson regression model. The aim of this study is to estimate the model parameters in presence of covariates, some of which may not have significant effects on the magnitude of the counts in presence of a large proportion of zeros. The estimation procedures we propose for the study are the pretest, shrinkage, and penalty when some of the covariates may be subject to certain restrictions. Properties of the pretest and shrinkage estimators are discussed in terms of the asymptotic distributional biases and risks. We show that if the dimension of parameters exceeds two, the risk of the shrinkage estimator is strictly less than that of the maximum likelihood estimator, and the risk of the pretest estimator depends on the validity of the restrictions on parameters. A Monte Carlo simulation study shows that the mean squared errors (MSE) of shrinkage estimator are comparable to the MSE of the penalty estimators and in particular it performs better than the penalty estimators when the dimension of the restricted parameter space is large. For illustrative purposes, the methods are applied to a real life data set
The Annals of Applied Statistics, 2010
Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over-or underdispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway-Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.
Statistical Modelling: An International Journal
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form µ + φµ p , where µ is the mean, φ and p are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya-Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson-Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter φ. Furthermore, the Poisson-Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under, equi, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability
British Journal of Mathematical and Statistical Psychology, 2012
Infrequent count data in psychological research are commonly modelled using zeroinflated Poisson regression. This model can be viewed as a latent mixture of an "alwayszero" component and a Poisson component. Hurdle models are an alternative class of two-component models that are seldom used in psychological research, but clearly separate the zero counts and the non-zero counts by using a left-truncated count model for the latter. In this tutorial we revisit both classes of models, and discuss model comparisons and the interpretation of their parameters. As illustrated with an example from relational psychology, both types of models can easily be fitted using the R-package pscl.
The American Journal of Drug and Alcohol Abuse, 2011
Communications in Statistics - Simulation and Computation, 2018
This study was aimed at examining the performance of count data models under various outliers and zero inflation situations with simulated data. Poisson, Negative Binomial, Zero-inflated Poisson, Zeroinflated Negative Binomial, Poisson Hurdle and Negative Binomial Hurdle models were considered to test how well each of the model fits the selected datasets having outliers and excess zeros. We found that Zero-inflated Negative Binomial and Negative Binomial Hurdle models were found to be more successful than other count data models. Also the results indicated that in some scenarios, the Negative Binomial model outperformed other models in the presence of outliers and/or excess zeros.
Applied Stochastic Models in Business and Industry, 2012
The Poisson distribution is a popular distribution for modeling count data, yet it is constrained by its equi-dispersion assumption, making it less than ideal for modeling real data that often exhibit over-or under-dispersion. The COM-Poisson distribution is a two-parameter generalization of the Poisson distribution that allows for a wide range of over-and under-dispersion. It not only generalizes the Poisson distribution, but also contains the Bernoulli and geometric distributions as special cases. This distribution"s flexibility and special properties has prompted a fast growth of methodological and applied research in various fields. This paper surveys the different COM-Poisson models that have been published thus far, and their applications in areas including marketing, transportation, and biology, among others.
Journal of Statistical Theory and Applications
In the count data set, the frequency of some points may occur more than expected under the standard data analysis models. Indeed, in many situations, the frequencies of zero and of some other points tend to be higher than those of the Poisson. Adapting existing models for analyzing inflated observations has been studied in the literature. A method for modeling the inflated data is the inflated distribution. In this paper, we extend this inflated distribution. Indeed, if inflations occur in three or more of the support point, then the previous models are not suitable. We propose a model based on zero, one, $$\ldots ,$$ … , and k inflated points with probabilities $$w_{0},w_1,\ldots ,$$ w 0 , w 1 , … , and $$w_{k},$$ w k , respectively. By choosing the appropriate values for the weights $$w_{0},\ldots ,w_{k},$$ w 0 , … , w k , various inflated distributions, such as the zero-inflated, zero–one-inflated, and zero–k-inflated distributions, are derived as special cases of the proposed mo...
Journal of Economic Surveys, 1995
This paper deals with statistical methods for modelling individual behavior when the endogenous variable is a nonnegative integer. Examples are the number of children, the number of job changes or the number of shopping trips in a given period. Several approaches-Poisson, robust Poisson, negative binomial (NEGBIN), NEGBIN,, hurdle Poisson, truncated-at-zero Poissonare discussed with a focus on specification, estimation, and testing. An application to labor mobility data illustrates the gain obtained by carefully taking into account the specific structure of the data.
Entropy, 2021
For count data, though a zero-inflated model can work perfectly well with an excess of zeroes and the generalized Poisson model can tackle over- or under-dispersion, most models cannot simultaneously deal with both zero-inflated or zero-deflated data and over- or under-dispersion. Ear diseases are important in healthcare, and falls into this kind of count data. This paper introduces a generalized Poisson Hurdle model that work with count data of both too many/few zeroes and a sample variance not equal to the mean. To estimate parameters, we use the generalized method of moments. In addition, the asymptotic normality and efficiency of these estimators are established. Moreover, this model is applied to ear disease using data gained from the New South Wales Health Research Council in 1990. This model performs better than both the generalized Poisson model and the Hurdle model.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.