Journal of Statistical Theory and Applications

Volume 18, Issue 4, December 2019, Pages 343 - 350

A Modified Negative Binomial Distribution: Properties, Overdispersion and Underdispersion

Authors
Ghobad Barmalzan*, Hadi Saboori, Sajad Kosari
Department of Statistics, University of Zabol, Sistan and Baluchestan, Iran
*Corresponding author. Email: ghobad.barmalzan@gmail.com
Corresponding Author
Ghobad Barmalzan
Received 6 January 2018, Accepted 27 August 2018, Available Online 28 November 2019.
DOI
10.2991/jsta.d.191105.001How to use a DOI?
Keywords
Modified negative binomial distribution; Overdispersion; Underdispersion; Maximum likelihood estimator; Sufficient statistics; Exponential family; Weighted geometric distribution
Abstract

In this paper, we introduce a new and useful discrete distribution (modified negative binomial distribution) and its statistical and probabilistic properties are discussed. This distribution is a three-parameter extension of the negative binomial distribution that generalizes some well-known discrete distributions (negative binomial and geometric). Various statistical and probabilistic properties were derived such as moments, probability and moment generating functions and maximum likelihood estimation of parameters. Modified negative binomial distribution is appealing from a theoretical point of view since it belongs to the exponential family as well as to the weighted negative binomial distributions family. It is a flexible distribution that can account for overdispersion or underdispersion that is commonly encountered in count data. Finally, a real numerical example is also considered for illustrative purpose.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Usually the binomial and Poisson and negative binomial distributions are used to analyze discrete data. However, it seems wise to consider flexible alternative models to take into account the overdispersion or underdispersion. It is well known that the negative binomial distribution has become increasingly popular as a more flexible alternative to the Poisson distribution especially when it is doubtful whether the strict requirements particularly independence for a Poisson distribution will be satisfied. For various applications of negative binomial distribution, see Johnson et al. [1].

Greenwood and Yule [2] presented the negative binomial distribution as a mixture of Poisson distribution where the mean λ of the Poisson distribution follows a gamma distribution. There are various extensions/modifications of negative binomial distribution in the literature including Engens extended negative binomial distribution [3,4], generalized negative binomial distribution of Jain and Consul [5], a new generalization of the negative binomial distribution of Gupta and Ong [6] and weighted negative binomial distribution; see Johnson et al. [1] for more details and explanations.

For many set of observed data, it is common to have the sample variance to be greater or smaller than the sample mean which are referred to as overdispersion and underdispersion, respectively. Several authors have worked on the case of overdispersion; see, for example, Gelfand and Dalal [7], Hougaard et al. [8] and Kokonendji et al. [9].

In this paper, we show that under some conditions the weighted geometric distributions are modified negative binomial (Mod-NB) distributions that provide a unified approach to handle both overdispersion and underdispersion. Interested readers may refer to Gupta and Kirmani [10] and Pakes et al. [11] for comprehensive discussions on weighted distributions. They were originally introduced by Fisher [12] through the method of ascertainment, which is just a method of adjustment applicable to many situations; see also Rao [13] and Patil [14].

The rest of this paper is organized as follows: In Section 2, we introduce the Mod-NB distribution along with its mathematical properties. Section 3 describes the maximum likelihood method for estimating the parameters. In Section 4, we investigate connections of the geometric weight function to overdispersion and underdispersion. A real numerical example is also considered for illustrative purposes in Section 5. Finally, some concluding remarks are made in Section 6.

2. Mod-NB AND ITS PROPERTIES

Adding parameters to a known distribution is a useful way of constructing flexible families of distributions. In this paper, based on the negative binomial distribution, we introduce one new models, referred to as Mod-NB which include as special cases the well-known such as negative binomial and geometric distribution. We say that the random variable X follows Mod-NB distribution if its probability mass function be as follows:

P(X=x|p,ν)=1x=0x+r1r1νpx(1p)rx+r1r1νpx(1p)r,x=0,1,,
for rR+ (set of known non-negative), p(0,1), and νR. Throughout, we have considered r as a constant. For ν=1, we have the usual negative binomial distribution and for ν=0 and r=1, we have the geometric distribution (see Figures 1 and 2).

Figure 1

Probability mass function. Left figure: r = 10, p = 0.4. Right figure: r = 10, v = 1.5. Down figure: p = 0.8, v = 2

Figure 2

Surface of expectation, variance, skewness and kurtosis of modified negative binomial (Mod-NB) probability mass function with respect to parameter (v and p) while r = 2.

The Mod-NB distribution can be interpreted as a sum of non-independent geometric variables Zi (i=1,...,r) with joint distribution

P(Z1=z1,,Zr=zr)=1z1=0zr=0x+r1r1ν1px(1p)rx+r1r1ν1px(1p)r,(x=i=1rzi).

If we set Z(p,ν)=x=0x+r1r1νpx(1p)r, then Eq. (1) can be rewritten as follows:

P(X=x)=1Z(p,ν)x+r1r1νpx(1p)r=1Z(p,ν)expxlogp+rlog(1p)+νlog(x+r1)!x!+νlog(r1)!.

It follows from (2) that the Mod-NB distribution belongs to the exponential family on N, where T(X)=(X,log((X+r1)!X!)) is the sufficient statistic and (logp,ν)R2 is the corresponding natural parameters (see Barndorff-Nielsen [15]). For a set of n independent and identically distributed variables X=(X1,,Xn), the sufficient statistic based on X is T(X)=i=1nXi,i=1nlog(Xi+r1)!Xi!. Now, we also establish the Mod-NB distribution belongs to the family of weighted geometric distributions defined as follows:

P(X=x|p,ν)=w(x;ν)P(X=x|1p)E1p(w(X;ν)),
where w(;ν) is a non-negative weight function with parameter, P(|1p) is the probability mass function of a geometric distribution with parameter (1p), and E1p() indicates that the expectation is taken with respect to the geometric distribution with parameter 1p. Therefore, if we take the weight function as w(x;ν)=((x+r1)!(r1)!x!)ν, then the Mod-NB distribution in (1) can be expressed as the weighted geometric distribution in (3).

The probability generating function of this weighted geometric variable is given by

ψX(s)=E(sX)=x=0sxw(x;ν)P(X=x;1p)E1p(w(X;ν))=1E1p(w(X;ν))x=0w(x;ν)(1p)(sp)x=1p1spE1sp(w(X;ν))E1p(w(X;ν))=Z(sp,ν)Z(p,ν),0s1.

The moment generating function of X, say MX(s), is immediately derived from ψX(s) as

MX(s)=E(esX)=ψX(es)=Z(pes,ν)Z(p,ν),0s1.

3. MAXIMUM LIKELIHOOD ESTIMATION OF THE PARAMETERS

Let X=(X1,,Xn) be a random sample size of n with observed values x=(x1,,xn) from a Mod-NB distribution with parameters 1p and ν. Let us define

t1(x)=1ni=1nxiandt2(x)=1ni=1nlog(xi+r1)!xi!.

It is of interest to note that t1(x) and t2(x) are, respectively, the sample mean and the log-geometric mean.

The log-likelihood function for the Mod-NB distribution model based on the observed sample x is

l(p,ν)=n(logZ(p,ν)+t1logp+rlog(1p)+νt2+νlog(r1)!).

We can show that

E(X)=1Z(p,ν)x=0xx+r1r1νpx(1p)r=p1x=0x+r1r1νpxx=0x+r1r1νddppx=p1x=0x+r1r1νpxddpx=0x+r1r1νpx=p1((1p)rZ(p,ν))ddp((1p)rZ(p,ν))=pdlog(1p)rZ(p,ν)dp,
and
Elog(X+r1)!X!(r1)!=1Z(p,ν)x=0log(x+r1)!x!(r1)!x+r1r1νpx(1p)r=1x=0x+r1r1νpxx=0log(x+r1)!x!(r1)!(x+r1)!x!(r1)!νpx=1x=0x+r1r1νpxx=0ddν(x+r1)!x!(r1)!νpx=1x=0x+r1r1νpxddνx=0(x+r1)!x!(r1)!νpx=1((1p)rZ(p,ν))ddν((1p)rZ(p,ν))=1Z(p,ν)ddνZ(p,ν)=dlogZ(p,ν)dν,
and then, we have
Elog(X+r1)!X!=dlogZ(p,ν)dν+log((r1)!).

As mentioned earlier since Mod-NB belongs to the exponential distributions family therefore the likelihood equations, based on the observed sample x, may be written as

{pdlog((1p)rZ(p,ν))dp=E(X)=t1(x),dlog(Z(p,ν))dν+log((r1)!)=E(log((X+r1)!X!))=t2(x).

Since these equations cannot be solved analytically, an iterative method such as the Newton-Raphson method can be used (see Gelman et al. [16], pages 272–273). In each iteration, the expectations, variance and covariance log((X+r1)!X!) are computed by plugging the estimates of p and ν obtained from the previous iterations in the expression

E(g(X))=1Z(p,ν)x=0g(x)x+r1r1νpx(1p)r.

Notice that the maximum likelihood estimators of the parameters can also be obtained by direct maximization of the log-likelihood function in (4) by using the SAS (PROC NLMIXED) or MaxBFGS routine of the Ox program (Doornik [17]) or optim routine of the R package (R Development Core Team [18]).

4. OVERDISPERSION AND UNDERDISPERSION IN WEIGHTED GEOMETRIC DISTRIBUTION

In this section, we discuss connection between weighted geometric distribution and its overdispersion and underdispersion. We generalize the Mod-NB distribution to the general weight w(x;ν), when weights are not necessary w(x;ν)=((x+r1)!(r1)!x!)ν. We say that the random variable Xw follows weighted geometric distribution if its probability mass function be as follows:

P(Xw=x|p,ν)=w(x;ν)E(w(X;ν))(1p)pxx=0,1,...

Theorem 4.1.

Let X be a geometric random variable with mean p(1p)>0, and let w(x,ν), xN, be a weight function not depending on the geometric distribution parameter. Then,

Var1p(Xw)=E1p(Xw)+p2(1p)2+p2d2dp2E1p(w(X,ν)).

Proof.

Let θ=logp. Then, the canonical probability mass function of Xw is given by

P(X=x|θ,ν)=w(x;ν)exp{xθ+log(1eθ)logE(w(X;ν))},xN.

For fixed ν, the probability mass function Xw is therefore an element of the natural exponential family on N with cumulant function K(θ,ν)=logEθ(w(X;ν))log(1eθ); see Jϕrgensen [19]. So, the mean E1p(Xw)=dK(θ,ν)dθ of Xw is

E1p(Xw)=eθ1eθ+ddθlogEθ(w(X;ν))=p1p+pddplogE1p(w(X;ν)),(ddθ=d/dpdθ/dp=pddp).

Next, the characteristic variance Var1p(Xw)=d2K(θ,ν)dθ2 of Xw is

Var1p(Xw)=ddθddθK(θ,ν)=ddθeθ1eθ+ddθlogEθ(w(X;ν))=eθ(1eθ)2+ddθlogEθ(w(X;ν))=p(1p)2+pddppddplogE1p(w(X;ν))=p(1p)2+pddplogE1p(w(X;ν))+p2d2dp2logE1p(w(X;ν))=E1p(Xw)+p2(1p)2+p2d2dp2logE1p(w(X;ν)).

Hence, the theorem is proved.

The following corollary is a direct consequence of Theorem 4.1 which enable us to compare weighted geometric distributions in terms of overdispersion and underdispersion.

Corollary 4.1.

Let X be a geometric random variable with mean p(1p)>0, and let w(x,ν), xN be a weight function not depending on the geometric mean parameter. Then,

  • If d2dp2logE1p(w(X;ν))>1(1p)2, then the weighted version Xw of X is overdispersed.

  • If d2dp2logE1p(w(X;ν))<1(1p)2, then the weighted version Xw of X is underdispersed.

Corollary 4.1 will be clearly useful in cases when E1p(w(X;ν)) is available in an explicit form as in the case of the size-biased geometric distribution with w(x;ν)=x, xN.

It is of interest to use a statistical test for detecting the overdispersion or underdispersion in observed count data, (Mizère et al. [20]), and it is therefore useful to have a family of count distributions possessing both overdispersion and underdispersion properties with respect to the parameters. In this case, the parameter estimation would lead to an appropriate model within the family for overdispersed or underdispersed count data.

5. REAL DATA EXAMPLE

To illustrate the usefulness and flexibility of the Mod-NB distribution, we consider a real data set. Jaggia and Thosar (1993) model the number of bids received by 126 U.S. firms that were targets of tender offers during the period from 1978 through 1985 and were actually taken over within 52 weeks of the initial offer. The count variable is the number of bids after the initial bid (NUMBIDS) received by the target firm (See Table 1). Here, we have k=126. We fit the Mod-NB, Negative Binomial (NB), Binomial (B), Beta Negative Binomial (BNB), Wang [21], and COM Poisson Binomial (COMP) Borges et al. [22] distributions by using the maximum likelihood method for estimate parameter of models.

Count 0 1 2 3 4 5 6 7 8 9 10
Frequency 9 63 31 12 6 1 2 1 0 0 1
Relative frequency .071 .500 .246 .095 .048 .008 .016 .008 .000 .000 .008
Table 1

Takeover bids: Actual frequency distribution.

The form of probability mass functions of BNB and COMP that be used in Table 2, respectively, as follows:

P(X=x)=kxB(a+k,nxb)B(a,b),x=0,1,...,k,
P(X=x)=1Z(θ,ν)θx[x!(kx)!]ν,x=0,1,...,k,
where B(,) is the Beta function and Z(θ,ν)=j=0kθj[j!(mj)!]ν.

Number of Takeover Bids Observed Frequency Expected Frequency
Mod-NB NB B BNB COMP
0 9 9.91 23.92 18.67 23.30 23.82
1 63 61.55 38.02 39.28 37.48 36.82
2 31 30.78 31.84 37.19 32.45 32.45
3 12 13.76 18.69 20.86 19.53 19.78
4 6 5.88 8.62 7.68 8.92 8.97
5 1 2.45 3.33 1.94 3.19 3.12
6 2 1.00 1.12 0.34 0.90 0.84
7 1 0.41 0.34 0.04 0.19 0.17
8 0 0.16 0.09 0.00 0.03 0.03
9 0 0.07 0.02 0.00 0.00 0.00
10 1 0.03 0.01 0.00 0.00 0.00

Parameter p^=0.3818 p̂=0.0855 p̂=0.1738 â=6.0821 ν̂=0.7037
Estimates ν^=0.3898 r̂=18.6020 b̂=28.8135 θ̂=0.3058
r^=0.0008

Kernel of the log-likelihood −180.1697 −184.5131 −201.1203 −208.3501 −205.4349 −206.4554

Chi-square 3.0104 29.7659 29.3274 31.0962 33.0807
P-value 0.9812 0.0009 0.0011 0.0005 0.0002

Mod-NB, modified negative binomial; NB, negative binomial; B, binomial; BNB, beta negative binomial; COMP, COM Poisson binomial.

Table 2

The goodness of fit of the Mod-NB, NB, B, BNB and COMP distributions for real data.

Suppose pi=P(X=i) and Oi= observed frequency of (X=i). Then, in Table 1 the kernel of the log-likelihood of any model is given by

i=0kOilog(pi)=i=0kOilognp^in=i=0kOilog(Ei)nlog(n),
where n=i=0k is total frequency, Ei is the expected frequency of X=i and K is the number level of data. Also it follows that the kernel of the log-likelihood for data is i=0kOilog(Oi)nlog(n).

In Table 2, it used the Chi-square statistics for testing the goodness of fit of model on real data. By p-value of this test is shown that the Mod-NB probability mass function, significantly is the best choice for distribution of real data.

6. CONCLUDING REMARKS

We study and discuss here the mathematical properties of the Mod-NB distribution as a extension of the negative binomial distribution. The main advantage of this model is its flexibility to handle overdispersion or underdispersion commonly encountered in count datasets. The Mod-NB distribution is appealing from a theoretical point of view since it belongs to the exponential family as well as to the weighted negative binomial distributions family. Various statistical and probabilistic properties were derived such as moments, probability and moment generating functions and maximum likelihood estimation of parameters. Since the Mod-NB distribution belongs to exponential family, it will also be possible to develop a subjective or objective Bayesian analysis for this model. Work in this direction is currently under progress and we hope to report these findings in a future paper.

AUTHORS' CONTRIBUTIONS

The authors thank the Associate Editor and anonymous reviewers for their useful comments and suggestions on an earlier version of this manuscript which led to this improved one.

Funding Statement

This work has been conducted by University of Zabol, Grant Number: UOZ-GR-9618-14.

REFERENCES

1.N.L. Johnson, S. Kotz, and A.W. Kemp, Univariate Discrete Distributions, second, Wiley, New York, 1992.
9.C.C. Kokonendji, C.G.B. Demétrio, and S. Dossou-Gbété, J. Stat. Oper. Res. Trans., Vol. 28, 2004, pp. 201-214.
13.C.R. Rao, G.P. Patil (editor), Classical and Contagious Discrete Distributions, Pergamon Press and Statistical Publishing Society, Calcutta, 1965, pp. 320-332.
14.G.P. Patil, A.H. El-Shaarawi and W.W. Piegorsch (editors), Encyclopedia of Environmetrics, Wiley, Chichester, Vol. 4, 2002, pp. 2369-2377.
15.O. Barndorff-Nielsen, Information and Exponential Families in Statistical Theory, Wiley, Chichester, 1978.
17.J. Doornik, Ox 5: Object-Oriented Matrix Programming Language, fifth, Timber Lake Consultants, London, 2013.
18.R Development Core Team, Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.
19.B. Jørgensen, The Theory of Dispersion Models, Chapman & Hall, London, 1997.
20.D. Mizére, C.C. Kokonendji, and S. Dossou-Gbété, Revue de statistique appliquée, Vol. 54, 2006, pp. 61-84.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
18 - 4
Pages
343 - 350
Publication Date
2019/11/28
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.191105.001How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Ghobad Barmalzan
AU  - Hadi Saboori
AU  - Sajad Kosari
PY  - 2019
DA  - 2019/11/28
TI  - A Modified Negative Binomial Distribution: Properties, Overdispersion and Underdispersion
JO  - Journal of Statistical Theory and Applications
SP  - 343
EP  - 350
VL  - 18
IS  - 4
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.191105.001
DO  - 10.2991/jsta.d.191105.001
ID  - Barmalzan2019
ER  -