Negative binomial distribution

This is a concise discussion of the negative binomial distribution. Links to detailed discussion are given below.

A counting distribution is a random variable that only takes on the non-negative integers 0, 1, 2, … The negative binomial distribution is a counting distribution. In the present discussion, N is a random variable that follows a negative binomial distribution. This means that the probability that N takes on the value k is given by one of the following probability functions.

    (1)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ p^r (1-p)^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k=0,1,2,\cdots

    (2)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ \biggl(\frac{1}{1+\theta} \biggr)^r \biggl(\frac{\theta}{1+\theta} \biggr)^k \ \ \ \ \ \ k=0,1,2,\cdots

    (3)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ \biggl(\frac{\beta}{1+\beta} \biggr)^r \biggl(\frac{1}{1+\beta} \biggr)^k \ \ \ \ \ \ k=0,1,2,\cdots

These three functions are called probability functions. We discuss (1) first. In (1), the numbers r and p are fixed constants. They are called the parameters of the negative binomial distribution. The parameter r can be any positive number r>0. The parameter p can be any number 0<p<1.

A Natural Interpretation

When the parameter r>0 is an integer, (1) has a natural interpretation. Let’s say r is a positive integer. Suppose that a coin has the characteristics that when flipped, the probability of getting a head is p. Let’s say we keep tossing this coin until we get r heads. Then the probability function (1) describes this random phenomenon. The probability P(N=k) in (1) is the probability that the rth head is on the (r+k)th toss. In other words, P(N=k) is the probability that it takes r+k tosses to get r heads.

As an illustration, let p=0.5 (a fair coin). Let’s say we flip the coins until the third head. Here’s several probabilities.

    …….\displaystyle P(N=0)=\binom{2}{0} \ 0.5^3 \cdot 0.5^0=0.5^3=0.125

    …….\displaystyle P(N=1)=\binom{3}{1} \ 0.5^3 \cdot 0.5^1=3 \cdot 0.5^4=0.1875

    …….\displaystyle P(N=2)=\binom{4}{2} \ 0.5^3 \cdot 0.5^2= 6 \cdot 0.5^5=0.1875

    …….\displaystyle P(N=3)=\binom{5}{3} \ 0.5^3 \cdot 0.5^3=10 \cdot 0.5^6=0.15625

A quick note about binomial coefficients. Numbers such as \binom{2}{1} and \binom{3}{2} are called binomial coefficients. In general \binom{m}{n} is defined by the ratio \frac{m!}{n! (m-n)!}. The number such as m! is called factorial, which is the product of m and all the positive integers below m. So m is the number m!=m \cdot (m-1) \cdots 3 \cdot 2 \cdot 1.

The above 4 probabilities tell us that in flipping a fair coins, there are 12.5% chance that it takes 3 tosses (0+3) to get three heads, that there is an 18.75% chance that it takes 4 tosses (1 + 3) to get three heads and so on. The sum of these 4 probabilities is 0.65625. So there is a 65.625% chance that it takes at most 6 tosses to get three heads.

Note that in the coin tossing example, the random variable N counts the number of tails. Since the goal is to get 3 heads, the number of tosses to achieve the goal would be N+3. Thus the probability of flipping the coin 7 times to get r=3 heads would be P(N=4).

The coin tossing example can be generalized by a random experiment such as this: perform a series of independent trials, where each trial has only two distinct outcomes (for convenience one is called success and the other is called failure). The probability of getting a success in each trial is constant across all the trials. Let p be the probability of a success in a trial. Let’s say this experiment stops when r successes are obtained. The probability P(N=k) in (1) is the probability that it will take k failures to obtain r successes. Equivalently, P(N=k) is the probability it will take r+k trials in the experiment to obtain r successes.

When the Parameter r Is Not Integer

When the parameter r is a positive real number but not an integer, the natural setting of tossing a coin until the rth head would not be applicable. However, the negative binomial distribution is still a useful model. It cannot be interpreted as the counting of failures until the rth success. It can be used as a model for the count of some type of random occurrences. For example, the number of insurance losses from an insurance contract in a policy period.

To calculate the probability when r is not an integer, we need to relax the definition of the binomial coefficient. When r is a positive integer, the binomial coefficient \binom{r+k-1}{k} is defined as follows:

…….\displaystyle \binom{r+k-1}{k}=\frac{(r+k-1)!}{k! (r-1)!}

A further simplification of this calculation is informative.

…….\displaystyle \begin{aligned} \binom{r+k-1}{k}&=\frac{(r+k-1)!}{k! \cdot (r-1)!} \\&=\frac{(r+k-1) \cdot (r+k-1) \cdots (r+1) \cdot r \cdot (r-1)!}{k! \cdot (r-1)!} \\&=\frac{(r+k-1) \cdot (r+k-1) \cdots (r+1) \cdot r}{k!} \ \ \ \ k=1,2,\cdots \end{aligned}

We can let the last step in the above derivation as the definition for \binom{r+k-1}{k} when r is just a positive number not necessarily an integer. For example, let r=0.5 and k=3. Then \binom{0.5+3-1}{3} is \binom{2.5}{3}=(2.5 \cdot 1.5 \cdot 0.5)/3!, which is 1.875/6=0.3125.

Note that the new definition of the binomial coefficient \binom{r+k-1}{k} requires that the bottom number k is a positive integer (1 or higher). When k=0, we define \binom{r+0-1}{0}=1. Whenever the bottom number is 0, the value of the binomial coefficient is 1. With this understanding, we calculate a few probabilities for the parameters r=0.5 and p=0.5.

…….\displaystyle P(N=0)=\binom{0.5+0-1}{0} \ 0.5^{0.5} \cdot 0.5^0=0.5^{0.5}=0.7071

…….\displaystyle \begin{aligned} P(N=1)&=\binom{0.5+1-1}{1} \ 0.5^{0.5} \cdot 0.5^1=\binom{0.5}{1} \cdot 0.5^{1.5} \\&=0.5 \cdot 0.5^{1.5}=0.1768 \end{aligned}

…….\displaystyle \begin{aligned} P(N=2)&=\binom{0.5+2-1}{2} \ 0.5^{0.5} \cdot 0.5^2=\binom{1.5}{2} \cdot 0.5^{2.5} \\&=0.375 \cdot 0.5^{2.5}=0.06629 \end{aligned}

…….\displaystyle \begin{aligned} P(N=3)&=\binom{0.5+3-1}{3} \ 0.5^{0.5} \cdot 0.5^3=\binom{2.5}{3} \cdot 0.5^{3.5} \\&=0.3125 \cdot 0.5^{3.5}=0.02762 \end{aligned}

Compare the negative binomial probabilities between the example of r=3 and p=0.5 and the example of r=0.5 and p=0.5. The two negative binomial distributions have different shapes. In the example of r=0.5, the probabilities are concentrated in the lower values. About 88% of the probabilities are concentrated at N=0 and N=1. On the other hand, in the example of r=3, there are still significant amount of probabilities at N=k for k \ge 4. For this reason, the parameter r is called the shape parameter of the negative binomial distribution.

The Other Two Parametrizations

We now discuss the negative binomial distribution as described by (2) and (3). These give the same probabilities as (1), just that one of the parameters is different. The shape parameter is still r. In (2), the other parameter is \theta, a positive real number. The rule for relating (2) and (1) would be making p=1/(1+\theta) and 1-p=\theta/(1+\theta). Otherwise, (2) would work the same way as in (1) in terms of evaluating the probabilities P(N=k).

Similarly the parameters for (3) would be r and \beta where \beta is a positive real number. The parameters p and \beta would be related by setting p=\beta/(1+\beta) and 1-p=1/(1+\beta).

Why would there be a need for the parametrizations of (2) and (3)? Both (2) and (3) arise naturally through the idea of mixture. The negative binomial is a mixture of Poisson distributions with gamma mixing weights. More specifically, mixing Poisson distributions with uncertain mean \lambda with \lambda following a gamma distribution will produce a negative binomial distribution as described by (2) or (3) depending on the form of the gamma distribution used.

The notion of mixture is applicable in many areas. The notion of mixture distributions and Poisson-gamma mixture in particular are discussed here. Many distributions applicable in actuarial applications are mixture distributions (see here for examples).

Here is a discussion on how Poisson is related to gamma.


Discussion on the negative binomial distribution is found in blog posts in several companion blogs. Here is a detailed discussion of the negative binomial distribution. Further discussions are found here and here.

This post discusses the negative binomial survival function. Here is a detailed discussion on the three versions of the negative binomial distribution.

Two sets of practice problems are found here and here.

Dan Ma math
Dan Ma mathematics topics

Daniel Ma mathematics
Daniel Ma mathematics topics

\copyright 2018 – Dan Ma


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s