Negative binomial distribution

This is a concise discussion of the negative binomial distribution. Links to detailed discussion are given below.

A counting distribution is a random variable that only takes on the non-negative integers 0, 1, 2, … The negative binomial distribution is a counting distribution. In the present discussion, N is a random variable that follows a negative binomial distribution. This means that the probability that N takes on the value k is given by one of the following probability functions.

    (1)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ p^r (1-p)^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k=0,1,2,\cdots

    (2)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ \biggl(\frac{1}{1+\theta} \biggr)^r \biggl(\frac{\theta}{1+\theta} \biggr)^k \ \ \ \ \ \ k=0,1,2,\cdots

    (3)…….\displaystyle P(N=k)=\binom{r+k-1}{k} \ \biggl(\frac{\beta}{1+\beta} \biggr)^r \biggl(\frac{1}{1+\beta} \biggr)^k \ \ \ \ \ \ k=0,1,2,\cdots

These three functions are called probability functions. We discuss (1) first. In (1), the numbers r and p are fixed constants. They are called the parameters of the negative binomial distribution. The parameter r can be any positive number r>0. The parameter p can be any number 0<p<1.

A Natural Interpretation

When the parameter r>0 is an integer, (1) has a natural interpretation. Let’s say r is a positive integer. Suppose that a coin has the characteristics that when flipped, the probability of getting a head is p. Let’s say we keep tossing this coin until we get r heads. Then the probability function (1) describes this random phenomenon. The probability P(N=k) in (1) is the probability that the rth head is on the (r+k)th toss. In other words, P(N=k) is the probability that it takes r+k tosses to get r heads.

As an illustration, let p=0.5 (a fair coin). Let’s say we flip the coins until the third head. Here’s several probabilities.

    …….\displaystyle P(N=0)=\binom{2}{0} \ 0.5^3 \cdot 0.5^0=0.5^3=0.125

    …….\displaystyle P(N=1)=\binom{3}{1} \ 0.5^3 \cdot 0.5^1=3 \cdot 0.5^4=0.1875

    …….\displaystyle P(N=2)=\binom{4}{2} \ 0.5^3 \cdot 0.5^2= 6 \cdot 0.5^5=0.1875

    …….\displaystyle P(N=3)=\binom{5}{3} \ 0.5^3 \cdot 0.5^3=10 \cdot 0.5^6=0.15625

A quick note about binomial coefficients. Numbers such as \binom{2}{1} and \binom{3}{2} are called binomial coefficients. In general \binom{m}{n} is defined by the ratio \frac{m!}{n! (m-n)!}. The number such as m! is called factorial, which is the product of m and all the positive integers below m. So m is the number m!=m \cdot (m-1) \cdots 3 \cdot 2 \cdot 1.

The above 4 probabilities tell us that in flipping a fair coins, there are 12.5% chance that it takes 3 tosses (0+3) to get three heads, that there is an 18.75% chance that it takes 4 tosses (1 + 3) to get three heads and so on. The sum of these 4 probabilities is 0.65625. So there is a 65.625% chance that it takes at most 6 tosses to get three heads.

Note that in the coin tossing example, the random variable N counts the number of tails. Since the goal is to get 3 heads, the number of tosses to achieve the goal would be N+3. Thus the probability of flipping the coin 7 times to get r=3 heads would be P(N=4).

The coin tossing example can be generalized by a random experiment such as this: perform a series of independent trials, where each trial has only two distinct outcomes (for convenience one is called success and the other is called failure). The probability of getting a success in each trial is constant across all the trials. Let p be the probability of a success in a trial. Let’s say this experiment stops when r successes are obtained. The probability P(N=k) in (1) is the probability that it will take k failures to obtain r successes. Equivalently, P(N=k) is the probability it will take r+k trials in the experiment to obtain r successes.

When the Parameter r Is Not Integer

When the parameter r is a positive real number but not an integer, the natural setting of tossing a coin until the rth head would not be applicable. However, the negative binomial distribution is still a useful model. It cannot be interpreted as the counting of failures until the rth success. It can be used as a model for the count of some type of random occurrences. For example, the number of insurance losses from an insurance contract in a policy period.

To calculate the probability when r is not an integer, we need to relax the definition of the binomial coefficient. When r is a positive integer, the binomial coefficient \binom{r+k-1}{k} is defined as follows:

…….\displaystyle \binom{r+k-1}{k}=\frac{(r+k-1)!}{k! (r-1)!}

A further simplification of this calculation is informative.

…….\displaystyle \begin{aligned} \binom{r+k-1}{k}&=\frac{(r+k-1)!}{k! \cdot (r-1)!} \\&=\frac{(r+k-1) \cdot (r+k-1) \cdots (r+1) \cdot r \cdot (r-1)!}{k! \cdot (r-1)!} \\&=\frac{(r+k-1) \cdot (r+k-1) \cdots (r+1) \cdot r}{k!} \ \ \ \ k=1,2,\cdots \end{aligned}

We can let the last step in the above derivation as the definition for \binom{r+k-1}{k} when r is just a positive number not necessarily an integer. For example, let r=0.5 and k=3. Then \binom{0.5+3-1}{3} is \binom{2.5}{3}=(2.5 \cdot 1.5 \cdot 0.5)/3!, which is 1.875/6=0.3125.

Note that the new definition of the binomial coefficient \binom{r+k-1}{k} requires that the bottom number k is a positive integer (1 or higher). When k=0, we define \binom{r+0-1}{0}=1. Whenever the bottom number is 0, the value of the binomial coefficient is 1. With this understanding, we calculate a few probabilities for the parameters r=0.5 and p=0.5.

…….\displaystyle P(N=0)=\binom{0.5+0-1}{0} \ 0.5^{0.5} \cdot 0.5^0=0.5^{0.5}=0.7071

…….\displaystyle \begin{aligned} P(N=1)&=\binom{0.5+1-1}{1} \ 0.5^{0.5} \cdot 0.5^1=\binom{0.5}{1} \cdot 0.5^{1.5} \\&=0.5 \cdot 0.5^{1.5}=0.1768 \end{aligned}

…….\displaystyle \begin{aligned} P(N=2)&=\binom{0.5+2-1}{2} \ 0.5^{0.5} \cdot 0.5^2=\binom{1.5}{2} \cdot 0.5^{2.5} \\&=0.375 \cdot 0.5^{2.5}=0.06629 \end{aligned}

…….\displaystyle \begin{aligned} P(N=3)&=\binom{0.5+3-1}{3} \ 0.5^{0.5} \cdot 0.5^3=\binom{2.5}{3} \cdot 0.5^{3.5} \\&=0.3125 \cdot 0.5^{3.5}=0.02762 \end{aligned}

Compare the negative binomial probabilities between the example of r=3 and p=0.5 and the example of r=0.5 and p=0.5. The two negative binomial distributions have different shapes. In the example of r=0.5, the probabilities are concentrated in the lower values. About 88% of the probabilities are concentrated at N=0 and N=1. On the other hand, in the example of r=3, there are still significant amount of probabilities at N=k for k \ge 4. For this reason, the parameter r is called the shape parameter of the negative binomial distribution.

The Other Two Parametrizations

We now discuss the negative binomial distribution as described by (2) and (3). These give the same probabilities as (1), just that one of the parameters is different. The shape parameter is still r. In (2), the other parameter is \theta, a positive real number. The rule for relating (2) and (1) would be making p=1/(1+\theta) and 1-p=\theta/(1+\theta). Otherwise, (2) would work the same way as in (1) in terms of evaluating the probabilities P(N=k).

Similarly the parameters for (3) would be r and \beta where \beta is a positive real number. The parameters p and \beta would be related by setting p=\beta/(1+\beta) and 1-p=1/(1+\beta).

Why would there be a need for the parametrizations of (2) and (3)? Both (2) and (3) arise naturally through the idea of mixture. The negative binomial is a mixture of Poisson distributions with gamma mixing weights. More specifically, mixing Poisson distributions with uncertain mean \lambda with \lambda following a gamma distribution will produce a negative binomial distribution as described by (2) or (3) depending on the form of the gamma distribution used.

The notion of mixture is applicable in many areas. The notion of mixture distributions and Poisson-gamma mixture in particular are discussed here. Many distributions applicable in actuarial applications are mixture distributions (see here for examples).

Here is a discussion on how Poisson is related to gamma.


Discussion on the negative binomial distribution is found in blog posts in several companion blogs. Here is a detailed discussion of the negative binomial distribution. Further discussions are found here and here.

This post discusses the negative binomial survival function. Here is a detailed discussion on the three versions of the negative binomial distribution.

Two sets of practice problems are found here and here.

Dan Ma math
Dan Ma mathematics topics

Daniel Ma mathematics
Daniel Ma mathematics topics

\copyright 2018 – Dan Ma


Benford’s Law

Take a look at a natural data set, e.g. the populations of all the counties in the United States.

Figure 1 – County Map

There are over 3,000 counties. If you look at all the 3000+ population counts, the first digit is 1 about 30% of the time, the first digit is 2 about 18% of the time, and 9 about 5% of the time.

Figure 2

The digit 1 shows up as a first digit about 6 times as often as 9. Amazing! This phenomenon is called Benford’s law.

Figure 3 – Benford’s Law

This phenomenon shows up in many other data sets. In addition to population data, it shows up in geographical data (e.g. areas of rivers), baseball statistics, numbers in magazine articles, numbers in street addresses, house prices and stock market trading data. Benford’s law also has practical applications in business because it shows up in corporate financial statements.

Let’s take a look at the annual financial statements of Google from 2013 to 2016. This is the summary of the first digits in these financial statements.

Figure 4 – First Digits in Goolge’s Financial Statements

Look very similar to Benford’s law, right? Here’s is a side by side comparison with Benford’s law.

Figure 5 – Side by Side Comparison of Google and Benford

Overall, there is a general agreement between first digits in Google’s financial statements and the predicted percentages according to Benford’s law. There are some noticeable differences. For example, the first digit 1 shows up about 35% of the time in Google’s statements versus 30% of the time according to Benford’s law. The differences are not statistically significant in this case. This conclusion is based on a statistical test called the chi-squared test.

The comparison between the first digits in Google’s financial statements and Benford’s law shows that Benford’s law is a powerful tool for detecting financial frauds.

How do you use Benford’s law? Compare the actual frequencies of the first digits in a set of financial statements with the predicted frequencies according to the Benford’s law. Anyone who fakes financial data at random will not produce data that look convincing. Even when the first digits do not distribute uniformly, too big of a discrepancy between the actual first digits and the Benford’s law (e.g. too few 1’s or too many 7’s, 8’s and 9’s) will raise a giant red flag, at which time the investigator can use more sophisticated tests for further evaluation.

Remember thus guy?

Figure 6 – Bernie Madoff’s Mug Shot

Bernie Madoff perpetrated most likely the most massive Ponzi scheme in all of history. His operation would have a constant need to make up numbers for the purpose of keeping up the appearance of legitimate investing. There was a study showing that the first digits in the monthly returns over a 215-month period did not conform to Benford’s law. So Bernie Madoff could have been caught a lot sooner if auditors and regulators were willing to look more closely.

Whether the financial data in question are or are not close to Benford’s law does not prove anything. But too big of a discrepancy should raise suspicion. Then the investigator can further test or evaluate using more sophisticated methods.

There are other applications in addition to fraud detection and forensic accounting. Benford’s law can be used to detect changes in natural processes (e.g. earthquake detection) and as a tool to assess the appropriateness of mathematical models (e.g. assessing whether projected population counts make sense).

This post is an abbreviated version of an article on Benford’s law in an affiliated blog.

\copyright 2017 – Dan Ma

Law of large numbers

Gambling games are a great way to illustrate the law of large numbers, a fundamental principle in probability. In a gambling context, it states that the individual bets can be unpredictable but in the long run (after hundreds or thousands or more bets), the results of the bets are stable and predictable. Predictable in this case mean that the casino always win. Furthermore, the average win of the casino approaches the house edge, which is the theoretical winning per bet. Of course, this principle describes many other random phenomena that have nothing to do with gambling. In this post we point out several examples of illustrations of the law of large numbers.

From a game of chance perspective, one way to confirm the law of large numbers is through actual gambling (with real money). Walk into casino several nights in a row and play. Then see what happens at the end of each night. Of course, this could be a very costly way to obtain data. Another way is through simulation, which can be done by hand (rolling dice and flipping coins) or performed in a computer. Game results can be generated using computer generated random numbers based on the presumed odds of the game in question. In a computer simulation, thousands or tens of thousands of bets can be simulated. We can then see what the long run results look like.

Take the game of roulette as an example. The payout rules of the game are designed in such a way that the house edge of any bet that can be made is always 5.26%. This is the theoretical average winning of the house (or the theoretical loss of the gambler) per unit bet. So for each $1 bet (say the bet on the color red), the house is expect to win $0.0526 (5.26 cents). Per $100 bet, the house is expected to win $5.26. If the total dollar value of the bets made in a given night is $1 million, then the house expects to collect $52,600. As long as customers are flowing into the casino, the casino wins. The mathematical aspect is discussed in this post in a companion blog. Simulations are demonstrated in this post.

Another simulation is performed on the carnival game of Chuck-a-Luck (see here). Some people mistakenly think that the bets in Chuck-a-Luck are made in even-odds basis. The house edge is actually close to 8% (worse odds than the roulette wheel for the gambler’s perspective).

Here’s an example of a simulation on a context outside of gambling. In rolling a fair die, the value of the die can range from 1 to 6 and is random and unpredictable. Yet the long run results are actually stable and predictable. The simulated values of the dice average out to be around 3.5 in the long run. See this post in a companion blog for details. The blog post describes a random experiment called n-dice gas station. This is a gas station where you roll dice to determine the price of gas. For example, in the 1-die gas station, the price is determined by rolling a fair die (whatever the value of the die that comes up, that will be the price of gas per gallon). In a 2-dice gas station, you roll two dice and the price of gas per gallon is the average of the two dice. The blog post is to illustrate the sampling distribution of the sample mean as well as the law of large numbers.

Here’s a simulation of the law of large numbers in the context of the coupon collector problem (see here).

The possibilities for simulations for gambling games and other random phenomena are endless. For any one who thinks that he can “beat the odds” over a long series of bets, just try to simulate bets by rolling dice and/or tossing coins. A clear advantage for the house should emerge fairly quickly.

\copyright 2017 – Dan Ma

The gambler’s ruin

Here’s two formulas that tell a great story about what would happen to a gambler if he or she plays long enough in a casino.

Formula 1

\displaystyle A_i=\frac{i}{n}

Formula 2

\displaystyle A_i=\frac{\displaystyle 1-\biggl(\frac{q}{p} \biggr)^i}{\displaystyle 1- \biggl(\frac{q}{p} \biggr)^n}

Both formulas are answers to the following problem called the gambler’s ruin.

The Gambler’s Ruin

Two gamblers, A and B, are betting on the tosses of a coin such that the probability of getting a head is p. Let q=1-p, which would be the probability of getting a tail in a coin toss. At the beginning of the game, player A has i units of wealth and player B has n-i units (together both players have a combined wealth of n units at the beginning). In each play of the game, the coin is tossed. If the result of the coin toss is head, player A collects 1 unit from player B. If the result of the coin toss is tail, player A pays player B 1 unit. The game continues until one of the players has all the n units of wealth. What is the probability that player A ends up with all n units? What is the probability that player B ends up with all n units?

This problem is very interesting if there is a great disparity in wealth between the two players. If the wealth of one of the players is a tiny percentage of the other player, then the problem would describe the game between a gambler and a casino.

Imagine that player A is a gambler and player B is the casino. In this scenario, the number n is a large number and i is a very small number relative to n. For example, the casino has $999,000 on hand and the gambler has only $1,000 so that the combined wealth is $1,000,000. One unit is $1. If the game is played in the manner described above, what is the probability that player A ends up with $1,000,000? This would be the probability of player A breaking the bank. Invest $1,000 and end up with $1 million. How likely is that? What is the probability of player B (casino) ends up with $1,000,000? This would be the probability that player A losing the $1,000 he starts with (the probability of ruin for player A).

Let’s explain what A_i is in the above two formulas. If player A starts with i units of wealth and player B starts with n-i units, let A_i be the probability that player A will end up with all n units of wealth and let B_i be the probability that player B will end up with all n units of wealth. Of course, this is assuming that the game continues until one of the players is broke.

Formula 1 is the case that the coin used in the game is a fair coin so that p = 0.5. Formula 2 is the case that the coin is not a fair coin so that p is not 0.5.

Also A_i+B_i is always 1.0. When A_i is computed using one of the above formulas, B_i=1-A_i.

In the event that the gambler plays a fair game in a casino, the probability that he or she will win everything is the ratio of the gambler’s wealth to the total combined wealth of the gambler and the casino. In the case that player A only has $1,000 and the combined bankroll is $1,000,000, A_i = 1,000 / 1,000,000 = 0.001 (0.1%). The probability of ruin for player A is then 99.9%. So there is a 99.9% chance that player A will lose the $1,000. But this is only if they play a fair game in the casino, which they don’t.

Let’s say the game is biased for the house so that the house has 1% edge. So the probability of getting a head is p = 0.49. If the exponent is too big, the calculator would give an overflow error. So let’s assume that the gambler has 5 units (i = 5) and the total bankroll is n = 1,000. Then we would use Formula 2 to find A_{5}.

    \displaystyle A_{5}=\frac{\displaystyle 1-\biggl(\frac{0.51}{0.49} \biggr)^5}{\displaystyle 1- \biggl(\frac{0.51}{0.49} \biggr)^{1000}}=9.35731 \times 10^{-19}

This is virtually a zero probability. This means B_5 = 1. So the probability of ruin is virtually certain. We all know that the casino always has the upper hand. The formulas tell a great story.

For more information about gamber’s ruin, see the following two blog posts. This post explains how the two formulas are derived. This post explain how the formulas are used and has further commentary.

\copyright 2017 – Dan Ma