Comprehensive Guide on Negative Binomial Distribution
Start your free 7-days trial now!
Let's derive the negative binomial distribution ourselves using a motivating example.
Motivating example
Recall that the geometric distribution is the distribution of the number of trials to observe the first success in repeated independent Bernoulli trials. The negative binomial distribution generalizes this, that is, the negative binomial distribution is the distribution of the number of trials to observe the first
Now, let's go over a simple example that will allow us to derive the probability mass function of the negative binomial distribution. Suppose have an unfair coin with the following probability of heads and tails:
Let's denote the outcome of heads as a success and the outcome of tails as a failure. Suppose we are interested in observing the
What is the probability of observing
the number of trials (
) is fixed.the probability of success (
) is fixed.the trials are independent.
The probability of observing
Note that the reason why we don't compute
We also need to observe the
Let's now generalize this. Suppose we perform repeated Bernoulli trials where the probability of success is
We assume that
The minimum value that
Negative Binomial Distribution
A random variable
Where
Rolling a dice
Suppose we keep rolling a fair dice until we observe
Solution. Let's treat the event of observing a
The probability of observing the
Therefore, the probability of rolling the
Let's also plot the negative binomial probability mass function

We can indeed see that when
Properties of negative binomial distribution
Geometric distribution is a limiting case of the negative binomial distribution
If
Proof. If
Notice that this is the probability mass function of the geometric distribution. Therefore,
Mean of negative binomial distribution
If
Proof. Most proofs for the mean and variance involve tedious algebraic manipulations, but we can avoid doing so by recognizing that the negative binomial distribution is the sum of independent geometric distributions. Let's take a moment to understand why.
Recall that the difference between the negative binomial distribution and geometric distribution is:
a negative binomial random variable
represents the number of trials needed to observe successes.a geometric random variable
represents the number of trials needed to observe the first success.
The diagram below illustrates the relationship between the two types of random variables:

Here, a
Note that the diagram above illustrates the case for
In our guide on geometric distribution, we have already provenlink that the expected value of a geometric random variable
Where
This completes the proof.
Variance of negative binomial distribution
If
Proof. We will again treat a negative random variable
Let's take the variance of both sides:
Note that the third equality holds by propertylink of variance because the geometric random variables are independent.
In our guide on geometric distribution, we have already provenlink that the variance of a geometric random variable
Finally, substituting
This completes the proof.
Rolling a dice (revisited)
Let's revisit our examplelink from earlier - suppose we keep rolling a fair dice until we roll a six for the
We computed the probability of rolling a six for the
We said that such a low probability is to be expected because it should, on average, take us
No wonder it's extremely rare to observe the
Alternate parametrization of the negative binomial distribution
A random variable
Where
Proof and intuition. The negative binomial distribution is sometimes formulated in a different way - instead of counting the number of trials at which the
For instance, observing the

Let's now go the other way - observing
To generalize, let random variable
Let's simplify this, starting with the left-hand side:
We then simplify the right-hand side:
Remember,
Expected value and variance
The expected value and variance of the second definition of the negative binomial random variable are:
Proof. We've already derived the expected value and variance of the first definition of the negative binomial random variable
The variance of
The first equality holds by propertylink of variance because
Overdispersion
The variance of the second definition of a negative binomial random variable is always greater than its expected value, that is:
This is known as the overdispersion property of the negative binomial distribution.
Proof. We have derived the expected value and variance of the second definition of a negative binomial random variable to be:
We can easily express the variance in terms of the expected value:
Since
Note that the overdispersion property only applies to the case when we use the second definition of the geometric random variable.
Working with negative binomial distribution using Python
Computing the probability mass function
Recall the examplelink from earlier once more:
Suppose we keep rolling a fair dice until we observe
To align with Python's SciPy library, let's use the second parametrization of the negative binomial distribution to answer this question. We let random variable
Observing the
Instead of calculating by hand, we can use Python's SciPy library like so:
filter_none
Copy
from scipy.stats import nbinomr = 3p = (1/6)x = 5nbinom.pmf(x,r,p)
0.03907143061271146
Plotting the probability mass function
Suppose we wanted to plot the probability mass function of random variable
We can call the nbinom.pmf(~)
function on a list of non-negative integers:
filter_none
Copy
import matplotlib.pyplot as plt
r = 3p = (1/6)n = 50xs = list(range(n+1)) # [0,1,2,...,50]pmfs = nbinom.pmf(xs, r, p)plt.bar(xs, pmfs)plt.xlabel('$x$')plt.ylabel('$p(x)$')plt.show()
This generates the following plot:
