Before we formally define the geometric distribution, let's go through a motivating example.
Motivating example
Suppose we repeatedly toss an unfair coin with the following probabilities:
Where and represent heads and tails respectively.
Let the random variable denote the first occurrence of a heads at the -th trial. To obtain heads for the very first time at the -th trial, we must have obtained number of tails before that. For instance, suppose we are interested in the probability of obtaining a heads for the first time at the 3rd trial. This means that the outcome of our tosses has to be:
The probability of this specific outcome is:
What would the probability of obtaining a heads for the first time at the 5th trial be? In this case, the outcome of the tosses must be:
The probability that this specific outcome occurs is:
Hopefully you can see that, in general, the probability of obtaining a heads for the first time at the -th trial is given by:
Let's generalize further - instead of heads and tails, let's denote a heads as a success and a tails as a failure. If the probability of success is , then the probability of failure is . Therefore, the probability of observing a success for the first time at the -th trial is given by:
Random variables that have this specific distribution is said to follow the geometric distribution.
Assumptions of the geometric distribution
While deriving the geometric distribution, we made the following implicit assumptions:
probability of success is constant for every trial. For our example, the probability of heads is for each coin toss.
the trials are independent. For our example, the outcome of a coin toss does not affect the outcome of the next coin toss.
the outcome of each trial is binary. For our example, the outcome of a coin toss is either heads or tails.
Statistical experiments that satisfy these conditions are called repeated Bernoulli trials.
Definition.
Geometric distribution
A discrete random variable is said to follow a geometric distribution with parameter if and only if the probability mass function of is:
Where . If a random variable follows a geometric distribution with parameter , then we can write .
Example.
Drawing balls from a bag
Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. Answer the following questions:
What is the probability of drawing a green ball at the 3rd trial?
What is the probability of drawing a green ball at or before the 3rd trial?
Solution. Let's start by confirming that this experiment is a repeated Bernoulli trials:
probability of success, that is, drawing a green ball is constant ().
the trials are independent because we are drawing with replacement.
the outcome is binary - we either draw a green ball or we don't.
Let be a geometric random variable representing the first time we draw a green ball at the -th trial. The probability of success, that is, the probability of drawing a green ball at each trial is . Therefore, the geometric probability mass function is:
The probability of drawing a green ball at the 3rd trial is:
The probability of drawing a green ball at or before the 3rd trial is:
Finally, let's graph our geometric probability mass function:
We can see that is indeed roughly around . Note that we've truncated the graph at but can be any positive integer.
■
Properties of geometric distribution
Theorem.
Expected value of a geometric random variable
If follows a geometric distribution with parameter , then the expected value of is given by:
Proof. Let . By the definitionlink of expected values, we have that:
We now want to compute the summation:
Let's rewrite each term as:
To get , we must take the summation - but the trick is to do so vertically:
Notice that each of these are infinite geometric series with common ratio . The only difference between them is the starting value. Because is a probability, we have that . This also means that . In our guide on geometric series, we have shownlink that all infinite geometric series converge to the following sum when the common ratio is between and :
Where is the starting value of the series and is the common ratio. Therefore, in our case, the sum would be:
The orange sum can therefore be written as:
Once again, we end up with yet another infinite geometric series with starting value and common ratio . Using the formula for the sum of infinite geometric series once again gives:
Substituting into gives:
This completes the proof.
■
Theorem.
Variance of a geometric random variable
If follows a geometric distribution with parameter , then the expected value of is given by:
Proof. We know from the propertylink of variance that:
We already know what is from earlierlink, so we have that:
We now need to derive the expression for . From the definition of expected values, we have that:
Now, notice how we can obtain the purple term by taking the derivative like so:
Note that we are using the power rule of differentiation here.
Let's now use the properties of geometric series to find an expression for the green summation in . We define a new variable such that , which also means that . Rewriting the green summation in terms of gives:
Now, recall that we derived the following lemma when proving the expected value earlier:
Notice how the only difference between the summation in and is the symbol used. Since the symbol itself doesn't matter, we have that:
Taking the derivative of both sides with respective to gives:
Equating and gives:
Substituting into gives:
Finally, substituting into gives:
This completes the proof.
■
Theorem.
Cumulative distribution function of the geometric distribution
The cumulative distribution function of the geometric distribution with success probability is given by:
Proof. We use the definition of cumulative distribution and geometric distribution:
Notice that this is a finite geometric series with starting value and common ratio . Using the formulalink for the sum of a finite geometric series, we have that:
This completes the proof.
■
Theorem.
Memoryless property of the geometric distribution
The geometric distribution satisfies the memoryless property, that is:
Where and are non-negative integers. Note that the geometric distribution is the only discrete probability distribution with the memoryless property.
Intuition. Suppose we keep tossing a coin until we observe our first heads. We know that if we let random variable represent the outcome of heads at the -th trial, then is a geometric random variable with success probability . Suppose we are interested in the probability that we get heads for the first time after trial , that is:
Now, suppose we have already observed more than tails. We can update our probability to include this information:
We now use the formula for conditional probability:
Notice how is equal to because when , then is always true. Therefore, we have that:
To calculate the two probabilities, we can use the geometric cumulative distribution functionlink that we derived earlier:
Taking the complement on both sides:
Therefore, and in can be expressed as:
We can simplify this further using again:
This means that the probability of observing a heads after the -th trial given that we have already observed tails is equal to the probability of starting over and observing the first heads after trials. This makes sense because the past outcomes ( tails in this case) do not affect subsequent outcomes, and hence we can forget about them and act as if we're starting a new coin-toss experiment with the remaining number of trials ( in this case).
■
Proof. The proof of the memoryless property follows the same logic. Consider a geometric random variable with probability of success . Recall from earlier that the probability of the first heads occurring after the -th trial given that we have already observed tails is:
Instead of using these concrete numbers, we replace with and with .
This completes the proof.
■
Theorem.
Alternate parametrization of the geometric distribution
A discrete random variable is also said to follow a geometric distribution with parameter if the probability mass function of is:
Where .
Intuition and proof. We have introduced the geometric random variable as observing the first success at the -th trial. The probability mass function of was derived to be:
Where is the probability of success and .
There exists an equivalent formulation of the geometric distribution where we let random variable represent the number of failures before the first success. The key is to notice that observing the first success at the -th trial is logically equivalent to observing failures before the first success. For instance, observing the first success at the -th trial is the same as observing failures before the first success:
Let's go the other way now - if we let random variable represent the number of failures before the first success, then we must observe the first success at the trial. We know that follows a geometric distribution with probability mass function:
Let's simplify the left-hand side:
Next, we simplify the right-hand side:
Therefore, is:
Finally, since represents the number of failures before the first success, can take on the values . This is slightly different from what can take on in the original definition of the geometric distribution, which was . This completes the proof.
■
Working with geometric distribution using Python
Computing probabilities
Consider the example from earlier:
Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. What is the probability of drawing a green ball at the 3rd trial?
If we define random variable as the number of trials needed to observe the first green ball at the -th trial, then . We then use the geometric probability mass function to compute the probability of :
Instead of computing the probability by hand, we can use Python's SciPy library:
filter_none
Copy
from scipy.stats import geom
x = 3
p = (2/5)
geom.pmf(x, p)
Notice that the computed result is identical to the hand-calculated result.
Drawing probability mass function
Suppose we wanted to draw the probability mass function of :
We can call the geom.pdf(~)
function on a list of positive integers:
filter_none
Copy
import matplotlib.pyplot as plt
p = (2/5)
n = 15
xs = list(range(1, n+1)) # [0,1,2,...,15]
pmfs = geom.pmf(xs, p)
str_xs = [str(x) for x in xs] # Convert list of integers into list of string labels
plt.bar(str_xs, pmfs)
plt.xlabel('$x$')
plt.ylabel('$p(x)$')
plt.show()
This generates the following plot:
Note that we converted the list of integers into a list of string labels, otherwise the -axis will contain decimals: