**Prob and Stats**

# Comprehensive Guide on Geometric Distribution

*schedule*Aug 12, 2023

*toc*Table of Contents

*expand_more*

**mathematics behind data science**with 100+ top-tier guides

Start your free 7-days trial now!

Before we formally define the geometric distribution, let's go through a motivating example.

# Motivating example

Suppose we repeatedly toss an unfair coin with the following probabilities:

Where $\mathrm{H}$ and $\mathrm{T}$ represent heads and tails respectively.

Let the random variable $X$ denote the first occurrence of a heads at the $X$-th trial. To obtain heads for the very first time at the $X$-th trial, we must have obtained $X-1$ number of tails before that. For instance, suppose we are interested in the probability of obtaining a heads for the first time at the 3rd trial. This means that the outcome of our tosses has to be:

The probability of this specific outcome is:

What would the probability of obtaining a heads for the first time at the 5th trial be? In this case, the outcome of the tosses must be:

The probability that this specific outcome occurs is:

Hopefully you can see that, in general, the probability of obtaining a heads for the first time at the $x$-th trial is given by:

Let's generalize further - instead of heads and tails, let's denote a heads as a success and a tails as a failure. If the probability of success is $p$, then the probability of failure is $1-p$. Therefore, the probability of observing a success for the first time at the $x$-th trial is given by:

Random variables that have this specific distribution is said to follow the geometric distribution.

## Assumptions of the geometric distribution

While deriving the geometric distribution, we made the following implicit assumptions:

probability of success is constant for every trial. For our example, the probability of heads is $0.2$ for each coin toss.

the trials are independent. For our example, the outcome of a coin toss does not affect the outcome of the next coin toss.

the outcome of each trial is binary. For our example, the outcome of a coin toss is either heads or tails.

Statistical experiments that satisfy these conditions are called repeated Bernoulli trials.

# Geometric distribution

A discrete random variable $X$ is said to follow a geometric distribution with parameter $p$ if and only if the probability mass function of $X$ is:

Where $0\le{p}\le{1}$. If a random variable $X$ follows a geometric distribution with parameter $p$, then we can write $X\sim\text{Geom}(p)$.

## Drawing balls from a bag

Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. Answer the following questions:

What is the probability of drawing a green ball at the 3rd trial?

What is the probability of drawing a green ball at or before the 3rd trial?

Solution. Let's start by confirming that this experiment is a repeated Bernoulli trials:

probability of success, that is, drawing a green ball is constant ($p=2/5$).

the trials are independent because we are drawing with replacement.

the outcome is binary - we either draw a green ball or we don't.

Let $X$ be a geometric random variable representing the first time we draw a green ball at the $X$-th trial. The probability of success, that is, the probability of drawing a green ball at each trial is $p=2/5$. Therefore, the geometric probability mass function is:

The probability of drawing a green ball at the 3rd trial is:

The probability of drawing a green ball at or before the 3rd trial is:

Finally, let's graph our geometric probability mass function:

We can see that $\mathbb{P}(X=3)$ is indeed roughly around $0.144$. Note that we've truncated the graph at $x=10$ but $x$ can be any positive integer.

# Properties of geometric distribution

## Expected value of a geometric random variable

If $X$ follows a geometric distribution with parameter $p$, then the expected value of $X$ is given by:

Proof. Let $X\sim\mathrm{Geom}(p)$. By the definitionlink of expected values, we have that:

We now want to compute the summation:

Let's rewrite each term as:

To get \eqref{eq:tmFosgxf7NJZupdcACh}, we must take the summation - but the trick is to do so vertically:

Notice that each of these are infinite geometric series with common ratio $(1-p)$. The only difference between them is the starting value. Because $p$ is a probability, we have that $0\lt{p}\lt1$. This also means that $0\lt{1-p}\lt1$. In our guide on geometric series, we have shownlink that all infinite geometric series converge to the following sum when the common ratio is between $-1$ and $1$:

Where $a$ is the starting value of the series and $r$ is the common ratio. Therefore, in our case, the sum would be:

The orange sum can therefore be written as:

Once again, we end up with yet another infinite geometric series with starting value $1/p$ and common ratio $(1-p)$. Using the formula for the sum of infinite geometric series \eqref{eq:ETLEUmzvGgK9qpXwgIf} once again gives:

Substituting \eqref{eq:jkILy4J9QlCfNqZ0wQI} into \eqref{eq:wX1Je2CfjYCv3cV1Sgl} gives:

This completes the proof.

## Variance of a geometric random variable

If $X$ follows a geometric distribution with parameter $p$, then the expected value of $X$ is given by:

Proof. We know from the propertylink of variance that:

We already know what $\mathbb{E}(X)$ is from earlierlink, so we have that:

We now need to derive the expression for $\mathbb{E}(X^2)$. From the definition of expected values, we have that:

Now, notice how we can obtain the purple term by taking the derivative like so:

Note that we are using the power rule of differentiation here.

Let's now use the properties of geometric series to find an expression for the green summation in \eqref{eq:ctA9d0vBqyejpCEhDWm}. We define a new variable $k$ such that $k=x+1$, which also means that $x=k-1$. Rewriting the green summation in terms of $k$ gives:

Now, recall that we derived the following lemma \eqref{eq:jkILy4J9QlCfNqZ0wQI} when proving the expected value earlier:

Notice how the only difference between the summation in \eqref{eq:QL6R8IRppv3v1qFGqed} and \eqref{eq:BCJF0TeNFEcRdzNrRqs} is the symbol used. Since the symbol itself doesn't matter, we have that:

Taking the derivative of both sides with respective to $p$ gives:

Equating \eqref{eq:ctA9d0vBqyejpCEhDWm} and \eqref{eq:C5c0gN5zpfWoNmLmv79} gives:

Substituting \eqref{eq:JMbtP2c4X0XzUrqu7Pj} into \eqref{eq:Liebvt5zDji9PljQRHJ} gives:

Finally, substituting \eqref{eq:VqPsbnb96WEAsuLui4w} into \eqref{eq:BFUrEk4tUwaopDxTSb9} gives:

This completes the proof.

## Cumulative distribution function of the geometric distribution

The cumulative distribution function of the geometric distribution with success probability $p$ is given by:

Proof. We use the definition of cumulative distribution and geometric distribution:

Notice that this is a finite geometric series with starting value $p$ and common ratio $(1-p)$. Using the formulalink for the sum of a finite geometric series, we have that:

This completes the proof.

## Memoryless property of the geometric distribution

The geometric distribution satisfies the memoryless property, that is:

Where $m$ and $n$ are non-negative integers. Note that the geometric distribution is the only discrete probability distribution with the memoryless property.

Intuition. Suppose we keep tossing a coin until we observe our first heads. We know that if we let random variable $X$ represent the outcome of heads at the $X$-th trial, then $X$ is a geometric random variable with success probability $p$. Suppose we are interested in the probability that we get heads for the first time after trial $5$, that is:

Now, suppose we have already observed more than $2$ tails. We can update our probability \eqref{eq:ooYrmktgCHfXpKBZwTL} to include this information:

We now use the formula for conditional probability:

Notice how $\mathbb{P}(X\gt5\;\text{and}\;X\gt2)$ is equal to $\mathbb{P}(X\gt5)$ because when $X\gt5$, then $X\gt2$ is always true. Therefore, we have that:

To calculate the two probabilities, we can use the geometric cumulative distribution functionlink that we derived earlier:

Taking the complement on both sides:

Therefore, $\mathbb{P}(X\gt5)$ and $\mathbb{P}(X\gt2)$ in \eqref{eq:XDyz0O6555xlpibL8kt} can be expressed as:

We can simplify this further using \eqref{eq:DVNJ1bsgANPTzCUJUqP} again:

This means that the probability of observing a heads after the $5$-th trial given that we have already observed $2$ tails is equal to the probability of starting over and observing the first heads after $3$ trials. This makes sense because the past outcomes ($2$ tails in this case) do not affect subsequent outcomes, and hence we can forget about them and act as if we're starting a new coin-toss experiment with the remaining number of trials ($3$ in this case).

Proof. The proof of the memoryless property follows the same logic. Consider a geometric random variable $X$ with probability of success $p$. Recall from earlier that the probability of the first heads occurring after the $5$-th trial given that we have already observed $2$ tails is:

Instead of using these concrete numbers, we replace $5$ with $m+n$ and $2$ with $n$.

This completes the proof.

# Alternate parametrization of the geometric distribution

A discrete random variable $X$ is also said to follow a geometric distribution with parameter $p$ if the probability mass function of $X$ is:

Where $0\le{p}\le{1}$.

Intuition and proof. We have introduced the geometric random variable $X$ as observing the first success at the $X$-th trial. The probability mass function of $X$ was derived to be:

Where $p$ is the probability of success and $x=1,2,3,\cdots$.

There exists an equivalent formulation of the geometric distribution where we let random variable $X$ represent the number of failures before the first success. The key is to notice that observing the first success at the $X$-th trial is logically equivalent to observing $X-1$ failures before the first success. For instance, observing the first success at the $5$-th trial is the same as observing $5-1=4$ failures before the first success:

Let's go the other way now - if we let random variable $X$ represent the number of failures before the first success, then we must observe the first success at the $(X+1)^\text{th}$ trial. We know that $X+1$ follows a geometric distribution with probability mass function:

Let's simplify the left-hand side:

Next, we simplify the right-hand side:

Therefore, \eqref{eq:sTwVrvAjmdD3HLG0aA0} is:

Finally, since $X$ represents the number of failures before the first success, $X$ can take on the values $X=0,1,2,\cdots$. This is slightly different from what $X$ can take on in the original definition of the geometric distribution, which was $X=1,2,3,\cdots$. This completes the proof.

# Working with geometric distribution using Python

## Computing probabilities

Consider the example from earlier:

Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. What is the probability of drawing a green ball at the 3rd trial?

If we define random variable $X$ as the number of trials needed to observe the first green ball at the $X$-th trial, then $X\sim\text{Geom}(2/5)$. We then use the geometric probability mass function to compute the probability of $X=3$:

Instead of computing the probability by hand, we can use Python's SciPy library:

```
from scipy.stats import geomx = 3p = (2/5)geom.pmf(x, p)
0.144
```

Notice that the computed result is identical to the hand-calculated result.

## Drawing probability mass function

Suppose we wanted to draw the probability mass function of $X\sim\text{Geom}(2/5)$:

We can call the `geom.pdf(~)`

function on a list of positive integers:

```
import matplotlib.pyplot as plt
```

p = (2/5)n = 15xs = list(range(1, n+1)) # [0,1,2,...,15]pmfs = geom.pmf(xs, p)str_xs = [str(x) for x in xs] # Convert list of integers into list of string labelsplt.bar(str_xs, pmfs)plt.xlabel('$x$')plt.ylabel('$p(x)$')plt.show()

This generates the following plot:

Note that we converted the list of integers into a list of string labels, otherwise the $x$-axis will contain decimals: