**Prob and Stats**

# Comprehensive Guide on Binomial Distribution

*schedule*Mar 5, 2023

*toc*Table of Contents

*expand_more*

**interactive map of data science**

Before we dive into the formal definition of the binomial distribution, we will first go through a simple motivating example and try to derive the binomial distribution ourselves.

# Motivating example of the binomial distribution

Suppose we toss an unfair coin three times. The probability that the coin lands on heads is $0.8$. Compute the probabilities of tossing $0$, $1$, $2$ and $3$ heads.

Solution. Let's define the random variable $X$ as the number of heads obtained from the three tosses. The probabilities of tossing $0$, $1$, $2$ and $3$ heads are:

Let's understand how these probabilities were calculated. $\mathbb{P}(X=0)$ and $\mathbb{P}(X=3)$ are straight-froward because there is only one way to obtain $3$ tails or $3$ heads from $3$ tosses. Let's now focus on $\mathbb{P}(X=1)$, which is the probability of obtaining $1$ heads. There are three ways to obtain $1$ heads:

The key here is to realize that each of these outcomes occurs with the same probability.

Therefore, instead of computing all of these probabilities, we can simply compute one probability and multiply it by the number of ways to obtain the outcome. In this case, the number of ways of obtaining $1$ heads out of $3$ tosses is:

If you are unfamiliar with the concept of combinations, please refer to our comprehensive guide here.

Similarly, for $\mathbb{P}(X=3)$, we had to multiply the probabilities by the number of ways of getting $3$ heads out of $3$ tosses:

Let's now go back to and express the green multiples using combinations:

Can you see that there is a pattern here? In fact, we can compute each probability like so:

Here:

$3$ is the number of trials.

$0.8$ is the probability of success.

$0.2$ is the probability of failure.

$x$ is the number of successes we observe.

We can generalize \eqref{eq:ea15QR73ajeGLJd3HAH} by

replacing the number of trials ($3$) with $n$.

the probability of success ($0.8$) with $p$.

the probability of failure by ($0.2$) with $1-p$.

Doing so will give us the famous binomial distribution below.

# Probability mass function of binomial distribution

A random variable $X$ is said to follow a binomial distribution if and only if the probability mass function of $X$ is:

Where $n$ and $p$ are the parameters of the distribution defined as:

$n$ is the number of trials.

$p$ is the probability of success.

Note that we often use the notation $X\sim\text{Bin}(n,p)$ to denote that random variable $X$ follows a binomial distribution with parameters $n$ and $p$.

# Assumptions of binomial distribution

From our motivating example, it should be clear that the following conditions must be satisfied for the binomial distribution to be appropriate:

for each trial, there are only $2$ outcomes - either success or failure.

the number of trials $n$ is fixed.

each trial has the same probability of success $p$. For instance, each time we toss a coin, the probability of getting a heads should always be the same.

each trial is independent, which means that the result of one trial does not affect the result of another. For instance, obtaining a heads in the current trial should not affect the probability of the outcome of the next coin trial.

Whenever these assumptions are satisfied, the experiment is called a binomial experiment. A single trial of a binomial experiment is called a Bernoulli trial. We will later look at Bernoulli random variables, which are special cases of the binomial random variables with number of trials $n=1$.

# Rolling a dice

Suppose we roll a fair dice $3$ times. What is the probability of rolling a $4$ once?

Solution. Let's define the random variable $X$ as the number of $4$s we roll. Let's check whether or not $X$ is binomially distributed:

each trial has either $2$ outcomes - either $4$ or not a $4$. You may think that a dice roll has $6$ outcomes, but we are only concerned with the binary outcome of either $4$ or not $4$.

the number of trials is fixed at $n=3$.

each trial has the same probability of success, $p=1/6$.

the trials are independent since the outcome of the current roll does not affect the outcome of the next roll.

Since the conditions of a binomial experiment are satisfied, $X$ is indeed a binomial random variable. Specifically, $X\sim\text{Bin}(3,1/6)$, which means that the probability mass function of $X$ is:

The probability of rolling a $4$ once is therefore:

We show the binomial probability mass function \eqref{eq:XcyxweXOskhTagGUZkp} below:

We can see that the probability is indeed roughly $0.35$ for $x=1$. Intuitively, we should on average expect to roll a $4$ once every six rolls. Because we have only rolled $n=3$ times, it makes sense that the probability that we don't roll a $4$ at all is the highest! Later on, we will dive deeper into what the expected value of a binomial random variable should be!

# Case when binomial distribution does not apply

Suppose we have a bag that contains 3 red balls and 1 green ball. We randomly draw two balls in succession **without replacement**. If we let random variable $X$ denote the number of red balls we draw, does $X$ follow the binomial distribution?

Solution. Since we are not putting the ball back in for every trial, the probability of success (getting a red ball) changes. For instance, suppose we wanted to compute the probability of drawing two red balls. The probability of getting a red ball in the first draw is $3/4$, and the probability of getting a red ball in the second draw is $2/3$ since we've taken the red ball out after the first draw. Therefore, the probability of drawing two red balls is:

This does not align with our formula for binomial distribution. In particular, this scenario violates the assumption of the binomial distribution that each trial is independent. Note that if the balls are drawn with replacement, that is, we put the ball back into the bag after each draw, then $X$ will follow a binomial distribution.

The Bernoulli distribution is a special case of the Binomial distribution where the number of trials $n$ is one. For instance, the outcome of a single coin toss can be represented by a Bernoulli random variable with a probability of heads $p$.

Let's briefly discuss the properties of Bernoulli random variables as they will come in handy when deriving properties of Binomial random variables.

# Bernoulli random variables

A random variable $X$ is said to be a Bernoulli random variable with parameter $p$ if and only if the probability mass function of $X$ is:

Where $0\lt{p}\lt1$. A Bernoulli random variable is denoted as $X\sim\text{Ber}(p)$.

Intuition. The outcome of a Bernoulli random variable is either $0$ (failure) or $1$ (success). The outcome $0$ occurs with a probability of $1-p$ and the outcome $1$ occurs with a probability of $p$.

Here are two examples of Bernoulli random variables:

suppose we toss a fair coin once. If $X$ is a random variable such that $X=1$ if heads (success) and $X=0$ if tails (failure), then $X\sim\text{Ber}(1/2)$.

suppose we roll a fair dice once. If $X$ is a random variable such that $X=1$ if the outcome is one or two (success) and $X=0$ otherwise (failure), then $X\sim\mathrm{Ber}(2/6)$.

The probability mass function of the two examples are as follows:

$X\sim\text{Ber}(1/2)$ | $X\sim\text{Ber}(1/3)$ |
---|---|

Note that if we assume the bin widths to be one, the total area of the bins would add up to one!

## Expected value and variance of Bernoulli random variables

If $X$ is a Bernoulli random variable with a probability of success $p$, then the expected value and variance of $X$ are:

Proof. Let's derive $\mathbb{E}(X)$ first. Using the definition of expected values, we have that:

To derive variance $\mathbb{V}(X)$, let's use the following propertylink:

We already know what $\mathbb{E}(X)$ is, so let's derive $\mathbb{E}(X^2)$. Using the definition of expected values once more, we have that:

Substituting $\mathbb{E}(X)$ and $\mathbb{E}(X^2)$ into \eqref{eq:NDc9Gq2WTEXIHvVeljl} gives us the variance:

This completes the proof.

Now, let's get back to exploring the properties of binomial random variables!

# Expected value and variance of a binomial random variable

The expected value and variance of a binomial random variable $X\sim\text{Bin}(n,p)$ are:

Proof. Perhaps the simplest and the most elegant way to derive the expected value and variance of a binomial random variable $X\sim\text{Bin}(n,p)$ is to treat them as the sum of $n$ independent Bernoulli random variables with a probability $p$ of success.

For instance, suppose we tossed a fair coin $3$ times. If we let random variable $X$ represent the total number of heads, then $X\sim\text{Bin}(3,0.5)$. However, the outcome of each toss can be represented by a Bernoulli random variable $Y_i$ where $Y_i=1$ if heads and $Y_i=0$ if tails - each of these outcomes occur at a probability of $0.5$. Therefore, the binomial random variable $X$ can be expressed as the sum of three independent Bernoulli random variables:

Let's go back to the general case when $X\sim\text{Bin}(n,p)$. Again, $X$ can be expressed as the sum of $n$ independent Bernoulli random variables:

Taking the expected value of both sides and using the linearity of expected values:

We have already provenlink earlier that the expected value of a Bernoulli random variable $Y_i\sim\text{Ber}(p)$ is:

Plugging this into \eqref{eq:pAydOsQCIxspceUCSS4} gives us the expected value of $X$:

To derive the variance of our Binomial random variable $X$, we can again use \eqref{eq:t3cBEWp62yvN2QdQXlU} - but this time, we take the variance of both sides:

Because $Y_1,Y_2,\cdots,Y_n$ are independent, we can use this propertylink of variance to get:

We have previously shownlink that the variance of a Bernoulli random variable $Y_i\sim\text{Ber}(p)$ is:

Plugging this into \eqref{eq:zwbY6dzThniXRg8MHBW} gives us the variance of $X$:

This completes the proof.

# Effects of parameters on the shape of the distribution

## Increasing the sample size

Increasing the sample size $n$ causes the probability mass function of a binomial random variable to take on a bell-curve shape:

$X\sim\text{Bin}(5,0.5)$ | $X\sim\text{Bin}(25,0.5)$ | $X\sim\text{Bin}(50,0.5)$ |
---|---|---|

Here, the probability of success is fixed at $p=0.5$, but the sample size is $n=5,25,50$ from left to right, respectively. We can see that the probability mass function is starting to look like a continuous curve already at $n=50$. Although we won't do so here, we can mathematically prove that the binomial distribution converges to a normal distribution when the sample size $n$ is large.

## Increasing probability of success

Increasing the probability of success $p$ causes the probability mass function of a binomial random variable to shift to the right:

$X\sim\text{Bin}(10,0.25)$ | $X\sim\text{Bin}(10,0.50)$ | $X\sim\text{Bin}(10,0.75)$ |
---|---|---|

Here, the sample size is fixed at $n=10$, but we vary the probability of success from $p=0.25,0.50,0.75$ from left to right, respectively. This shifting behavior makes sense because the higher probability of success, the more successes we should expect.

# Working with binomial random variables using Python

## Computing the binomial probability mass function

Recall the following example question from earlier:

Suppose we roll a fair dice $3$ times. What is the probability of rolling a $4$ once?

We can define a binomial random variable $X\sim\text{Bin}(3,1/6)$ to represent the number of times we roll a $4$. The probability of rolling a $4$ once is therefore:

We can also use a statistical library in Python called SciPy to easily compute the value of the binomial probability mass function:

```
from scipy.stats import binom
```

n = 3p = (1/6)x = 1binom.pmf(x, n, p)
0.3472222222222223

Notice how this aligns with our hand-calculated result!

## Drawing the binomial probability mass function

Consider the probability mass function of $X\sim\text{Binom}(3,1/6)$:

We can call the `binom.pmf(~)`

function for all the possible values of $x$ like so:

```
import matplotlib.pyplot as plt
```

n = 3p = (1/6)xs = list(range(n+1)) # [0,1,2,3]# Calculate the pmf values and store in a listpmfs = binom.pmf(xs, n, p)# Plotting a bar chart with Matplotlibplt.bar(xs, pmfs)plt.xlabel('$x$')plt.ylabel('$p(x)$')plt.show()

This generates the following plot: