search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Comprehensive Guide on Probability Density Functions

schedule Aug 12, 2023
Last updated
local_offer
Probability and Statistics
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Before reading this guide, please be familiar with the concept of random variables and probability mass functions.

What is a probability density function?

We have previously introduced the probability mass function as a probability distribution that assigns probabilities to each possible value of a discrete random variable. Assigning probabilities is possible for the discrete random variables as they take on a finite number of possible values. For instance, we can assign a probability of $1/6$ to each face of a fair dice.

In contrast, continuous random variables can take on any real number, which means there are an infinite number of possible outcomes. Therefore, unlike for the discrete case, we cannot assign probabilities to each outcome.

Deriving the definition of probability density functions

Suppose we selected $5000$ adults randomly and measured their height. If we group the heights using a bin width of $10$, then the frequency histogram might look like follows:

We can convert a frequency histogram into a density histogram like so:

The main property of density histograms is that the areas of the rectangles sum up to one. This means, for instance, that we can find the probability of a person who is shorter than or equal to 160cm, that is $\mathbb{P}(X\le160)$, by summing the following areas:

We can also easily find the probability $\mathbb{P}(160\le{X}\le180)$ by summing the area of the following rectangles:

Now, instead of a bin width of $10$, let's use a shorter bin width of say $5$. Here's what the frequency histogram would look like:

Notice how using a shorter bin width results in more rectangles. Once again, we can convert this frequency histogram into a density histogram such that the areas of the rectangles sum up to one:

Let's now use an even smaller bin width - here's the density histogram with bin width $1$:

We can clearly still see that the density histogram is tracing out a curve! In fact, the density histogram becomes increasingly smoother as we keep decreasing the bin width. As a final example, here's a density histogram with a bin width of 0.01:

Remember, this is a density histogram so the areas of these extremely small rectangles still add up to one! Of course, we can choose an even smaller bin width and smoothen out the histogram even more! If we were to use an infinitesimally small bin width, we would end up with an infinite number of rectangles that will trace out a continuous function:

This function is called the probability density function! The probability density function is essentially a density histogram whose bin width is infinitesimally small.

Deriving the properties of probability density functions

The way to compute probabilities using the probability density function is the same as when using the density histogram, except that we can now compute the area under the curve instead of summing up the area of rectangles. For instance, the probability $\mathbb{P}(160\le{X}\le180)$ is given by the following area:

Again, an important property of the probability density function is that the area under the curve when the domain is all possible values of $X$ (e.g. from $150$ to $190$ in this case) is one:

The probability density function is always positive for all possible values of $X$ because the probability density histogram cannot have negative density.

One last property of the probability density function is that the probability of a random variable $X$ taking a specific value is $0$. For instance, what is the probability that a random person's height is exactly 170cm? Here's the mind-blowing truth - the probability is, in fact, zero 🤯:

$$\mathbb{P}(X=170)=0$$

This should make sense because for a person's height to be exactly 170cm, the height must be 170.00000...cm with an infinite number of 0s. Since a person cannot have an exact height, the probability that a continuous random variable $X$ takes on a particular value is always $0$.

This also can be mathematically justified by the fact that the bin width of the probability density function is infinitesimally small, which means that the width of a specific interval of $X$ converges to a single point of width zero. We therefore have a vertical line for each possible value of $X$:

Recall that the area of a rectangle represents the probability that the random variable is within the corresponding interval. However, in this case, our rectangle's width is infinitesimally small such that it is a vertical line. We know that lines have an area of $0$, and so the probability that $X$ takes on a specific value is $0$.

Definition.

Probability density function

If $X$ is a continuous random variable, then the probability density function (pdf) of $X$ is a function $f(x)$ such that the probability of $X$ taking on a value in the interval $[a,b]$ is given by:

$$\mathbb{P}(a\le{X}\le{b})= \int^b_af(x)\;dx$$

In other words, the probability that $X$ is between values $a$ and $b$ is given by the area under the curve of $f(x)$ in the interval $[a,b]$.

Theorem.

Properties of probability density function

A probability density function $f(x)$ must satisfy the two properties. The first property is that the function must be non-negative for all $x$, that is:

$$f(x)\ge0,\;\;\;\;\; \text{for all }\;x$$

The second property is that the area under the graph of $f(x)$ must be one:

$$\int^\infty_{-\infty}f(x)\;dx=1$$

Proof. Two properties were proven when deriving the definition of the probability density function.

Example.

Computing the probability that a continuous random variable takes on a certain range of values

Consider the following probability density function:

$$f(x)=\begin{cases} \frac{1}{2}x,&0\lt{x}\lt{2}\\ 0,&\text{otherwise} \end{cases}$$

Do the following:

  • verify that the two properties of probability density functions are satisfied.

  • compute $\mathbb{P}(0.5\le{X}\le1.5)$.

Solution. The graph of the probability density function is:

We can clearly see that the function satisfies the first property of not having negative outputs. The area under the curve for the domain $0\le{x}\le2$ is equal to the area of the triangle, that is:

$$\mathbb{P}(0\le{x}\le2)= \frac{1}{2}\times2\times1=1$$

This satisfies the second property of a probability density function. Because the two properties are satisfied, $f(x)$ is a valid probability density function.

Next, the probability $\mathbb{P}(0.5\le{X}\le1.5)$ is given by following area under the curve:

The area of the trapezoid is given by:

$$\begin{align*} \mathbb{P}(0.5\le{X}\le1.5) &=\frac{f(0.5)+f(1.5)}{2}\times(1.5-0.5)\\ &=\frac{1}{2} \end{align*}$$

Special probability density functions

There are several special probability density functions:

  • uniform distribution - a distribution where outcomes are equally likely.

  • normal distribution - a distribution that resembles a bell curve.

  • gamma distribution - a family of right-skewed distributions.

  • exponential and chi-squared distribution - special cases of the gamma distribution.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...