search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Comprehensive Guide on Correlation of Two Random Variables

schedule Jan 20, 2024
Last updated
local_offer
Probability and Statistics
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

As a prerequisite to this guide, please be familiar with the concept of covariance of random variables.

What is a correlation coefficient?

The correlation coefficient of two random variables is a numeric measure between $-1$ and $1$ that captures the linear relationship between them. Recall that the covariance of two variables describes how one variable linearly changes according to other. The correlation coefficient normalizes the covariance such that the value falls between $-1$ and $1$. Later on in this guide, we will prove this.

Intuition behind correlation

Correlation can be thought of as the normalized version of the covariance. Recall that the covariance $\text{cov}(X,Y)$ is a numeric measure of the association between the random variables. However, the problem with covariance is that covariance is unbounded and is heavily affected by the scale of the random variables. Therefore, we cannot compare the covariance between two pairs of datasets.

Unlike covariance, correlation is always between $-1$ and $1$. As we shall prove later, dividing the covariance $\text{cov}(X,Y)$ by the variance $\mathbb{V}(X)$ and $\mathbb{V}(Y)$ guarantees this mathematical property.

The interpretation of correlation and covariance is very similar:

  • a positive correlation that is close to $1$ means that there is strong positive linear association between the random variables - as $X$ increases, then $Y$ tends to also increase.

  • a negative correlation that is close to $-1$ means that there is strong negative linear association between the random variables - as $X$ increases, then $Y$ tends to also decrease.

  • a near-zero correlation means that there is no linear association between the random variables - as $X$ increases, then $Y$ tends to vary without linearity.

The relationship between correlation and the pattern of data points is illustrated below:

For a more intuitive explanation of correlation, please consult this section on our guide about covariance - remember, correlation is just the normalized version of the covariance so the intuitions are the same!

Example.

Computing covariance of two random variables

Let's revisit an earlier examplelink. Suppose random variables $X$ and $Y$ have the following joint probability mass function:

$f_{X,Y}(x,y)$

$x$

$f_Y(y)$

$1$

$2$

$3$

$y$

$1$

$0.2$

$0.1$

$0.1$

$0.4$

$2$

$0.1$

$0.3$

$0.2$

$0.6$

$f_X(x)$

$0.3$

$0.4$

$0.3$

Compute the correlation of $X$ and $Y$.

Solution. The correlation of $X$ and $Y$ is computed by:

$$\mathrm{corr}(X,Y)=\frac{\mathrm{cov}(X,Y)} {\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}}$$

We have already computedlink the covariance of $X$ and $Y$ before:

$$\mathrm{cov}(X,Y)=0.1$$

Next, let's find $\mathbb{V}(X)$ and $\mathbb{V}(Y)$ using the computational formulalink of variance:

$$\begin{equation}\label{eq:o7EilmMgGlrcPSNyJ5E} \begin{aligned}[b] \mathbb{V}(X)&=\mathbb{E}(X^2)-\big[\mathbb{E}(X)\big]^2\\ \mathbb{V}(Y)&=\mathbb{E}(Y^2)-\big[\mathbb{E}(Y)\big]^2 \end{aligned} \end{equation}$$

Let's now find $\mathbb{E}(X^2)$ and $\mathbb{E}(Y^2)$ using the definition of expected value:

$$\begin{align*} \mathbb{E}(X^2) &=\sum_xx^2\cdot{f_X(x)}\\ &=(1)^2(0.3)+(2)^2(0.4)+(3)^2(0.3)\\ &=4.6\\\\ \mathbb{E}(Y^2) &=\sum_yy^2\cdot{f_Y(y)}\\ &=(1)^2(0.4)+(2)^2(0.6)\\ &=2.8 \end{align*}$$

Plugging in the expected values in \eqref{eq:o7EilmMgGlrcPSNyJ5E} gives:

$$\begin{align*} \mathbb{V}(X) &=\mathbb{E}(X^2)-\big[\mathbb{E}(X)\big]^2\\ &=4.6-(2)^2\\ &=0.6\\\\ \mathbb{V}(Y) &=2.8-(1.6)^2\\ &=0.24 \end{align*}$$

We've now found $\mathrm{cov}(X,Y)$, $\mathbb{V}(X)$ and $\mathbb{V}(Y)$. Let's plug them into the formula for correlation:

$$\begin{align*} \mathrm{corr}(X,Y) &=\frac{\mathrm{cov}(X,Y)} {\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}}\\ &=\frac{0.1}{\sqrt{(0.6)(0.24)}}\\ &\approx0.26 \end{align*}$$

Since the correlation is small, we conclude that $X$ and $Y$ have a weak positive linear association.

Theorem.

Correlation is bounded between -1 and 1

The correlation of two random variables is bounded between $-1$ and $1$, that is:

$$-1\le\mathrm{corr}(X,Y)\le1$$

Proof. Suppose we have two random variables $X$ and $Y$. Our first goal is to prove the following useful property:

$$\big[\mathbb{E}(XY)\big]^2 \le \mathbb{E}(X^2)\cdot\mathbb{E}(Y^2)$$

Let $t\in\mathbb{R}$ be any constant, and we define the following function:

$$p(t)= \mathbb{E}\big[(Xt+Y)^2\big]$$

Note that defining custom functions like this are perfectly acceptable in proofs because we don't violate any assumptions. Notice how $(Xt+Y)^2$ is always non-negative because of the square. The expected value of a non-negative random variable will must also be non-negative:

$$\begin{equation}\label{eq:hQq0dN8lW1QwVhDStCZ} p(t)=\mathbb{E}\big[(Xt+Y)^2\big]\ge0 \end{equation}$$

Let's now expand the brackets:

$$\begin{equation}\label{eq:W4ftULTB4klW2dOyzlm} \begin{aligned}[b] p(t)&=\mathbb{E}\big[(Xt+Y)^2\big]\\ &=\mathbb{E}(X^2t^2+2XYt+Y^2)\\ &=t^2\cdot\mathbb{E}(X^2)+2t\cdot\mathbb{E}(XY)+\mathbb{E}(Y^2)\\ \end{aligned} \end{equation}$$

Notice how $p(t)$ is a quadratic function. Since $p(t)\ge0$ from \eqref{eq:hQq0dN8lW1QwVhDStCZ}, we know that there can be at most one root. This means that the discriminant must be less than or equal to $0$. Just as a refresher on elementary mathematics, if we have a non-negative quadratic function $ax^2+bx+c$, then the discriminant is:

$$b^2-4ac\le0$$

Therefore, the discriminant of $p(t)$ in \eqref{eq:W4ftULTB4klW2dOyzlm} is:

$$\big[2\cdot\mathbb{E}(XY)\big]^2 -4\cdot\mathbb{E}(X^2)\cdot\mathbb{E}(Y^2)\le0$$

Simplifying gives us the desired property:

$$\begin{equation}\label{eq:xEP2kqT3afsWywykrvW} \big[\mathbb{E}(XY)\big]^2 \le \mathbb{E}(X^2)\cdot\mathbb{E}(Y^2) \end{equation}$$

Now, let's go back to our initial goal of proving that the correlation coefficient must be bounded between $-1$ and $1$. Let's define new random variables $X_*$ and $Y_*$ such that:

$$\begin{align*} X_*&=X-\mu_X\\ Y_*&=Y-\mu_Y \end{align*}$$

Where $\mu_X$ and $\mu_Y$ are the mean of $X$ and $Y$ respectively. Remember, the property \eqref{eq:xEP2kqT3afsWywykrvW} must hold for any pair of random variables, which includes the pair $X_*$ and $Y_*$, and so:

$$\begin{equation}\label{eq:R85bKXTYpwCyaQ6qyk2} \big[\mathbb{E}(X_*Y_*)\big]^2 \le \mathbb{E}(X_*^2)\cdot\mathbb{E}(Y_*^2) \end{equation}$$

Therefore, we can substitute $X_*$ and $Y_*$ into \eqref{eq:R85bKXTYpwCyaQ6qyk2} to get:

$$\begin{equation}\label{eq:wmiYuklgWblXxsyEF0s} \Big[\mathbb{E}\big[(X-\mu_X)(Y-\mu_Y)\big]\Big]^2 \le \mathbb{E}\big[(X-\mu_X)^2\big]\cdot\mathbb{E}\big[(Y-\mu_Y)^2\big] \end{equation}$$

Now, by definitionlink of the variance of a random variable, we have that:

$$\begin{align*} \mathbb{V}(X)&=\mathbb{E}\big[(X-\mu_X)^2\big]\\ \mathbb{V}(Y)&=\mathbb{E}\big[(Y-\mu_Y)^2\big]\\ \end{align*}$$

Moreover, by definitionlink of covariance, we have that:

$$\mathrm{cov}(X,Y)= \mathbb{E}\big[(X-\mu_X)(Y-\mu_Y)\big]$$

Therefore, \eqref{eq:wmiYuklgWblXxsyEF0s} can be written as:

$$\big[\mathrm{cov}(X,Y)\big]^2 \le \mathbb{V}(X)\cdot\mathbb{V}(Y)$$

We now want to take the square root on both sides, but we must do so carefully because of the inequality. Firstly, notice how the right-hand side is non-negative because variance is non-negative. We can therefore use this theoremlink to take the square root:

$$\begin{equation}\label{eq:ktSWtFvNwavUlIOOwOd} \Big\vert\mathrm{cov}(X,Y)\Big\vert \le \sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)} \end{equation}$$

We can divide both sides by the right-hand term without changing the direction of the inequality because it is positive:

$$\begin{equation}\label{eq:hCFVeMQsqm0WoTtgSgC} \left\vert \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}} \right\vert \le1 \end{equation}$$

The left hand side is the formula for correlation coefficient:

$$\begin{equation}\label{eq:H2fbGKITIu6eZfeddgD} |\mathrm{corr}(X,Y)| \le1 \end{equation}$$

We're essentially done here, but let's also convert this into an interval:

$$\begin{equation}\label{eq:nbg3cNH19XnFo95ymtH} -1\le\mathrm{corr}(X,Y) \le1 \end{equation}$$

This completes the proof.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...