The correlation coefficient of two random variables is a numeric measure between $-1$ and $1$ that captures the linear relationship between them. Recall that the covariance of two variables describes how one variable linearly changes according to other. The correlation coefficient normalizes the covariance such that the value falls between $-1$ and $1$. Later on in this guide, we will prove this.

Definition.

Correlation of two random variables

If $X$ and $Y$ are random variables, then their correlation coefficient (or just correlation) is defined as:

$$\mathrm{corr}(X,Y)=\frac{\mathrm{cov}(X,Y)} {\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}}$$

Where:

$\mathrm{cov}(X,Y)$ is the covariance of $X$ and $Y$.
$\mathbb{V}(X)$ and $\mathbb{V}(Y)$ are the variance of $X$ and $Y$ respectively.

Correlation is bounded between $-1$ and $1$.

Intuition behind correlation

Correlation can be thought of as the normalized version of the covariance. Recall that the covariance $\text{cov}(X,Y)$ is a numeric measure of the association between the random variables. However, the problem with covariance is that covariance is unbounded and is heavily affected by the scale of the random variables. Therefore, we cannot compare the covariance between two pairs of datasets.

Unlike covariance, correlation is always between $-1$ and $1$. As we shall prove later, dividing the covariance $\text{cov}(X,Y)$ by the variance $\mathbb{V}(X)$ and $\mathbb{V}(Y)$ guarantees this mathematical property.

The interpretation of correlation and covariance is very similar:

a positive correlation that is close to $1$ means that there is strong positive linear association between the random variables - as $X$ increases, then $Y$ tends to also increase.
a negative correlation that is close to $-1$ means that there is strong negative linear association between the random variables - as $X$ increases, then $Y$ tends to also decrease.
a near-zero correlation means that there is no linear association between the random variables - as $X$ increases, then $Y$ tends to vary without linearity.

The relationship between correlation and the pattern of data points is illustrated below:

For a more intuitive explanation of correlation, please consult this section on our guide about covariance - remember, correlation is just the normalized version of the covariance so the intuitions are the same!

Example.

Computing covariance of two random variables

Let's revisit an earlier examplelink. Suppose random variables $X$ and $Y$ have the following joint probability mass function:

$f_{X,Y}(x,y)$		$x$			$f_Y(y)$
		$1$	$2$	$3$
$y$	$1$	$0.2$	$0.1$	$0.1$	$0.4$
	$2$	$0.1$	$0.3$	$0.2$	$0.6$
$f_X(x)$		$0.3$	$0.4$	$0.3$

Compute the correlation of $X$ and $Y$.

Solution. The correlation of $X$ and $Y$ is computed by:

$$\mathrm{corr}(X,Y)=\frac{\mathrm{cov}(X,Y)} {\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}}$$

We have already computedlink the covariance of $X$ and $Y$ before:

$$\mathrm{cov}(X,Y)=0.1$$

Next, let's find $\mathbb{V}(X)$ and $\mathbb{V}(Y)$ using the computational formulalink of variance:

$$\begin{equation}\label{eq:o7EilmMgGlrcPSNyJ5E} \begin{aligned}[b] \mathbb{V}(X)&=\mathbb{E}(X^2)-\big[\mathbb{E}(X)\big]^2\\ \mathbb{V}(Y)&=\mathbb{E}(Y^2)-\big[\mathbb{E}(Y)\big]^2 \end{aligned} \end{equation}$$

Let's now find $\mathbb{E}(X^2)$ and $\mathbb{E}(Y^2)$ using the definition of expected value:

$$\begin{align*} \mathbb{E}(X^2) &=\sum_xx^2\cdot{f_X(x)}\\ &=(1)^2(0.3)+(2)^2(0.4)+(3)^2(0.3)\\ &=4.6\\\\ \mathbb{E}(Y^2) &=\sum_yy^2\cdot{f_Y(y)}\\ &=(1)^2(0.4)+(2)^2(0.6)\\ &=2.8 \end{align*}$$

Plugging in the expected values in \eqref{eq:o7EilmMgGlrcPSNyJ5E} gives:

$$\begin{align*} \mathbb{V}(X) &=\mathbb{E}(X^2)-\big[\mathbb{E}(X)\big]^2\\ &=4.6-(2)^2\\ &=0.6\\\\ \mathbb{V}(Y) &=2.8-(1.6)^2\\ &=0.24 \end{align*}$$

We've now found $\mathrm{cov}(X,Y)$, $\mathbb{V}(X)$ and $\mathbb{V}(Y)$. Let's plug them into the formula for correlation:

$$\begin{align*} \mathrm{corr}(X,Y) &=\frac{\mathrm{cov}(X,Y)} {\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}}\\ &=\frac{0.1}{\sqrt{(0.6)(0.24)}}\\ &\approx0.26 \end{align*}$$

Since the correlation is small, we conclude that $X$ and $Y$ have a weak positive linear association.

Theorem.

Correlation is bounded between -1 and 1

The correlation of two random variables is bounded between $-1$ and $1$, that is:

$$-1\le\mathrm{corr}(X,Y)\le1$$

Proof. Suppose we have two random variables $X$ and $Y$. Our first goal is to prove the following useful property:

$$\big[\mathbb{E}(XY)\big]^2 \le \mathbb{E}(X^2)\cdot\mathbb{E}(Y^2)$$

Let $t\in\mathbb{R}$ be any constant, and we define the following function:

$$p(t)= \mathbb{E}\big[(Xt+Y)^2\big]$$

Note that defining custom functions like this are perfectly acceptable in proofs because we don't violate any assumptions. Notice how $(Xt+Y)^2$ is always non-negative because of the square. The expected value of a non-negative random variable will must also be non-negative:

$$\begin{equation}\label{eq:hQq0dN8lW1QwVhDStCZ} p(t)=\mathbb{E}\big[(Xt+Y)^2\big]\ge0 \end{equation}$$

Let's now expand the brackets:

$$\begin{equation}\label{eq:W4ftULTB4klW2dOyzlm} \begin{aligned}[b] p(t)&=\mathbb{E}\big[(Xt+Y)^2\big]\\ &=\mathbb{E}(X^2t^2+2XYt+Y^2)\\ &=t^2\cdot\mathbb{E}(X^2)+2t\cdot\mathbb{E}(XY)+\mathbb{E}(Y^2)\\ \end{aligned} \end{equation}$$

Notice how $p(t)$ is a quadratic function. Since $p(t)\ge0$ from \eqref{eq:hQq0dN8lW1QwVhDStCZ}, we know that there can be at most one root. This means that the discriminant must be less than or equal to $0$. Just as a refresher on elementary mathematics, if we have a non-negative quadratic function $ax^2+bx+c$, then the discriminant is:

$$b^2-4ac\le0$$

Therefore, the discriminant of $p(t)$ in \eqref{eq:W4ftULTB4klW2dOyzlm} is:

$$\big[2\cdot\mathbb{E}(XY)\big]^2 -4\cdot\mathbb{E}(X^2)\cdot\mathbb{E}(Y^2)\le0$$

Simplifying gives us the desired property:

$$\begin{equation}\label{eq:xEP2kqT3afsWywykrvW} \big[\mathbb{E}(XY)\big]^2 \le \mathbb{E}(X^2)\cdot\mathbb{E}(Y^2) \end{equation}$$

Now, let's go back to our initial goal of proving that the correlation coefficient must be bounded between $-1$ and $1$. Let's define new random variables $X_*$ and $Y_*$ such that:

$$\begin{align*} X_*&=X-\mu_X\\ Y_*&=Y-\mu_Y \end{align*}$$

Where $\mu_X$ and $\mu_Y$ are the mean of $X$ and $Y$ respectively. Remember, the property \eqref{eq:xEP2kqT3afsWywykrvW} must hold for any pair of random variables, which includes the pair $X_*$ and $Y_*$, and so:

$$\begin{equation}\label{eq:R85bKXTYpwCyaQ6qyk2} \big[\mathbb{E}(X_*Y_*)\big]^2 \le \mathbb{E}(X_*^2)\cdot\mathbb{E}(Y_*^2) \end{equation}$$

Therefore, we can substitute $X_*$ and $Y_*$ into \eqref{eq:R85bKXTYpwCyaQ6qyk2} to get:

$$\begin{equation}\label{eq:wmiYuklgWblXxsyEF0s} \Big[\mathbb{E}\big[(X-\mu_X)(Y-\mu_Y)\big]\Big]^2 \le \mathbb{E}\big[(X-\mu_X)^2\big]\cdot\mathbb{E}\big[(Y-\mu_Y)^2\big] \end{equation}$$

Now, by definitionlink of the variance of a random variable, we have that:

$$\begin{align*} \mathbb{V}(X)&=\mathbb{E}\big[(X-\mu_X)^2\big]\\ \mathbb{V}(Y)&=\mathbb{E}\big[(Y-\mu_Y)^2\big]\\ \end{align*}$$

Moreover, by definitionlink of covariance, we have that:

$$\mathrm{cov}(X,Y)= \mathbb{E}\big[(X-\mu_X)(Y-\mu_Y)\big]$$

Therefore, \eqref{eq:wmiYuklgWblXxsyEF0s} can be written as:

$$\big[\mathrm{cov}(X,Y)\big]^2 \le \mathbb{V}(X)\cdot\mathbb{V}(Y)$$

We now want to take the square root on both sides, but we must do so carefully because of the inequality. Firstly, notice how the right-hand side is non-negative because variance is non-negative. We can therefore use this theoremlink to take the square root:

$$\begin{equation}\label{eq:ktSWtFvNwavUlIOOwOd} \Big\vert\mathrm{cov}(X,Y)\Big\vert \le \sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)} \end{equation}$$

We can divide both sides by the right-hand term without changing the direction of the inequality because it is positive:

$$\begin{equation}\label{eq:hCFVeMQsqm0WoTtgSgC} \left\vert \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathbb{V}(X)\cdot\mathbb{V}(Y)}} \right\vert \le1 \end{equation}$$

The left hand side is the formula for correlation coefficient:

$$\begin{equation}\label{eq:H2fbGKITIu6eZfeddgD} |\mathrm{corr}(X,Y)| \le1 \end{equation}$$

We're essentially done here, but let's also convert this into an interval:

$$\begin{equation}\label{eq:nbg3cNH19XnFo95ymtH} -1\le\mathrm{corr}(X,Y) \le1 \end{equation}$$

This completes the proof.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!