search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Properties of expected values

schedule Aug 12, 2023
Last updated
local_offer
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

This guide will go over the mathematical properties of the expected value of a random variable. For those unfamiliar with the concept of expected values, please check out our comprehensive guide on expected value first.

The proofs we provide here will be for discrete random variables, but the properties hold for continuous random variables as well. The proof for the continuous case is analogous to the discrete case with the summation sign essentially replaced by the integral sign.

Theorem.

Expected value of a constant

The expected value of a scalar constant $c$ is:

$$\mathbb{E}(c)=c$$

Proof. Let $X$ denote a discrete or continuous random variable with a probability mass function $p(x)$. Let $c$ be the outcome of some function $g(x)$, that is $c=g(x)$. Now, using the definition of expected values:

$$\begin{align*} \mathbb{E}(c) &=\mathbb{E}[g(x)]\\ &=\sum_xg(x)\cdot{p(x)}\\ &=\sum_x{c}\cdot{p(x)}\\ &=c\sum_x{p(x)}\\ &=c\\ \end{align*}$$

This completes the proof.

Theorem.

Expected value of a constant times a random variable

If $X$ is a random variable and $c$ is a constant, then the expected value of their product is:

$$\mathbb{E}(cX)=c\cdot{\mathbb{E}(X)}$$

Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:

$$\begin{align*} \mathbb{E}(cX) &=\sum_xcx\cdot{p(x)}\\ &=c\sum_xx\cdot{p(x)}\\ &=c\cdot\mathbb{E}(X) \end{align*}$$

This completes the proof.

Theorem.

Expected value of a random variable plus a constant

If $X$ is a random variable and $c$ is a constant, then the expected value of their sum is:

$$\mathbb{E}(X+c)=\mathbb{E}(X)+c$$

Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:

$$\begin{align*} \mathbb{E}(X+c) &=\sum_x(x+c)\cdot{p(x)}\\ &=\sum_x\Big(x\cdot{p(x)}+c\cdot{p(x)}\Big)\\ &=\sum_x\Big(x\cdot{p(x)}\Big)+\sum_x\Big(c\cdot{p(x)}\Big)\\ &=\mathbb{E}(X)+c\cdot\sum_x\Big(p(x)\Big)\\ &=\mathbb{E}(X)+c(1)\\ &=\mathbb{E}(X)+c\\ \end{align*}$$

This completes the proof.

Theorem.

Expected value of XY where X and Y are independent

If $X$ and $Y$ are independent random variables, then:

$$\mathbb{E}(XY)=\mathbb{E}(X)\cdot\mathbb{E}(Y)$$

Proof. Let $X$ and $Y$ are discrete random variables with probability mass functions $p_X(x)$ and $p_Y(y)$, respectively. Let $p_{X,Y}(x,y)$ denote their joint probability mass function. From the definition of expected value, we have that:

$$\begin{align*} \mathbb{E}(XY) &=\sum_i\sum_jx_iy_j\cdot{p_{X,Y}(x_i,y_j)}\\ &=\sum_i\sum_jx_iy_j\cdot{p_X(x_i)\cdot{p_Y(y_j)}}\\ &=\Big(\sum_ix_i\cdot{p_X(x_i)}\Big)\Big(\sum_jy_j\cdot{p_Y(y_j)}\Big)\\ &=\mathbb{E}(X)\cdot\mathbb{E}(Y) \end{align*}$$

Here, the second equality holds because $p_{X,Y}(x_i,y_j)=p_X(x_i)\cdot{p_Y(y_j)}$ is true if $X$ and $Y$ are independent random variables.

Theorem.

Expected value of a sum of two random variables (X+Y)

If $X$ and $Y$ are random variables, then the expected value of their sum $X+Y$ is:

$$\mathbb{E}(X+Y)= \mathbb{E}(X)+\mathbb{E}(Y)$$

Proof. Let $X$ and $Y$ are discrete random variables with probability mass functions $p_X(x)$ and $p_Y(y)$, respectively. Let $p_{X,Y}(x,y)$ denote their joint probability mass function. From the definition of expected value, we have that:

$$\begin{align*} \mathbb{E}(X+Y) &=\sum_i\sum_j(x_i+y_j)\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_i\sum_jx_i\cdot{p_{XY}(x_i,y_j)}+y_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_i\sum_jx_i\cdot{p_{XY}(x_i,y_j)}+\sum_i\sum_jy_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\sum_j{p_{XY}(x_i,y_j)}+\sum_j\sum_iy_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\sum_j{p_{XY}(x_i,y_j)}+\sum_jy_j\sum_i{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\cdot{p_{X}(x_i)}+\sum_jy_j\cdot{p_Y(y_j)}\\ &=\mathbb{E}(X)+\mathbb{E}(Y) \end{align*}$$

This completes the proof.

Theorem.

Linearity of expected values

If $X$ and $Y$ are two random variables and $a$ and $b$ are some constants, then:

$$\mathbb{E}(aX+bY) =a\cdot{\mathbb{E}(X)}+ b\cdot{\mathbb{E}(Y)}$$

Proof. The linearity of expected values follows from two of the properties of expected values below that we have already proven:

$$\begin{align*} \mathbb{E}(X+Y)&=\mathbb{E}(X)+\mathbb{E}(Y)\\ \mathbb{E}(aX)&=a\cdot\mathbb{E}(X) \end{align*}$$

The proof is as follows:

$$\begin{align*} \mathbb{E}(aX+bY) &=\mathbb{E}(aX)+\mathbb{E}(bY)\\ &=a\cdot\mathbb{E}(X)+b\cdot\mathbb{E}(Y) \end{align*}$$

This completes the proof.

Theorem.

Taking summation sign in and out of expected values

If $X_1,X_2,\cdots,X_n$ are random variables, then:

$$\sum_{i=1}^n \mathbb{E}(X_i)= \mathbb{E}\Big(\sum_{i=1}^nX_i\Big)$$

Proof. We can use the linearity of expected values to prove this easily:

$$\begin{align*} \sum_{i=1}^n \mathbb{E}(X_i) &=\mathbb{E}(X_1)+\mathbb{E}(X_2)+\cdots+\mathbb{E}(X_n)\\ &=\mathbb{E}(X_1+X_2+\cdots+X_n)\\ &=\mathbb{E}\Big(\sum^n_{i=1}X_i\Big) \end{align*}$$

This completes the proof.

Theorem.

Expected value of X given Y where X and Y are independent

If $X$ and $Y$ are independent random variables, then:

$$\mathbb{E}(X|Y)=\mathbb{E}(X)$$

Proof. We used the fact that $p(x|y)=p(x)$ given $X$ and $Y$ are independent:

$$\begin{align*} \mathbb{E}(X|Y=y) &=\sum_xx\cdot{p(x|y)}\\ &=\sum_xx\cdot{p(x)}\\ &=\mathbb{E}(X) \end{align*}$$

This completes the proof.

Theorem.

Bounding random variable

If a random variable $X$ is bounded between scalars $a$ and $b$, the expected value of $X$ is also bounded between $a$ and $b$, that is:

$$a\le{X}\le{b} \;\;\;\implies\;\;\; a\le\mathbb{E}(X)\le{b}$$

Proof. The idea is to incrementally apply transformations to the inequality such that $X$ becomes $\mathbb{E}(X)$. We begin from the left-hand side:

$$a\le{x}\le{b}$$

Multiplying every term by the probability distribution function of X gives:

$$\begin{equation}\label{eq:lLrkhvQDcb8ndDEZep8} a\cdot{p(x)}\le{x}\cdot{p(x)}\le{b}\cdot{p(x)} \end{equation}$$

This is allowed because $p(x)\ge0$. Let's take a moment to understand what \eqref{eq:lLrkhvQDcb8ndDEZep8} means. Suppose the possible values that $X$ can take is $\{x_1,x_2,x_3\}$. \eqref{eq:lLrkhvQDcb8ndDEZep8} implies that the following are all true:

$$\begin{align*} a_1\cdot{p(x_1)}\le{x_1}\cdot{p(x_1)}\le{b}\cdot{p(x_1)}\\ a_2\cdot{p(x_2)}\le{x_2}\cdot{p(x_2)}\le{b}\cdot{p(x_2)}\\ a_3\cdot{p(x_3)}\le{x_3}\cdot{p(x_3)}\le{b}\cdot{p(x_3)} \end{align*}$$

Let's add the three inequalities:

$$\sum_{i=1}^{3}a\cdot{p(x_i)} \le\sum_{i=1}^{3}{x_i}\cdot{p(x_i)} \le\sum_{i=1}^{3}{b}\cdot{p(x_i)}$$

To generalize, let's say $X$ can take on the values $\{x_1,x_2,\cdots,x_n\}$, which will lead to:

$$\sum_{i=1}^{n}a\cdot{p(x_i)} \le\sum_{i=1}^{n}{x_i}\cdot{p(x_i)} \le\sum_{i=1}^{n}{b}\cdot{p(x_i)}$$

Notice how the middle term is the definition of $\mathbb{E}(X)$! We can also take out $a$ and $b$ from the summation since they are constants:

$$\begin{equation}\label{eq:AHOFCS3qRCNcPbYKujk} a\sum_{i=1}^{n}p(x_i) \le\mathbb{E}(X) \le{b\sum_{i=1}^{n}p(x_i)} \end{equation}$$

Finally, by definition of probability mass function, the sum of the probabilities of all possible values of a random variable must sum up to one, that is:

$$\sum_{i=1}^{n}p(x_i)=1$$

Using this property on \eqref{eq:AHOFCS3qRCNcPbYKujk} gives us the desired result:

$$a\le\mathbb{E}(X)\le{b}$$

This completes the proof.

Theorem.

Expected value of a function of random variables

Let $X$ be a random variable with a known probability mass function $p(x)$, and $Y$ be another random variable that is a function of the original random variable $X$, say $Y=g(x)$. The expected value of $Y$ is:

$$\mathbb{E}(Y) =\mathbb{E}[g(x)] =\sum_{x}g(x)\cdot{p(x)}$$

Proof. Let's work with a simple example first for some intuition before we tackle the formal proof. Suppose $X$ is a random variable that can take on the following $3$ different values that occur with some known probabilities:

$X$

$\mathbb{P}(X=x)$

$-1$

$\mathbb{P}(X=-1)$

$1$

$\mathbb{P}(X=1)$

$2$

$\mathbb{P}(X=2)$

Here, instead of explicitly assigning probabilities (e.g. $\mathbb{P}(X=-1)=0.3$), we stick with the general form.

Now, let $Y$ be a random variable defined by $Y=X^2$. Our goal is to compute the expected value of $Y$, that is, $\mathbb{E}(Y)$. Let's start by drawing a diagram that visualizes the function $y=g(x)=x^2$:

Here, notice how $X=-1$ and $X=1$ are mapped to the same $Y$ value. This is because our function $g(x)$ is a many-to-one function, that is, multiple inputs can output the same value. To compute the probability $\mathbb{P}(Y=1)$, we need to sum up all the probabilities that result in $Y=1$. In this case, $X=-1$ and $X=1$ both result in $Y=1$ so:

$$\begin{equation}\label{eq:jyZgThRLEE7WVZHvlzS} \mathbb{P}(Y=1)=\mathbb{P}(X=-1)+\mathbb{P}(X=1) \end{equation}$$

We can write this more generally as:

$$\begin{equation}\label{eq:EKGL0NiHnsSUQpYAqMJ} \mathbb{P}(Y=y_1) =\mathbb{P}(X=x_1) +\mathbb{P}(X=x_2) \end{equation}$$

Where $y_1=g(x_1)=g(x_2)$. Similarly, $\mathbb{P}(Y=4)$ can be computed by:

$$\mathbb{P}(Y=4)=\mathbb{P}(X=2)$$

Or more generally by:

$$\begin{equation}\label{eq:Xr9cwUTNraDeWtdvs5U} \mathbb{P}(Y=y_2) =\mathbb{P}(X=x_3) \end{equation}$$

Where $y_2=g(x_3)$.

Now, by referring to \eqref{eq:EKGL0NiHnsSUQpYAqMJ} and \eqref{eq:Xr9cwUTNraDeWtdvs5U}, we can generalize the process of computing the probability of any $Y$ using the following summation:

$$\begin{equation}\label{eq:HDzS3cB1MNYZ4S2sIej} \mathbb{P}(Y=y_j)= \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} \mathbb{P}(X=x_i) \end{equation}$$

Again, the reason why the summation is necessary here is that the function $g(x)$ may not necessarily be one-to-one (e.g. $g(x)=x^2$).

Now, let's go back to our original goal of computing $\mathbb{E}(Y)$. From the definition of expected value, we have that:

$$\begin{equation}\label{eq:nQpvD0Xh9mTv1Ngz6EN} \mathbb{E}(Y) =\sum_{j=1}^2y_j\cdot\mathbb{P}(Y=y_j) \end{equation}$$

Substituting \eqref{eq:HDzS3cB1MNYZ4S2sIej} into \eqref{eq:nQpvD0Xh9mTv1Ngz6EN} gives:

$$\begin{equation}\label{eq:NXSEJ3sYDPVDuFvBakh} \begin{aligned}[b] \mathbb{E}(Y) &=\sum_{j=1}^2y_j\cdot \Big( \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} \mathbb{P}(X=x_i) \Big)\\ &= \sum_{j=1}^2 \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} y_j\cdot\mathbb{P}(X=x_i)\\ &= y_1\cdot\mathbb{P}(X=-1)+ y_1\cdot\mathbb{P}(X=1)+ y_2\cdot\mathbb{P}(X=2)\\ &= g(x_1)\cdot\mathbb{P}(X=-1)+ g(x_1)\cdot\mathbb{P}(X=1)+ g(x_2)\cdot\mathbb{P}(X=2)\\ &=\sum_{i=1}^3g(x_i)\cdot\mathbb{P}(X=x_i)\\ &=\sum_{x}g(x)\cdot\mathbb{P}(X=x) \end{aligned} \end{equation}$$

Let's generalize this further:

  • instead of 3 values for random variable $X$, let's make this $m$.

  • instead of 2 values for random variable $Y$, let's make this $n$.

  • if $g(x)$ is a one-to-one function, then $m=n$.

  • if $g(x)$ is a many-to-one function (as in our example), then $m\ge{n}$.

  • instead of a probability $\mathbb{P}(X=x)$, we use a probability mass (or density) function $p(x)$.

The formal proof simply involves using these general variables - the logic is exactly the same:

$$\begin{align*} \mathbb{E}(Y) &=\sum_{j=1}^ny_j\cdot \Big( \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} {p(x_i)} \Big)\\ &= \sum_{j=1}^n \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} y_j\cdot{p(x_i)}\\ &=\sum_{i=1}^mg(x_i)\cdot{p(x_i)}\\ &=\sum_{x}g(x)\cdot{p(x)} \end{align*}$$

This completes the proof.

Theorem.

Law of total expectation

If $ X$ and $Y$ are random variables, then the Law of Total Expectation states that:

$$\mathbb{E}_{Y}(\mathbb{E}(X|Y))=\mathbb{E}(X)$$

To emphasize that $\mathbb{E}(X|Y)$ is a random quantity based on $Y$, the outer expected value is subscripted with $Y$.

From the definition of expected values, the Law of Total Expectation can also be written as:

$$\begin{align*} \mathbb{E}(X) =\mathbb{E}_{Y}[\mathbb{E}(X|Y)] =\sum_{y}\Big[\mathbb{E}(X|Y)\cdot{p(y)}\Big] \end{align*}$$

Where $p(y)$ is the probability distribution of $Y$.

Proof. We use the definition of expected values rewrite term:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)]&= \mathbb{E}_Y\Big[\sum_xx\cdot{p(x|Y)}\Big]\\ \end{align*}$$

We again use the definition of expected values:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)] &=\sum_y\Big[\sum_xx\cdot{p(x|y)}\Big]\cdot{p(y)}\\ \end{align*}$$

We rearrange the summations:

$$\begin{equation}\label{eq:TNDHb0OXFTj2QNpgYRH} \mathbb{E}_Y[\mathbb{E}(X|Y)]=\sum_xx\sum_y{p(x|y)}\cdot{p(y)} \end{equation}$$

We know from the definition of conditional probability that:

$$p(x|y)=\frac{p(x,y)}{p(y)} \;\;\;\;\; \Longleftrightarrow \;\;\;\;\; p(x,y)=p(x|y)\cdot{p(y)}$$

Therefore, \eqref{eq:TNDHb0OXFTj2QNpgYRH} becomes:

$$\begin{equation}\label{eq:u74Q0ZeRpZLHy47b4G1} \mathbb{E}_Y[\mathbb{E}(X|Y)]=\sum_xx\sum_y{p(x,y)} \end{equation}$$

Now, the marginal distribution $p(x)$ is defined as:

$$p(x)=\sum_yp(x,y)$$

Substituting this into \eqref{eq:u74Q0ZeRpZLHy47b4G1} gives:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)] &=\sum_xx\cdot{p(x)}\\ &=\mathbb{E}(X)\\ \end{align*}$$

This completes the proof.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...