search
Search
Map of Data Science
Guest 0reps
exit_to_appLog out
Map of data science
Thanks for the thanks!
close
chevron_left Bayesian Statistics
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview Doc Search Code Search Beta SORRY NOTHING FOUND!
mic
Start speaking... Voice search is only supported in Safari and Chrome.
Shrink
Navigate to
A
A
brightness_medium
share
arrow_backShare Twitter Facebook
chevron_left Bayesian Statistics
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

# Introduction to Bayesian Statistics

Probability and Statistics
chevron_right
Bayesian Statistics
schedule Jul 1, 2022
Last updated
local_offer
Tags
expand_more
map
Check out the interactive map of data science

# Philosophy

The pivotal difference between Frequentist and Bayesian statistics is that the former considers the population parameter to be fixed, while the latter treats it as a random variable. Such a difference in the interpretation of the nature of the population parameters carries significant and vast implications. In the eyes of Bayesians, it makes sense to assign probability distributions to parameters and compute expected values and variances, just as we do for any random variable.

# Bayes' Theorem

The Bayes' theorem is as follows:

$$\mathbb{P}(A|B)=\frac{\mathbb{P}(B|A)\cdot{}\mathbb{P}(A)}{\mathbb{P}(B)}$$

If the events $B_1,B_2,…,B_k$ constitute a partition of the sample space $S$ such that $\mathbb{P}(B_i)\ne0$ for $i=1,2,…,k$, then for any event $A$ in $S$ such that $\mathbb{P}(A)\ne0$:

$$\mathbb{P}(B_r|A)= \frac{\mathbb{P}(B_r\cap{A})}{\sum^k_{i=1}\mathbb{P}(B_i\cap{A})} =\frac{\mathbb{P}(B_r)\cdot{\mathbb{P}(A|B_r)}}{\sum^k_{i=1}\mathbb{P}(B_i)\cdot{\mathbb{P}(A|B_i)}}$$

For the continuous case:

$$\begin{equation}\label{eq:oyWcmvy42akS8TCH9AF} f(\theta|x)=\frac{f(x|\theta)\cdot{f(\theta)}}{\int{f(x|\theta)\cdot{f(\theta)}\;d\theta}} \end{equation}$$

Here, the denominator is a normalising constant for the posterior distribution and does not depend on $\theta$. For this reason, we often write \eqref{eq:oyWcmvy42akS8TCH9AF} as:

$$f(\theta|x)\propto{f}(x|\theta)\cdot{f(\theta)}$$

Proof. Using the definition of conditional probability and the rule of elimination, we can derive the Bayes’ theorem:

$$\mathbb{P}(B)= \sum^k_{i=1}\mathbb{P}(A_i\cap{B})= \sum^k_{i=1} \mathbb{P}(B|A_i)\cdot{\mathbb{P}(A_i)}$$

The continuous case is as follows (i.e. this is just a combination of conditional probability and the rule of elimination):

$$f_{\theta}(\theta|x)=\int^\infty_{-\infty}f_X(x|\theta)\cdot{f_{\theta}(\theta)\;\;d\theta}$$

# Terminologies

## Prior Distribution

The initial probability distribution assigned to the parameter is referred to as the prior distribution. Note that the prior distribution is given before we process the data - hence the name "prior".

## Likelihood Function

The likelihood function that we are used was often denoted as $f(\theta|x)$, which makes the fact that we are treating theta as the variable explicit. However, the notation that we use in the Bayesian world differs from this in that, the likelihood function appears to be in reverse order $f(x|\theta)$ at first glance. However, $x$ is still treated as a constant, while the θ is treated as a variable. For this very reason, we do not regard $f(x|\theta)$ as a probability.

## Posterior Distribution

Once we incorporate and process the data, the prior distribution then becomes a posterior distribution, $f_\theta(\theta|\mathbf{x})$. The posterior distribution is the basis for statistical inference in the Bayesian world.

# Example

Question. Company A is trying to estimate how many of their products are defects. Out of the thousands of the products made, the company took a random sample of size n and found that k of them are defects. As an additional insight, suppose company A knows that around 5% of theirs products are defects based on past experience. Determine the posterior distribution.

Solution. We know directly from the question that $X\sim\mathcal{B}(n,\theta)$, which means that the likelihood function is:

$$p(x|n,\theta)=\binom{n}{x}\theta^x(1-\theta)^{n-x}$$

One way of modelling the company's insight of $\theta$ is to use the beta distribution:

$$f_\theta(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\cdot\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}$$

We need to come up with the parameters $\alpha$ and $\beta$ that aligns with the company's insight, that is, we want the beta distribution to be skewed sharply to the left. If we set $\alpha=2$ and $\beta=8$, this seems to be adequate. This means that the prior distribution is:

$$f_\theta(\theta)=\frac{\Gamma(2+8)}{\Gamma(2)\cdot\Gamma(8)}\theta^{2}(1-\theta)^{7}$$

As it turns out, the posterior distribution is:

$$f(\theta|\mathbf{x})$$