search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
chevron_leftBayesian Statistics
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

Introduction to Bayesian Statistics

schedule Aug 12, 2023
Last updated
local_offer
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Philosophy

The pivotal difference between Frequentist and Bayesian statistics is that the former considers the population parameter to be fixed, while the latter treats it as a random variable. Such a difference in the interpretation of the nature of the population parameters carries significant and vast implications. In the eyes of Bayesians, it makes sense to assign probability distributions to parameters and compute expected values and variances, just as we do for any random variable.

Bayes' Theorem

The Bayes' theorem is as follows:

$$\mathbb{P}(A|B)=\frac{\mathbb{P}(B|A)\cdot{}\mathbb{P}(A)}{\mathbb{P}(B)} $$

If the events $B_1,B_2,…,B_k$ constitute a partition of the sample space $S$ such that $\mathbb{P}(B_i)\ne0$ for $i=1,2,…,k$, then for any event $A$ in $S$ such that $\mathbb{P}(A)\ne0$:

$$\mathbb{P}(B_r|A)= \frac{\mathbb{P}(B_r\cap{A})}{\sum^k_{i=1}\mathbb{P}(B_i\cap{A})} =\frac{\mathbb{P}(B_r)\cdot{\mathbb{P}(A|B_r)}}{\sum^k_{i=1}\mathbb{P}(B_i)\cdot{\mathbb{P}(A|B_i)}}$$

For the continuous case:

$$\begin{equation}\label{eq:oyWcmvy42akS8TCH9AF} f(\theta|x)=\frac{f(x|\theta)\cdot{f(\theta)}}{\int{f(x|\theta)\cdot{f(\theta)}\;d\theta}} \end{equation}$$

Here, the denominator is a normalising constant for the posterior distribution and does not depend on $\theta$. For this reason, we often write \eqref{eq:oyWcmvy42akS8TCH9AF} as:

$$f(\theta|x)\propto{f}(x|\theta)\cdot{f(\theta)}$$

Proof. Using the definition of conditional probability and the rule of elimination, we can derive the Bayes’ theorem:

$$\mathbb{P}(B)= \sum^k_{i=1}\mathbb{P}(A_i\cap{B})= \sum^k_{i=1} \mathbb{P}(B|A_i)\cdot{\mathbb{P}(A_i)}$$

The continuous case is as follows (i.e. this is just a combination of conditional probability and the rule of elimination):

$$f_{\theta}(\theta|x)=\int^\infty_{-\infty}f_X(x|\theta)\cdot{f_{\theta}(\theta)\;\;d\theta}$$

Terminologies

Prior Distribution

The initial probability distribution assigned to the parameter is referred to as the prior distribution. Note that the prior distribution is given before we process the data - hence the name "prior".

Likelihood Function

The likelihood function that we are used was often denoted as $f(\theta|x)$, which makes the fact that we are treating theta as the variable explicit. However, the notation that we use in the Bayesian world differs from this in that, the likelihood function appears to be in reverse order $f(x|\theta)$ at first glance. However, $x$ is still treated as a constant, while the θ is treated as a variable. For this very reason, we do not regard $f(x|\theta)$ as a probability.

Posterior Distribution

Once we incorporate and process the data, the prior distribution then becomes a posterior distribution, $f_\theta(\theta|\mathbf{x})$. The posterior distribution is the basis for statistical inference in the Bayesian world.

Example

Question. Company A is trying to estimate how many of their products are defects. Out of the thousands of the products made, the company took a random sample of size n and found that k of them are defects. As an additional insight, suppose company A knows that around 5% of theirs products are defects based on past experience. Determine the posterior distribution.

Solution. We know directly from the question that $X\sim\mathcal{B}(n,\theta)$, which means that the likelihood function is:

$$p(x|n,\theta)=\binom{n}{x}\theta^x(1-\theta)^{n-x}$$

One way of modelling the company's insight of $\theta$ is to use the beta distribution:

$$f_\theta(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\cdot\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}$$

We need to come up with the parameters $\alpha$ and $\beta$ that aligns with the company's insight, that is, we want the beta distribution to be skewed sharply to the left. If we set $\alpha=2$ and $\beta=8$, this seems to be adequate. This means that the prior distribution is:

$$f_\theta(\theta)=\frac{\Gamma(2+8)}{\Gamma(2)\cdot\Gamma(8)}\theta^{2}(1-\theta)^{7}$$

As it turns out, the posterior distribution is:

$$f(\theta|\mathbf{x})$$
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!