search
Search
Login
Map of Data Science
menu
menu search toc more_vert
Robocat
Guest 0reps
Sign up
Log in
account_circleMy Profile homeAbout paidPricing
emailContact us
exit_to_appLog out
Map of data science
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Sample estimators

Probability and Statistics
chevron_right
Basic estimators
schedule Oct 5, 2022
Last updated
local_offer
Tags
map
Check out the interactive map of data science

What is a sample estimator?

To estimate a population parameter, we typically extract a small sample from the population and compute the estimate based on the sample. For instance, to compute the average salary of a fresh graduate, we would go out and find 100 random fresh graduates and ask for their salaries. We would then take the average of the 100 salaries to estimate the mean salary of all fresh graduates.

Because we are picking random fresh graduates, the salaries will also be random. Therefore, we can represent the salary as a random variable $X$. Our sample $\boldsymbol{X}$ consists of $n=100$ random salaries:

$$\boldsymbol{X}=(X_1,X_2,\cdots,X_n)$$

Intuition should tell you that the mean of this sample would be a good estimate of the mean salary of all fresh graduates:

$$\bar{X}=\frac{1}{n}\sum^n_{i=1}X_i$$

Here, $\bar{X}$ is known as the sample mean. More generally, the sample mean is a type of sample estimator because we are using sampled data to estimate a population parameter. Mathematically, a sample estimator is defined as a function of observations $X_1,X_2,\cdots,X_n$ of a random sample that estimates a population parameter.

Defining multiple estimators for a population parameter

The sample mean is the most common estimator for the population mean, but we could use any other estimator for the population mean. For instance, consider the estimator $Y$ below:

$$Y=\frac{1}{n-1}\sum^n_{i=1}X_i$$

We could also use $Y$ to estimate the population mean, although we will later prove that the sample mean $\bar{X}$ is a better estimator than $Y$. The point here is that estimators can be of any form - there is potentially an infinite number of estimators we can define for a population parameter. Of course, some estimators possess properties that make them more desirable than others.

Estimators are random variables

Because a sample estimator is a function of the random variables $X_i$, the sample estimator itself is a random variable. Here's a diagram that illustrates this point for the sample mean:

Every time we draw a sample from the population, we will end up with different values for the sample mean because the fresh graduates we ask are randomly chosen. For instance, for the first sample, the sample mean could be $\$1200$, and for the second sample, the sample mean could be $\$1100$.

Because estimators are random variables, it makes sense to talk about properties related to random variables such as their expected value, variance and probability distribution. In the upcoming sections, we will derive these properties for common sample estimators such as the sample mean.

Sampling distribution of an estimator

Suppose we ask 100 random fresh graduates for their salaries to form a single sample. Let's say the mean of this sample is $\$1200$. Next, we ask another 100 random fresh graduates to obtain a second sample. This time, let's say the sample mean is $\$1100$. Suppose we repeat this sampling process 1000 times, which means we would end up with 1000 samples (each of size 100) and of course, 1000 sample means. Let's illustrate the sampling process:

Let's plot the frequency histogram of our 1000 sample means:

This distribution is known as the sampling distribution of the sample mean. The sampling distribution of an estimator is the theoretical distribution of an estimate computed by the estimator for an infinite number of samples. The histogram above is specifically the sampling distribution for the sample mean - other estimators may have a different sampling distribution.

As we shall explore later, the central limit theorem guarantees that regardless of what the true distribution is for our original random variable $X$, the sampling distribution of the sample mean will approximately follow a normal distribution. Therefore, it's no coincidence that we ended up with a normal-looking distribution above!

Common sample estimators

In most cases, we are interested in estimating population parameters such as the population mean, variance, covariance and correlation. We will cover the sample estimators for each of these parameters:

  • sample mean to estimate population mean.

  • sample variance to estimate population variance.

  • sample covariance to estimate population covariance.

  • sample correlation to estimate population correlation.

We will also discuss basic statistical properties of sample estimators such as bias and mean square error.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!