Comprehensive Guide on Sample Correlation
Start your free 7-days trial now!
The prerequisites of this guide are as follows:
sample variance.
sample covariance.
Sample correlation coefficient
Suppose we have two samples
Where:
is the sample covariance of and . and are the sample variance of and respectively.
Note that the sample correlation coefficient is sometimes referred to as:
correlation
sample correlation
Pearson product-moment correlation coefficient (PPMCC)
Pearson's correlation coefficient
Pearson’s r
Intuition behind sample correlation
Recall that the sample covariance measures the association between two variables:
Negative covariance (as | Zero covariance (as | Positive covariance (as |
---|---|---|
![]() | ![]() | ![]() |
For a detailed explanation and intuition behind this diagram, please consult our guide on sample covariance. The problem with covariance is that covariance is largely affected by the scale of the samples, and so a high covariance does not necessarily mean that two variables have a strong positive association.
The sample correlation rectifies this issue by dividing the covariance by the standard deviation of the
Here's how to interpret correlation:
a correlation close to
: there is a strong positive association between the two variables. As increases, also tends to increase. We say that there is a strong positive linear relationship between and .a correlation close to
: no association between the two variables. This means that does not change linearly with .a correlation close to
: there is a strong negative association between the two variables. As increases, tends to decrease. We say that there is a strong negative linear relationship between and .
We illustrate these cases below:
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
Notice how when
Another equation for sample correlation
The sample correlation
Where:
and are the sample mean of and respectively. is the sample size.
Proof. Recall that the formal definition of sample correlation is:
Where
Whereas the sample variances
Substituting
This completes the proof.
Computing the sample correlation by hand
Suppose we have the following dataset:
2 | 3 |
3 | 5 |
5 | 8 |
6 | 12 |
Compute the sample correlation of
Solution. Let's use the formal definition to compute the correlation coefficient:
We first need to compute the sample means
In our example,
The covariance
Here, we're just leaving
The variance
The variance
Putting this all together:
Because the correlation coefficient is close to

Indeed, we can see that as
Computing sample correlation using Python
We can easily compute sample correlation by using Python's numpy
library:
filter_none
Copy
array([[1. , 0.97913005], [0.97913005, 1. ]])
Here, the x
and y
are the same values we used for the previous example. NumPy's corrcoef(~)
method returns a symmetric correlation matrix whose diagonals are always 1. To extract the sample correlation, we use NumPy's [~]
syntax:
filter_none
Copy
corr_matrix[0][1]
0.9791300486523296
This is roughly equal to the sample correlation we computed by hand!