Pandas DataFrame | cov method
Start your free 7-days trial now!
Pandas DataFrame.cov(~) method computes the covariance matrix of the columns in the source DataFrame. Note that the unbiased estimator of the covariance is used:
Where,
$N$ is the number of values in a column
$\bar{x}$ is the sample mean of column $\mathbf{x}$
$\bar{y}$ is the sample mean of column $\mathbf{y}$
$x_i$ and $y_i$ are the $i$th value in the column $\mathbf{x}$ and $\mathbf{y}$ respectively.
All NaN values are ignored.
Parameters
1. min_periodslink | int | optional
The minimum number of non-NaN values to compute the covariance.
Return Value
A DataFrame that represents the covariance matrix of the values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
A B0 2 31 4 42 6 5
To compute the covariance of two columns:
df.cov()
A BA 4.0 2.0B 2.0 1.0
Here, we get the following results:
the sample covariance of columns
AandBis2.0.the sample variance of column
Ais4.0and that of columnBis1.0.
Specifying min_periods
Consider the following DataFrame with some missing values:
df
A B0 3.0 5.01 NaN 6.02 4.0 7.0
Setting min_periods=3 yields:
df.cov(min_periods=3)
A BA NaN NaNB NaN 1.0
Here, the reason why we get NaN is that, since the method ignores NaN, column A only has 2 values. Since we've set the minimum threshold to compute the covariance to be 3, we end up with a DataFrame filled with NaN.