# Pandas DataFrame | cov method

Programming
Python
Pandas
Documentation
DataFrame
Basic and Descriptive Statistics
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags
Pandas DataFrame.cov(~) method computes the covariance matrix of the columns in the source DataFrame. Note that the unbiased estimator of the covariance is used:

$$\mathrm{cov}(\mathbf{x},\mathbf{y})=\frac{1}{N-1}\sum_{i=0}^{N-1}\left[\left(x_i-\bar{x}\right)(y_i-\bar{y})\right]$$

Where,

• $N$ is the number of values in a column

• $\bar{x}$ is the sample mean of column $\mathbf{x}$

• $\bar{y}$ is the sample mean of column $\mathbf{y}$

• $x_i$ and $y_i$ are the $i$th value in the column $\mathbf{x}$ and $\mathbf{y}$ respectively.

NOTE

All NaN values are ignored.

# Parameters

1. min_periodslink | int | optional

The minimum number of non-NaN values to compute the covariance.

# Return Value

A DataFrame that represents the covariance matrix of the values in the source DataFrame.

# Examples

## Basic usage

Consider the following DataFrame:

 df = pd.DataFrame({"A":[2,4,6],"B":[3,4,5]})df A B0 2 31 4 42 6 5 

To compute the covariance of two columns:

 df.cov() A BA 4.0 2.0B 2.0 1.0 

Here, we get the following results:

• the sample covariance of columns A and B is 2.0.

• the sample variance of column A is 4.0 and that of column B is 1.0.

## Specifying min_periods

Consider the following DataFrame with some missing values:

 df = pd.DataFrame({"A":[3,np.NaN,4],"B":[5,6,7]})df A B0 3.0 5.01 NaN 6.02 4.0 7.0 

Setting min_periods=3 yields:

 df.cov(min_periods=3) A BA NaN NaNB NaN 1.0 

Here, the reason why we get NaN is that, since the method ignores NaN, column A only has 2 values. Since we've set the minimum threshold to compute the covariance to be 3, we end up with a DataFrame filled with NaN.

