search
Search
Login
Math ML
Map of Data Science
Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare

Pandas DataFrame | cov method

Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Basic and Descriptive Statistics
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.cov(~) method computes the covariance matrix of the columns in the source DataFrame. Note that the unbiased estimator of the covariance is used:

$$\mathrm{cov}(\mathbf{x},\mathbf{y})=\frac{1}{N-1}\sum_{i=0}^{N-1}\left[\left(x_i-\bar{x}\right)(y_i-\bar{y})\right]$$

Where,

  • $N$ is the number of values in a column

  • $\bar{x}$ is the sample mean of column $\mathbf{x}$

  • $\bar{y}$ is the sample mean of column $\mathbf{y}$

  • $x_i$ and $y_i$ are the $i$th value in the column $\mathbf{x}$ and $\mathbf{y}$ respectively.

NOTE

All NaN values are ignored.

Parameters

1. min_periodslink | int | optional

The minimum number of non-NaN values to compute the covariance.

Return Value

A DataFrame that represents the covariance matrix of the values in the source DataFrame.

Examples

Basic usage

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,4,6],"B":[3,4,5]})
df
A B
0 2 3
1 4 4
2 6 5

To compute the covariance of two columns:

df.cov()
A B
A 4.0 2.0
B 2.0 1.0

Here, we get the following results:

  • the sample covariance of columns A and B is 2.0.

  • the sample variance of column A is 4.0 and that of column B is 1.0.

Specifying min_periods

Consider the following DataFrame with some missing values:

df = pd.DataFrame({"A":[3,np.NaN,4],"B":[5,6,7]})
df
A B
0 3.0 5.0
1 NaN 6.0
2 4.0 7.0

Setting min_periods=3 yields:

df.cov(min_periods=3)
A B
A NaN NaN
B NaN 1.0

Here, the reason why we get NaN is that, since the method ignores NaN, column A only has 2 values. Since we've set the minimum threshold to compute the covariance to be 3, we end up with a DataFrame filled with NaN.

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!