# Pandas DataFrame | var method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Basic and Descriptive Statistics
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags
expand_more

Pandas DataFrame.var(~) method computes the variance of each row or column of the source DataFrame. The (unbiased) variance is computed using the following formula:

$$\frac{1}{N-1}\sum_{i=0}^{N-1}\left(x_i-\bar{x}^2\right)$$

Where,

$N$ is the size of the row or column

$x_i$ is the value of the $i$-th index in the row or column

$\bar{x}$ is the mean of the values in the row or column.

NOTE

The var(~) method can also compute the population variance. We do this by setting ddof=0.

# Parameters

1. axislink | int or string | optional

Whether to compute the variance column-wise or row-wise:

Axis

Description

Variance is computed for each column.

"index" or 0

Variance is computed for each row.

"columns" or 1

By default, axis=0.

2. skipna | boolean | optional

Whether or not to skip NaN. Skipped NaN would not count towards the total size ($N$). By default, skipna=True.

3. level | string or int | optional

The name or the integer index of the level to consider. This is needed only if your DataFrame is Multi-index.

4. ddof | int | optional

The delta degree of freedom. This can be used to modify the denominator:

$$\frac{1}{N\color{#6495ed}{-ddof}}\sum_{i=0}^{N-1}\left(x_i-\bar{x}^2\right)$$

By default, ddof=1.

5. numeric_onlylink | None or boolean | optional

The allowed values are as follows:

Value

Description

True

Only numeric rows/columns will be considered (e.g. float, int, boolean).

False

Attempt computation with all types (e.g. strings and dates), and throw an error whenever the variance cannot be computed.

None

Attempt computation with all types, and ignore all rows/columns whose variance cannot be computed without raising an error.

Note that the variance can only be computed when the + operator is well-defined between the types.

By default, numeric_only=None.

# Return Value

If the level parameter is specified, then a DataFrame will be returned. Otherwise, a Series will be returned.

# Examples

Consider the following DataFrame:

 df = pd.DataFrame({"A":[3,5,7], "B":[2,5,8]})df A B0 3 21 5 52 7 8 

## Column-wise variance

To compute the variance for each column:

 df.var() # axis=0 A 4.0B 9.0dtype: float64 

## Row-wise variance

To compute the variance for each row:

 df.var(axis=1) 0 0.51 0.02 0.5dtype: float64 

## Specifying numeric_only

Consider the following DataFrame:

 df = pd.DataFrame({"A":[3,5], "B":[True,5], "C":["x",7]})df A B C0 3 True x1 5 5 7 

Here, columns B and C are of mixed-type.

### None

By default, numeric_only=None, which means that rows/columns with mixed types will also be considered:

 df.var() # numeric_only=None A 2.0B 8.0dtype: float64 

The reason why the variance is still computable for column B is that, True is internally represented as a 1 in Pandas. In contrast, the variance for column C cannot be computed since "x"+7 is undefined.

### False

numeric_only=False means that the rows/columns of mixed type will also be considered, but an error will be raised if the variance is not computable:

 df.var(numeric_only=False) TypeError: could not convert string to float: 'x' 

### True

To compute the variance of numeric rows/columns only:

 df.var(numeric_only=True) A 2.0dtype: float64 
