**Pandas**

**Basic and Descriptive Statistics**

# Pandas DataFrame | var method

*schedule*Aug 11, 2023

Pandas `DataFrame.var(~)`

method computes the variance of each row or column of the source DataFrame. The (unbiased) variance is computed using the following formula:

Where,

$N$ is the size of the row or column

$x_i$ is the value of the $i$-th index in the row or column

$\bar{x}$ is the mean of the values in the row or column.

The `var(~)`

method can also compute the population variance. We do this by setting `ddof=0`

.

# Parameters

1. `axis`

link | `int`

or `string`

| `optional`

Whether to compute the variance column-wise or row-wise:

Axis | Description |
---|---|

| Variance is computed for each column. |

| Variance is computed for each row. |

By default, `axis=0`

.

2. `skipna`

| `boolean`

| `optional`

Whether or not to skip `NaN`

. Skipped `NaN`

would not count towards the total size ($N$). By default, `skipna=True`

.

3. `level`

| `string`

or `int`

| `optional`

The name or the integer index of the level to consider. This is needed only if your DataFrame is Multi-index.

4. `ddof`

| `int`

| `optional`

The delta degree of freedom. This can be used to modify the denominator:

By default, `ddof=1`

.

5. `numeric_only`

link | `None`

or `boolean`

| `optional`

The allowed values are as follows:

Value | Description |
---|---|

| Only numeric rows/columns will be considered (e.g. |

| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the variance cannot be computed. |

| Attempt computation with all types, and ignore all rows/columns whose variance cannot be computed without raising an error. |

Note that the variance can only be computed when the `+`

operator is well-defined between the types.

By default, `numeric_only=None`

.

# Return Value

If the `level`

parameter is specified, then a `DataFrame`

will be returned. Otherwise, a `Series`

will be returned.

# Examples

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[3,5,7], "B":[2,5,8]})df
A B0 3 21 5 52 7 8
```

## Column-wise variance

To compute the variance for each column:

```
df.var() # axis=0
A 4.0B 9.0dtype: float64
```

## Row-wise variance

To compute the variance for each row:

```
df.var(axis=1)
0 0.51 0.02 0.5dtype: float64
```

## Specifying numeric_only

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[3,5], "B":[True,5], "C":["x",7]})df
A B C0 3 True x1 5 5 7
```

Here, columns `B`

and `C`

are of mixed-type.

### None

By default, `numeric_only=None`

, which means that rows/columns with mixed types will also be considered:

```
df.var() # numeric_only=None
A 2.0B 8.0dtype: float64
```

The reason why the variance is still computable for column `B`

is that, `True`

is internally represented as a `1`

in Pandas. In contrast, the variance for column `C`

cannot be computed since `"x"+7`

is undefined.

### False

`numeric_only=False`

means that the rows/columns of mixed type will also be considered, but an error will be raised if the variance is not computable:

```
df.var(numeric_only=False)
TypeError: could not convert string to float: 'x'
```

### True

To compute the variance of numeric rows/columns only:

```
df.var(numeric_only=True)
A 2.0dtype: float64
```