**Pandas**

*chevron_left*

**Basic and Descriptive Statistics**

# Pandas DataFrame | mean method

*schedule*Mar 5, 2023

*toc*Table of Contents

*expand_more*

**interactive map of data science**

Pandas `DataFrame.mean(~)`

method computes the mean for each row or column of the DataFrame.

# Parameters

1. `axis`

link | `int`

or `string`

| `optional`

Whether to compute the mean row-wise or column-wise:

Axis | Description |
---|---|

| Mean is computed for each column. |

| Mean is computed for each row. |

By default, `axis=0`

.

2. `skipna`

link | `boolean`

| `optional`

Whether or not to skip `NaN`

. Skipped `NaN`

would not count towards the total size, which is the divisor when computing the mean. By default, `skipna=True`

.

3. `level`

| `string`

or `int`

| `optional`

The name or the integer index of the level to consider. This is only relevant if your DataFrame is Multi-index.

4. `numeric_only`

link | `None`

or `boolean`

| `optional`

The allowed values are as follows:

Value | Description |
---|---|

| Only numeric rows/columns will be considered (e.g. |

| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the mean cannot be computed. |

| Attempt computation with all types, and ignore all rows/columns whose mean cannot be computed without raising an error. |

Note that means can only be computed when the `+`

operator is well-defined between the types.

By default, `numeric_only=None`

.

# Return Value

If the `level`

parameter is specified, then a `DataFrame`

will be returned. Otherwise, a `Series`

will be returned.

# Examples

Consider the following DataFrame:

```
df
A B0 2 41 3 5
```

## Column-wise mean

To compute the mean for each column:

```
df.mean() # or axis=0
A 2.5B 4.5dtype: float64
```

## Row-wise mean

To compute the mean for each row, set `axis=1`

:

```
df.mean(axis=1)
0 3.01 4.0dtype: float64
```

## Specifying skipna

Consider the following DataFrame with a missing value:

```
df
A0 3.01 NaN2 5.0
```

By default `skipna=True`

, which means that all missing values will be ignored when computing the mean:

```
df.mean() # skipna=True
A 4.0dtype: float64
```

To take into account missing values:

```
df.mean(skipna=False)
A NaNdtype: float64
```

Note that if the row/column contains a missing value, then the mean for that row/column will be `NaN`

.

## Specifying numeric_only

Consider the following DataFrame:

```
df
A B C0 4 2 "6"1 5 True False
```

Here, both columns `B`

and `C`

contain mixed types, but the key difference is that summation is defined for `B`

, but not for `C`

. Computing the mean requires summation between the types to be well-defined.

Recall that the internal representation of a `True`

boolean is `1`

, so the operation `2+True`

actually evaluates to `3`

:

```
2 + True
3
```

On the other hand, `"6"+False`

throws an error:

```
6 + "False"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

### None

By default, `numeric_only=None`

, which means that rows/columns with mixed types will also be considered:

```
df.mean(numeric_only=None)
A 4.5B 1.5dtype: float64
```

Here, notice how the mean was computed for column `B`

, but not for `C`

. By passing in `None`

, rows/columns where the mean cannot be computed (due to invalid summation of types) will simply be ignored without raising an error.

### False

By setting `numeric_only=False`

, rows/columns with mixed types will again be considered, but an error will be thrown when the mean cannot be computed:

```
df.mean(numeric_only=False)
TypeError: can only concatenate str (not "bool") to str
```

Here, we end up with an error because column `C`

contains mixed types where the `+`

operation is not defined.

### True

By setting `numeric_only=True`

, only numeric rows/columns will be considered:

```
df.mean(numeric_only=True)
A 4.5dtype: float64
```

Notice how columns `B`

and `C`

were ignored since they contain mixed types.