**Basic and Descriptive Statistics**

# Pandas DataFrame | median method

*schedule*Aug 12, 2023

Pandas `DataFrame.median(~)`

method computes the median for each row or column of the DataFrame.

# Parameters

1. `axis`

link | `int`

or `string`

| `optional`

Whether to compute the median row-wise or column-wise:

Axis | Description |
---|---|

| Median is computed for each column. |

| Median is computed for each row. |

By default, `axis=0`

.

2. `skipna`

link | `boolean`

| `optional`

Whether or not to skip `NaN`

. If `skipna=False`

, then having even one `NaN`

will return a `NaN`

as the median for its row/column. By default, `skipna=True`

.

3. `level`

| `string`

or `int`

| `optional`

The name or the integer index of the level to consider. This is only relevant if your DataFrame is Multi-index.

4. `numeric_only`

link | `None`

or `boolean`

| `optional`

The allowed values are as follows:

Value | Description |
---|---|

| Only numeric rows/columns will be considered (e.g. |

| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the median cannot be computed. |

| Attempt computation with all types, and ignore all rows/columns whose median cannot be computed without raising an error. |

Note that medians can only be computed when we can perform summation between the types.

By default, `numeric_only=None`

.

# Return Value

If the `level`

parameter is specified, then a `DataFrame`

will be returned. Otherwise, a `Series`

will be returned.

# Examples

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[2,3], "B":[4,5], "C":["6",7]})df
A B C0 2 4 "6"1 3 5 7
```

## Column-wise median

To compute the median for each column:

```
df.median() # axis=0
A 2.5B 4.5C 6.5dtype: float64
```

Notice how `"6"`

has been automatically casted to a `float`

in order to compute the median.

## Row-wise median

To compute the median for each row, set `axis=1`

:

```
df.median(axis=1)
0 4.01 5.0dtype: float64
```

## Specifying skipna

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[3,4,6], "B":[7,9,pd.np.nan]})df
A B0 3 7.01 4 9.02 6 NaN
```

By default, `skipna=True`

, which means that all missing value are skipped when computing the median:

```
df.median() # skipna=True
A 4.5B 8.0dtype: float64
```

To take into account the missing values:

```
df.median(skipna=False)
A 4.0B NaNdtype: float64
```

Note that if a row/column contains one or more missing values, the median for that row/column will be `NaN`

.

## Specifying numeric_only

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[3,4], "B":[5,True], "C":[6,"7@8"]})df
A B C0 3 5 61 4 True 7@8
```

Here, both columns `B`

and `C`

contain mixed types, but the key difference is that the median can be computed for `B`

, but not for `C`

. When sample size is even, which is the case here, the median is computed by taking the average of the middle two numbers, which means that the summation operation between the types must be well-defined.

Recall that the internal representation of a `True`

boolean is `1`

, so the operation `5+True`

actually evaluates to `6`

:

```
5 + True
6
```

On the other hand, `6+"7@8"`

throws an error:

```
6 + "7@8"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

### None

By default, `numeric_only=None`

, which means that rows/columns with mixed types will also be considered:

```
df.median(numeric_only=None)
A 3.5B 3.0dtype: float64
```

Here, notice how the median was computed for column `B`

, but not for `C`

. By passing in `None`

, rows/columns where the median cannot be computed (due to invalid summation) will simply be ignored without raising an error.

### False

By setting `numeric_only=False`

, rows/columns with mixed types will again be considered, but an error will be thrown when the median cannot be computed:

```
df.median(numeric_only=False)
TypeError: could not convert string to float: '7@8'
```

Here, we end up with an error because column `C`

contains mixed types where summation is not defined.

### True

By setting `numeric_only=True`

, only numeric rows/columns will be considered:

```
df.median(numeric_only=True)
A 4.5dtype: float64
```

Notice how columns `B`

and `C`

were ignored since they contain mixed types.