**Pandas**

*chevron_left*

**Basic and Descriptive Statistics**

# Pandas DataFrame | rank method

*schedule*Mar 5, 2023

*toc*Table of Contents

*expand_more*

**interactive map of data science**

Pandas `DataFrame.rank(~)`

method computes the ordering of the values for each row or column of the DataFrame.

# Parameters

1. `axis`

link · `int`

or `string`

· `optional`

Whether to compute the ordering row-wise or column-wise:

Axis | Description |
---|---|

| Ordering is computed for each column. |

| Ordering is computed for each row. |

By default `axis=0`

.

2. `method`

link · `string`

· `optional`

How to rank duplicate values in a group:

Value | Description |
---|---|

| Return the average of the ranks. |

| Return the minimum of the ranks. |

| Return the maximum of the ranks. |

| Return the ranks based on the ordering in the DataFrame. |

| Similar to |

Check examples below for clarification. By default, `method="average"`

.

3. `numeric_only`

· `boolean`

· `optional`

If `True`

, ordering is performed only on numeric values. By default, `numeric_only=True`

.

4. `na_option`

link · `string`

· `optional`

How to deal with `NaN`

values:

Value | Description |
---|---|

| Leave the |

| Assign the lowest ( |

| Assign the highest ordering to the |

By default, `na_option="keep"`

.

5. `ascending`

link · `boolean`

· `optional`

If

`True`

, then the smallest value will have a rank of 1.If

`False`

, then the largest value will have a rank of 1.

By default, `ascending=False`

.

6. `pct`

link · `boolean`

· `optional`

If `True`

, then rank will be in terms of percentiles instead. By default, `pct=False`

.

# Return Value

A `DataFrame`

containing the ordering of the values in the source DataFrame.

# Examples

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[4,5,3,3], "B": ["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
```

## Ranking column-wise

To obtain the ordering of the values of each column:

```
df.rank() # axis=0
A B 0 3.0 2.01 4.0 1.02 1.5 3.03 1.5 4.0
```

Notice how we have two `1.5`

in column `A`

. This is because we had a tie - entries `A2`

and `A3`

shared the same value, and so the `rank(~)`

method computed the average of their ranks (`method="average"`

by default), that is, the average of `1`

and `2`

.

## Ranking row-wise

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[3,4],"B":[1,2],"C":[5,6]})df
A B C0 3 1 51 4 2 6
```

To rank the values for each row, set `axis=1`

:

```
df.rank(axis=1)
A B C0 2.0 1.0 3.01 2.0 1.0 3.0
```

## Specifying method

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[8,6,6,8]})df
A0 81 62 63 8
```

### average

By default, `method="average"`

, which means that the average rank is computed for duplicate values:

```
df.rank()
A0 3.51 1.52 1.53 3.5
```

### max

To use the largest rank of each group:

```
df.rank(method="max")
A0 4.01 2.02 2.03 4.0
```

Here's `df`

again for your reference:

```
df
A0 81 62 63 8
```

### min

To use the smallest rank of each group:

```
df.rank(method="min")
A0 3.01 1.02 1.03 3.0
```

### first

To use the ordering of the values in the original DataFrame:

```
df.rank(method="first")
A0 3.01 1.02 2.03 4.0
```

Here, notice how the first value `8`

is assigned a rank of `3`

, while the last value `8`

is assigned a rank of `4`

. This is because of their ordering in `df`

, that is, the first `8`

is assigned a lower rank since it appears earlier in `df`

.

Here's `df`

again for your reference:

```
df
A0 81 62 63 8
```

### dense

This is similar to `"min"`

, except that the ranks are incremented by one after each duplicate group:

```
df.rank(method="dense")
A0 2.01 1.02 1.03 2.0
```

To clarify, in the case of `"min"`

, the group values `8`

were assigned a rank of 3, but for `"dense"`

, the rank only gets incremented by 1 after each group, so we end up with a rank of `2`

for the next group.

## Specifying na_option

Consider the following DataFrame with some missing values:

```
df = pd.DataFrame({"A":[pd.np.NaN,6,pd.np.NaN,5]})df
A0 NaN1 6.02 NaN3 5.0
```

By default, `na_option="keep"`

, which means that `NaN`

s are ignored during the ranking and kept in the resulting DataFrame:

```
df.rank() # na_option="keep"
A0 NaN1 2.02 NaN3 1.0
```

To assign the lowest ranks (`1`

, `2`

, `...`

) to missing values:

```
df.rank(na_option="top")
A0 1.51 4.02 1.53 3.0
```

Here, you see `1.5`

there since we have 2 `NaN`

, and so the average of their ranks (`1`

and `2`

) was computed.

To assign the highest ranks to the missing values:

```
df.rank(na_option="bottom")
A0 3.51 2.02 3.53 1.0
```

## Ranking in descending order

Consider the same DataFrame we had before:

```
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
```

To rank in descending order (largest value has a rank of 1), simply set `ascending=False`

:

```
df.rank(ascending=False)
A B 0 2.0 3.01 1.0 4.02 3.5 2.03 3.5 1.0
```

## Ranking using percentiles

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
```

To rank using percentiles, set `pct=True`

:

```
df_one.rank(pct=True)
A B 0 0.750 0.501 1.000 0.252 0.375 0.753 0.375 1.00
```