Pandas DataFrame | rank method
Start your free 7-days trial now!
Pandas DataFrame.rank(~) method computes the ordering of the values for each row or column of the DataFrame.
Parameters
1. axislink · int or string · optional
Whether to compute the ordering row-wise or column-wise:
Axis | Description |
|---|---|
| Ordering is computed for each column. |
| Ordering is computed for each row. |
By default axis=0.
2. methodlink · string · optional
How to rank duplicate values in a group:
Value | Description |
|---|---|
| Return the average of the ranks. |
| Return the minimum of the ranks. |
| Return the maximum of the ranks. |
| Return the ranks based on the ordering in the DataFrame. |
| Similar to |
Check examples below for clarification. By default, method="average".
3. numeric_only · boolean · optional
If True, ordering is performed only on numeric values. By default, numeric_only=True.
4. na_optionlink · string · optional
How to deal with NaN values:
Value | Description |
|---|---|
| Leave the |
| Assign the lowest ( |
| Assign the highest ordering to the |
By default, na_option="keep".
5. ascendinglink · boolean · optional
If
True, then the smallest value will have a rank of 1.If
False, then the largest value will have a rank of 1.
By default, ascending=False.
6. pctlink · boolean · optional
If True, then rank will be in terms of percentiles instead. By default, pct=False.
Return Value
A DataFrame containing the ordering of the values in the source DataFrame.
Examples
Consider the following DataFrame:
df
A B0 4 b1 5 a2 3 c3 3 d
Ranking column-wise
To obtain the ordering of the values of each column:
df.rank() # axis=0
A B 0 3.0 2.01 4.0 1.02 1.5 3.03 1.5 4.0
Notice how we have two 1.5 in column A. This is because we had a tie - entries A2 and A3 shared the same value, and so the rank(~) method computed the average of their ranks (method="average" by default), that is, the average of 1 and 2.
Ranking row-wise
Consider the following DataFrame:
df
A B C0 3 1 51 4 2 6
To rank the values for each row, set axis=1:
df.rank(axis=1)
A B C0 2.0 1.0 3.01 2.0 1.0 3.0
Specifying method
Consider the following DataFrame:
df
A0 81 62 63 8
average
By default, method="average", which means that the average rank is computed for duplicate values:
df.rank()
A0 3.51 1.52 1.53 3.5
max
To use the largest rank of each group:
df.rank(method="max")
A0 4.01 2.02 2.03 4.0
Here's df again for your reference:
df
A0 81 62 63 8
min
To use the smallest rank of each group:
df.rank(method="min")
A0 3.01 1.02 1.03 3.0
first
To use the ordering of the values in the original DataFrame:
df.rank(method="first")
A0 3.01 1.02 2.03 4.0
Here, notice how the first value 8 is assigned a rank of 3, while the last value 8 is assigned a rank of 4. This is because of their ordering in df, that is, the first 8 is assigned a lower rank since it appears earlier in df.
Here's df again for your reference:
df
A0 81 62 63 8
dense
This is similar to "min", except that the ranks are incremented by one after each duplicate group:
df.rank(method="dense")
A0 2.01 1.02 1.03 2.0
To clarify, in the case of "min", the group values 8 were assigned a rank of 3, but for "dense", the rank only gets incremented by 1 after each group, so we end up with a rank of 2 for the next group.
Specifying na_option
Consider the following DataFrame with some missing values:
df
A0 NaN1 6.02 NaN3 5.0
By default, na_option="keep", which means that NaNs are ignored during the ranking and kept in the resulting DataFrame:
df.rank() # na_option="keep"
A0 NaN1 2.02 NaN3 1.0
To assign the lowest ranks (1, 2, ...) to missing values:
df.rank(na_option="top")
A0 1.51 4.02 1.53 3.0
Here, you see 1.5 there since we have 2 NaN, and so the average of their ranks (1 and 2) was computed.
To assign the highest ranks to the missing values:
df.rank(na_option="bottom")
A0 3.51 2.02 3.53 1.0
Ranking in descending order
Consider the same DataFrame we had before:
df
A B0 4 b1 5 a2 3 c3 3 d
To rank in descending order (largest value has a rank of 1), simply set ascending=False:
df.rank(ascending=False)
A B 0 2.0 3.01 1.0 4.02 3.5 2.03 3.5 1.0
Ranking using percentiles
Consider the following DataFrame:
df
A B0 4 b1 5 a2 3 c3 3 d
To rank using percentiles, set pct=True:
df_one.rank(pct=True)
A B 0 0.750 0.501 1.000 0.252 0.375 0.753 0.375 1.00
Ranking by multiple columns
Consider the following DataFrame:
df
A B0 8 71 9 62 9 5
To rank by column A while using column B as a tie beaker:
0 1.01 3.02 2.0dtype: float64
Note the following:
the first row is assigned a rank of
1because the its value ofAis the lowest.the second row and third rows both have the same value of
A. Therefore, we use their value ofBas a tie-breaker; since the third row has a larger value ofB, it is assigned a rank of2.
Let's now break down the code. We first use the apply(~) method to combine the two columns into a single column of tuples:
0 (8, 7)1 (9, 6)2 (9, 5)dtype: object
We then use the rank method like so:
0 1.01 3.02 2.0dtype: float64