Pandas DataFrame  rank method
Pandas DataFrame.rank(~)
method computes the ordering of the values for each row or column of the DataFrame.
Parameters
1. axis
· int
or string
· optional
Whether to compute the ordering rowwise or columnwise:
Axis  Description 

Ordering is computed for each column. 

Ordering is computed for each row. 

By default axis=0
.
2. method
· string
· optional
How to rank duplicate values in a group:
Value  Description 

 Return the average of the ranks. 
 Return the minimum of the ranks. 
 Return the maximum of the ranks. 
 Return the ranks based on the ordering in the DataFrame. 
 Similar to 
Check examples below for clarification. By default, method="average"
.
3. numeric_only
· boolean
· optional
If True
, ordering is performed only on numeric values. By default, numeric_only=True
.
4. na_option
· string
· optional
How to deal with NaN
values:
Value  Description 

 Leave the 
 Assign the lowest ( 
 Assign the highest ordering to the 
By default, na_option="keep"
.
5. ascending
· boolean
· optional
If
True
, then the smallest value will have a rank of 1.If
False
, then the largest value will have a rank of 1.
By default, ascending=False
.
6. pct
· boolean
· optional
If True
, then rank will be in terms of percentiles instead. By default, pct=False
.
Return Value
A DataFrame
containing the ordering of the values in the source DataFrame.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[4,5,3,3], "B": ["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
Ranking columnwise
To obtain the ordering of the values of each column:
df.rank() # axis=0
A B 0 3.0 2.01 4.0 1.02 1.5 3.03 1.5 4.0
Notice how we have two 1.5
in column A
. This is because we had a tie  entries A2
and A3
shared the same value, and so the rank(~)
method computed the average of their ranks (method="average"
by default), that is, the average of 1
and 2
.
Ranking rowwise
Consider the following DataFrame:
df = pd.DataFrame({"A":[3,4],"B":[1,2],"C":[5,6]})df
A B C0 3 1 51 4 2 6
To rank the values for each row, set axis=1
:
df.rank(axis=1)
A B C0 2.0 1.0 3.01 2.0 1.0 3.0
Specifying method
Consider the following DataFrame:
df = pd.DataFrame({"A":[8,6,6,8]})df
A0 81 62 63 8
average
By default, method="average"
, which means that the average rank is computed for duplicate values:
df.rank()
A0 3.51 1.52 1.53 3.5
max
To use the largest rank of each group:
df.rank(method="max")
A0 4.01 2.02 2.03 4.0
Here's df
again for your reference:
df
A0 81 62 63 8
min
To use the smallest rank of each group:
df.rank(method="min")
A0 3.01 1.02 1.03 3.0
first
To use the ordering of the values in the original DataFrame:
df.rank(method="first")
A0 3.01 1.02 2.03 4.0
Here, notice how the first value 8
is assigned a rank of 3
, while the last value 8
is assigned a rank of 4
. This is because of their ordering in df
, that is, the first 8
is assigned a lower rank since it appears earlier in df
.
Here's df
again for your reference:
df
A0 81 62 63 8
dense
This is similar to "min"
, except that the ranks are incremented by one after each duplicate group:
df.rank(method="dense")
A0 2.01 1.02 1.03 2.0
To clarify, in the case of "min"
, the group values 8
were assigned a rank of 3, but for "dense"
, the rank only gets incremented by 1 after each group, so we end up with a rank of 2
for the next group.
Specifying na_option
Consider the following DataFrame with some missing values:
df = pd.DataFrame({"A":[pd.np.NaN,6,pd.np.NaN,5]})df
A0 NaN1 6.02 NaN3 5.0
By default, na_option="keep"
, which means that NaN
s are ignored during the ranking and kept in the resulting DataFrame:
df.rank() # na_option="keep"
A0 NaN1 2.02 NaN3 1.0
To assign the lowest ranks (1
, 2
, ...
) to missing values:
df.rank(na_option="top")
A0 1.51 4.02 1.53 3.0
Here, you see 1.5
there since we have 2 NaN
, and so the average of their ranks (1
and 2
) was computed.
To assign the highest ranks to the missing values:
df.rank(na_option="bottom")
A0 3.51 2.02 3.53 1.0
Ranking in descending order
Consider the same DataFrame we had before:
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
To rank in descending order (largest value has a rank of 1), simply set ascending=False
:
df.rank(ascending=False)
A B 0 2.0 3.01 1.0 4.02 3.5 2.03 3.5 1.0
Ranking using percentiles
Consider the following DataFrame:
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})df
A B0 4 b1 5 a2 3 c3 3 d
To rank using percentiles, set pct=True
:
df_one.rank(pct=True)
A B 0 0.750 0.501 1.000 0.252 0.375 0.753 0.375 1.00