search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | rank method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.rank(~) method computes the ordering of the values for each row or column of the DataFrame.

Parameters

1. axislink · int or string · optional

Whether to compute the ordering row-wise or column-wise:

Axis

Description

0 or "index"

Ordering is computed for each column.

1 or "columns"

Ordering is computed for each row.

By default axis=0.

2. methodlink · string · optional

How to rank duplicate values in a group:

Value

Description

"average"

Return the average of the ranks.

"min"

Return the minimum of the ranks.

"max"

Return the maximum of the ranks.

"first"

Return the ranks based on the ordering in the DataFrame.

"dense"

Similar to "min", but the rank is incremented by one after each group.

Check examples below for clarification. By default, method="average".

3. numeric_only · boolean · optional

If True, ordering is performed only on numeric values. By default, numeric_only=True.

4. na_optionlink · string · optional

How to deal with NaN values:

Value

Description

"keep"

Leave the NaNs intact, and ignore them in the ordering.

"top"

Assign the lowest (1, 2, ...) ordering to the NaNs.

"bottom"

Assign the highest ordering to the NaNs.

By default, na_option="keep".

5. ascendinglink · boolean · optional

  • If True, then the smallest value will have a rank of 1.

  • If False, then the largest value will have a rank of 1.

By default, ascending=False.

6. pctlink · boolean · optional

If True, then rank will be in terms of percentiles instead. By default, pct=False.

Return Value

A DataFrame containing the ordering of the values in the source DataFrame.

Examples

Consider the following DataFrame:

filter_none Copy
df = pd.DataFrame({"A":[4,5,3,3], "B": ["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

Ranking column-wise

To obtain the ordering of the values of each column:

filter_none Copy
df.rank() # axis=0
A B
0 3.0 2.0
1 4.0 1.0
2 1.5 3.0
3 1.5 4.0

Notice how we have two 1.5 in column A. This is because we had a tie - entries A2 and A3 shared the same value, and so the rank(~) method computed the average of their ranks (method="average" by default), that is, the average of 1 and 2.

Ranking row-wise

Consider the following DataFrame:

filter_none Copy
df = pd.DataFrame({"A":[3,4],"B":[1,2],"C":[5,6]})
df
A B C
0 3 1 5
1 4 2 6

To rank the values for each row, set axis=1:

filter_none Copy
df.rank(axis=1)
A B C
0 2.0 1.0 3.0
1 2.0 1.0 3.0

Specifying method

Consider the following DataFrame:

filter_none Copy
df = pd.DataFrame({"A":[8,6,6,8]})
df
A
0 8
1 6
2 6
3 8

average

By default, method="average", which means that the average rank is computed for duplicate values:

filter_none Copy
df.rank()
A
0 3.5
1 1.5
2 1.5
3 3.5

max

To use the largest rank of each group:

filter_none Copy
df.rank(method="max")
A
0 4.0
1 2.0
2 2.0
3 4.0

Here's df again for your reference:

filter_none Copy
df
A
0 8
1 6
2 6
3 8

min

To use the smallest rank of each group:

filter_none Copy
df.rank(method="min")
A
0 3.0
1 1.0
2 1.0
3 3.0

first

To use the ordering of the values in the original DataFrame:

filter_none Copy
df.rank(method="first")
A
0 3.0
1 1.0
2 2.0
3 4.0

Here, notice how the first value 8 is assigned a rank of 3, while the last value 8 is assigned a rank of 4. This is because of their ordering in df, that is, the first 8 is assigned a lower rank since it appears earlier in df.

Here's df again for your reference:

filter_none Copy
df
A
0 8
1 6
2 6
3 8

dense

This is similar to "min", except that the ranks are incremented by one after each duplicate group:

filter_none Copy
df.rank(method="dense")
A
0 2.0
1 1.0
2 1.0
3 2.0

To clarify, in the case of "min", the group values 8 were assigned a rank of 3, but for "dense", the rank only gets incremented by 1 after each group, so we end up with a rank of 2 for the next group.

Specifying na_option

Consider the following DataFrame with some missing values:

filter_none Copy
df = pd.DataFrame({"A":[pd.np.NaN,6,pd.np.NaN,5]})
df
A
0 NaN
1 6.0
2 NaN
3 5.0

By default, na_option="keep", which means that NaNs are ignored during the ranking and kept in the resulting DataFrame:

filter_none Copy
df.rank() # na_option="keep"
A
0 NaN
1 2.0
2 NaN
3 1.0

To assign the lowest ranks (1, 2, ...) to missing values:

filter_none Copy
df.rank(na_option="top")
A
0 1.5
1 4.0
2 1.5
3 3.0

Here, you see 1.5 there since we have 2 NaN, and so the average of their ranks (1 and 2) was computed.

To assign the highest ranks to the missing values:

filter_none Copy
df.rank(na_option="bottom")
A
0 3.5
1 2.0
2 3.5
3 1.0

Ranking in descending order

Consider the same DataFrame we had before:

filter_none Copy
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

To rank in descending order (largest value has a rank of 1), simply set ascending=False:

filter_none Copy
df.rank(ascending=False)
A B
0 2.0 3.0
1 1.0 4.0
2 3.5 2.0
3 3.5 1.0

Ranking using percentiles

Consider the following DataFrame:

filter_none Copy
df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

To rank using percentiles, set pct=True:

filter_none Copy
df_one.rank(pct=True)
A B
0 0.750 0.50
1 1.000 0.25
2 0.375 0.75
3 0.375 1.00

Ranking by multiple columns

Consider the following DataFrame:

filter_none Copy
df = pd.DataFrame({"A":[8,9,9], "B":[7,6,5]})
df
A B
0 8 7
1 9 6
2 9 5

To rank by column A while using column B as a tie beaker:

filter_none Copy
df[["A","B"]].apply(tuple, axis=1).rank()
0 1.0
1 3.0
2 2.0
dtype: float64

Note the following:

  • the first row is assigned a rank of 1 because the its value of A is the lowest.

  • the second row and third rows both have the same value of A. Therefore, we use their value of B as a tie-breaker; since the third row has a larger value of B, it is assigned a rank of 2.

Let's now break down the code. We first use the apply(~) method to combine the two columns into a single column of tuples:

filter_none Copy
df[["A","B"]].apply(tuple, axis=1)
0 (8, 7)
1 (9, 6)
2 (9, 5)
dtype: object

We then use the rank method like so:

filter_none Copy
df[["A","B"]].apply(tuple, axis=1).rank()
0 1.0
1 3.0
2 2.0
dtype: float64
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
Cookie Policy
close
By using our site, you acknowledge that you agree to our Privacy Policy and Terms and Conditions.