# Getting the top n rows with largest column value in each group in Pandas

schedule Aug 12, 2023
PythonPandas
Tags
# Example

Consider the following DataFrame about some products:

``` df = pd.DataFrame({"price":[500,300,700, 200,900], "brand": ["apple", "google", "apple", "google","apple"], "device":["phone","phone","computer","phone","phone"]}, index=["a","b","c","d","e"])df price brand devicea 500 apple phoneb 300 google phonec 700 apple computerd 200 google phonee 900 apple phone ```

## Solution

To get the top 2 priciest products of each brand:

``` df.sort_values("price", ascending=False).groupby("brand").head(2) # returns a DataFrame price brand devicee 900 apple phonec 700 apple computerb 300 google phoned 200 google phone ```

## Explanation

We first sort by `price` in descending order using `sort_values(~)`:

``` df.sort_values("price", ascending=False) price brand devicee 900 apple phonec 700 apple computera 500 apple phoneb 300 google phoned 200 google phone ```

Next, we group by `brand`, and the key here is that `groupby(~)` preserves order. This means that even after we group by `brand`, the rows in every group will still be sorted by `price`. For this very reason, calling `head(2)` would return the 2 priciest device in each group.

