# Difference between methods apply and transform for groupby in Pandas

Programming
Python
Pandas
Cookbooks
DataFrame Cookbooks
Data Aggregation Cookbook
schedule Mar 9, 2022
local_offer PythonPandas
The main differences are the input and output of the argument function:

Input

Output

A scalar, a sequence or a DataFrame.

A `DataFrame` representing each group.

`apply(~)`

A `Series` representing a column of each group.

A sequence that has the same length as the input `Series`. Scalars will be broadcasted to become a sequence.

`transform(~)`

What this means is that `apply(~)` allows you perform operations on columns, rows and the entire DataFrame of each group, whereas `transform(~)` is restricted to operations on individual columns of each group.

# Examples

## Difference in input

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[2,5,4],"B":[10,100,8],"group":["a","a","b"]})df A B group0 2 10 a1 5 100 a2 4 8 b ```

To compute the cumulative sum of rows of each group, you must use `apply()`:

``` # my_df is a DataFrame representing each groupdef f(my_df): # returns a DataFrame return my_df.cumsum(axis=1)df.groupby("group").apply(f) A B0 2 121 5 1052 4 12 ```

Here, our function `f` is called twice - once for each group. Here, `transform(f)` would not work because `transform(f)` only allows for operations involving individual columns, and so row operations are not allowed.

To compute the cumulative sum of columns of each group, you can use `transform(f)`:

``` # my_col is a Series representing a single column of each groupdef f(my_col): # returns a Series return my_col.cumsum()df.groupby("group").transform(f) A B0 2 101 7 1102 4 8 ```

Here, our function `f` is called 4 times since we have two groups and each group we have two columns.

NOTE

In most cases, using `apply(f)` instead of `transform(f)` would produce identical results since many of the DataFrame's operations, including `cumsum(~)`, are performed for each column by default.

## Difference in output

Consider the same DataFrame as before:

``` df = pd.DataFrame({"A":[2,5,4],"B":[10,100,8],"group":["a","a","b"]})df A B group0 2 10 a1 5 100 a2 4 8 b ```

Returning a scalar for `apply(~)` yields:

``` def f(my_df): # return the maximum value (scalar) in the entire my_df for each group return my_df.max().max()df.groupby("group").apply(f) # returns a Series groupa 100b 8dtype: int64 ```

Returning a scalar for `transform(~)` yields:

``` # my_col is a Series representing a single column of each groupdef f(my_col): # maximum value (scalar) in column gets broadcasted to become a Series of the same length as my_col return my_col.max()df.groupby("group").transform(f) # returns a DataFrame A B0 5 1001 5 1002 4 8 ```