search
Search
Publish
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
Twitter
Facebook

Pandas | pivot_table method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
General Functions
schedule Mar 10, 2022
Last updated
local_offer PythonPandas
Tags

Pandas pivot_table(~) method converts the input DataFrame into what is called a pivot table. The concept of pivot tables is best explained using examples, so check them out down below.

Parameters

1. datalink | DataFrame

The DataFrame from which to create the pivot table.

2. valueslink | string or list<string> | optional

The label of the columns whose values will fill the resulting pivot table. These values will be aggregated using the specified aggfunc, which just means that we compute a statistic (e.g. sum, average, max and so on) to summarise these values.

3. indexlink | string or list<string>| optional

The label of the columns that will become the index of the resulting pivot table.

4. aggfunclink | function or list<function> | optional

The function used to aggregate the values. By default, aggfunc=np.mean.

5. fill_valuelink | scalar | optional

The value to replace NaN. By default, fill_value=None.

6. marginslink | boolean | optional

Whether or not to add a new row and column that shows the sum of the values. By default, margins=False.

7. dropnalink | boolean | optional

Whether or not to remove entries with NaN. By default, dropna=True.

8. margins_namelink | string | optional

The label assigned to the new row and column that contain the sum of the values. This is only relevant when margins=True. By default, margins_name="All".

Return Value

A new DataFrame that represents a pivot table.

Examples

Basic usage

Consider the following DataFrame about the bonus received by employees:

df = pd.DataFrame({"name":["alice","alice","bob","alice","bob","bob"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20]})
df
name year bonus
0 alice 2012 10
1 alice 2012 15
2 bob 2012 15
3 alice 2013 10
4 bob 2013 10
5 bob 2013 20

Our goal is to compute the total bonus of each employee per year. We can do this by creating what is called a pivot table using the pivot_table(~) method:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
year 2012 2013
name
alice 25 10
bob 15 30

Note the following:

  • the values of column bonus is used to fill the resulting DataFrame.

  • the name column is assigned as the new index.

  • the values of the year column (2012 and 2013) became the new column labels.

  • the aggfunc is summation, as indicated by NumPy's sum(~) method.

Multiple aggregates

We could also compute multiple aggregates at once. In addition to the sum, let's also compute the average bonus of each employee per year:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=[np.sum, np.mean])
sum mean
year 2012 2013 2012 2013
name
alice 25 10 12.5 10.0
bob 15 30 15.0 15.0

Notice how we had to pass in a list of aggregate functions for aggfunc here.

Multiple values

Suppose we want to aggregate two columns this time. To demonstrate this, here's a DataFrame with a new column indicating the number of days absent of each employee per year:

df = pd.DataFrame({"name":["alice","alice","bob","alice","bob","bob"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20], "days_absent":[1,2,5,2,3,5]})
df
name year bonus days_absent
0 alice 2012 10 1
1 alice 2012 15 2
2 bob 2012 15 5
3 alice 2013 10 2
4 bob 2013 10 3
5 bob 2013 20 5

To compute the total bonus as well as the days absent per year for each employee:

pd.pivot_table(df, values=["bonus","days_absent"], index="name", columns="year", aggfunc=np.sum)
bonus days_absent
year 2012 2013 2012 2013
name
alice 25 10 3 2
bob 15 30 5 8

Notice how we passed in an array of column labels for values here.

Multiple indices

Suppose we want to use two columns as the index of the resulting DataFrame. To demonstrate this, here's a DataFrame with employees now identified by their first and last names:

df = pd.DataFrame({"first_name":["alice","alice","bob","alice","bob","bob"], "last_name":["A","B","A","A","A","B"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20]})
df
first_name last_name year bonus
0 alice A 2012 10
1 alice B 2012 15
2 bob A 2012 15
3 alice A 2013 10
4 bob A 2013 10
5 bob B 2013 20

Now, alice A and alice B represent two different employees, so it would make sense to differentiate them when computing the total bonus per year for each employee. We do this by passing in a list for the index parameter:

pd.pivot_table(df, values=["bonus"], index=["first_name", "last_name"], columns="year", aggfunc=np.sum)
bonus
year 2012 2013
first_name last_name
alice A 10.0 10.0
B 15.0 NaN
bob A 15.0 10.0
B NaN 20.0

We get NaN for the total bonus of alice B in 2013 since it does not exist in df.

Filling missing values

Consider the following DataFrame:

df = pd.DataFrame({"name":["alice","bob","bob"], "year":[2012,2012,2013], "bonus":[10,15,20]})
name year bonus
0 alice 2012 10
1 bob 2012 15
2 bob 2013 20

Notice how we don't have data about Alice's bonus in 2013.

By default, missing values will be indicated by NaN in the resulting pivot table:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
year 2012 2013
name
alice 10.0 NaN
bob 15.0 20.0

We can choose to fill these NaN with our own values by passing in the fill_value parameter:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, fill_value=-1)
year 2012 2013
name
alice 10 -1
bob 15 20

Specifying margin and margins_name

Consider the following DataFrame:

df = pd.DataFrame({"name":["alice","bob","alice","bob"], "year":[2012,2012,2013,2013], "bonus":[10,15,20,30]})
df
name year bonus
0 alice 2012 10
1 bob 2012 15
2 alice 2013 20
3 bob 2013 30

By setting margins=True, we obtain an additional row and column that holds the total sum of the values:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, margins=True)
year 2012 2013 All
name
alice 10 20 30
bob 15 30 45
All 25 50 75

Note the following:

  • the row sum tells us the total bonus earned by each employee

  • the column sum is the total bonus earned per year of all employees

  • the bottom-right entry tells us the total bonus earned over the 2 years of all employees

By default, these newly introduced row and column are assigned the label "All". We can assign our own label using the margins_name parameter:

pd.pivot_table(df, ~~~, margins=True, margins_name="Total")
year 2012 2013 Total
name
alice 10 20 30
bob 15 30 45
Total 25 50 75

Specifying dropna

Consider the following DataFrame:

df = pd.DataFrame({"name":[np.NaN,np.NaN,"alice","bob"], "year":[2012,2012,2013,2013], "bonus":[10,15,20,25]})
df
name year bonus
0 NaN 2012 10
1 NaN 2012 15
2 alice 2013 20
3 bob 2013 25

Here, we're missing the names of the employee for the year 2012. This means that the resulting pivot table would have column corresponding to 2012 filled with NaN.

By default, dropna=True, which means that columns whose entries are all NaN in the resulting pivot table will be removed:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
year 2013
name
alice 20
bob 25

We can choose to keep the NaN by setting dropna=False like so:

pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, dropna=False)
year 2012 2013
name
alice NaN 20
bob NaN 25
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...