search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
chevron_leftMiscellaneous Cookbook
Adjusting number of rows that are printedAppending DataFrame to an existing CSV fileChecking differences between two indexesChecking if a DataFrame is emptyChecking if a variable is a DataFrameChecking if index is sortedChecking if value exists in IndexChecking memory usage of DataFrameChecking whether a Pandas object is a view or a copyConcatenating a list of DataFramesConverting a DataFrame to a listConverting a DataFrame to a SeriesConverting DataFrame to a list of dictionariesConverting DataFrame to list of tuplesCounting the number of negative valuesCreating a DataFrame using cartesian product of two DataFramesDisplaying DataFrames side by sideDisplaying full non-truncated DataFrame valuesDrawing frequency histogram of DataFrame columnExporting Pandas DataFrame to PostgreSQL tableHighlighting a particular cell of a DataFrameHighlighting DataFrame cell based on valueHow to solve "ValueError: If using all scalar values, you must pass an index"Importing BigQuery table as Pandas DataFramePlotting two columns of DataFramePrinting DataFrame on a single linePrinting DataFrame without indexPrinting DataFrames in tabular formatRandomly splitting DataFrame into multiple DataFrames of equal sizeReducing DataFrame memory sizeSaving a DataFrame as a CSV fileSaving DataFrame as Excel fileSaving DataFrame as feather fileSetting all values to zeroShowing all dtypes without truncationSplitting DataFrame into multiple DataFrames based on valueSplitting DataFrame into smaller equal-sized DataFramesWriting DataFrame to SQLite
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

Reducing DataFrame memory size in Pandas

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
tocTable of Contents
expand_more
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

There are two main ways to reduce DataFrame memory size in Pandas without necessarily compromising the information contained within the DataFrame:

  • Use smaller numeric types

  • Convert object columns to categorical columns

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[7,8,9,10,11,12],"B":["A","B","A","B","A","B"]}, index = [1,2,3,4,5,6])
df
A B
1 7 A
2 8 B
3 9 A
4 10 B
5 11 A
6 12 B

To check the memory usage of the DataFrame:

df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 6 non-null int64
1 B 6 non-null object
dtypes: int64(1), object(1)
memory usage: 444.0 bytes

Note here that:

  • The memory usage of the DataFrame is 444 bytes

  • Datatype of column A is int64

  • Datatype of column B is object

Smaller numeric types

To reduce the memory usage we can convert column A to int8:

df["A"] = df["A"].astype('int8')
df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 6 non-null int8
1 B 6 non-null object
dtypes: int8(1), object(1)
memory usage: 402.0 bytes

Note that:

  • Column A has been converted to int8

  • The memory usage of the DataFrame has decreased from 444 bytes to 402 bytes

You should always check the minimum and maximum numbers in the column you would like to convert to a smaller numeric type. By using a smaller numeric type you are able to reduce memory usage, however, at the same time you will lose precision which may be significant depending on the analysis you are trying to perform. Below is a reference for the range of numbers supported by each datatype:

Datatype

Integer range supported

int8

-128 to 127

int16

-32768 to 32767

int64

-9223372036854775808 to 9223372036854775807

Categorical columns

Here is the DataFrame we are working with again:

df = pd.DataFrame({"A":[7,8,9,10,11,12],"B":["A","B","A","B","A","B"]}, index = [1,2,3,4,5,6])
df
A B
1 7 A
2 8 B
3 9 A
4 10 B
5 11 A
6 12 B

To reduce the memory usage we can convert datatype of column B from object to category:

df["B"] = df["B"].astype('category')
df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1 to 6
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 6 non-null int64
1 B 6 non-null category
dtypes: category(1), int64(1)
memory usage: 326.0 bytes

Note here that:

  • Column B has been converted from object to category

  • The memory usage of the DataFrame has decreased from 444 bytes to 326 bytes

For object columns, each value in the column is stored as a Python string in memory. Even if the same value appears multiple times in the column, each time a new string will be stored in memory. By converting to a categorical column, a single string is only stored once in memory, even if it appears multiple times within the column. This allows us to save memory usage.

WARNING

Categorical columns are suited for columns that only take on a fixed number of possible values. Examples include blood type, marital status, etc.

robocat
Published by Arthur Yanagisawa
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!