search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Pandas DataFrame | reindex method

Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Data Selection and Renaming
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.reindex(~) method sets a new index for the source DataFrame, and sets NaN to values whose row or column label is new. Check examples for clarification.

Parameters

1. labels | array-like | optional

The new labels to set as the index. Whether to set new labels for the rows or columns is indicated using axis.

2. index | array-like | optional

The new row labels.

3. columns | array-like | optional

The new column labels.

4. axis | int or str | optional

Whether to apply the labels to the index or the columns:

Value

Description

The labels will be applied to the index (i.e. row labels)

0 or "index"

The labels will become column labels

0 or "columns"

NOTE

You can change the labels of the index or the columns in two ways:

  • specify index and/or columns

  • specify labels and axis

It is better to use parameters index or columns than labels and axis since the intent is clearer, and the syntax is shorter.

5. method | None or string | optional

The logic to use when filling missing values:

Value

Description

None

Leave missing values as is.

Use the values of the previous row/column.

"pad" or "ffill"

Use the next values of the next row/column.

"backfill" or "bfill"

"nearest"

Use the values of the nearest row/column.

By default, method=None. Check out our examples for clarification.

WARNING

The method parameter only takes effect when the row or column labels of the source DataFrame are monotonically increasing or decreasing.

6. copy | boolean | optional

Whether or not to create and return a new DataFrame, as opposed to directly modifying the source DataFrame. By default, copy=True.

7. level | string | optional

The level to target. This is only relevant if the source DataFrame is multi-index.

8. fill_value | scalar | optional

The value to fill missing values. By default, fill_value=NaN.

9. limit | int | optional

The maximum number of consecutive missing values to forward/backward fill. By default, limit=None.

10. tolerance | scalar or list | optional

Whether or not to perform filling based on the following criteria:

abs(index[indexer] - target) <= tolerance.

Specifying tolerance without method will result in an error. By default, tolerance=None.

Return Value

A DataFrame with the row labels or column labels updated.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
A B
a 2 4
b 3 5

Changing the row labels

Changing the index (i.e. the row labels) to "a" and "c":

df.reindex(index=["a","c"])
A B
a 2.0 4.0
c NaN NaN

Here, notice the following:

  • the values at [aA] and [aB] are left as is. This is because [aA] and [aB] both existed in the source DataFrame.

  • the values at [cA] and [cB] are NaN. This is because [cA] and [cB] did not exist in the source DataFrame.

Changing the column labels

Here's the same df we had before:

df
A B
a 2 4
b 3 5

To set new column labels:

df.reindex(columns=["B","D"])
B D
b 4 NaN
d 5 NaN

Here, note the following:

  • the values at [bB] and [dB] are left as is. This is because [bB] and [dB] both existed in the source DataFrame.

  • the values at [Db] and [Dd] are NaN. This is because [Db] and [Dd] did not exist in the source DataFrame.

Specifying method

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])
df
A B
b 2 4
d 3 5

None

By default, method=None, which means no filling will be performed so values that have a new row label or column label will be NaN:

df.reindex(index=["a","c"])
A B
a NaN NaN
c NaN NaN

ffill

To fill using the previous values, pass in method="ffill" like so:

df.reindex(index=["a","c"], method="ffill")
A B
a NaN NaN
c 2.0 4.0

Here, note the following:

  • we still have NaN for index "a" because there is no index that is smaller than "a", that is, the source DataFrame contains index "b" and "d", which are both greater than "a".

  • the values in index "c" are filled using the values in index "b" of the source DataFrame. This is because index "b" is the last index that is smaller than index "c".

bfill

Just as reference, here's df again:

df
A B
b 2 4
d 3 5

To fill using the next values, pass in method="bfill" like so:

df.reindex(index=["a","c"], method="bfill")
A B
a 2 4
c 3 5

Here, note the following:

  • the values in row "a" are filled with those in row "b" of the source DataFrame. This is because the next index that is larger than index "a" is index "b".

  • the exact same reasoning applies for how index "c" was filled.

nearest

Although not officially documented, the method="nearest" does not seem to work for strings. Hence, we'll use a DataFrame with an integer index to demonstrate how it works:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[6,9])
df
A B
6 2 4
9 3 5

To fill using the values using nearest:

df.reindex(index=[7,8], method="nearest")
A B
7 2 4
8 3 5

Here, note the following:

  • index 7 is filled with values of index 6 since index 6 is closest to index 7 of the source DataFrame.

  • index 8 is filled with values of index 9 since index 8 is closest to index 9 of the source DataFrame.

Specifying tolerance

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])
df
A B
3 2 4
6 3 5

Suppose we wanted to set a new index [5,7] with forward-fill. We can specify a tolerance to dictate whether the forward fill should take effect:

df.reindex(index=[5,7], method="ffill", tolerance=1)
A B
5 NaN NaN
7 3.0 5.0

Here, note the following:

  • the row with index 5 has NaN. This is because abs(3-5)=2, which is greater than the specified tolerance.

  • the row with index 7 has been forward-filled using index 6 of the source DataFrame. This is because abs(6-7)=1, which is less than or equal to the specified tolerance.

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!