search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | replace method

schedule Aug 11, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas' DataFrame.replace(~) method replaces the specified values with another set of values.

Parameters

1. to_replacelink | string or regex or list or dict or Series or number or None

The values that will be replaced.

2. valuelink | number or dict or list or string or regex or None | optional

The value(s) that will replace to_replace. By default, value=None.

3. inplacelink | boolean | optional

  • If True, then the method will directly modify the source DataFrame instead of creating a new DataFrame.

  • If False, then a new DataFrame will be created and returned.

By default, inplace=False.

4. limitlink | int | optional

The maximum number of consecutive fills to perform. By default, limit=None.

5. regexlink | boolean or string | optional

If True, then to_replace is interpreted as a regular expression. Note that this requires to_replace to be a string.

By default, regex=False.

6. methodlink | string or None | optional

The rule by which to replace to_replace:

Method

Description

"ffill" or "pad"

Fills the value with the preceding row's value.

"bfill"

Fills the value with the next row's value.

This parameter takes effect only when value=None. By default, method="pad".

Return Value

A DataFrame with the specified values replaced with your desired values.

Examples

Replacing single value with a single value

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df
   A  B
0  1  3
1  2  4

To replace all values of 1 with 5:

df.replace(1, 5)
   A  B
0  5  3
1  2  4

Replacing multiple values with a single value

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df
   A  B
0  1  3
1  2  4

To replace all values of 1 and 2 with 5:

df.replace([1,2], 5)
   A  B
0  5  3
1  5  4

Replacing multiple values with corresponding values

Using array

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df
   A  B
0  1  3
1  2  4

To replace all values of 1 and 2 with 5 and 6, respectively:

df.replace([1,2], [5,6])
   A  B
0  5  3
1  6  4

Using dict

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df
   A  B
0  1  3
1  2  4

To replace all values of 1 and 3 with 5 and 6, respectively:

df.replace({1:5, 3:6})
   A  B
0  5  6
1  2  4

Replacing using regex

Consider the following DataFrame:

df = pd.DataFrame({"A":["alex","bob"], "B":["cathy","doge"]})
df
   A     B
0  alex  cathy
1  bob   doge

To replace all values starting with the letter "a" with "eric":

df.replace("^a.*", "eric", regex=True)
   A     B
0  eric  cathy
1  bob   doge

Notice how we had to enable regex by specifying regex=True.

Replacing for certain columns only

Replacing single value with a single value

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[1,2]})
df
   A  B
0  1  1
1  2  2

To replace all values of 1 with 3 for just column A. To do so, we must provide a dict, like follows:

df.replace({"A":1}, 3)
   A  B
0  3  1
1  2  2

Notice how column B is unaffected despite containing a value of 1.

Replacing multiple values with corresponding values

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df
   A  B
0  1  3
1  2  4

To replace values 1 and 2 with 5 and 6 respectively for just column A:

df.replace({"A":{1:5, 2:6}})
   A  B
0  5  3
1  6  4

Replacing using fills

When you don't explicitly provide the value parameter, the replace(~) function will automatically forward-fill the matched values, that is, replace the to_replace with the preceding row's value.

Consider the following DataFrame:

df = pd.DataFrame({"A":["a","b","c"]})
df
   A
0  a
1  b
2  c

Forward-fill

To forward-fill all occurrences of "b":

df.replace("b", method="ffill")   # or simply leave out the method parameter.
   A
0  a
1  a
2  c

Notice how the value a, which is the value preceding the match (i.e. "b"), was used as the filler.

WARNING

When there is no preceding row, nothing will be replaced.

Consider the case when we want to forward-fill all occurrences of "a":

df.replace("a", method="ffill")
   A
0  a
1  b
2  c

Notice how, even though we have "a" in our DataFrame, it did not get replaced. Since there is no preceding row, we have no filler and hence no replacement is performed.

Backward-fill

To backward-fill all occurrences of "b":

df.replace("b", method="bfill")
   A
0  a
1  c
2  c

Notice how the value "c", which comes right after the match (i.e. "b"), was used as the filler.

WARNING

When there is no next row, nothing will be replaced.

Consider the case when we want to backward-fill all occurrences of "c":

df.replace("c", method="bfill")
   A
0  a
1  b
2  c

Notice how, even though we have "c" in our DataFrame, it did not get replaced. Since there is no next row, we have no filler and hence no replacement is performed.

Limit

Consider the following DataFrame:

df = pd.DataFrame({"A":["a","b","b"]})
df
A
0 a
1 b
2 b

By default, limit=None, which means that there is no restriction on how many consecutive fills are allowed:

df.replace("b", method="ffill")
A
0 a
1 a
2 a

In contrast, setting limit=1 yields:

df.replace("b", method="ffill", limit=1)
A
0 a
1 a
2 b

Here, notice how b was filled only once. Also note that limit imposes a restriction on consecutive fills only.

Replacing in-place

To perform replacement in-place, we need to set inplace=True. This will directly perform the replace operation on the source DataFrame instead of creating a new one.

Consider the following DataFrame:

df = pd.DataFrame({"A":[1,2],"B":[3,4]})
df
   A  B
0  1  3
1  2  4

We replace all occurrences of 1 with 5 with inplace=True:

df.replace(1, 5, inplace=True)
df
   A  B
0  5  3
1  2  4

As shown in the output, the source DataFrame has been directly modified.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...