Pandas DataFrame | update method
Start your free 7-days trial now!
Pandas DataFrame.update(~) method replaces the values in the source DataFrame using non-NaN values from another DataFrame.
The update is done in-place, which means that the source DataFrame will be directly modified.
Parameters
1. otherlink | Series or DataFrame
The Series or DataFrame that holds the values to update the source DataFrame.
If a
Seriesis provided, then its name attribute must match the name of the column you wish to update.If a
DataFrameis provided, then the column names must match.
2. overwritelink | boolean | optional
If
True, then all values in the source DataFrame will be updated usingother.If
False, then onlyNaNvalues in the source DataFrame will be updated usingother.
By default, overwrite=True.
3. filter_funclink | function | optional
The values you wish to update. The function takes in a column as a 1D Numpy array, and returns an 1D array of booleans that indicate whether or not a value should be updated.
4. errorslink | string | optional
Whether or not to raise errors:
Value | Description |
|---|---|
| An error will be raised if a non- |
| No error will be raised. |
By default, errors="ignore".
Return value
Nothing is returned since the update is performed in-place. This means that the source DataFrame will be directly modified.
Examples
Basic usage
Consider the following DataFrames:
Notice how the two DataFrames both have a column with label B. Performing the update gives:
df.update(df_other)df
A B0 1 51 2 6
The values in column B of the original DataFrame have been replaced by those in column B of the other DataFrame.
Case when other DataFrame contains missing values
Consider the following DataFrames:
Notice how the other DataFrame has a NaN.
Performing the update gives:
df.update(df_other)df
A B0 1 5.01 2 4.0
The takeaway here is that if the new value is a missing value, then no update is performed for that value.
Specifying the overwrite parameter
Consider the following DataFrames:
Performing the update with default parameter overwrite=True gives:
df.update(df_other)df
A B0 1 5.01 2 6.0
Notice how all the values in column B of the source DataFrame got updated.
Now, let's compare this with overwrite=False:
df.update(df_other, overwrite=False)df
A B0 1 3.01 2 6.0
Here, the value 3 was left intact, while the NaN was replaced by the corresponding value of 6. This is because overwrite=False ensures that only NaNs get updated, while non-NaN values remain the unchanged.
Specifying the filter_func parameter
Consider the following DataFrames:
Suppose we only wanted to only update values that were larger than 3. We could do so by specifying a custom function like so:
def foo(vals): return vals > 3
df.update(df_other, filter_func=foo)df
A B0 1 31 2 6
Notice how the value 3 was left unchanged.
Specifying the errors parameter
Consider the following DataFrames:
Performing the update with the default parameter errors="ignore" gives:
df.update(df_other) # errors="ignore"df
A B0 1 51 2 6
The update completes without any error, even if non-NaN values are updated with non-NaN values.
Performing the update with errors="raise" gives:
df.update(df_other, errors="raise")df
ValueError: Data overlaps.
We end up with an error because we are trying to update non-NaN values with non-NaN values. Note that if column B in df_other just had NaN as its values, then no error will be thrown.