# Conditionally updating values of a DataFrame in Pandas

schedule Aug 11, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Consider the following DataFrame:

``` df = pd.DataFrame({"A":[3,4],"B":[5,6]})df A B0 3 51 4 6 ```

# Conditionally updating all values

To update values that are larger than `3` in the entire DataFrame:

``` df[df > 3] = 10df A B0 3 101 10 10 ```

## Explanation

Here, we're first creating a DataFrame of booleans based on our criteria:

``` df > 3 A B0 False True1 True True ```

`True` represents entries that match our criteria. Placing this mask into our `df` using `[~]` returns the references to the matched entries:

``` df[df > 3] A B0 NaN 51 4.0 6 ```

We can then update the values using `=` like so:

``` df[df > 3] = 10df A B0 3 101 10 10 ```

# Conditionally updating values for specific columns

Consider the same DataFrame we had before:

``` df = pd.DataFrame({"A":[3,4],"B":[5,6]})df A B0 3 51 4 6 ```

Instead of updating the values of the entire DataFrame, we can select the columns to conditionally update using the `loc` property:

``` df.loc[df["A"] > 3, "A"] = 10df A B0 3 51 10 6 ```

Here, we are updating values that are greater than `3` in column `A`.

## Explanation

To break down the components of `loc`, here's the boolean mask we are passing in:

``` df["A"] > 3 0 False1 TrueName: A, dtype: bool ```

This is a `Series`, where `True` indicates the entry that satisfied the criteria.

The trap here is that, if we just pass this mask directly into `loc`, we end up with the second row being updated:

``` df.loc[df["A"] > 3] = 10df A B0 3 51 10 10 ```

This is not what we want since we want to perform updates on column `A` only. To this end, we need to specify the columns like so:

``` df.loc[df["A"] > 3, "A"] = 10df A B0 3 51 10 6 ```

# Conditionally updating values based on their value

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[3,4],"B":[5,6]})df A B0 3 51 4 6 ```

## All values in the DataFrame

To update values based on their value, use the `applymap(~)` method like so:

``` df = df.applymap(lambda val: 2*val if val > 3 else val)df A B0 3 101 8 12 ```

Here, we're doubling values that are greater than `3`. This approach gives you the flexibility of setting a new value that is based on the value to be updated, which isn't possible by using `loc` alone.

## Values of specific columns

To update values of specific columns based on their value:

``` df["A"] = df["A"].apply(lambda val: 2*val if val > 3 else val)df A B0 3 51 8 6 ```

Here, note the following:

