Updating rows based on column values in Pandas DataFrame
Start your free 7-days trial now!
Filling rows where condition is based on their values with a constant
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index=["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To fill rows where value for column A is 1 or value for column C is greater than or equal to 9:
df.loc[(df["A"] == 1) | (df["C"] >= 9)] = 0df
A B Ca 0 0 0b 2 5 8c 0 0 0
Here, we are first extracting the following Series of booleans:
(df["A"] == 1) | (df["C"] >= 9)
a Trueb Falsec Truedtype: bool
Passing in this boolean mask into the loc property will return the rows that correspond to True. We then fill these rows with the value 0 using standard assignment (=).
Filling certain row values where condition is based on their values with a constant
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index=["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
Instead of filling the entire rows with a constant, you can specify which rows to fill like so:
df.loc[(df["A"] == 1) | (df["C"] >= 9), "B"] = 0df
A B Ca 1 0 7b 2 5 8c 3 0 9
Here, the "B" after the comma indicates that we want to only update column B, and leave the other column values intact.
Filling rows where condition is based on a function of their values
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index= ["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To fill rows where the sum of the value for column A and the value for column B is greater than 6:
def criteria(my_df): return my_df["A"] + my_df["B"] > 6
df.loc[criteria] = 0df
A B Ca 1 4 7b 0 0 0c 0 0 0
To clarify, criteria(my_df) takes in as argument the source DataFrame, and returns a Series of booleans where True corresponds to the rows that satisfy the condition:
def criteria(df): print(df["A"] + df["B"] > 6) return df["A"] + df["B"] > 6
df.loc[criteria] = 0
a Falseb Truec Truedtype: bool
The loc property will then return all the rows that correspond to True in this boolean mask.
Filling rows using a function of their values
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index= ["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To double the values of rows where the value for column B is larger than 4:
df.loc[df["B"] > 4] = df * 2df
A B Ca 1 4 7b 4 10 16c 6 12 18
Here, loc returns all the rows where the value for column B is larger than 4. These rows are then assigned new values using =. Note that the assignment only updates the rows returned by loc, and so the rows that do not satisfy the condition will be kept intact.