# Creating new column using if, elif and else in Pandas DataFrame

schedule Jul 1, 2022
Last updated
Consider the following DataFrame:

``` df = pd.DataFrame({"A":[3,4,5],"B":[6,7,8]}, index=["a","b","c"])df A Ba 3 6b 4 7c 5 8 ```

# Using apply method

To add a new column using `if`, `elif` and `else`, pass a function to `apply(~)` method like so:

``` def foo(row):   # row is a Series if row.A + row.B == 9: return -1 elif row.A + row.B == 11: return 0 else: return 1df["C"] = df.apply(foo, axis=1)df A B Ca 3 6 -1b 4 7 0c 5 8 1 ```

Note the following:

• `axis=1` means that we pass a row to `foo(~)` instead of a column.

• `apply(~)` method is notorious for being slow for large DataFrames since it is not vectorised.

# Using loc

The `apply(~)` method is readable, but its performance is substandard for large DataFrames. If performance is a concern, use `loc` instead like so:

``` df.loc[df["A"] + df["B"] == 9, "C"] = -1df.loc[df["A"] + df["B"] == 11, "C"] = 0df.loc[df["A"] + df["B"] == 13, "C"] = 1df A B Ca 3 6 -1.0b 4 7 0.0c 5 8 1.0 ```

To explain, we are first fetching a Series of booleans based on a condition:

``` df["A"] + df["B"] == 9 a Trueb Falsec Falsedtype: bool ```

We then directly pass this into `loc`:

``` df.loc[df["A"] + df["B"] == 9, "C"] = -1df A B Ca 3 6 -1.0b 4 7 NaNc 5 8 NaN ```

We repeat this process for however many `if` statements we have.