Creating a new column based on other columns in Pandas DataFrame
Start your free 7-days trial now!
To create a new column based on other columns, either:
use column-arithmetics for fastest performance.
use NumPy's
where(~)method for creating binary columnsuse the
apply(~)method, which is the slowest but offers the most flexibilityuse the Series'
replace(~)method for mapping new values from existing columns.
Creating new columns using arithmetics
Consider the following DataFrame:
df
A Ba 3 5b 4 6
The fastest and simplest way of creating a new column is to use simple column-arithmetics:
df["C"] = df["A"] + df["B"]df
A B Ca 3 5 8b 4 6 10
For slightly more complicated operations, use the DataFrame's native methods:
df
A B Ca 3 5 5b 4 6 6
Note the following:
we are populating the new column
Cwith the maximum of each row (axis=1).the return type of
df.max(axis=1)isSeries.
Creating binary column values
Consider the following Pandas DataFrame:
To create a new column of binary values that are based on the age column, use NumPy's where(~) method:
Here, the first argument of the where(~) method is a boolean mask. If the boolean value is True, then resulting value will be 'JUNIOR', otherwise the value will be 'SENIOR'.
Creating column with multiple values
Once again, consider the following Pandas DataFrame:
To create a new column with multiple values based on the age column, use the apply(~) function:
Here, the apply(~) function is iteratively called for each row, and takes in as argument a Series representing a row.
Creating column via mapping
Consider the same Pandas DataFrame as before:
To create a new column that is based on some mapping of an existing column:
mapping = { 'Alex': 'ALEX', 'Bob': 'BOB', 'Cathy': 'CATHY'}df['upper_name'] = df['name'].replace(mapping)
name age upper_name0 Alex 20 ALEX1 Bob 30 BOB2 Cathy 40 CATHY
Creating column using the assign method
Consider the following Pandas DataFrame:
df
A Ba 3 5b 4 6
We could also use the DataFrame's assign(~) method, which takes in as argument a function with the DataFrame as the input and returns the new column values:
A B C0 3 5 01 4 6 0
Note the following:
if the sum of column
Ais larger than that of columnB, then[-1,-1]will be used as the new column, otherwise[0,0]will be used.the keyword argument (
C) became the new column label.