PySpark DataFrame | withColumn method
withColumn(~) method can be used to:
add a new column
update an existing column
The label of the new column. If
colName already exists, then supplied
col will update the existing column. If
colName does not exist, then
col will be a new column.
The new column.
A PySpark DataFrame (
Consider the following PySpark DataFrame:
Updating column values based on original column values in PySpark
To update an existing column, supply its column label as the first argument:
+-----+---+| name|age|+-----+---+| Alex| 50|| Bob| 60||Cathy|100|+-----+---+
Note that you must pass in a
Column object as the second argument, and so you cannot simply use a list as the new column values.
Adding a new column to a PySpark DataFrame
To add a new column
F.lit(0) returns a
Column object holding
0s. Note that since column labels are case insensitive, if you pass in
"AGE" as the first argument, you would end up overwriting the