PySpark SQL Functions | when method
Start your free 7-days trial now!
PySpark SQL Functions' when(~) method is used to update values of a PySpark DataFrame column to other values based on the given conditions.
The when(~) method is often used in conjunction with the otherwise(~) method to implement an if-else logic. See examples below for clarification.
Parameters
1. condition | Column | optional
A boolean Column expression. See examples below for clarification.
2. value | any | optional
The value to map to if the condition is true.
Return Value
A PySpark Column (pyspark.sql.column.Column).
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 24||Cathy| 22|+-----+---+
Implementing if-else logic using when and otherwise
To rename the name Alex to Doge, and others to Eric:
+-----------------------------------------------+|CASE WHEN (name = Alex) THEN Doge ELSE Eric END|+-----------------------------------------------+| Doge|| Eric|| Eric|+-----------------------------------------------+
Notice how we used the method otherwise(~) to set values for cases when the conditions are not met.
Case when otherwise method is not used
Note that if you do not include the otherwise(~) method, then any value that does not fulfil the if condition will be assigned null:
Specifying multiple conditions
Using pipeline and ampersand operator
We can combine conditions using & (and) and | (or) like so:
+----+---+|name|age|+----+---+|Doge| 20||Eric| 24||Eric| 22|+----+---+
Chaining the when method
The when(~) method can be chained like so:
+----------------------------------------------------------------------------+|CASE WHEN (name = Alex) THEN Doge WHEN (name = Bob) THEN Zebra ELSE Eric END|+----------------------------------------------------------------------------+| Doge|| Zebra|| Eric|+----------------------------------------------------------------------------+
Setting a new value based on original value
To set a new value based on the original value:
Using an alias
By default, the new column label is convoluted:
+-----------------------------------------------+|CASE WHEN (name = Alex) THEN Doge ELSE Eric END|+-----------------------------------------------+| Doge|| Eric|| Eric|+-----------------------------------------------+
To assign a new column, simply use the alias(~) method: