PySpark SQL Functions | expr method
PySpark SQL Functions'
expr(~) method parses the given SQL expression.
The SQL expression to parse.
A PySpark Column.
Consider the following PySpark DataFrame:
Using the expr method to convert column values to uppercase
expr(~) method takes in as argument a SQL expression, so we can use SQL functions such as
expr(~) method can often be more succinctly written using PySpark DataFrame's
selectExpr(~) method. For instance, the above case can be rewritten as:
I recommend that you use
selectExpr(~) whenever possible because:
you won't have to import the SQL functions library (
syntax is shorter
Parsing complex SQL expressions using expr method
Here's a more complex SQL expression using clauses like
Note the following:
we are checking for rows where
ageis larger than
we are assigning the label
Practical applications of boolean masks returned by expr method
As we can see in the above example, the
expr(~) method can return a boolean mask depending on the SQL expression you supply:
This allows us to check for the existence of rows that satisfy a given condition using
Here, we get
True because there exists at least one
True value in the boolean mask.
Mapping column values using expr method
We can map column values using
CASE WHEN in the
expr(~) method like so:
Here, note the following:
we are using the DataFrame's
withColumn(~)method to obtain a new PySpark DataFrame that includes the column returned by
selectExpr(~)method returns a new DataFrame based on the specified SQL expression.