PySpark SQL Functions | regexp_replace method
Start your free 7-days trial now!
PySpark SQL Functions' regexp_replace(~) method replaces the matched regular expression with the specified string.
Parameters
1. str | string or Column
The column whose values will be replaced.
2. pattern | string or Regex
The regular expression to be replaced.
3. replacement | string
The string value to replace pattern.
Return Value
A new PySpark Column.
Examples
Consider the following PySpark DataFrame:
+----+---+|name|age|+----+---+|Alex| 10||Mile| 30|+----+---+
Replacing a specific substring
To replace the substring 'le' with 'LE', use regexp_replace(~):
The second argument is a regular expression, so characters such as $ and [ will carry special meaning. In order to treat these special characters as literal characters, escape them using the \ character (e.g. \$).
Passing in a Column object
Instead of referring to the column by its name, we can also pass in a Column object:
Getting a new PySpark DataFrame
We can use the PySpark DataFrame's withColumn(~) method to obtain a new PySpark DataFrame with the updated column like so:
+----+---+|name|age|+----+---+|ALEx| 10||MiLE| 30|+----+---+
Replacing a specific substring using regular expression
To replace the substring 'le' that occur only at the end with 'LE', use regexp_replace(~):
Here, we are using the special regular expression character '$' that only matches patterns occurring at the end of the string. This is the reason no replacement was done for the 'le' in Alex.