search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Replacing certain substrings in PySpark DataFrame column

Machine Learning
chevron_right
PySpark
chevron_right
Cookbooks
chevron_right
DataFrame Cookbooks
chevron_right
String operations
schedule Jul 1, 2022
Last updated
local_offer PySpark
Tags

To replace certain substrings in column values of a PySpark DataFrame, use either PySpark SQL Functions' translate(~) method or regexp_replace(~) method.

As an example, consider the following PySpark DataFrame:

df = spark.createDataFrame([["!A@lex"], ["B#!ob"]], ["name"])
df.show()
+------+
| name|
+------+
|!A@lex|
| B#ob|
+------+

Replacing certain characters

Suppose we wanted to make the following character replacements:

'!' replaced by '3'
'@' replaced by '4'
'#' replaced by '5'

We can use the translate(~) method like so:

from pyspark.sql import functions as F
df_new = df.withColumn("name", F.translate("name", "!@#", "345"))
df_new.show()
+------+
| new|
+------+
|3A4lex|
| B5ob|
+------+

The withColumn(~) here is used to replace the name column with our new column.

Replacing certain substrings

Consider the following PySpark DataFrame:

df = spark.createDataFrame([["A@@ex"], ["@Bob"]], ["name"])
df.show()
+-----+
| name|
+-----+
|A@@ex|
| @Bob|
+-----+

To replace certain substrings, use the regexp_replace(~) method:

from pyspark.sql import functions as F
df_new = df.withColumn("name", F.regexp_replace("name", "@@", "l"))
df_new.show()
+----+
|name|
+----+
|Alex|
|@Bob|
+----+

Here, note the following:

  • we are replacing the substring "@@" with the letter "l".

NOTE

The second argument of regexp_replace(~) is a regular expression. This means that certain characters such as $ and [ carry special meaning. To replace literal substrings, escape special regex characters using backslash \ (.g. \[).

Replacing certain substrings using Regex

Consider the following PySpark DataFrame:

df = spark.createDataFrame([["A@ex"], ["@Bob"]], ["name"])
df.show()
+----+
|name|
+----+
|A@ex|
|@Bob|
+----+

To replace @ if it's at the beginning of the string with another string, use regexp_replace(~):

from pyspark.sql import functions as F
df_new = df.withColumn("name", F.regexp_replace("name", "^@", "*"))
df_new.show()
+----+
|name|
+----+
|A@ex|
|*Bob|
+----+

Here, the regex ^@ represents @ that is at the start of the string.

Replacing certain substrings in multiple columns

The regexp_replace(~) can only be performed on one column at a time.

For example, consider the following PySpark DataFrame:

df = spark.createDataFrame([['@a','@b'], ['@c','@d']], ['A', 'B'])
df.show()
+---+---+
| A| B|
+---+---+
| @a| @b|
| @c| @d|
+---+---+

To replace the substring '@' with '#' for columns A and B:

str_before = '@'
str_after = '#'
df_new = df.withColumn('A', F.regexp_replace('A', str_before, str_after))
df_new = df_new.withColumn('B', F.regexp_replace('B', str_before, str_after))
df_new.show()
+---+---+
| A| B|
+---+---+
| #a| #b|
| #c| #d|
+---+---+
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!