PySpark Column | substr method
substr(~) method returns a
Column of substrings extracted from string column values.
The starting position. This position is inclusive and non-index, meaning the first character is in position 1. Negative position is allowed here as well - please consult the example below for clarification.
The length of the substring to extract.
Consider the following PySpark DataFrame:
Extracting substrings from column values in PySpark DataFrame
To extract substrings from column values:
Note the following:
F.col("name").substr(2,3)means that we are extracting a substring starting from the 2nd character and up to a length of 3.
even if the string is too short (e.g.
"Bob"), no error will be thrown.
alias(~)method is used to assign a label to our column.
Note that you could also specify a negative starting position like so:
Here, we are starting from the third character from the end (inclusive).