PySpark Column | getItem method
Start your free 7-days trial now!
PySpark Column's getItem(~) method extracts a value from the lists or dictionaries in a PySpark Column.
Parameters
1. key | any
The key value depends on the column type:
for lists,
keyshould be an integer index indicating the position of the value that you wish to extract.for dictionaries,
keyshould be the key of the values you wish to extract.
Return Value
A new PySpark Column.
Examples
Consider the following PySpark DataFrame:
+------+| vals|+------+|[5, 6]||[7, 8]|+------+
Extracting n-th item in lists
To extract the second value from each list in the vals column:
Note that we could also use [~] syntax instead of getItem(~):
Specifying an index position that is out of bounds for the list will return a null value:
Extracting values using keys in dictionaries
Consider the following PySpark DataFrame:
+----------------+| vals|+----------------+| {A -> 4}||{A -> 5, B -> 6}|+----------------+
To extract the value where the key is 'A':
Note that referring to keys that do not exist will return null: