# df.select(F.col('my_col').getItem(1)) also works!
df_res = df.select(F.col('my_col')[1].alias('second_value'))
df_res.show()
                
            
            +------------+
|second_value|
+------------+
|          20|
|          40|
+------------+

Here, we are assigning a label to the Column returned by F.col('my_col')[0] using alias(~).

Equivalently, we can use the element_at(~) method instead of using the [~] syntax:


        
        
            
                
                
                    df_res = df.select(F.element_at('my_col',2).alias('second_value'))
df_res.show()
                
            
            +------------+
|second_value|
+------------+
|          20|
|          40|
+------------+

Note that element_at(~) does not use index-based positioning - the second value in a list is denoted by position 2.

Extracting values from the back

I recommend using element_at(~) rather than [~] syntax because element_at(~) allows you to extract elements from the back using negative positioning:


        
        
            
                
                
                    df_res = df.select(F.element_at('my_col', -1).alias('last_val'))
df_res.show()
                
            
            +--------+
|last_val|
+--------+
|      20|
|      40|
+--------+

This is not possible using the [~] syntax or the getItem(~) method.

In case of out-of-bound indexes

Specifying out-of-bound indexes will return null values:


        
        
            
                
                
                    df_res = df.select(F.element_at('my_col',5))
df_res.show()
                
            
            +---------------------+
|element_at(my_col, 5)|
+---------------------+
|                 null|
|                 null|
+---------------------+

Extracting multiple values from arrays in PySpark Column

To extract multiple values from arrays in a PySpark Column:


        
        
            
                
                
                    col = F.col('my_col')
df_res = df.select(col[0], col[1])
df_res.show()
                
            
            +---------+---------+
|my_col[0]|my_col[1]|
+---------+---------+
|       10|       20|
|       30|       40|
+---------+---------+

Here, we are extracting the first as well as second values of each list.

Equivalently, we could use element_at(~) once again:


        
        
            
                
                
                    col = F.col('my_col')
df_res = df.select(F.element_at(col,1), F.element_at(col,-1))
df_res.show()
                
            
            +---------------------+----------------------+
|element_at(my_col, 1)|element_at(my_col, -1)|
+---------------------+----------------------+
|                   10|                    20|
|                   30|                    40|
+---------------------+----------------------+

Again, you can provide an alias for each column by using the alias(~) method:


        
        
            
                
                
                    col = F.col('my_col')
df_res = df.select(col[0].alias('1st'), col[1].alias('2nd'))
df_res.show()
                
            
            +---+---+
|1st|2nd|
+---+---+
| 10| 20|
| 30| 40|
+---+---+