PySpark SQL Functions | instr method
Start your free 7-days trial now!
PySpark SQL Functions' instr(~) method returns a new PySpark Column holding the position of the first occurrence of the specified substring in each value of the specified column.
The position is not index-based, and starts from 1 instead of 0.
Parameters
1. str | string or Column
The column to perform the operation on.
2. substr | string
The substring of which to check the position.
Return Value
A PySpark DataFrame.
Examples
Consider the following PySpark DataFrame:
        
        
            
                
                
                    
                
            
            +----+|   x|+----+| ABA|| BBB|| CCC||null|+----+
        
    Getting the position of the first occurrence of a substring in PySpark Column
To get the position of the first occurrence of the substring "B" in column x, use the instr(~) method:
        
        
    Here, note the following:
- we see - 2returned for the column value- "ABA"because the substring- "B"occurs in the 2nd position - remember, this method counts position from- 1instead of- 0.
- if the substring does not exist in the string, then a value of - 0is returned. This is the case for- "Cathy"because this string does not include- "B".
- if the string is - null, then the result will also be- null.
