What does this mean?
Why is this true?
Give me some examples!
# PySpark SQL Functions | instr method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
expand_more
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

PySpark SQL Functions' `instr(~)` method returns a new PySpark Column holding the position of the first occurrence of the specified substring in each value of the specified column.

WARNING

The position is not index-based, and starts from 1 instead of 0.

# Parameters

1. `str` | `string` or `Column`

The column to perform the operation on.

2. `substr` | `string`

The substring of which to check the position.

# Return Value

A PySpark DataFrame.

# Examples

Consider the following PySpark DataFrame:

``` df = spark.createDataFrame([("ABA",), ("BBB",), ("CCC",), (None,)], ["x",])df.show() +----+| x|+----+| ABA|| BBB|| CCC||null|+----+ ```

## Getting the position of the first occurrence of a substring in PySpark Column

To get the position of the first occurrence of the substring `"B"` in column `x`, use the `instr(~)` method:

``` df.select(F.instr("x", "B")).show() +-----------+|instr(x, B)|+-----------+| 2|| 1|| 0|| null|+-----------+ ```

Here, note the following:

• we see `2` returned for the column value `"ABA"` because the substring `"B"` occurs in the 2nd position - remember, this method counts position from `1` instead of `0`.

• if the substring does not exist in the string, then a value of `0` is returned. This is the case for `"Cathy"` because this string does not include `"B"`.

• if the string is `null`, then the result will also be `null`.

