Pandas Series str | extract method
Start your free 7-days trial now!
Pandas Series str.extract(~) extracts the first matched substrings using regular expression.
To extract all matches instead of just the first one, use str.extractall(~).
Parameters
1. patlink | str
Regular expression to match.
2. flags | int | optional
The flags to set from the re library (e.g. re.IGNORECASE). Multiple flags can be set by combining them with the bitwise | (e.g. re.IGNORECASE | re.MULTILINE).
3. expandlink | boolean | optional
If
True, then a pattern with one group will return DataFrame.If
False, then a pattern with one group will returnSeriesorIndex.
By default, expand=True.
Return Value
If
expand=True, then a DataFrame is returned.If
expand=False, then a pattern with one group will returnSeriesorIndex.In case of multiple capturing groups, then a DataFrame is returned regardless of
expand.
Examples
Basic usage
Consider the following DataFrame:
A0 a11 b22 c3
To get extract substrings that match a given regex:
df['A'].str.extract('[ab](\d+)')
00 11 22 NaN
Here, [ab] means either a or b, and \d+ denotes a number. We use () to indicate the part we want to extract.
Multiple capturing groups
We can capture multiple groups using multiple brackets like so:
df['A'].str.extract('([ab])(\d+)') # returns a DataFrame
0 10 a 11 b 22 NaN NaN
Setting expand
Consider the following DataFrame:
A0 a11 b22 c3
By default, expand=True, which means that even if there is only one capturing group, a DataFrame will be returned:
df['A'].str.extract('[ab](\d+)') # expand=True
00 11 22 NaN
To get a Series (or Index) instead, set expand=False:
df['A'].str.extract('[ab](\d+)', expand=False)
0 11 22 NaNName: A, dtype: object