Pandas Series str | extractall method
Start your free 7-days trial now!
Pandas Series' str.extractall(~) extracts all the matched substrings using regular expression.
To extract the first match instead of all matches, use str.extract(~).
Parameters
1. patlink | str
Regular expression to match.
2. flags | int | optional
The flags to set from the re library (e.g. re.IGNORECASE). Multiple flags can be set by combining them with the bitwise | (e.g. re.IGNORECASE | re.MULTILINE).
Return Value
A multi-index DataFrame.
Examples
Basic usage
Consider the following DataFrame:
Aa k23b 45kc 67k89
To get extract substrings that match a given regex:
df['A'].str.extractall('(\d+)') # returns a multi-index DataFrame
0match a 0 23b 0 45c 0 67 1 89
Here, the input string is a regex, and \d+ indicates a number, while () indicates the portion we want to extract.
Since the resulting DataFrame is a multi-index, we can obtain the matches for specific indexes like so:
df_result = df['A'].str.extractall('(\d+)')
0match 0 671 89
Multiple capturing groups
Consider the following DataFrame:
Aa k23b 45yc 67k89
We can capture multiple groups using multiple brackets:
df['A'].str.extractall('(\d+)([ky])')
0 1 match b 0 45 yc 0 67 k