The flags to set from the re library (e.g. re.IGNORECASE). Multiple flags can be set by combining them with the bitwise | (e.g. re.IGNORECASE | re.MULTILINE).

3. expandlink | boolean | optional

If True, then a pattern with one group will return DataFrame.
If False, then a pattern with one group will return Series or Index.

By default, expand=True.

Return Value

If expand=True, then a DataFrame is returned.
If expand=False, then a pattern with one group will return Series or Index.
In case of multiple capturing groups, then a DataFrame is returned regardless of expand.

Examples

Basic usage

Consider the following DataFrame:


        
        
            
                
                
                    import pandas as pd
df = pd.DataFrame({'A':['a1','b2','c3']})
df
                
            
               A
0  a1
1  b2
2  c3

To get extract substrings that match a given regex:


        
        
            
                
                
                    df['A'].str.extract('[ab](\d+)')
                
            
               0
0  1
1  2
2  NaN

Here, [ab] means either a or b, and \d+ denotes a number. We use () to indicate the part we want to extract.

Multiple capturing groups

We can capture multiple groups using multiple brackets like so:


        
        
            
                
                
                    df['A'].str.extract('([ab])(\d+)')   # returns a DataFrame
                
            
               0    1
0  a    1
1  b    2
2  NaN  NaN

Setting expand

Consider the following DataFrame:


        
        
            
                
                
                    import pandas as pd
df = pd.DataFrame({'A':['a1','b2','c3']})
df
                
            
               A
0  a1
1  b2
2  c3

By default, expand=True, which means that even if there is only one capturing group, a DataFrame will be returned:


        
        
            
                
                
                    df['A'].str.extract('[ab](\d+)')   # expand=True
                
            
               0
0  1
1  2
2  NaN

To get a Series (or Index) instead, set expand=False:


        
        
            
                
                
                    df['A'].str.extract('[ab](\d+)', expand=False)
                
            
            0      1
1      2
2    NaN
Name: A, dtype: object