PySpark DataFrame | colRegex method
Start your free 7-days trial now!
PySpark DataFrame's colRegex(~) method returns a Column object whose label match the specified regular expression. This method also allows multiple columns to be selected.
Parameters
1. colName | string
The regex to match the label of the columns.
Return Value
A PySpark Column.
Examples
Selecting columns using regular expression in PySpark
Consider the following PySpark DataFrame:
        
        
            
                
                
                    
                
            
            +-----+----+| col1|col2|+-----+----+| Alex|  20||  Bob|  30||Cathy|  40|+-----+----+
        
    To select columns using regular expression, use the colRegex(~) method:
        
        
    Here, note the following:
- we wrapped the column label using backticks - `- this is required otherwise PySpark will throw an error.
- the regular expression - col[123]matches columns with label- col1,- col2or- col3.
- the - select(~)method is used to convert the- Columnobject into a PySpark DataFrame.
Getting column labels that match regular expression as list of strings in PySpark
To get column labels as a list of strings instead of PySpark Column objects:
        
        
    Here, we are using the columns property of the PySpark DataFrame returned by select(~).
