PySpark DataFrame | unionByName method
Start your free 7-days trial now!
PySpark DataFrame's unionByName(~) method concatenates PySpark DataFrames vertically by aligning the column labels.
Parameters
1. other | PySpark DataFrame
The other DataFrame with which to concatenate.
2. allowMissingColumns | boolean | optional
If
True, then no error will be thrown if the column labels of the two DataFrames do not align. If in case of misalignments, thennullvalues will be set.If
False, then an error will be thrown if the column labels of the two DataFrames do not align.
By default, allowMissingColumns=False.
Return Value
A new PySpark DataFrame.
Examples
Concatenating PySpark DataFrames vertically by aligning columns
Consider the following PySpark DataFrame:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|+---+---+---+
Here's another PySpark DataFrame:
+---+---+---+| A| B| C|+---+---+---+| 4| 5| 6|| 7| 8| 9|+---+---+---+
To concatenate these two DataFrames vertically by aligning the columns:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|| 4| 5| 6|| 7| 8| 9|+---+---+---+
Dealing with cases when column labels mismatch
By default, allowMissingColumns=False, which means that if the two DataFrames do not have exactly matching column labels, then an error will be thrown.
For example, consider the following PySpark DataFrames:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|+---+---+---+
Here's the other PySpark DataFrame that have slightly different column labels:
+---+---+---+| B| C| D|+---+---+---+| 4| 5| 6|| 7| 8| 9|+---+---+---+
Since the column labels do not match, calling unionByName(~) will result in an error:
AnalysisException: Cannot resolve column name "A" among (B, C, D)
To allow for misaligned columns, set allowMissingColumns=True:
+----+---+---+----+| A| B| C| D|+----+---+---+----+| 1| 2| 3|null||null| 4| 5| 6||null| 7| 8| 9|+----+---+---+----+
Notice how we have null values for the misaligned columns.