PySpark DataFrame | union method
Start your free 7-days trial now!
union(~) method concatenates two DataFrames vertically based on column positions.
Note the following:
the two DataFrames must have the same number of columns
the DataFrames will be vertically concatenated based on the column position rather than the labels. See examples below for clarification.
other | PySpark DataFrame
The other DataFrame with which to vertically concatenate with.
A PySpark DataFrame (
Concatenating PySpark DataFrames vertically based on column position
Consider the following two PySpark DataFrames:
The other DataFrame:
To concatenate the two DataFrames:
+-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 24||Cathy| 22|| Alex| 25|| Doge| 30|| Eric| 50|+-----+---+
Union is based on column position
Consider the following PySpark DataFrames:
The other PySpark DataFrame has a different column called
Joining the two DataFrames using
+-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 24||Cathy| 22|| Alex|250|| Doge|200|| Eric|100|+-----+---+
Notice how even though the two DataFrames had separate column labels, the method still concatenated them. This is because the concatenation is based on the column positions and so the labels play no role here. You should be wary of this behaviour because the
union(~) method may yield incorrect DataFrames like the one above without throwing an error!
unionByName(~)method concatenates PySpark DataFrames vertically by aligning the column labels.