search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

PySpark DataFrame | unionByName method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

PySpark DataFrame's unionByName(~) method concatenates PySpark DataFrames vertically by aligning the column labels.

Parameters

1. other | PySpark DataFrame

The other DataFrame with which to concatenate.

2. allowMissingColumns | boolean | optional

  • If True, then no error will be thrown if the column labels of the two DataFrames do not align. If in case of misalignments, then null values will be set.

  • If False, then an error will be thrown if the column labels of the two DataFrames do not align.

By default, allowMissingColumns=False.

Return Value

A new PySpark DataFrame.

Examples

Concatenating PySpark DataFrames vertically by aligning columns

Consider the following PySpark DataFrame:

df1 = spark.createDataFrame([[1, 2, 3]], ["A", "B", "C"])
df1.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
+---+---+---+

Here's another PySpark DataFrame:

df2 = spark.createDataFrame([[4, 5, 6], [7, 8, 9]], ["A", "B", "C"])
df2.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

To concatenate these two DataFrames vertically by aligning the columns:

df1.unionByName(df2).show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

Dealing with cases when column labels mismatch

By default, allowMissingColumns=False, which means that if the two DataFrames do not have exactly matching column labels, then an error will be thrown.

For example, consider the following PySpark DataFrames:

df1 = spark.createDataFrame([[1, 2, 3]], ["A", "B", "C"])
df1.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
+---+---+---+

Here's the other PySpark DataFrame that have slightly different column labels:

df2 = spark.createDataFrame([[4, 5, 6], [7, 8, 9]], ["B", "C", "D"])
df2.show()
+---+---+---+
| B| C| D|
+---+---+---+
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

Since the column labels do not match, calling unionByName(~) will result in an error:

df1.unionByName(df2).show() # allowMissingColumns=False
AnalysisException: Cannot resolve column name "A" among (B, C, D)

To allow for misaligned columns, set allowMissingColumns=True:

df1.unionByName(df2, allowMissingColumns=True).show()
+----+---+---+----+
| A| B| C| D|
+----+---+---+----+
| 1| 2| 3|null|
|null| 4| 5| 6|
|null| 7| 8| 9|
+----+---+---+----+

Notice how we have null values for the misaligned columns.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!