search
Search
Publish
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
Twitter
Facebook

PySpark DataFrame | unionByName method

Machine Learning
chevron_right
PySpark
chevron_right
Documentation
chevron_right
PySpark DataFrame
schedule Jul 1, 2022
Last updated
local_offer PySpark
Tags

PySpark DataFrame's unionByName(~) method concatenates PySpark DataFrames vertically by aligning the column labels.

Parameters

1. other | PySpark DataFrame

The other DataFrame with which to concatenate.

2. allowMissingColumns | boolean | optional

  • If True, then no error will be thrown if the column labels of the two DataFrames do not align. If in case of misalignments, then null values will be set.

  • If False, then an error will be thrown if the column labels of the two DataFrames do not align.

By default, allowMissingColumns=False.

Return Value

A new PySpark DataFrame.

Examples

Concatenating PySpark DataFrames vertically by aligning columns

Consider the following PySpark DataFrame:

df1 = spark.createDataFrame([[1, 2, 3]], ["A", "B", "C"])
df1.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
+---+---+---+

Here's another PySpark DataFrame:

df2 = spark.createDataFrame([[4, 5, 6], [7, 8, 9]], ["A", "B", "C"])
df2.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

To concatenate these two DataFrames vertically by aligning the columns:

df1.unionByName(df2).show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

Dealing with cases when column labels mismatch

By default, allowMissingColumns=False, which means that if the two DataFrames do not have exactly matching column labels, then an error will be thrown.

For example, consider the following PySpark DataFrames:

df1 = spark.createDataFrame([[1, 2, 3]], ["A", "B", "C"])
df1.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
+---+---+---+

Here's the other PySpark DataFrame that have slightly different column labels:

df2 = spark.createDataFrame([[4, 5, 6], [7, 8, 9]], ["B", "C", "D"])
df2.show()
+---+---+---+
| B| C| D|
+---+---+---+
| 4| 5| 6|
| 7| 8| 9|
+---+---+---+

Since the column labels do not match, calling unionByName(~) will result in an error:

df1.unionByName(df2).show() # allowMissingColumns=False
AnalysisException: Cannot resolve column name "A" among (B, C, D)

To allow for misaligned columns, set allowMissingColumns=True:

df1.unionByName(df2, allowMissingColumns=True).show()
+----+---+---+----+
| A| B| C| D|
+----+---+---+----+
| 1| 2| 3|null|
|null| 4| 5| 6|
|null| 7| 8| 9|
+----+---+---+----+

Notice how we have null values for the misaligned columns.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!