search
Search
Publish
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
Twitter
Facebook

Pandas DataFrame | align method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Data Selection and Renaming
schedule Mar 9, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.align(~) method ensures that two DataFrames have the same column or row labels.

Parameters

1. other | DataFrame or Series

The DataFrame or Series that you want to align with.

2. join | string | optional

The type of join to perform:

  • "outer"

  • "inner"

  • "left"

  • "right"

By default, join="outer". See examples below for clarification.

3. axis | None or int or string | optional

The axis along which to perform the alignment:

Axis

Description

Align using row labels

0 or "index"

Align using column labels

1 or "columns"

By default, axis=None.

4. level | int or string | optional

The level to target. This is only relevant for Multi-index DataFrames. By default, level=None.

5. copy | boolean | optional

Whether to return a new copy. If copy=False and no reindexing is performed, then the original DataFrames/Series will be returned. By default, copy=True.

6. fill_value | scalar | optional

The value to fill missing values (NaN). By default, fill_value=np.NaN, that is, the missing values are left as is.

7. method | None or string | optional

The method by which to fill missing values:

Method

Description

Fill using the previous valid observation

"pad" or "ffill"

Fill using the next valid observation

"backfill" or "bfill"

By default, method=None.

8. limit | int | optional

The maximum number of consecutive fills allowed. For instance, if you have 3 consecutive NaNs, and you set limit=2, then only the first two NaNs will be filled, and the third will be left as is. By default, limit=None.

9. fill_axis | int or string | optional

Whether to apply the method horizontally or vertically:

Axis

Description

Filling is applied vertically.

0 or "index"

Filling is applied horizontally.

1 or "columns"

By default, fill_axis=0.

10. broadcast_axis | int or string | optional

The axis along which to perform broadcasting:

Axis

Description

Broadcast along the index axis.

0 or "index"

Broadcast along the columns axis.

1 or "columns"

By default, broadcast_axis=None. This is only relevant when the source DataFrame and other have different dimensions.

Return value

A sized-two tuple of DataFrames (aligned source DataFrame, other DataFrame/Series).

Examples

Specifying the join type

Consider the following two DataFrames:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

Outer full-join

To align the two DataFrame via outer full-join:

a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

Here, note the following:

  • By default, join="outer", which means that the resulting DataFrames will have all column labels present in both the input DataFrames. This is the reason we see column label E in a_one, and column label C in a_two.

  • The axis=1 parameter is telling Pandas to perform the alignment column-wise.

  • Despite the fact that new columns are added, they do not hold any values as they are filled with NaN.

Inner join

To align via an inner-join:

a_one, a_two = df_one.align(df_two, join="inner", axis=1)
[a_one] [a_two]
A B A B
0 1 3 a 7 11
1 2 4 b 8 12

We obtain this result because column labels "A" and "B" are present in both the DataFrames - every other columns are stripped away.

Left join

To align via a left-join:

a_one, a_two = df_one.align(df_two, join="left", axis=1)
a_one
[a_one] [a_two]
A B C A B C
0 1 3 5 a 7 11 NaN
1 2 4 6 b 8 12 NaN

By performing a left join, we are ensuring that the other DataFrame has all the column labels of the source DataFrame. This is why we see column C appear in a_two.

Specifying the axis

Once again, suppose we have the following two DataFrames:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

axis=0

a_one, a_two = df_one.align(df_two, axis=0)
[a_one] [a_two]
A B C A E B
0 1.0 3.0 5.0 0 NaN NaN NaN
1 2.0 4.0 6.0 1 NaN NaN NaN
a NaN NaN NaN a 7.0 9.0 11.0
b NaN NaN NaN b 8.0 10.0 12.0

By setting axis=0, we are telling Pandas to align the row labels, that is, for both resulting DataFrames to have the exact same row labels. However, notice how the column labels are kept intact for both DataFrames.

axis=1

a_one, a_two = df_one.align(df_two, axis=1)
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

By setting axis=1, we are telling Pandas to align the column labels, that is, for both resulting DataFrames to have the exact same column labels. However, notice how the row labels are kept intact for both DataFrames.

axis=None

The default parameter value is axis=None:

a_one, a_two = df_one.align(df_two) # axis=None
[a_one] [a_two]
A B C E A B C E
0 1.0 3.0 5.0 NaN 0 NaN NaN NaN NaN
1 2.0 4.0 6.0 NaN 1 NaN NaN NaN NaN
a NaN NaN NaN NaN a 7.0 11.0 NaN 9.0
b NaN NaN NaN NaN b 8.0 12.0 NaN 10.0

The axis=None is a combination of axis=0 and axis=1, that is, the resulting DataFrames will share the same row labels as well as the column labels.

Performing filling

Consider the same DataFrames we had before:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

Performing horizontal alignment using outer full-join yields:

a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

Notice how we end up with missing values here since no filling is performed by default.

To fill the NaNs, we can specify parameters method and optionally fill_axis:

a_one, a_two = df_one.align(df_two, axis=1, method="ffill", fill_axis=1)
a_one, a_two
[a_one] | [a_two]
A B C E | A B C E
0 1.0 3.0 5.0 5.0 | a 7.0 11.0 11.0 9.0
1 2.0 4.0 6.0 6.0 | b 8.0 12.0 12.0 10.0

Here, note the following:

  • method="ffill" applies a forward-fill, meaning NaNs are filled using the previous valid observation.

  • fill_axis=1 performs the forward-fill horizontally.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...