search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | align method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.align(~) method ensures that two DataFrames have the same column or row labels.

Parameters

1. other | DataFrame or Series

The DataFrame or Series that you want to align with.

2. join | string | optional

The type of join to perform:

  • "outer"

  • "inner"

  • "left"

  • "right"

By default, join="outer". See examples below for clarification.

3. axis | None or int or string | optional

The axis along which to perform the alignment:

Axis

Description

0 or "index"

Align using row labels

1 or "columns"

Align using column labels

By default, axis=None.

4. level | int or string | optional

The level to target. This is only relevant for Multi-index DataFrames. By default, level=None.

5. copy | boolean | optional

Whether to return a new copy. If copy=False and no reindexing is performed, then the original DataFrames/Series will be returned. By default, copy=True.

6. fill_value | scalar | optional

The value to fill missing values (NaN). By default, fill_value=np.NaN, that is, the missing values are left as is.

7. method | None or string | optional

The method by which to fill missing values:

Method

Description

"pad" or "ffill"

Fill using the previous valid observation

"backfill" or "bfill"

Fill using the next valid observation

By default, method=None.

8. limit | int | optional

The maximum number of consecutive fills allowed. For instance, if you have 3 consecutive NaNs, and you set limit=2, then only the first two NaNs will be filled, and the third will be left as is. By default, limit=None.

9. fill_axis | int or string | optional

Whether to apply the method horizontally or vertically:

Axis

Description

0 or "index"

Filling is applied vertically.

1 or "columns"

Filling is applied horizontally.

By default, fill_axis=0.

10. broadcast_axis | int or string | optional

The axis along which to perform broadcasting:

Axis

Description

0 or "index"

Broadcast along the index axis.

1 or "columns"

Broadcast along the columns axis.

By default, broadcast_axis=None. This is only relevant when the source DataFrame and other have different dimensions.

Return value

A sized-two tuple of DataFrames (aligned source DataFrame, other DataFrame/Series).

Examples

Specifying the join type

Consider the following two DataFrames:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

Outer full-join

To align the two DataFrame via outer full-join:

a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

Here, note the following:

  • By default, join="outer", which means that the resulting DataFrames will have all column labels present in both the input DataFrames. This is the reason we see column label E in a_one, and column label C in a_two.

  • The axis=1 parameter is telling Pandas to perform the alignment column-wise.

  • Despite the fact that new columns are added, they do not hold any values as they are filled with NaN.

Inner join

To align via an inner-join:

a_one, a_two = df_one.align(df_two, join="inner", axis=1)
[a_one] [a_two]
A B A B
0 1 3 a 7 11
1 2 4 b 8 12

We obtain this result because column labels "A" and "B" are present in both the DataFrames - every other columns are stripped away.

Left join

To align via a left-join:

a_one, a_two = df_one.align(df_two, join="left", axis=1)
a_one
[a_one] [a_two]
A B C A B C
0 1 3 5 a 7 11 NaN
1 2 4 6 b 8 12 NaN

By performing a left join, we are ensuring that the other DataFrame has all the column labels of the source DataFrame. This is why we see column C appear in a_two.

Specifying the axis

Once again, suppose we have the following two DataFrames:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

axis=0

a_one, a_two = df_one.align(df_two, axis=0)
[a_one] [a_two]
A B C A E B
0 1.0 3.0 5.0 0 NaN NaN NaN
1 2.0 4.0 6.0 1 NaN NaN NaN
a NaN NaN NaN a 7.0 9.0 11.0
b NaN NaN NaN b 8.0 10.0 12.0

By setting axis=0, we are telling Pandas to align the row labels, that is, for both resulting DataFrames to have the exact same row labels. However, notice how the column labels are kept intact for both DataFrames.

axis=1

a_one, a_two = df_one.align(df_two, axis=1)
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

By setting axis=1, we are telling Pandas to align the column labels, that is, for both resulting DataFrames to have the exact same column labels. However, notice how the row labels are kept intact for both DataFrames.

axis=None

The default parameter value is axis=None:

a_one, a_two = df_one.align(df_two) # axis=None
[a_one] [a_two]
A B C E A B C E
0 1.0 3.0 5.0 NaN 0 NaN NaN NaN NaN
1 2.0 4.0 6.0 NaN 1 NaN NaN NaN NaN
a NaN NaN NaN NaN a 7.0 11.0 NaN 9.0
b NaN NaN NaN NaN b 8.0 12.0 NaN 10.0

The axis=None is a combination of axis=0 and axis=1, that is, the resulting DataFrames will share the same row labels as well as the column labels.

Performing filling

Consider the same DataFrames we had before:

df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two]
A B C A E B
0 1 3 5 a 7 9 11
1 2 4 6 b 8 10 12

Performing horizontal alignment using outer full-join yields:

a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two]
A B C E | A B C E
0 1 3 5 NaN | a 7 12 NaN 9
1 2 4 6 NaN | b 8 12 NaN 10

Notice how we end up with missing values here since no filling is performed by default.

To fill the NaNs, we can specify parameters method and optionally fill_axis:

a_one, a_two = df_one.align(df_two, axis=1, method="ffill", fill_axis=1)
a_one, a_two
[a_one] | [a_two]
A B C E | A B C E
0 1.0 3.0 5.0 5.0 | a 7.0 11.0 11.0 9.0
1 2.0 4.0 6.0 6.0 | b 8.0 12.0 12.0 10.0

Here, note the following:

  • method="ffill" applies a forward-fill, meaning NaNs are filled using the previous valid observation.

  • fill_axis=1 performs the forward-fill horizontally.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!