Pandas DataFrame | align method
Start your free 7-days trial now!
Pandas DataFrame.align(~) method ensures that two DataFrames have the same column or row labels.
Parameters
1. other | DataFrame or Series
The DataFrame or Series that you want to align with.
2. join | string | optional
The type of join to perform:
"outer""inner""left""right"
By default, join="outer". See examples below for clarification.
3. axis | None or int or string | optional
The axis along which to perform the alignment:
Axis | Description |
|---|---|
| Align using row labels |
| Align using column labels |
By default, axis=None.
4. level | int or string | optional
The level to target. This is only relevant for Multi-index DataFrames. By default, level=None.
5. copy | boolean | optional
Whether to return a new copy. If copy=False and no reindexing is performed, then the original DataFrames/Series will be returned. By default, copy=True.
6. fill_value | scalar | optional
The value to fill missing values (NaN). By default, fill_value=np.NaN, that is, the missing values are left as is.
7. method | None or string | optional
The method by which to fill missing values:
Method | Description |
|---|---|
| Fill using the previous valid observation |
| Fill using the next valid observation |
By default, method=None.
8. limit | int | optional
The maximum number of consecutive fills allowed. For instance, if you have 3 consecutive NaNs, and you set limit=2, then only the first two NaNs will be filled, and the third will be left as is. By default, limit=None.
9. fill_axis | int or string | optional
Whether to apply the method horizontally or vertically:
Axis | Description |
|---|---|
| Filling is applied vertically. |
| Filling is applied horizontally. |
By default, fill_axis=0.
10. broadcast_axis | int or string | optional
The axis along which to perform broadcasting:
Axis | Description |
|---|---|
| Broadcast along the index axis. |
| Broadcast along the columns axis. |
By default, broadcast_axis=None. This is only relevant when the source DataFrame and other have different dimensions.
Return value
A sized-two tuple of DataFrames (aligned source DataFrame, other DataFrame/Series).
Examples
Specifying the join type
Consider the following two DataFrames:
df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two] A B C A E B0 1 3 5 a 7 9 111 2 4 6 b 8 10 12
Outer full-join
To align the two DataFrame via outer full-join:
a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two] A B C E | A B C E0 1 3 5 NaN | a 7 12 NaN 91 2 4 6 NaN | b 8 12 NaN 10
Here, note the following:
By default,
join="outer", which means that the resulting DataFrames will have all column labels present in both the input DataFrames. This is the reason we see column labelEina_one, and column labelCina_two.The
axis=1parameter is telling Pandas to perform the alignment column-wise.Despite the fact that new columns are added, they do not hold any values as they are filled with
NaN.
Inner join
To align via an inner-join:
a_one, a_two = df_one.align(df_two, join="inner", axis=1)
[a_one] [a_two] A B A B 0 1 3 a 7 111 2 4 b 8 12
We obtain this result because column labels "A" and "B" are present in both the DataFrames - every other columns are stripped away.
Left join
To align via a left-join:
a_one, a_two = df_one.align(df_two, join="left", axis=1)a_one
[a_one] [a_two] A B C A B C0 1 3 5 a 7 11 NaN1 2 4 6 b 8 12 NaN
By performing a left join, we are ensuring that the other DataFrame has all the column labels of the source DataFrame. This is why we see column C appear in a_two.
Specifying the axis
Once again, suppose we have the following two DataFrames:
df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two] A B C A E B0 1 3 5 a 7 9 111 2 4 6 b 8 10 12
axis=0
a_one, a_two = df_one.align(df_two, axis=0)
[a_one] [a_two] A B C A E B0 1.0 3.0 5.0 0 NaN NaN NaN1 2.0 4.0 6.0 1 NaN NaN NaNa NaN NaN NaN a 7.0 9.0 11.0b NaN NaN NaN b 8.0 10.0 12.0
By setting axis=0, we are telling Pandas to align the row labels, that is, for both resulting DataFrames to have the exact same row labels. However, notice how the column labels are kept intact for both DataFrames.
axis=1
a_one, a_two = df_one.align(df_two, axis=1)
[a_one] | [a_two] A B C E | A B C E0 1 3 5 NaN | a 7 12 NaN 91 2 4 6 NaN | b 8 12 NaN 10
By setting axis=1, we are telling Pandas to align the column labels, that is, for both resulting DataFrames to have the exact same column labels. However, notice how the row labels are kept intact for both DataFrames.
axis=None
The default parameter value is axis=None:
a_one, a_two = df_one.align(df_two) # axis=None
[a_one] [a_two] A B C E A B C E0 1.0 3.0 5.0 NaN 0 NaN NaN NaN NaN1 2.0 4.0 6.0 NaN 1 NaN NaN NaN NaNa NaN NaN NaN NaN a 7.0 11.0 NaN 9.0b NaN NaN NaN NaN b 8.0 12.0 NaN 10.0
The axis=None is a combination of axis=0 and axis=1, that is, the resulting DataFrames will share the same row labels as well as the column labels.
Performing filling
Consider the same DataFrames we had before:
df_one = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})df_two = pd.DataFrame({"A":[7,8], "E":[9,10], "B":[11,12]}, index=["a","b"])
[df_one] [df_two] A B C A E B0 1 3 5 a 7 9 111 2 4 6 b 8 10 12
Performing horizontal alignment using outer full-join yields:
a_one, a_two = df_one.align(df_two, axis=1) # join="outer"
[a_one] | [a_two] A B C E | A B C E0 1 3 5 NaN | a 7 12 NaN 91 2 4 6 NaN | b 8 12 NaN 10
Notice how we end up with missing values here since no filling is performed by default.
To fill the NaNs, we can specify parameters method and optionally fill_axis:
a_one, a_two = df_one.align(df_two, axis=1, method="ffill", fill_axis=1)a_one, a_two
[a_one] | [a_two] A B C E | A B C E0 1.0 3.0 5.0 5.0 | a 7.0 11.0 11.0 9.01 2.0 4.0 6.0 6.0 | b 8.0 12.0 12.0 10.0
Here, note the following:
method="ffill"applies a forward-fill, meaningNaNs are filled using the previous valid observation.fill_axis=1performs the forward-fill horizontally.