Python | merge_asof method
Start your free 7-days trial now!
Pandas merge_asof(~) method is used to perform a left join on two DataFrames where the join keys are matched not by equality but by proximity.
Both left and right DataFrames must be sorted by the join key.
Parameters
1. leftlink | DataFrame
The left DataFrame to perform the join on.
2. rightlink | DataFrame
The right DataFrame to perform the join on.
3. onlink | string
The label of the column to join on. The label must be present in both left and right.
The on parameter is there only for convenience. If the column to join on have different labels, then you must use a combination of left_on, right_on, left_index or right_index.
4. left_on | string or array-like
The label of the column in left to perform join on.
5. right_on | string or array-like
The label of the column in right to perform join on.
6. left_index | boolean | optional
Whether or not to perform the join on the index of the left DataFrame. By default, left_index=False.
7. right_index | boolean | optional
Whether or not to perform the join on the index of the right DataFrame. By default, right_index=False.
Many textbooks and documentation use the words merge keys or join keys to denote the columns by which a join is performed.
8. by | string or list<string> | optional
The label of the columns that must additionally match up for the join to take effect. Just like on, by must be present in both left and right.
9. left_bylink | string | optional
The label of the column in left to perform additional matching on. See examples below for clarification.
10. right_by | string | optional
The label of the column in right to perform additional matching on. See examples below for clarification.
If left_by is specified, then right_by must also be specified, and vice versa.
11. suffixeslink | tuple of (string, string) | optional
The suffix names to append to the duplicate column labels in the resulting DataFrame. You can also pass a single None instead of a string in suffixes to indicate that the left or right column label should be left as is. By default, suffixes=("_x", "_y").
12. tolerance | int or Timedelta | optional
The maximum acceptable difference between a pair of join keys. By default, tolerance=None.
13. allow_exact_matches | boolean | optional
Whether or not to allow exact matches between a pair of join keys. By default, allow_exact_matches=True.
14. direction | string | optional
The direction of proximity matching:
Value | Description |
|---|---|
| Match if left join key is smaller than (or equal to) the right join key. |
| Match if left join key is larger than (or equal to) the right join key, |
| Match regardless of the relative size of the two keys. |
By default, direction="backward". Note that the (or equal to) part depends on allow_exact_matches.
Return Value
The merged DataFrame.
Examples
Basic usage
Consider the following two DataFrames:
Performing the join on column B:
pd.merge_asof(df, df_other, on="B")
A B C0 2 3 71 3 5 82 4 9 8
Note the following:
the original column
Bof the left DataFrame appears in the resulting DataFrame.the values in the pair of column
Bs do not match exactly -[3,5,9]and[2,5,10].since the value
3does not exist in therightjoin key, the method looks for the closest value that is smaller than3(direction="backward"), which in this case is2. The corresponding value for columnCis7, so this is why we see the value7there in the first row.again, the value
9does not exist in the right join key, so the closest match smaller than9is5. The corresponding value for columnCfor this match is8, so we end up with a8at the bottom-right entry.on="B"can actually be omitted here since the method will infer the join key if there is a single pair of overlapping column labels inleftandright.
Specifying left_by
Consider the following DataFrames:
By default, without specifying by, left_by or right_by:
pd.merge_asof(df, df_other, on="B")
A B C0 7 3 71 9 5 8
We can restrict the matches by allowing those where the specified column values match up:
pd.merge_asof(df, df_other, on="B", left_by="A", right_by="C")
A B C0 7 3 7.01 9 5 NaN
Note the following:
the value
7.0is still returned for the first row because the column value forAmatched up with the column value forC(both were7).we got
NaNfor the second row because the column value forA(9) did not match up with the column value forC(8).
Specifying suffixes
Consider the following two DataFrames:
By default, suffixes=("_x","_y"), which means that if duplicate column labels arise in the resulting DataFrame, "_x" will be appended as a suffix to the overlapping label for the left, and "_y" for the right:
pd.merge_asof(df, df_other, on="B")
A_x B A_y0 2 3 71 3 5 8
We can specify our own suffixes like so:
pd.merge_asof(df, df_other, on="B", suffixes=["_X","_Y"])
A_X B A_Y0 2 3 71 3 5 8
You can pass in a single None instead of a string to keep the original name for either the left or right overlapping label:
pd.merge_asof(df_one, df_two, on="B", suffixes=["_X",None])
A_X B A0 2 3 71 3 5 82 4 9 8