search
Search
Publish
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
Twitter
Facebook

Python | merge_asof method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
General Functions
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas merge_asof(~) method is used to perform a left join on two DataFrames where the join keys are matched not by equality but by proximity.

WARNING

Both left and right DataFrames must be sorted by the join key.

Parameters

1. leftlink | DataFrame

The left DataFrame to perform the join on.

2. rightlink | DataFrame

The right DataFrame to perform the join on.

3. onlink | string

The label of the column to join on. The label must be present in both left and right.

NOTE

The on parameter is there only for convenience. If the column to join on have different labels, then you must use a combination of left_on, right_on, left_index or right_index.

4. left_on | string or array-like

The label of the column in left to perform join on.

5. right_on | string or array-like

The label of the column in right to perform join on.

6. left_index | boolean | optional

Whether or not to perform the join on the index of the left DataFrame. By default, left_index=False.

7. right_index | boolean | optional

Whether or not to perform the join on the index of the right DataFrame. By default, right_index=False.

NOTE

Many textbooks and documentation use the words merge keys or join keys to denote the columns by which a join is performed.

8. by | string or list<string> | optional

The label of the columns that must additionally match up for the join to take effect. Just like on, by must be present in both left and right.

9. left_bylink | string | optional

The label of the column in left to perform additional matching on. See examples below for clarification.

10. right_by | string | optional

The label of the column in right to perform additional matching on. See examples below for clarification.

WARNING

If left_by is specified, then right_by must also be specified, and vice versa.

11. suffixeslink | tuple of (string, string) | optional

The suffix names to append to the duplicate column labels in the resulting DataFrame. You can also pass a single None instead of a string in suffixes to indicate that the left or right column label should be left as is. By default, suffixes=("_x", "_y").

12. tolerance | int or Timedelta | optional

The maximum acceptable difference between a pair of join keys. By default, tolerance=None.

13. allow_exact_matches | boolean | optional

Whether or not to allow exact matches between a pair of join keys. By default, allow_exact_matches=True.

14. direction | string | optional

The direction of proximity matching:

Value

Description

"backward"

Match if left join key is smaller than (or equal to) the right join key.

"forward"

Match if left join key is larger than (or equal to) the right join key,

"nearest"

Match regardless of the relative size of the two keys.

By default, direction="backward". Note that the (or equal to) part depends on allow_exact_matches.

Return Value

The merged DataFrame.

Examples

Basic usage

Consider the following two DataFrames:

df = pd.DataFrame({"A":[2,3,4],"B":[3,5,9]}, index=["a","b","c"])
df_other = pd.DataFrame({"B":[2,5,10],"C":[7,8,1]}, index=["d","e","f"])
A B | B C
a 2 3 | d 2 7
b 3 5 | e 5 8
c 4 9 | f 10 1

Performing the join on column B:

pd.merge_asof(df, df_other, on="B")
A B C
0 2 3 7
1 3 5 8
2 4 9 8

Note the following:

  • the original column B of the left DataFrame appears in the resulting DataFrame.

  • the values in the pair of column Bs do not match exactly - [3,5,9] and [2,5,10].

  • since the value 3 does not exist in the right join key, the method looks for the closest value that is smaller than 3 (direction="backward"), which in this case is 2. The corresponding value for column C is 7, so this is why we see the value 7 there in the first row.

  • again, the value 9 does not exist in the right join key, so the closest match smaller than 9 is 5. The corresponding value for column C for this match is 8, so we end up with a 8 at the bottom-right entry.

  • on="B" can actually be omitted here since the method will infer the join key if there is a single pair of overlapping column labels in left and right.

Specifying left_by

Consider the following DataFrames:

df = pd.DataFrame({"A":[7,9],"B":[3,5]}, index=["a","b"])
df_other = pd.DataFrame({"B":[2,5],"C":[7,8]}, index=["d","e"])
A B | B C
a 7 3 | d 2 7
b 9 5 | e 5 8

By default, without specifying by, left_by or right_by:

pd.merge_asof(df, df_other, on="B")
A B C
0 7 3 7
1 9 5 8

We can restrict the matches by allowing those where the specified column values match up:

pd.merge_asof(df, df_other, on="B", left_by="A", right_by="C")
A B C
0 7 3 7.0
1 9 5 NaN

Note the following:

  • the value 7.0 is still returned for the first row because the column value for A matched up with the column value for C (both were 7).

  • we got NaN for the second row because the column value for A (9) did not match up with the column value for C (8).

Specifying suffixes

Consider the following two DataFrames:

df = pd.DataFrame({"A":[2,3],"B":[3,5]}, index=["a","b"])
df_other = pd.DataFrame({"B":[2,5],"A":[7,8]}, index=["d","e"])
A B | B A
a 2 3 | d 2 7
b 3 5 | e 5 8

By default, suffixes=("_x","_y"), which means that if duplicate column labels arise in the resulting DataFrame, "_x" will be appended as a suffix to the overlapping label for the left, and "_y" for the right:

pd.merge_asof(df, df_other, on="B")
A_x B A_y
0 2 3 7
1 3 5 8

We can specify our own suffixes like so:

pd.merge_asof(df, df_other, on="B", suffixes=["_X","_Y"])
A_X B A_Y
0 2 3 7
1 3 5 8

You can pass in a single None instead of a string to keep the original name for either the left or right overlapping label:

pd.merge_asof(df_one, df_two, on="B", suffixes=["_X",None])
A_X B A
0 2 3 7
1 3 5 8
2 4 9 8
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!