search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas | concat method

schedule Aug 12, 2023
Last updated
local_offer
PandasPython
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas concat(~) method concatenates a list of Series or DataFrame, either horizontally or vertically.

Parameters

1. objslink | list-like or map-like of Series or DataFrame

The array-likes or DataFrames to stack horizontally or vertically.

2. axislink | int or string | optional

Whether to concatenate horizontally or vertically:

Axis

Description

0 or "index"

Concatenate horizontally.

1 or "columns"

Concatenate vertically.

By default, axis=0.

3. joinlink | string | optional

Whether to perform an inner or outer (full) join:

  • "inner": performs an inner join

  • "outer": performs an outer join

By default, join="outer".

4. ignore_indexlink | boolean | optional

If True, then the index of the resulting DataFrame will be reset to 0,1,...,n-1 where n is the number of rows of the DataFrame. By default, ignore_index=False.

5. keyslink | sequence | optional

Used to construct a hierarchical index. By default, keys=None.

6. levels | list<sequence> | optional

The levels used to construct a MultiIndex. By default, keys will be used.

7. nameslink | list<string> | optional

The labels assigned to the levels in the resulting hierarchical index. By default, names=None.

8. verify_integritylink | boolean | optional

If True, then an error will be thrown if the resulting Series/DataFrame contains duplicate index or column labels. This checking process may be computationally expensive. By default, verify_integrity=False.

9. sortlink | boolean | optional

Whether or not to sort non-concatenation axis. This is only applicable for join="outer", and not for join="inner".

10. copy | boolean | optional

Whether to return a new Series/DataFrame or reuse the provided objs if possible. By default, copy=True.

Return Value

The return type depends on the following parameters:

  • When axis=0 and concatenation is between Series, then a Series is returned.

  • When the concatenation involves at least one DataFrame, then a DataFrame is returned.

  • When axis=1, then a DataFrame is returned.

Examples

Consider the following DataFrames:

df = pd.DataFrame({"A":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"B":[8,9]})
   A  B | A B
0  2  4 | 0 6 8
1  3  5 | 1 7 9

Concatenating multiple DataFrames vertically

To concatenate multiple DataFrames vertically:

pd.concat([df, df_other])   # axis=0
   A  B
0  2  4
1  3  5
0  6  8
1  7  9

Concatenating multiple DataFrames horizontally

To concatenate multiple DataFrames horizontally, pass in axis=1 like so:

pd.concat([df, df_other], axis=1)
   A  B  A  B
0  2  4  6  8
1  3  5  7  9

Specifying join

Consider the following DataFrames:

df = pd.DataFrame({"A":[2],"B":[3]})
df_other = pd.DataFrame({"B":[4],"C":[5]})
   A  B | B C
0  2  3 | 0 4 5

Here, both the DataFrames both have column B.

Outer join

By default, join="outer", which means that all columns will appear in the resulting DataFrame, and the columns with the same label will be stacked:

pd.concat([df,df_other], join="inner")
   A    B  C
0  2.0  3  NaN
0  NaN  4  5.0

The reason why we get NaN for some entries is that, since column B is shared between the DataFrames, the values get stacked for B, but columns A and C only have a single value, so NaN must be inserted as a filler.

Inner join

To perform inner-join instead, set join="inner" like so:

pd.concat([df,df_other], join="inner")
   B
0  3
0  4

Here, only columns that appear in all the DataFrames will appear in the resulting DataFrame. Since only column B is shared between df and df_other, we only see column B in the output.

Concatenating Series

Concatenating Series works in the same as concatenating DataFrames.

To concatenate two Series vertically:

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2])         # returns a Series
0 a
1 b
0 c
1 d
dtype: object

To concatenate two Series horizontally:

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2], axis=1)   # returns a DataFrame
0 1
0 a c
1 b d

Specifying ignore_index

By default, ignore_index=False, which means the original indexes of the inputs will be preserved:

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2])
a 3
b 4
c 5
d 6
dtype: int64

To reset the index to the default integer indices:

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2], ignore_index=True)
0 3
1 4
2 5
3 6
dtype: int64

Specifying keys

To form a multi-index, specify the keys parameters:

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"])
A 0 a
1 b
B 0 c
1 d
dtype: object

To add more levels, pass a tuple like so:

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=[("A","B"),("C","D")])
A B 0 a
1 b
C D 0 c
1 d
dtype: object

Specifying names

The names parameter is used to assign a label to the index of the resulting Series/DataFrame:

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"], names=["Groups"])
Groups
A 0 a
1 b
B 0 c
1 d
dtype: object

Here, the label "Groups" is assigned to the index of the Series.

Specifying verify_integrity

By default, verify_integrity=False, which means that duplicate indexes and column labels are allowed:

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2])         # verify_integrity=False
0 a
1 b
0 c
1 d
dtype: object

Notice how we have overlapping indexes 0 and 1.

Setting verify_integrity=True will throw an error in such cases:

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], verify_integrity=True)
ValueError: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')

If you want to ensure that the resulting Series/DataFrame has a unique index, consider setting ignore_index=True.

Specifying sort

By default, sort=False, which means that the resulting column labels or indexes will not be sorted:

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other])      # axis=0
C B A D
0 2.0 4.0 NaN NaN
1 3.0 5.0 NaN NaN
0 NaN NaN 6.0 8.0
1 NaN NaN 7.0 9.0

Notice how the columns are not sorted by column labels.

When axis=0 and sort=True, the columns will be sorted by column labels:

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other], sort=True)
A B C D
0 NaN 4.0 2.0 NaN
1 NaN 5.0 3.0 NaN
0 6.0 NaN NaN 8.0
1 7.0 NaN NaN 9.0

When axis=1 and sort=True, the rows will be sorted by row labels:

df = pd.DataFrame({"C":[2,3],"B":[4,5]}, index=[3,2])
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]}, index=[1,4])
pd.concat([df, df_other], axis=1, sort=True)
C B A D
1 NaN NaN 6.0 8.0
2 3.0 5.0 NaN NaN
3 2.0 4.0 NaN NaN
4 NaN NaN 7.0 9.0
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...