Pandas | concat method
Start your free 7-days trial now!
Pandas concat(~) method concatenates a list of Series or DataFrame, either horizontally or vertically.
Parameters
1. objslink | list-like or map-like of Series or DataFrame
The array-likes or DataFrames to stack horizontally or vertically.
2. axislink | int or string | optional
Whether to concatenate horizontally or vertically:
Axis | Description |
|---|---|
| Concatenate horizontally. |
| Concatenate vertically. |
By default, axis=0.
3. joinlink | string | optional
Whether to perform an inner or outer (full) join:
"inner": performs an inner join"outer": performs an outer join
By default, join="outer".
4. ignore_indexlink | boolean | optional
If True, then the index of the resulting DataFrame will be reset to 0,1,...,n-1 where n is the number of rows of the DataFrame. By default, ignore_index=False.
5. keyslink | sequence | optional
Used to construct a hierarchical index. By default, keys=None.
6. levels | list<sequence> | optional
The levels used to construct a MultiIndex. By default, keys will be used.
7. nameslink | list<string> | optional
The labels assigned to the levels in the resulting hierarchical index. By default, names=None.
8. verify_integritylink | boolean | optional
If True, then an error will be thrown if the resulting Series/DataFrame contains duplicate index or column labels. This checking process may be computationally expensive. By default, verify_integrity=False.
9. sortlink | boolean | optional
Whether or not to sort non-concatenation axis. This is only applicable for join="outer", and not for join="inner".
10. copy | boolean | optional
Whether to return a new Series/DataFrame or reuse the provided objs if possible. By default, copy=True.
Return Value
The return type depends on the following parameters:
When
axis=0and concatenation is betweenSeries, then aSeriesis returned.When the concatenation involves at least one DataFrame, then a
DataFrameis returned.When
axis=1, then a DataFrame is returned.
Examples
Consider the following DataFrames:
Concatenating multiple DataFrames vertically
To concatenate multiple DataFrames vertically:
pd.concat([df, df_other]) # axis=0
A B0 2 41 3 50 6 81 7 9
Concatenating multiple DataFrames horizontally
To concatenate multiple DataFrames horizontally, pass in axis=1 like so:
pd.concat([df, df_other], axis=1)
A B A B0 2 4 6 81 3 5 7 9
Specifying join
Consider the following DataFrames:
Here, both the DataFrames both have column B.
Outer join
By default, join="outer", which means that all columns will appear in the resulting DataFrame, and the columns with the same label will be stacked:
pd.concat([df,df_other], join="inner")
A B C0 2.0 3 NaN0 NaN 4 5.0
The reason why we get NaN for some entries is that, since column B is shared between the DataFrames, the values get stacked for B, but columns A and C only have a single value, so NaN must be inserted as a filler.
Inner join
To perform inner-join instead, set join="inner" like so:
pd.concat([df,df_other], join="inner")
B0 30 4
Here, only columns that appear in all the DataFrames will appear in the resulting DataFrame. Since only column B is shared between df and df_other, we only see column B in the output.
Concatenating Series
Concatenating Series works in the same as concatenating DataFrames.
To concatenate two Series vertically:
To concatenate two Series horizontally:
Specifying ignore_index
By default, ignore_index=False, which means the original indexes of the inputs will be preserved:
To reset the index to the default integer indices:
Specifying keys
To form a multi-index, specify the keys parameters:
To add more levels, pass a tuple like so:
Specifying names
The names parameter is used to assign a label to the index of the resulting Series/DataFrame:
Here, the label "Groups" is assigned to the index of the Series.
Specifying verify_integrity
By default, verify_integrity=False, which means that duplicate indexes and column labels are allowed:
Notice how we have overlapping indexes 0 and 1.
Setting verify_integrity=True will throw an error in such cases:
If you want to ensure that the resulting Series/DataFrame has a unique index, consider setting ignore_index=True.
Specifying sort
By default, sort=False, which means that the resulting column labels or indexes will not be sorted:
Notice how the columns are not sorted by column labels.
When axis=0 and sort=True, the columns will be sorted by column labels:
When axis=1 and sort=True, the rows will be sorted by row labels: