Pandas DataFrame | sum method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
Pandas `DataFrame.sum(~)` method computes the sum for each row or column of the source DataFrame.

Parameters

1. `axis`link | `int` or `string` | `optional`

Whether to compute the sum row-wise or column-wise:

Axis

Description

`"index"` or `0`

Sum is computed for each column.

`"columns"` or `1`

Sum is computed for each row.

By default, `axis=0`.

2. `skipna`link | `boolean` | `optional`

Whether or not to ignore missing values (`NaN`). By default, `skipna=True`.

3. `level` | `string` or `int` | `optional`

The name or the integer index of the level to consider for summation. This is relevant only if your DataFrame is Multi-index.

4. `numeric_only`link | `None` or `boolean` | `optional`

The allowed values are as follows:

Value

Description

`True`

Only numeric rows/columns will be considered (e.g. `float`, `int`, `boolean`).

`False`

Attempt computation with all types (e.g. strings and dates), and throw an error whenever the summation is invalid.

`None`

Attempt computation with all types, and ignore all rows/columns that do not allow for summation without raising an error.

For summation to be valid, the `+` operator must be well-defined between the types.

By default, `numeric_only=None`.

5. `min_count` | `int` | `optional`

The minimum number of values that must be present to perform summation. If there are fewer than `min_count` values (excluding `NaN`), then `NaN` will be returned. By default, no minimum is set.

Return Value

If the `level` parameter is specified, then a `DataFrame` will be returned. Otherwise, a `Series` will be returned.

Examples

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[2,3], "B":[4,5]})df    A  B0  2  41  3  5 ```

Column-wise summation

To compute the sum for each column:

``` df.sum()   # axis=0 A 5B 9dtype: int64 ```

Here, the return type is `Series`.

Row-wise summation

To compute the sum for each row, set `axis=1`:

``` df.sum(axis=1) 0 61 8dtype: int64 ```

Specifying skipna

Consider the following DataFrame with a missing value:

``` df = pd.DataFrame({"A":[2,pd.np.nan], "B":[4,5]})df A B0 2.0 41 NaN 5 ```

By default, `skipna=True`, which means that `NaN`s are ignored in the computation:

``` df.sum() A 2.0B 9.0dtype: float64 ```

Setting to `skipna=False` will take into account the `NaN`s:

``` df.sum(skipna=False) A NaNB 9.0dtype: float64 ```

The reason we get `NaN` for the sum of column `A` is that any arithmetic computation involving `NaN`s will result in `NaN`s.

Specifying numeric_only

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[4,5], "B":[2,True], "C":["6",False]})df    A  B     C0  4  2     "6"1  5  True  False ```

Here, both columns `B` and `C` contain mixed types, but the key difference is that summation is defined for `B`, but not for `C`. Recall that the internal representation of a `True` boolean is `1`, so the operation `2+True` actually evaluates to `3`:

``` 2 + True 3 ```

On the other hand, `"6"+False` throws an error:

``` 6 + "False" TypeError: unsupported operand type(s) for +: 'int' and 'str' ```

None

By default, `numeric_only=None`, which means that rows/columns with mixed types will also be considered:

``` df.sum(numeric_only=None) A 9B 3dtype: int64 ```

Here, notice how summation was performed on column `B`, but not on `C`. By passing in `None`, rows/columns that result in invalid summations will simply be ignored without throwing an error.

False

By setting `numeric_only=False`, rows/columns with mixed types will again be considered, but an error will be thrown when summation cannot be performed:

``` df.sum(numeric_only=False) TypeError: can only concatenate str (not "bool") to str ```

Here, we end up with an error because column `C` contains mixed types where the `+` operation is not defined.

True

By setting `numeric_only=True`, only numeric rows/columns will be considered:

``` df.sum(numeric_only=True) A 9dtype: int64 ```

Notice how columns `B` and `C` were ignored since they contain mixed types.

Case of empty DataFrame

Computing a sum of an empty DataFrame or Series will result in `0`:

``` df = pd.DataFrame({"A":[]})df.sum() A 0.0dtype: float64 ```
