search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Pandas DataFrame | sum method

Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Basic and Descriptive Statistics
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.sum(~) method computes the sum for each row or column of the source DataFrame.

Parameters

1. axislink | int or string | optional

Whether to compute the sum row-wise or column-wise:

Axis

Description

Sum is computed for each column.

"index" or 0

Sum is computed for each row.

"columns" or 1

By default, axis=0.

2. skipnalink | boolean | optional

Whether or not to ignore missing values (NaN). By default, skipna=True.

3. level | string or int | optional

The name or the integer index of the level to consider for summation. This is relevant only if your DataFrame is Multi-index.

4. numeric_onlylink | None or boolean | optional

The allowed values are as follows:

Value

Description

True

Only numeric rows/columns will be considered (e.g. float, int, boolean).

False

Attempt computation with all types (e.g. strings and dates), and throw an error whenever the summation is invalid.

None

Attempt computation with all types, and ignore all rows/columns that do not allow for summation without raising an error.

For summation to be valid, the + operator must be well-defined between the types.

By default, numeric_only=None.

5. min_count | int | optional

The minimum number of values that must be present to perform summation. If there are fewer than min_count values (excluding NaN), then NaN will be returned. By default, no minimum is set.

Return Value

If the level parameter is specified, then a DataFrame will be returned. Otherwise, a Series will be returned.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]})
df
   A  B
0  2  4
1  3  5

Column-wise summation

To compute the sum for each column:

df.sum()   # axis=0
A 5
B 9
dtype: int64

Here, the return type is Series.

Row-wise summation

To compute the sum for each row, set axis=1:

df.sum(axis=1)
0 6
1 8
dtype: int64

Specifying skipna

Consider the following DataFrame with a missing value:

df = pd.DataFrame({"A":[2,pd.np.nan], "B":[4,5]})
df
A B
0 2.0 4
1 NaN 5

By default, skipna=True, which means that NaNs are ignored in the computation:

df.sum()
A 2.0
B 9.0
dtype: float64

Setting to skipna=False will take into account the NaNs:

df.sum(skipna=False)
A NaN
B 9.0
dtype: float64

The reason we get NaN for the sum of column A is that any arithmetic computation involving NaNs will result in NaNs.

Specifying numeric_only

Consider the following DataFrame:

df = pd.DataFrame({"A":[4,5], "B":[2,True], "C":["6",False]})
df
   A  B     C
0  4  2     "6"
1  5  True  False

Here, both columns B and C contain mixed types, but the key difference is that summation is defined for B, but not for C. Recall that the internal representation of a True boolean is 1, so the operation 2+True actually evaluates to 3:

2 + True
3

On the other hand, "6"+False throws an error:

6 + "False"
TypeError: unsupported operand type(s) for +: 'int' and 'str'

None

By default, numeric_only=None, which means that rows/columns with mixed types will also be considered:

df.sum(numeric_only=None)
A 9
B 3
dtype: int64

Here, notice how summation was performed on column B, but not on C. By passing in None, rows/columns that result in invalid summations will simply be ignored without throwing an error.

False

By setting numeric_only=False, rows/columns with mixed types will again be considered, but an error will be thrown when summation cannot be performed:

df.sum(numeric_only=False)
TypeError: can only concatenate str (not "bool") to str

Here, we end up with an error because column C contains mixed types where the + operation is not defined.

True

By setting numeric_only=True, only numeric rows/columns will be considered:

df.sum(numeric_only=True)
A 9
dtype: int64

Notice how columns B and C were ignored since they contain mixed types.

Case of empty DataFrame

Computing a sum of an empty DataFrame or Series will result in 0:

df = pd.DataFrame({"A":[]})
df.sum()
A 0.0
dtype: float64
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!