search
Search
Join our weekly DS/ML newsletter layers DS/ML Guides
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Pandas DataFrame | std method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Basic and Descriptive Statistics
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.std(~) method computes the standard deviation of each row or column of the source DataFrame. The (unbiased) standard deviation is computed using the following formula:

$$\sqrt{\frac{1}{N-1}\sum_{i=0}^{N-1}\left(x_i-\bar{x}^2\right)}$$

Where,

$N$ is the size of the row or column

$x_i$ is the value of the $i$-th index in the row or column

$\bar{x}$ is the mean of the values in the row or column.

NOTE

std(~) can also compute the population standard deviation. We do this by setting ddof=0.

Parameters

1. axislink | int or string | optional

Whether to compute the standard deviation column-wise or row-wise:

Axis

Description

Standard deviation is computed for each column.

"index" or 0

Standard deviation is computed for each row.

"columns" or 1

By default, axis=0.

2. skipna | boolean | optional

Whether or not to skip NaN. Skipped NaN would not count towards the total size ($N$). By default, skipna=True.

3. level | int | optional

The name or the integer index of the level to consider. This is needed only if your DataFrame is Multi-index.

4. ddof | int | optional

The delta degree of freedom. This can be used to modify the denominator:

$$\sqrt{\frac{1}{N\color{#4fc3f7}{-ddof}}\sum_{i=0}^{N-1}\left(x_i-\bar{x}^2\right)}$$

By default, ddof=1.

5. numeric_onlylink | None or boolean | optional

The allowed values are as follows:

Value

Description

True

Only numeric rows/columns will be considered (e.g. float, int, boolean).

False

Attempt computation with all types (e.g. strings and dates), and throw an error whenever the standard deviation cannot be computed.

None

Attempt computation with all types, and ignore all rows/columns whose standard deviation cannot be computed without raising an error.

Note that the standard deviation can only be computed when the + operator is well-defined between the types.

By default, numeric_only=None.

Return Value

If the level parameter is specified, then a DataFrame will be returned. Otherwise, a Series will be returned.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,5,7], "B":[2,5,8]})
df
   A  B
0  3  2
1  5  5
2  7  8

Column-wise standard deviation

To compute the standard deviation for each column:

df.std()   # axis=0
A 2.0
B 3.0
dtype: float64

Row-wise standard deviation

To compute the standard deviation for each row:

df.std(axis=1)
0 0.707107
1 0.000000
2 0.707107
dtype: float64

Specifying numeric_only

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,5], "B":[True,5], "C":["x",6]})
df
   A  B C
0  3  True x
1  5  5 6

Here, columns B and C are of mixed-type.

None

By default, numeric_only=None, which means that rows/columns with mixed types will also be considered:

df.std()   # numeric_only=None
A 1.414214
B 2.828427
dtype: float64

The reason why the standard deviation is still computable for column B is that, True is internally represented as a 1 in Pandas. In contrast, the standard deviation for column C cannot be computed since "x"+7 is undefined.

False

numeric_only=False means that the rows/columns of mixed type will also be considered, but an error will be raised if the standard deviation cannot be computed:

df.std(numeric_only=False)
TypeError: could not convert string to float: 'x'

True

To compute the standard deviation of numeric rows/columns only:

df.std(numeric_only=True)
A 1.414214
dtype: float64
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
Ask a question or leave a feedback...
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!