search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | describe method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.describe(~) method returns a DataFrame containing some descriptive statistics (e.g. mean and min) of the columns of the source DataFrame. This is most commonly used to numerically summarise a given dataset.

Parameters

1. percentileslink | array-like of numbers | optional

The percentiles to include as part of the descriptive statistics. By default, percentiles=[0.25, 0.50, 0.75].

2. includelink | "all" or array-like of dtypes or None | optional

The columns in the source DataFrame to consider:

Value

Description

"all"

All columns of the source DataFrame will be included.

list-like of dtypes

Only columns with the data-types specified in the list will be included.

None

Only columns of numeric type will be considered.

By default, include=None.

3. exclude | list-like of dtypes or None | optional

Similar to include, but exclude specifies the column data-types to ignore. By default, exclude=None.

Return Value

A DataFrame holding the descriptive statistics of the column values in the source DataFrame.

Examples

Basic usage

Consider the following DataFrame:

df = pd.DataFrame({"name":["alex","bob","cathy"],"age":[20,30,40],"grade":[60,60,70]})
df
   name   age  grade
0  alex   20    60
1  bob    30    60
2  cathy  40    70

We can obtain some descriptive statistics using the describe(~) method:

df.describe()
       age   grade
count  3.0   3.000000
mean   30.0  63.333333
std    10.0  5.773503
min    20.0  60.000000
25%    25.0  60.000000
50%    30.0  60.000000
75%    35.0  65.000000
max    40.0  70.000000

Here, the 50% percentile represents the median.

Specifying percentiles

Instead of the 25th and 75th percentile, we can specify what percentiles to include by passing in percentiles:

df.describe(percentiles=[0.3, 0.6, 0.9])
       age   grade
count  3.0   3.000000
mean   30.0  63.333333
std    10.0  5.773503
min    20.0  60.000000
30%    26.0  60.000000
50%    30.0  60.000000
60%    32.0  62.000000
90%    38.0  68.000000
max    40.0  70.000000

Notice how the 50% percentile is still there - this is because it represents the median.

Specifying include

Consider the following DataFrame:

names = pd.Series(["alex","bob","cathy"], dtype="string")
gender = pd.Series(["male","male","female"], dtype="category")
age = pd.Series([20,30,20], dtype="int")
df = pd.DataFrame({"names":names,"gender":gender,"age":age})
df
   names  gender  age
0  alex   male    20
1  bob    male    30
2  cathy  female  20

To compute descriptive statistics of columns with type category and int only:

df.describe(include=["category",int])
       gender     age
count    3     3.000000
unique   2        NaN
top     male      NaN
freq    2         NaN
mean    NaN    23.333333
std     NaN    5.773503
min     NaN    20.000000
25%     NaN    20.000000
50%     NaN    20.000000
75%     NaN    25.000000
max     NaN    30.000000
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!