search
Search
Unlock 100+ guides
search toc
close
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
Doc Search
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Shrink
Navigate to
Pandas
655 guides
keyboard_arrow_down
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

# Pandas DataFrame | quantile method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas `DataFrame.quantile(~)` method returns the interpolated value at the specified quantile.

# Parameters

1. `q` | `array-like` of `float`

The desired quantile to compute, which must be between 0 (inclusive) and 1 (inclusive). By default, `q=0.5`, that is the value at the 50th percentile is computed.

2. `axis`link | `None` or `int` or `string` | `optional`

Whether to compute the quantile row-wise or column-wise:

Axis

Description

`0` or `"index"`

Compute the quantile for each column.

`1` or `"columns"`

Compute the quantile for each row.

By default, `axis=0`.

3. `numeric_only` | `boolean` | `optional`

Whether or not to compute the quantiles only for rows/columns of numeric type. If set to `False`, then quantiles of rows/columns with `datetime` and `timedelta` will also be computed. By default, `numeric_only=True`.

4. `interpolation`link | `string` | `optional`

How the values are interpolated when the given percentile sits between two data-points, say `i` and `j` where `i<j`:

Value

Description

`"linear"`

Standard linear interpolation

`"lower"`

Returns `i`

`"higher"`

Return `j`

`"midpoint"`

Returns `(i+j)/2`

`"nearest"`

Returns `i` or `j`, whichever is closer

By default, `interpolation="linear"`.

# Return Value

If `q` is a scalar, then a `Series` is returned. Otherwise, a `DataFrame` is returned.

# Examples

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[2,4,6,8],"B":[5,6,7,8]})df    A  B0  2  51  4  62  6  73  8  8 ```

## Computing percentile column-wise

To compute the 50th percentile of each column:

``` df.quantile()   # q=0.5 A 5.0B 6.5Name: 0.5, dtype: float64 ```

Here, the return type is `Series`. To interpret the output, exactly 50% of the values in column `A` is smaller than `5.0`.

## Computing percentile row-wise

To compute the 30th percentile of each row:

``` df.quantile(q=0.3, axis=1) 0 2.91 4.62 6.33 8.0Name: 0.3, dtype: float64 ```

## Computing multiple percentiles

To get the values at the 50th and 75th percentiles for each column:

``` df.quantile([0.5, 0.75])   # returns a DataFrame A B0.50 5.0 6.500.75 6.5 7.25 ```

## Changing interpolation methods

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[2,4,6,8]})df    A  0  21  42  63  8 ```

### linear

Consider the case when the value corresponding to the specified quantile does not exist:

``` df.quantile(0.5)   # interpolation="linear" A 5.0Name: 0.5, dtype: float64 ```

Here, since the value corresponding to the 50th percentile does not exist in column `A`, the value was linearly interpolated between 4 and 6.

### lower

``` df.quantile(0.5, interpolation="lower") A 4Name: 0.5, dtype: int64 ```

Again, since the 50% quantile does not exist, we need to perform interpolation. We know it is between the values 4 and 6. By passing in `"lower"`, we select the lower value, that is, 4 in this case.

### higher

``` df.quantile(0.5, interpolation="higher") A 6Name: 0.5, dtype: int64 ```

Same logic as `"lower"`, but we take the upper value.

Here's the same `df` for your reference:

``` df    A  0  21  42  63  8 ```

### nearest

``` df.quantile(0.5, interpolation="nearest") A 6Name: 0.5, dtype: int64 ```

By passing in `"nearest"`, instead of always selecting the lower or upper value, we take whichever is nearest. In this case, the 50% quantile is 5, which is coincidentally right in the middle of 4 and 6. In such cases, the upper value is selected.

### midpoint

``` df.quantile(0.5, interpolation="midpoint") A 5.0Name: 0.5, dtype: float64 ```

Here, we just take the midpoint of the lower and upper value, so `(4+6)/2=5`.

Edited by 0 others
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!