Pandas DataFrame | quantile method
Start your free 7-days trial now!
Pandas DataFrame.quantile(~) method returns the interpolated value at the specified quantile.
Parameters
1. q | array-like of float
The desired quantile to compute, which must be between 0 (inclusive) and 1 (inclusive). By default, q=0.5, that is the value at the 50th percentile is computed.
2. axislink | None or int or string | optional
Whether to compute the quantile row-wise or column-wise:
Axis | Description |
|---|---|
| Compute the quantile for each column. |
| Compute the quantile for each row. |
By default, axis=0.
3. numeric_only | boolean | optional
Whether or not to compute the quantiles only for rows/columns of numeric type. If set to False, then quantiles of rows/columns with datetime and timedelta will also be computed. By default, numeric_only=True.
4. interpolationlink | string | optional
How the values are interpolated when the given percentile sits between two data-points, say i and j where i<j:
Value | Description |
|---|---|
| Standard linear interpolation |
| Returns |
| Return |
| Returns |
| Returns |
By default, interpolation="linear".
Return Value
If q is a scalar, then a Series is returned. Otherwise, a DataFrame is returned.
Examples
Consider the following DataFrame:
df
A B0 2 51 4 62 6 73 8 8
Computing percentile column-wise
To compute the 50th percentile of each column:
df.quantile() # q=0.5
A 5.0B 6.5Name: 0.5, dtype: float64
Here, the return type is Series. To interpret the output, exactly 50% of the values in column A is smaller than 5.0.
Computing percentile row-wise
To compute the 30th percentile of each row:
df.quantile(q=0.3, axis=1)
0 2.91 4.62 6.33 8.0Name: 0.3, dtype: float64
Computing multiple percentiles
To get the values at the 50th and 75th percentiles for each column:
df.quantile([0.5, 0.75]) # returns a DataFrame
A B0.50 5.0 6.500.75 6.5 7.25
Changing interpolation methods
Consider the following DataFrame:
df
A 0 21 42 63 8
linear
Consider the case when the value corresponding to the specified quantile does not exist:
df.quantile(0.5) # interpolation="linear"
A 5.0Name: 0.5, dtype: float64
Here, since the value corresponding to the 50th percentile does not exist in column A, the value was linearly interpolated between 4 and 6.
lower
df.quantile(0.5, interpolation="lower")
A 4Name: 0.5, dtype: int64
Again, since the 50% quantile does not exist, we need to perform interpolation. We know it is between the values 4 and 6. By passing in "lower", we select the lower value, that is, 4 in this case.
higher
df.quantile(0.5, interpolation="higher")
A 6Name: 0.5, dtype: int64
Same logic as "lower", but we take the upper value.
Here's the same df for your reference:
df
A 0 21 42 63 8
nearest
df.quantile(0.5, interpolation="nearest")
A 6Name: 0.5, dtype: int64
By passing in "nearest", instead of always selecting the lower or upper value, we take whichever is nearest. In this case, the 50% quantile is 5, which is coincidentally right in the middle of 4 and 6. In such cases, the upper value is selected.
midpoint
df.quantile(0.5, interpolation="midpoint")
A 5.0Name: 0.5, dtype: float64
Here, we just take the midpoint of the lower and upper value, so (4+6)/2=5.