What does this mean?
Why is this true?
Give me some examples!
# PySpark DataFrame | summary method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
mode_heat
PySpark DataFrame's `summary(~)` method returns a PySpark DataFrame containing basic summary statistics of numeric columns.

# Parameters

1. `statistics` | `string` | `optional`

The statistic to compute. The following are available:

• count

• mean

• stddev

• min

• max

• arbitrary percentiles (e.g. `"60%"`)

By default, all the above as well as the 25%, 50%, and 75% percentiles are computed.

# Return Value

PySpark DataFrame (`pyspark.sql.dataframe.DataFrame`).

# Examples

Consider the following PySpark DataFrame:

``` df = spark.createDataFrame([["Alex", 20], ["Bob", 24], ["Cathy", 22], ["Doge", 30]], ["name", "age"])df.show() +-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 24||Cathy| 22|| Doge| 30|+-----+---+ ```

## Getting the summary statistics of numeric columns of PySpark DataFrame

The summary statistics of our DataFrame is as follows:

``` df.summary().show() +-------+----+-----------------+|summary|name| age|+-------+----+-----------------+| count| 4| 4|| mean|null| 24.0|| stddev|null|4.320493798938574|| min|Alex| 20|| 25%|null| 20|| 50%|null| 22|| 75%|null| 24|| max|Doge| 30|+-------+----+-----------------+ ```

To compute certain summary statistics only:

``` df.summary("max", "min").show() +-------+----+---+|summary|name|age|+-------+----+---+| max|Doge| 30|| min|Alex| 20|+-------+----+---+ ```

## Getting n-th percentile of numeric columns in PySpark DataFrame

To compute the 60th percentile:

``` df.summary("60%").show() +-------+----+---+|summary|name|age|+-------+----+---+| 60%|null| 24|+-------+----+---+ ```

## Getting summary statistics of certain columns in PySpark DataFrame

To summarise certain columns instead, use the `select(~)` method first to select the columns that you want to summarize:

``` df.select("age").summary("max", "min").show() +-------+---+|summary|age|+-------+---+| max| 30|| min| 20|+-------+---+ ```
