**PySpark**

*chevron_left*

**Cookbooks**

# Counting frequency of values in PySpark DataFrame Column

*schedule*Aug 12, 2023

*toc*Table of Contents

*expand_more*

**mathematics behind data science**with 100+ top-tier guides

Start your free 7-days trial now!

Consider the following PySpark DataFrame:

```
+----+|col1|+----+| A|| A|| B|+----+
```

# Counting frequency of values using aggregation (groupBy and count)

To count the frequency of values in column `col1`

:

```
```

Here, we are first grouping by the values in `col1`

, and then for each group, we are counting the number of rows.

# Sorting PySpark DataFrame by frequency counts

The resulting PySpark DataFrame is not sorted by any particular order by default. We can sort the DataFrame by the `count`

column using the `orderBy(~)`

method:

```
```

Here, the output is similar to Pandas' `value_counts(~)`

method which returns the frequency counts in descending order.

# Assigning label to count aggregate column

Similar to what we did with the methods `groupBy(~)`

and `count()`

, we can also use the `agg(~)`

method, which takes as input an aggregate function:

```
```

This is more verbose than the solution using `groupBy(~)`

and `count()`

, but the advantage is that we can use the `alias(~)`

method to assign a name to the resulting aggregate column - here the label is `my_count`

instead of the default `count`

.