What does this mean?
Why is this true?
Give me some examples!
PySpark
147 guides
# PySpark SQL Functions | count method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
PySpark SQL Functions' count(~) is an aggregate method used in conjunction with the agg(~) method to compute the number of items in each group.

# Parameters

1. col | string or Column

The column to perform the count on.

# Return Value

A new PySpark Column.

# Examples

Consider the following PySpark DataFrame:

df = spark.createDataFrame([['Alex','A'],['Bob','B'],['Cathy','A']], ['name','class'])
df.show()
+-----+-----+
| name|class|
+-----+-----+
| Alex| A|
| Bob| B|
|Cathy| A|
+-----+-----+

## Counting the number of items in each group

To count the number of rows for each class group:

import pyspark.sql.functions as F
df.groupBy('class').agg(F.count('class').alias('COUNT')).show()
+-----+-----+
|class|COUNT|
+-----+-----+
| A| 2|
| B| 1|
+-----+-----+

Here, note the following:

• we are first grouping by the class column using groupBy(~), and then for each group, we are counting how many rows there are. Technically speaking, we are counting the number of class values in each group (F.count('class')), but this is equivalent to just counting the number of rows in each group.

• we are assigning a label to the resulting aggregate column using the alias(~) method. Note that the default label assigned is 'count'.

