PySpark SQL Functions | count method
PySpark SQL Functions'
count(~) is an aggregate method used in conjunction with the
agg(~) method to compute the number of items in each group.
The column to perform the count on.
A new PySpark Column.
Consider the following PySpark DataFrame:
Counting the number of items in each group
To count the number of rows for each
Here, note the following:
we are first grouping by the
groupBy(~), and then for each group, we are counting how many rows there are. Technically speaking, we are counting the number of
classvalues in each group (
F.count('class')), but this is equivalent to just counting the number of rows in each group.
we are assigning a label to the resulting aggregate column using the
alias(~)method. Note that the default label assigned is
countDistinct(~)method returns the distinct number of rows for the specified columns.