Difference between a group's count and size in Pandas
Start your free 7-days trial now!
The difference between a group's count() and size() is the following:
count()returns the number of non-nanvalues for each column. If there is more than one column, then a DataFrame is returned.size()returns the length, that is, the number of rows of a group. This method does not differentiate betweennanand non-nanvalues.
Example
Consider the following DataFrame about some products:
df = pd.DataFrame({"price":[500,300,700, 200,np.nan], "brand": ["apple", "google", "apple", "google","apple"], "device":["phone","phone","computer","phone","phone"]}, index=["a","b","c","d","e"])df
price brand devicea 500.0 apple phoneb 300.0 google phonec 700.0 apple computerd 200.0 google phonee NaN apple phone
Notice how we have a missing value (nan) for the last product.
Here's the count() of each brand group:
df.groupby("brand").count()
price devicebrand apple 2 3google 2 2
Note the following:
the return type is
DataFrame,the count for apple's price is
2, since only non-nanvalues are counted.
Now, consider the size() of each brand group:
df.groupby("brand").size()
brandapple 3google 2dtype: int64
Note the following:
the return type is
Series.the size of brand
appleis 3 since the size just counts the number of rows of each group.