Pandas
keyboard_arrow_down 655 guides
chevron_leftData Aggregation Cookbook
Applying a function to multiple columns in groupsCalculating percentiles of a DataFrameCalculating the percentage of each value in each groupComputing descriptive statistics of each groupDifference between a group's count and sizeDifference between methods apply and transform for groupbyGetting cumulative sum of each groupGetting descriptive statistics of DataFrameGetting multiple aggregates of a column after groupingGetting n rows with smallest column value in each groupGetting number of distinct rows in each groupGetting size of each groupGetting specific group after groupbyGetting the first row of each groupGetting the last row of each groupGetting the top n rows with largest column value in each groupGetting unique values of each groupGrouping by multiple columnsGrouping without turning group column into indexMerging rows within a group togetherNaming columns after aggregationSorting values within groups
check_circle
Mark as learned thumb_up
1
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Calculating the percentage of each value in each group in Pandas
schedule Aug 12, 2023
Last updated local_offer
Tags Python●Pandas
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3,4],"B":[6,7,8],"group":["a","a","b"]})df
A B group0 2 6 a1 3 7 a2 4 8 b
To compute the percentage of each value in each distinct group
:
df.groupby("group").apply(lambda my_df: my_df / my_df.sum())
A B0 0.4 0.4615381 0.6 0.5384622 1.0 1.000000
Note the following:
the function defined in
apply(~)
is called twice in this case - once for each group.the argument (
my_df
) passed to this function is a DataFrame representing a single group.the
my_df.sum()
returns a Series containing the sum of each column ofmy_df
. In this case, for groupa
,my_df.sum()
would evaluate to a Series holding values[5,13]
.dividing
my_df
by this Series involves dividing values in columnA
by5
, and dividing values in columnB
by13
.the return type of argument function is a DataFrame.
* * *
To compute the percentage of a specific column instead of all numeric columns:
df.groupby("group").apply(lambda my_df: my_df["A"] / my_df["A"].sum())
group a 0 0.4 1 0.6b 2 1.0Name: A, dtype: float64
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!