I have a dataframe that has 2 experiment groups and I am trying to get percentile distributions. However, the data is already grouped:
df = pd.DataFrame({'group': ['control', 'control', 'control','treatment','treatment','treatment'],
'month': [1,4,9,2,5,12],
'ct': [8,4,2,5,5,7]})
I want to calculate which month is represents the 25th, 50th, 75th percentile of each group, but the dataframe is already grouped on group/month variables.
Update 1: I realize I didn't clarify the trouble I am running into. This is a grouped dataframe, so control, for example, has 8 data points where month = 1, 4 where month = 4, and 2 where month = 9. The following percentile values should be:
x = pd.Series([1,1,1,1,1,1,1,1,4,4,4,4,9,9)]
x.quantile([0.25,0.5,0.75])
>> 0.25 1.0
0.50 1.0
0.75 4.0
dtype: float64
Grouping by group and taking quantiles doesn't provide an accurate answer. Is there a way to explode out the counts and take the percentiles of the ungrouped values? Final object should have these values:
p25 p50 p75
control 1 1 4
treatment 2 5 12