Im transitioning from pandas, so please excuse my non-parallelized brain. Suppose we have following pandas code:
dfx = pd.DataFrame({val:np.random.randint(1,5,100) for val in ['a','b','c','d','x','y','z']})
(
dfx
.groupby('a')
.apply(
lambda df:
df
.sort_values('c')
.groupby('d')
[['x','y','z']]
.agg(['max','mean','median'])
)
)
How to rewrite it in polars?
The core idea of the exercise is that in apply
i can do something with the whole dataframe group, e.g. sort it and then aggregate (which doesnt make sense, i know, but the idea is freedom to do whatever). Do i lose this freedom if i want my code to be parallelizable or is there a way to catch the whole group? I tried pl.all()
but couldnt figure out the trick to at least sort each sub-df