0
votes

Im transitioning from pandas, so please excuse my non-parallelized brain. Suppose we have following pandas code:

dfx = pd.DataFrame({val:np.random.randint(1,5,100) for val in ['a','b','c','d','x','y','z']})
(
dfx
.groupby('a')
.apply(
    lambda df:
    df
    .sort_values('c')
    .groupby('d')
    [['x','y','z']]
    .agg(['max','mean','median'])
    )
)

How to rewrite it in polars?

The core idea of the exercise is that in apply i can do something with the whole dataframe group, e.g. sort it and then aggregate (which doesnt make sense, i know, but the idea is freedom to do whatever). Do i lose this freedom if i want my code to be parallelizable or is there a way to catch the whole group? I tried pl.all() but couldnt figure out the trick to at least sort each sub-df