Below is the given dataframe coming from sql for eg:
id | days | type | delta | A | B |
---|---|---|---|---|---|
1 | 30 | X | 1 | 0.1 | 0.5 |
1 | 30 | Y | 2 | 0.2 | 0.6 |
1 | 60 | X | 1 | 0.3 | 0.7 |
1 | 60 | Y | 2 | 0.4 | 0.8 |
Transforming data frame, I want to achieve something like this:
id | day_30 | day_60 |
---|---|---|
1 | {A: {X1: 01, Y2: 0.2}, B: {X1: 0.5, Y2: 0.6} | {A: {X1: 0.3, Y2: 0.4}, B: {X1: 0.7, Y2: 0.8} |
Currently I am able to achieve this by group by Id, then group by Days and do iterrows. But it's very slow for millions of rows. How can I make it faster any apply or vectorization example to do the same would be very helpfull.
current code:
grp = df.groupby('id')
row_list = []
for name, df_grp in grp:
o_row = [id]
day_grp = df_grp.groupby('days')
for d_name, df_day_grp:
a_dict = {}
b_dict = {}
for idx, row in df_day_grp.iterrows():
a_dict[f'{row['type']}{row['delta']}] = row['A']
b_dict[f'{row['type']}{row['delta']}] = row['B']
row_dict = { 'A': a_dict, 'B': b_dict }
o_row.append(json_dumps(row_dict))
row_list.append(o_row)
df = pd.DataFrame(row_list, columns=cols)
I am guessing creating the new dataframe and using the iterrows could be the reason for slowness. Any ideas? How I can achieve similar by using apply or assign?