2
votes

Below is the given dataframe coming from sql for eg:

id days type delta A B
1 30 X 1 0.1 0.5
1 30 Y 2 0.2 0.6
1 60 X 1 0.3 0.7
1 60 Y 2 0.4 0.8

Transforming data frame, I want to achieve something like this:

id day_30 day_60
1 {A: {X1: 01, Y2: 0.2}, B: {X1: 0.5, Y2: 0.6} {A: {X1: 0.3, Y2: 0.4}, B: {X1: 0.7, Y2: 0.8}

Currently I am able to achieve this by group by Id, then group by Days and do iterrows. But it's very slow for millions of rows. How can I make it faster any apply or vectorization example to do the same would be very helpfull.

current code:

grp = df.groupby('id')
row_list = []
for name, df_grp in grp:
  o_row = [id]
  day_grp = df_grp.groupby('days')
    for d_name, df_day_grp:
      a_dict = {}
      b_dict = {}
      for idx, row in df_day_grp.iterrows():
        a_dict[f'{row['type']}{row['delta']}] = row['A']
        b_dict[f'{row['type']}{row['delta']}] = row['B']
      row_dict = { 'A': a_dict, 'B': b_dict }
      o_row.append(json_dumps(row_dict))
 row_list.append(o_row)

df = pd.DataFrame(row_list, columns=cols)

I am guessing creating the new dataframe and using the iterrows could be the reason for slowness. Any ideas? How I can achieve similar by using apply or assign?

1

1 Answers

1
votes

First join type with delta with DataFrame.pop for use and remove column delta, then use custom lambda function per groups in GroupBy.apply for create nested dictionaries and last reshape by Series.unstack with DataFrame.add_prefix:

c = ['A','B']
df['type'] = df['type'] + df.pop('delta').astype(str)

f = lambda x:  x.set_index('type')[c].to_dict()
df = df.groupby(['id','days']).apply(f).unstack().add_prefix('day_').reset_index()
print (df)
days  id                                             day_30  \
0      1  {'A': {'X1': 0.1, 'Y2': 0.2}, 'B': {'X1': 0.5,...   

days                                             day_60  
0     {'A': {'X1': 0.3, 'Y2': 0.4}, 'B': {'X1': 0.7,...