Pandas dataframe transform columns to json columns using apply or vectorization

Question

Below is the given dataframe coming from sql for eg:

id	days	type	delta	A	B
1	30	X	1	0.1	0.5
1	30	Y	2	0.2	0.6
1	60	X	1	0.3	0.7
1	60	Y	2	0.4	0.8

Transforming data frame, I want to achieve something like this:

id	day_30	day_60
1	{A: {X1: 01, Y2: 0.2}, B: {X1: 0.5, Y2: 0.6}	{A: {X1: 0.3, Y2: 0.4}, B: {X1: 0.7, Y2: 0.8}

Currently I am able to achieve this by group by Id, then group by Days and do iterrows. But it's very slow for millions of rows. How can I make it faster any apply or vectorization example to do the same would be very helpfull.

current code:

grp = df.groupby('id')
row_list = []
for name, df_grp in grp:
  o_row = [id]
  day_grp = df_grp.groupby('days')
    for d_name, df_day_grp:
      a_dict = {}
      b_dict = {}
      for idx, row in df_day_grp.iterrows():
        a_dict[f'{row['type']}{row['delta']}] = row['A']
        b_dict[f'{row['type']}{row['delta']}] = row['B']
      row_dict = { 'A': a_dict, 'B': b_dict }
      o_row.append(json_dumps(row_dict))
 row_list.append(o_row)

df = pd.DataFrame(row_list, columns=cols)

I am guessing creating the new dataframe and using the iterrows could be the reason for slowness. Any ideas? How I can achieve similar by using apply or assign?

jezrael jezrael · Accepted Answer · 2021-03-04T14:06:34

First join type with delta with DataFrame.pop for use and remove column delta, then use custom lambda function per groups in GroupBy.apply for create nested dictionaries and last reshape by Series.unstack with DataFrame.add_prefix:

c = ['A','B']
df['type'] = df['type'] + df.pop('delta').astype(str)

f = lambda x:  x.set_index('type')[c].to_dict()
df = df.groupby(['id','days']).apply(f).unstack().add_prefix('day_').reset_index()
print (df)
days  id                                             day_30  \
0      1  {'A': {'X1': 0.1, 'Y2': 0.2}, 'B': {'X1': 0.5,...   

days                                             day_60  
0     {'A': {'X1': 0.3, 'Y2': 0.4}, 'B': {'X1': 0.7,...

Pandas dataframe transform columns to json columns using apply or vectorization

1 Answers