1
votes

I have two data frames in python. The first is raw rainfall data for a single day of year and the second is the sum of daily rainfall using group.by.

One data frame looks like this (with many more rows in between device_ids):

>>> df1
                             device_id      rain  day  month  year
0     9z849362-b05d-4317-96f5-f267c1adf8d6   0.0   31     12  2016
1     9z849362-b05d-4317-96f5-f267c1adf8d6   0.0   31     12  2016
6     e7z581f0-2693-42ad-9896-0048550ccda7   0.0   31     12  2016
11    e7z581f0-2693-42ad-9896-0048550ccda7   0.0   31     12  2016
12    ceez972b-135f-45b3-be4w-7c23102676bq   0.2   31     12  2016
13    ceez972b-135f-45b3-be4w-7c23102676bq   0.0   31     12  2016
18    ceez972b-135f-45b3-be4w-7c23102676bq   0.0   31     12  2016
19    1d28dz3a-c923-4967-a7bb-5881d232c9a7   0.0   31     12  2016
24    1d28dz3a-c923-4967-a7bb-5881d232c9a7   0.0   31     12  2016
25    a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   0.0   31     12  2016
29    a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   0.0   31     12  2016
29    a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   0.0   31     12  2016
...                                    ...   ...  ...    ...   ...
3903  9z849362-b05d-4317-96f5-f267c1adf8d6   0.0   31     12  2016
3904  9z849362-b05d-4317-96f5-f267c1adf8d6   0.0   31     12  2016
3905  9z849362-b05d-4317-96f5-f267c1adf8d6   0.0   31     12  2016

And the other looks something like this:

>>> df2
                                      rain
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7   0.0
9z849362-b05d-4317-96f5-f267c1adf8d6   0.0
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   1.2
ceez972b-135f-45b3-be4w-7c23102676bq   2.2
e7z581f0-2693-42ad-9896-0048550ccda7   0.2

... which I got by using:

df2 = df1.groupby(['device_id'])[["rain"]].sum()

I want my final data frame to look like this:

>>> df3
                                  rain      day  month  year
device_id
1d28dz3a-c923-4967-a7bb-5881d232c9a7   0.0  31   12     2016
9z849362-b05d-4317-96f5-f267c1adf8d6   0.0  31   12     2016
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   1.2  31   12     2016
ceez972b-135f-45b3-be4w-7c23102676bq   2.2  31   12     2016 
e7z581f0-2693-42ad-9896-0048550ccda7   0.2  31   12     2016

Which is to say that I want the "day month year" columns from df1 to be added to df2. I'm not sure if I should use merge, append, or do something else.

1

1 Answers

1
votes

Maybe this will work? groupby day month and year as well.

df.groupby(['device_id', 'day', 'month', 'year']).sum()
                                                     rain
device_id                            day month year      
1d28dz3a-c923-4967-a7bb-5881d232c9a7 31  12    2016   0.0
9z849362-b05d-4317-96f5-f267c1adf8d6 31  12    2016   0.0
a044ag4f-fd7c-4ae4-bff3-9158cebad3b1 31  12    2016   0.0
ceez972b-135f-45b3-be4w-7c23102676bq 31  12    2016   0.2
e7z581f0-2693-42ad-9896-0048550ccda7 31  12    2016   0.0

Or you could add reset_index to return these columns to the DataFrame like

df.groupby(['device_id', 'day', 'month', 'year']).sum().reset_index()

0  1d28dz3a-c923-4967-a7bb-5881d232c9a7   31     12  2016   0.0
1  9z849362-b05d-4317-96f5-f267c1adf8d6   31     12  2016   0.0
2  a044ag4f-fd7c-4ae4-bff3-9158cebad3b1   31     12  2016   0.0
3  ceez972b-135f-45b3-be4w-7c23102676bq   31     12  2016   0.2
4  e7z581f0-2693-42ad-9896-0048550ccda7   31     12  2016   0.0

Or the following should match your index / column structure exactly.

df.groupby(['device_id', 'day', 'month', 'year']).sum().reset_index([1, 2, 3])