0
votes

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.

        BAFFIN BAY      DateTime
8759    8.112838 2016-01-01 00:00:00
8760    7.977169 2016-01-01 01:00:00
8761    8.420204 2016-01-01 02:00:00
8762    9.515370 2016-01-01 03:00:00
8763    9.222840 2016-01-01 04:00:00
8764    8.872423 2016-01-01 05:00:00
8765    8.776145 2016-01-01 06:00:00
8766    9.030668 2016-01-01 07:00:00
8767    8.394983 2016-01-01 08:00:00
8768    8.092915 2016-01-01 09:00:00
8769    8.946967 2016-01-01 10:00:00
8770    9.620883 2016-01-01 11:00:00
8771    9.535951 2016-01-01 12:00:00
8772    8.861761 2016-01-01 13:00:00
8773    9.077692 2016-01-01 14:00:00
8774    9.116074 2016-01-01 15:00:00
8775    8.724343 2016-01-01 16:00:00
8776    8.916940 2016-01-01 17:00:00
8777    8.920438 2016-01-01 18:00:00
8778    8.926278 2016-01-01 19:00:00
8779    8.817666 2016-01-01 20:00:00
8780    8.704014 2016-01-01 21:00:00
8781    8.496358 2016-01-01 22:00:00
8782    8.434297 2016-01-01 23:00:00

I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:

davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()

All of these approaches yields the same answer as shown below : BAFFIN BAY DateTime
2016-01-01 6.008044

However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.

1
It's possible there are other rows for '2016-01-01' somewhere else in your DataFrame. Look at df2.groupby(pd.Grouper(freq='D', key='DateTime')).size().loc['2016-01-01']. Is it more than 24?ALollz
yeah, the answer is 192 not 24. How can i print out to screen all the rows with dates == 2016-01-01?user2100039
You could do something like df2[df2['DateTime'].dt.normalize() == '2016-01-01']. The normalize 0s out the time part so you can use that to select all days from the same date with that mask.ALollz
ok, the problem lies in how I'm converting the original date col which looks like this and the problem is that the format changes from yyyy-mm-dd to yyyy-dd-mm. how can i make all the data set to the yyyy-mm-dd format? My data looks like this -user2100039
8782 2016-01-01 2300 8783 2016-02-01 0000 8784 2016-02-01 0100user2100039

1 Answers

0
votes

I just ran your df with this code and i get 8.813134:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)

Output:

            BAFFIN BAY
DateTime
2016-01-01    8.813134