Pandas DateTime Calculating Daily Averages

Question

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.

        BAFFIN BAY      DateTime
8759    8.112838 2016-01-01 00:00:00
8760    7.977169 2016-01-01 01:00:00
8761    8.420204 2016-01-01 02:00:00
8762    9.515370 2016-01-01 03:00:00
8763    9.222840 2016-01-01 04:00:00
8764    8.872423 2016-01-01 05:00:00
8765    8.776145 2016-01-01 06:00:00
8766    9.030668 2016-01-01 07:00:00
8767    8.394983 2016-01-01 08:00:00
8768    8.092915 2016-01-01 09:00:00
8769    8.946967 2016-01-01 10:00:00
8770    9.620883 2016-01-01 11:00:00
8771    9.535951 2016-01-01 12:00:00
8772    8.861761 2016-01-01 13:00:00
8773    9.077692 2016-01-01 14:00:00
8774    9.116074 2016-01-01 15:00:00
8775    8.724343 2016-01-01 16:00:00
8776    8.916940 2016-01-01 17:00:00
8777    8.920438 2016-01-01 18:00:00
8778    8.926278 2016-01-01 19:00:00
8779    8.817666 2016-01-01 20:00:00
8780    8.704014 2016-01-01 21:00:00
8781    8.496358 2016-01-01 22:00:00
8782    8.434297 2016-01-01 23:00:00

I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:

davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()

All of these approaches yields the same answer as shown below : BAFFIN BAY DateTime
2016-01-01 6.008044

However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.

It's possible there are other rows for '2016-01-01' somewhere else in your DataFrame. Look at df2.groupby(pd.Grouper(freq='D', key='DateTime')).size().loc['2016-01-01']. Is it more than 24? — ALollz
yeah, the answer is 192 not 24. How can i print out to screen all the rows with dates == 2016-01-01? — user2100039
You could do something like df2[df2['DateTime'].dt.normalize() == '2016-01-01']. The normalize 0s out the time part so you can use that to select all days from the same date with that mask. — ALollz
ok, the problem lies in how I'm converting the original date col which looks like this and the problem is that the format changes from yyyy-mm-dd to yyyy-dd-mm. how can i make all the data set to the yyyy-mm-dd format? My data looks like this - — user2100039
8782 2016-01-01 2300 8783 2016-02-01 0000 8784 2016-02-01 0100 — user2100039

NYC Coder NYC Coder · Accepted Answer · 2020-06-16T16:01:22

I just ran your df with this code and i get 8.813134:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)

Output:

            BAFFIN BAY
DateTime
2016-01-01    8.813134

Pandas DateTime Calculating Daily Averages

1 Answers