0
votes

i have a DF (df2) that is 8784 x 13 that looks like this with a "DATE" column in yyyy-mm-dd format and a "TIME" column in hours like below and I need to calculate daily and monthly averages for the year 2016:

     DATE  TIME  BAFFIN BAY  GATUN II   GATUN I  KLONDIKE IIIG  \
8759 2016-01-01  0000    8.112838  3.949518  3.291540       7.629178   
8760 2016-01-01  0100    7.977169  4.028678  3.097562       7.477159   

  KLONDIKE II  LAGOA II  LAGOA I  PENASCAL II  PENASCAL I    SABINA  \
8759     7.095450       NaN      NaN     8.250527    8.911508  3.835205   
8760     7.362562       NaN      NaN     7.877099    7.858908  3.766714   

  SIERRA QUEMADA  
8759        3.405049  
8760        4.386598  

I have tried converting the 'DATE' column to datetime to use groupby but I'm not sure how to do this. I have tried the following below but it is not grouping my data as expected for day or month averages when i test the calculation in Excel:

davg_df2 = df2.groupby(by=df2['DATE'].dt.date).mean() #
davg_df2m = df2.groupby(by=df2['DATE'].dt.month).mean() # 

Thank you as I'm still learning python and to understand working with dates and different data types!

2

2 Answers

3
votes

Try this:

df2['DATE'] = pd.to_datetime(df2['DATE'], format='%Y-%m-%d')
# monthly
davg_df2 = df2.groupby(pd.Grouper(freq='M', key='DATE')).mean()
# daily
davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DATE')).mean()
0
votes
# first convert the DATE column to datetime data type:

df2['DATE'] = pd.to_datetime(df2['DATE'])

# create new columns for month and day like so:

df2['month'] = df2['DATE'].apply(lambda t:t.month)
df2['day'] = df2['DATE'].apply(lambda t:t.day)

# then you group by day and month and get the mean like so:

davg_df2m = df2.groupby('month').mean()
davg_df2 = df2.groupby('day').mean()