3
votes

I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this:

     YR MO DA HR MN TEMP
  1943  6 19 10  0   73
  1943  6 19 11  0   72
  1943  6 19 12  0   76
  1943  6 19 13  0   78
  1943  6 19 14  0   81
  1943  6 19 15  0   85
  1943  6 19 16  0   85
  1943  6 19 17  0   86
  1943  6 19 18  0   86
  1943  6 19 19  0   87

etc for 600,000+ data points.

How can I run a nested function to calculate daily average temperature so i preserve the YR, MO, DA, TEMP? Once I have this, I want to be able to look at long term averages & calculate say the average temperature for the Month of January across 30 years. How do I do this?

3
Two warnings: be aware to remove incomplete days (or interpolate them) and that simple mean over all hours is not what meteo people usually consider average temperature -- there are some stupid standards like temperature from 9:00 with weight 0.4 plus temperature from 13:00 with 0.6. - mbq
thanks for the heads up! right now this is just for a course project & will not be used for publication. i will look into that though for the future. - user2113985

3 Answers

10
votes

In one step you could do this:

 meanTbl <- with(datfrm, tapply(TEMP, ISOdate(YR, MO, DA), mean) )

This gives you a date-time formatted index as well as the values. If you wanted just the Date as character without the trailing time:

meanTbl <- with(dat, tapply(TEMP, as.Date(ISOdate(YR, MO, DA)), mean) )

The monthly averages could be done with:

 monMeans <- with(meanTbl, tapply(TEMP, MO, mean))
6
votes

You can do it with aggregate:

# daily means
aggregate(TEMP ~ YR + MO + DA, FUN=mean, data=data) 

# monthly means 
aggregate(TEMP ~ YR + MO, FUN=mean, data=data)

# yearly means
aggregate(TEMP ~ YR, FUN=mean, data=data)

# monthly means independent of year
aggregate(TEMP ~ MO, FUN=mean, data=data)
2
votes

Your first question can be achieved using the plyr package:

library(plyr)
daily_mean = ddply(df, .(YR, MO, DA), summarise, mean_temp = mean(TEMP))

In analogy to the above solution, to get monthly means:

monthly_mean = ddply(df, .(YR, MO), summarise, mean_temp = mean(temp))

or to get monthly averages over the whole dataset (30 years, aka normals in climate), not per year:

monthly_mean_normals = ddply(df, .(MO), summarise, mean_temp = mean(temp))