aggregate function produces daily instead of hourly mean

Question

I have a data.frame with 15 minute time steps in the first column and 16 more columns full of data. I want to get the hourly mean for each column. I am using aggregate and it works perfectly fine for 1 min data.

mydata <- list()
for(j in colnames(data_frame)){
  data_mean <- aggregate(data_frame[j], 
                        list(hour=cut(as.POSIXct(data_frame$TIME), "hour")),
                        mean, na.rm=TRUE)
  mydata[[j]] <- data_mean
}

When I use this same setup for a 15 min data set it gives me the daily mean instead of the hourly mean. Any idea why?

My data looks like this for 1 min data:

"TIME","Tair","RH"
2016-01-01 00:01:00,5.9,82
2016-01-01 00:02:00,5.9,82
2016-01-01 00:03:00,5.9,82
2016-01-01 00:04:00,5.89,82
2016-01-01 00:05:00,5.8,82
2016-01-01 00:06:00,5.8,82
2016-01-01 00:07:00,5.8,82
2016-01-01 00:08:00,5.8,82
2016-01-01 00:09:00,5.8,82
2016-01-01 00:10:00,5.8,82
2016-01-01 00:11:00,5.8,82
2016-01-01 00:12:00,5.8,82
2016-01-01 00:13:00,5.8,82
2016-01-01 00:14:00,5.8,82
2016-01-01 00:15:00,5.8,82
2016-01-01 00:16:00,5.8,82
2016-01-01 00:17:00,5.8,82
2016-01-01 00:18:00,5.8,82
2016-01-01 00:19:00,5.8,82
2016-01-01 00:20:00,5.8,82
2016-01-01 00:21:00,5.75,82
2016-01-01 00:22:00,5.78,82
2016-01-01 00:23:00,5.78,83
2016-01-01 00:24:00,5.8,82
2016-01-01 00:25:00,5.73,82
2016-01-01 00:26:00,5.7,82
2016-01-01 00:27:00,5.7,82
2016-01-01 00:28:00,5.7,82
2016-01-01 00:29:00,5.7,82
2016-01-01 00:30:00,5.7,82
2016-01-01 00:31:00,5.7,83
2016-01-01 00:32:00,5.76,83
2016-01-01 00:33:00,5.8,83
2016-01-01 00:34:00,5.8,82
2016-01-01 00:35:00,5.8,82
2016-01-01 00:36:00,5.8,83
2016-01-01 00:37:00,5.79,83
2016-01-01 00:38:00,5.7,82

And for 15 min data:

"TIME","Tair","RH"
2016-01-01 00:15:00,6.228442,80.40858
2016-01-01 00:30:00,6.121088,81.00000
2016-01-01 00:45:00,6.075000,NA
2016-01-01 01:00:00,5.951910,NA
2016-01-01 01:15:00,5.844144,NA
2016-01-01 01:30:00,5.802242,NA
2016-01-01 01:45:00,5.747619,NA
2016-01-01 02:00:00,5.742889,NA
2016-01-01 02:15:00,5.752584,81.12135
2016-01-01 02:30:00,5.677753,81.00000
2016-01-01 02:45:00,5.500224,81.61435
2016-01-01 03:00:00,5.225282,82.29797
2016-01-01 03:15:00,5.266441,83.00000
2016-01-01 03:30:00,5.200448,83.32584
2016-01-01 03:45:00,5.098876,84.00000
2016-01-01 04:00:00,5.081061,83.76894
2016-01-01 04:15:00,5.230769,82.88664
2016-01-01 04:30:00,5.300000,82.06742
2016-01-01 04:45:00,5.300000,NA
2016-01-01 05:00:00,5.399776,NA

bgoldst bgoldst · Accepted Answer · 2016-06-09T10:35:59

Your code works for me.

However, your loop is slightly wasteful in that it repeatedly computes the cut of the TIME column for every column of the data.frame. You could precompute it, but there's a better solution.

You can produce the same result but in a simpler, more conventional, and more useful form with a single call to aggregate():

aggregate(df1[names(df1)!='TIME'],list(hour=cut(df1$TIME,'hour')),mean,na.rm=T);
##         hour     Tair       RH
## 1 2016-01-01 5.786316 82.15789
aggregate(df15[names(df15)!='TIME'],list(hour=cut(df15$TIME,'hour')),mean,na.rm=T);
##                  hour     Tair       RH
## 1 2016-01-01 00:00:00 6.141510 80.70429
## 2 2016-01-01 01:00:00 5.836479      NaN
## 3 2016-01-01 02:00:00 5.668362 81.24523
## 4 2016-01-01 03:00:00 5.197762 83.15595
## 5 2016-01-01 04:00:00 5.227957 82.90767
## 6 2016-01-01 05:00:00 5.399776      NaN

Data

df1 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:01:00','2016-01-01 00:02:00',
'2016-01-01 00:03:00','2016-01-01 00:04:00','2016-01-01 00:05:00','2016-01-01 00:06:00',
'2016-01-01 00:07:00','2016-01-01 00:08:00','2016-01-01 00:09:00','2016-01-01 00:10:00',
'2016-01-01 00:11:00','2016-01-01 00:12:00','2016-01-01 00:13:00','2016-01-01 00:14:00',
'2016-01-01 00:15:00','2016-01-01 00:16:00','2016-01-01 00:17:00','2016-01-01 00:18:00',
'2016-01-01 00:19:00','2016-01-01 00:20:00','2016-01-01 00:21:00','2016-01-01 00:22:00',
'2016-01-01 00:23:00','2016-01-01 00:24:00','2016-01-01 00:25:00','2016-01-01 00:26:00',
'2016-01-01 00:27:00','2016-01-01 00:28:00','2016-01-01 00:29:00','2016-01-01 00:30:00',
'2016-01-01 00:31:00','2016-01-01 00:32:00','2016-01-01 00:33:00','2016-01-01 00:34:00',
'2016-01-01 00:35:00','2016-01-01 00:36:00','2016-01-01 00:37:00','2016-01-01 00:38:00')),
Tair=c(5.9,5.9,5.9,5.89,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.75,
5.78,5.78,5.8,5.73,5.7,5.7,5.7,5.7,5.7,5.7,5.76,5.8,5.8,5.8,5.8,5.79,5.7),RH=c(82L,82L,82L,
82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,83L,82L,82L,82L,
82L,82L,82L,82L,83L,83L,83L,82L,82L,83L,83L,82L));

df15 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:15:00','2016-01-01 00:30:00',
'2016-01-01 00:45:00','2016-01-01 01:00:00','2016-01-01 01:15:00','2016-01-01 01:30:00',
'2016-01-01 01:45:00','2016-01-01 02:00:00','2016-01-01 02:15:00','2016-01-01 02:30:00',
'2016-01-01 02:45:00','2016-01-01 03:00:00','2016-01-01 03:15:00','2016-01-01 03:30:00',
'2016-01-01 03:45:00','2016-01-01 04:00:00','2016-01-01 04:15:00','2016-01-01 04:30:00',
'2016-01-01 04:45:00','2016-01-01 05:00:00')),Tair=c(6.228442,6.121088,6.075,5.95191,
5.844144,5.802242,5.747619,5.742889,5.752584,5.677753,5.500224,5.225282,5.266441,5.200448,
5.098876,5.081061,5.230769,5.3,5.3,5.399776),RH=c(80.40858,81,NA,NA,NA,NA,NA,NA,81.12135,81,
81.61435,82.29797,83,83.32584,84,83.76894,82.88664,82.06742,NA,NA));

aggregate function produces daily instead of hourly mean

1 Answers