1
votes

I am stuck in R. I need to aggregate every minute data into every 30 minutes. My data frame is like this:

    Date        Time       Power
    2008-10-11  23:57:00   0.376
    2008-10-11  23:58:00   0.374
    2008-10-11  23:59:00   0.374
    2008-10-12  0:00:00    0.334
    2008-10-12  0:01:00    0.330
...
    2008-12-13  19:24:00   1.390
    2008-12-13  19:25:00   1.370
    2008-12-13  19:26:00   1.368
    2008-12-13  19:27:00   1.362
    2008-12-13  19:28:00   1.352
    2008-12-13  19:29:00   1.360

...

Basically, I have 500,979 rows of data for 2008. Every minute gives a power value. In order to do Time Series, I need to aggregate my data per 30 minutes. That would give me around 17,520 rows in the end. My questions are:

  1. Is it possible to plot time series when you have some missing values for the Power column?

  2. How to aggregate the Power value by 30 minutes please? (e.g., 0:00:00 to 0:29:00; 0:30:00 to 0:59:00)

Thank you very much!

2

2 Answers

1
votes

With magrittr and dplyr, you can generate a new column to hold a grouping variable giving with time unit the measurement is in, then summarise to calculate whatever aggregate metric you want. (Note %<>% saves the results of the pipes back into the original variable).

I am not sure what format your time is in already, when I read it in it is just a character, so converting it to numeric hours takes a bit of doing. If it is already in some other format for you, that step would likely be easier.

df %<>%
  mutate(cleanTime =
           strsplit(Time, ":") %>%
           sapply(function(x){
             x <- as.numeric(x)
             x[1] + x[2]/60 + x[3]/(60*60)
           })
         , roundTime = floor(cleanTime * 2)/2
         )

Then, group by the generated variable, and take whatever aggregate stats you want:

df %>%
  group_by(Date, roundTime) %>%
  summarise(avgPower = mean(Power)
            , sumPower = sum(Power))

gives:

        Date roundTime  avgPower sumPower
      <date>     <dbl>     <dbl>    <dbl>
1 2008-10-11      23.5 0.3746667    1.124
2 2008-10-12       0.0 0.3320000    0.664
3 2008-12-13      19.0 1.3670000    8.202
1
votes

With dplyr you can do the next (you can change mean to your aggregation function):

df %>%
  mutate(DateTime = as.POSIXct(paste(Date, Time))) %>%
  group_by(DateTime = cut(DateTime, breaks="30 min")) %>%
  summarize(Power = mean(Power))

Will result in the next output:

             DateTime  Power
               <fctr>  <dbl>
1 2008-10-11 23:57:00 0.3576
2 2008-12-13 18:57:00 1.3760
3 2008-12-13 19:27:00 1.3580