2
votes

I'm having trouble when trying to calculate the average temperature by 10 minutes but within 24 hours vector.

I have a time series stored in dataframe in a proper POSIX format. The only problem is that data is irregularly spaced (10 +-3 min).

I know how to average them in hours, days, months years but I need to get average for a profile of 24, 72 or 168 hours.

For example, for 168 hours profile I want to average every Monday in observation period at 00:00:00, then at 00:10:00, 00:20:00 etc., then every Tuesday, Wednesday etc.

So my data averages must fit into a regular 24/72/168 vector.

For 24 hours vector will be defined like this:

seq(ISOdatetime(2001,2,3,0,0,0), ISOdatetime(2001,2,4,0,0,0), by=(60*5))

Resulting in a regular 24 hours vector source of this solution here

  [1] "2001-02-03 00:00:00 PST" "2001-02-03 00:05:00 PST"
  [3] "2001-02-03 00:10:00 PST" "2001-02-03 00:15:00 PST"
  [5] "2001-02-03 00:20:00 PST" "2001-02-03 00:25:00 PST"
  [7] "2001-02-03 00:30:00 PST" "2001-02-03 00:35:00 PST"
  [9] "2001-02-03 00:40:00 PST" "2001-02-03 00:45:00 PST"

Problem is that timestamp of my data is shifting with each day as you can see from sample below. On 2016-09-01 first reading (supposed to be at 00:00:00) is at 00:01:00, next day it is 00:04:00, next day 00:07:00 and so on.

I tried xts and zoo with no success as the limit of aggregation is hour and I need to define it in minutes.

The multiple answers I have found deal with averaging timeseries through continuous set of data (example1, example2.

Unfortunately I couldn't find an answer regarding my problem.

Structure of sample database:

'data.frame':   9490 obs. of  2 variables:
 $ Date_Time_Stamp : POSIXct, format: "2016-09-01 00:01:00" "2016-09-01 00:11:00" "2016-09-01 00:22:00" "2016-09-01 00:32:00" ...
 $ Signal_Raw_Value: num  778 694 592 523 567 ...

My data look like this (head)

Date_Time_Stamp Signal_Raw_Value
1 2016-09-01 00:01:00           777.51
2 2016-09-01 00:11:00           694.38
3 2016-09-01 00:22:00           591.69
4 2016-09-01 00:32:00           523.23
5 2016-09-01 00:42:00           567.24
6 2016-09-01 00:52:00           547.68

Tail:

Date_Time_Stamp Signal_Raw_Value
9485 2016-11-06 23:02:00           660.15
9486 2016-11-06 23:12:00           635.70
9487 2016-11-06 23:22:00           498.78
9488 2016-11-06 23:32:00           415.65
9489 2016-11-06 23:42:00           425.43
9490 2016-11-06 23:53:00           440.10

First hour of 2016-09-01

Date_Time_Stamp Signal_Raw_Value
1    2016-09-01 00:01:00           777.51
2    2016-09-01 00:11:00           694.38
3    2016-09-01 00:22:00           591.69
4    2016-09-01 00:32:00           523.23
5    2016-09-01 00:42:00           567.24
6    2016-09-01 00:52:00           547.68
7    2016-09-01 01:02:00           562.35

First hour of next day (2016-09-02)

143  2016-09-02 00:04:00           557.46
144  2016-09-02 00:14:00           557.46
145  2016-09-02 00:24:00           562.35
146  2016-09-02 00:35:00           552.57
147  2016-09-02 00:45:00           503.67
148  2016-09-02 00:55:00           484.11
149  2016-09-02 01:05:00           454.77

First hour of third day in row (2016-09-03)

285  2016-09-03 00:07:00           655.26
286  2016-09-03 00:17:00           537.90
287  2016-09-03 00:27:00           464.55
288  2016-09-03 00:38:00           454.77
289  2016-09-03 00:48:00           425.43
290  2016-09-03 00:58:00           420.54
291  2016-09-03 01:08:00           400.98

And a fourth day:

426  2016-09-04 00:00:00           865.53
427  2016-09-04 00:10:00           723.72
428  2016-09-04 00:20:00           621.03
429  2016-09-04 00:30:00           562.35
430  2016-09-04 00:40:00           493.89
431  2016-09-04 00:51:00           459.66
432  2016-09-04 01:01:00           435.21

After processing raw value of signal I need to produce something like this: 24 hrs profile and 168 hrs: 168 hrs profile.

Thank you!

1
One way of solving this problem is to use the seq function to create an array of 10 minute intervals from the start of your data until the end. Use the cut function with your newly created array as the break parameter. Now you can aggregate but the newly defined cut valuesDave2e
I have a partial success with following (elegant?) solution: aggregate (qxts, format (index (qxts), "%H"), mean) This produces hourly averages for every hour. I'm stuck with changing interval to 10 or 15 minutes. Any ideas how to move on?Patryk

1 Answers

1
votes

Since you are still struggling with this problem try this:

#Create sample data
 #create a sequence of random times (about 10 minutes apart) 
 rtime <-as.integer(rnorm(1000, 10, 2))
 Date_Time_Stamp<- as.POSIXct("2016-09-01")
 Date_Time_Stamp<-Date_Time_Stamp+cumsum(rtime)*60
 Signal_Raw_Value <- rnorm(1000, 600, 20)
 df<-data.frame(Date_Time_Stamp, Signal_Raw_Value)
#End of sample data creation

#Calclated the number of minutes since midnight
df$minutes<-as.integer(format(df$Date_Time_Stamp, "%H"))*60 + as.integer(format(df$Date_Time_Stamp, "%M"))
#break into 144 intervals per day
df$mybreaks<-cut(df$minutes, breaks = seq(0, 1440, 10), include.lowest = TRUE)

#Using dplyr 
library(dplyr)
#find mean of each group
summarise( group_by(df, mybreaks), mean(Signal_Raw_Value))
#find number of elements in each grouping
summarise( group_by(df, mybreaks), n())

Your problem statement was not very clear. Here is a solution which will break the day into 144 10-minutes (1440 minutes/day) segments and averages the data across the entire dataset into those 144 intervals.