0
votes

I have a list of data frames, with each data frame corresponding to a date and the rows of each data frame corresponding to hourly periods in the day. I need to apply a rolling function over equivalent time stamps for every day.

For example, for a rolling 5 period window I would want to a apply a function to every 01:00:00, 02:00:00, and so on, time stamp across the first 5 days rolling until end = length(list)-5, with each function application resulting in one value. The result would be a vector of values for each time stamp of length end.

I am considering creating a new list of data frames where each data frame will correspond to a time stamp and the rows correspond to days, in which case I want to know the best way to convert my current list of data frames to this new format. However, if there are other more efficient ways I am open to other suggestions.

Edit: I've tried to include an example data set and result below, but I don't know how to format the data properly

If, for example, a list contains 4 data frames with the following format. With Col1 being one variable and Col2 another variable

Date    Time    Col1    Col2
1   2014-08-12  09:00:00    -0.0001079389   0.0002159128
2   2014-08-12  09:30:00    -0.0002158895   0.0002159361
3   2014-08-12  10:00:00    -0.0001079739   0.0008642109
4   2014-08-12  10:30:00    -0.0003241666   0.0003242717
5   2014-08-12  11:00:00    -0.0006482282   0.0005405113
6   2014-08-12  11:30:00    -0.0010794474   0.0002160294
7   2014-08-12  12:00:00    -0.0002157963   0.0002158429
8   2014-08-12  12:30:00    -0.0001079040   0.0009716600
9   2014-08-12  13:00:00    -0.0005397528   0.0002159827
10  2014-08-12  13:30:00    0.0000000000    0.0005398111
11  2014-08-12  14:00:00    0.0000000000    0.0003238517
12  2014-08-12  14:30:00    -0.0002158895   0.0003239216
13  2014-08-12  15:00:00    -0.0002159128   0.0001079739
14  2014-08-12  15:30:00    0.0000000000    0.0008642109
15  2014-08-12  16:00:00    -0.0003240966   0.0004322922
16  2014-08-12  16:30:00    -0.0003241666   0.0003242717
17  2014-08-12  17:00:00    -0.0008642109   0.0001080789
18  2014-08-12  17:30:00    -0.0001079739   0.0004320121
19  2014-08-12  18:00:00    -0.0005396945   0.0001079739

If the rolling window of the function is 2, using a simple function such as multiplication, the equivalent rows in the first two data frames would be multiplied, rolling forward until the equivalent rows in the 3rd and 4th data frames were multiplied.

Res1 = data.frame(matrix(NA,ncol=3,nrow=19))
rownames(Res1) = Example[[1]][,2]
colnames(Res1) = c("2014-08-13","2014-08-14","2014-08-15")
Res1[,1]= Example[[1]][,3]*Example[[2]][,3]
Res1[,2]= Example[[2]][,3]*Example[[3]][,3]
Res1[,3]= Example[[3]][,3]*Example[[4]][,3]

This is a simple example, as the function I want to apply would roll over the equivalent rows in 5 data frames at a time rolling forward. Below, the function is applied to just the Col1 variable to simplify, but I would do it for all variables separately.

The result should look something like the below, where the first column name is the end of each rolling window, hence 4 data frames, equivalent to 4 dates becomes 3 columns

row.names   2014-08-13  2014-08-14  2014-08-15
1   09:00:00    0.000000e+00    0.000000e+00    4.612407e-08
2   09:30:00    6.987091e-08    0.000000e+00    0.000000e+00
3   10:00:00    6.987846e-08    6.954805e-08    4.614140e-08
4   10:30:00    4.889919e-07    1.621737e-07    2.307567e-08
5   11:00:00    0.000000e+00    0.000000e+00    0.000000e+00
6   11:30:00    6.977682e-07    4.869397e-07    8.083873e-08
7   12:00:00    2.325015e-08    5.794719e-08    1.154280e-07
8   12:30:00    0.000000e+00    0.000000e+00    0.000000e+00
9   13:00:00    5.816615e-08    0.000000e+00    0.000000e+00
10  13:30:00    0.000000e+00    9.270554e-08    6.921964e-08
11  14:00:00    0.000000e+00    0.000000e+00    0.000000e+00
12  14:30:00    0.000000e+00    0.000000e+00    9.224832e-08
13  15:00:00    1.627863e-07    1.621387e-07    2.305836e-08
14  15:30:00    0.000000e+00    0.000000e+00    0.000000e+00
15  16:00:00    6.972068e-08    0.000000e+00    0.000000e+00
16  16:30:00    1.045980e-07    3.122420e-07    4.148280e-07
17  17:00:00    0.000000e+00    0.000000e+00    2.419698e-07
18  17:30:00    0.000000e+00    0.000000e+00    2.303854e-07
19  18:00:00    5.810978e-08    1.040920e-07    6.218404e-07
1
You may not need a list of dataframes, it may be easier to have01:00:00, 2:00:00 as a new grouping variable by using any of the aggregating functions i.e. aggregate, data.table, dplyr, and within the group, may be use rollapply from library(zoo). It is better to provide a small reproducible example with an expected output based on that example for testing.akrun
Please provide a reproducible example and desired output.David Arenburg
@akrun thank you for your advice, I've included an example now if that helpsCeleste
@DavidArenburg I've tried to include an example and desired output, I hope this makes it clearerCeleste

1 Answers

0
votes

For any others who find themselves at this link, I found my answer at this link

Precisely, I did the following, after which you can apply a rolling function to the newly formed data

DataSample = lapply(DataList, function(x) with(x,x[x$Col==Idx,,drop=FALSE]))

DataSample = data.frame(matrix(unlist(DataSample),nrow=noDays,byrow=T),stringsAsFactors=FALSE)