0
votes

I have a ton of data that I'm feeding through R in order to generate averages. The relevant data involves dates and temperature readings. There are often multiple temperature readings for a single day. The dates span approximately 6 months.

Two to the critera the researchers requested were described as the following:

Average Weekly – 7 day rolling average (not calendar week) Average Max – 7 day rolling max

So, if my data started on 1/1/13, I'd average ALL the temperature readings between 1/1/13 and 1/7/13, and then do the same thing for 1/8/13 - 1/15/13 and so on. I've been told elsewhere on Stack that this is actually called a "average-by-week-of-year", though I'd admit I don't quite get how it's not a moving average. I've done some research, but total newb that I am, I've struggled to understand how to approach this problem.

For the visual among you, this is essentially the sort of data I'm dealing with (the actual data.frame looks a lot different (see the dput head below) and is several thousand records long, but these are the proper names of the two relevant columns):

DATE    |    TEMP
-----------------
1/2/13     34.4
1/2/13     36.4
1/2/13     34.3
1/4/13     45.6
1/4/13     33.5
1/5/13     45.2
1/6/13     53.9
1/7/13     34.6
1/7/13     36.2
1/8/13     22.4
1/9/13     30.8
1/9/13     33.2

I've been looking at the xts library:

xts(x = NULL,
    order.by = index(x),
    frequency = NULL,
    unique = TRUE,
    tzone = Sys.getenv("TZ"),
    ...)

This looks promising but I can't see to quite figure it out and the documentation isn't helping too much.

xts(x = mydf, order.by = DATE(x), frequency = 7...?

Ideas? Thank you.

Here's a small sample of the dput head info:

structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L, 
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L, 
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L, 
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013", 
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013", 
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00", 
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00", 
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00", 
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146, 
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID", 
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA, 
6L), class = "data.frame") 
1
It would help a lot if we could tell the column classes of your data.frame. Could you post dput(head(mydf)), or at least dput(head(mydf[, c("DATE", "TEMP")]))?Gregor Thomas
@shujaa - done. :) As I said, the actual data is different than the sample data I posted as I only really need to bother w/ the temp and date, but if the dput head helps, there it is.TheNovice
It definitely helps, mostly because I can see that your date is a factor, rather than something like a Date object.Gregor Thomas
Got it, still learning R. I appreciate it.TheNovice

1 Answers

3
votes
sampledata = ' 
    DATE       TEMP
    1/2/13     34.4
    1/2/13     36.4
    1/2/13     34.3
    1/4/13     45.6
    1/4/13     33.5
    1/5/13     45.2
    1/6/13     53.9
    1/7/13     34.6
    1/7/13     36.2
    1/8/13     22.4
    1/9/13     30.8
    1/9/13     33.2
'

ex1 = read.table(text=sampledata,header=T)

library(xts)

ex1$DATE = as.Date(ex1$DATE,format='%m/%d/%y')
ex2= xts(ex1$TEMP,order.by=ex1$DATE)
xts::apply.weekly(ex2, mean)

It doesn't sound like a moving average to me