0
votes

I am trying to figure out how to smooth data that I've averaged in to "day of year" data. I have simplified the problem in the example code below to the minimum possible. In my actual script I have calculated a data frame that has a "day of year" index column from 1 to 365 and a 2nd column that is the average of a specific measurement for that day of the year over many years. I seek to further smooth the data by calculating a centered running average (of, for example 11 days) on this data.

I am having a hard figuring out how to efficiently handle the "calendar break", meaning that at the beginning and end of the "day of year" data I need to cross DoY = 365 back to DoY=1. How do I calculate the running average when the center of the average runs from day=360 to day=5?

I started to kluge together a solution but quickly arrived at less than elegant code. Is there an efficient means to do this?

The example below provides an example data frame with trial data.

# A simulated daily time series average
ann_data <- data.frame(day=seq(1,365,1), data=
(sin(pi*seq(1:365)/182+90)+rnorm(365)/10)) 
plot(ann_data)

ann_data_smooth <- ?
1
If the answer addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer (when more than one are offered), you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)r2evans

1 Answers

0
votes

If you already know how to do the running average, just copy the data to the end of itself, compute the running average, then limit it again. E.g.

yearDataLength <- length(yearData)
yearData <- c(yearData, yearData)
runningAve <- running_average_function(yearData)[1:yearDataLength]

The running_average_function is whatever function you're currently using. The 1:yearDataLength bit just limits the range you take after applying the function. It's fairly common to do this when smoothing on cyclical data like this. If you need the start to line up with the end as well was than taking 1:yearDataLength take the middle 50% of the data rather than the first 50%.

EDIT: After re-reading I see you are concerned about the beginning as well. This means if you use the above approach you could take the data from 5:(yearDataLength+4) (which is just 5:369). This gives it the room it needs for the calculation across the calendar break.

Your index should still be in tact from the copy, so once you have that range, just use the sort function to get the data back in the correct order using the index column.