0
votes

I'm attempting to use the timeSeries package in R to aggregate data from a timeSeries object. I wrote some basic sample code for reference:

library(timeSeries)
library(timeDate)
BD <- as.timeDate(paste("2015-01-01", "00:00:00")) # Creates a timeDate.
ED <- as.timeDate(paste("2015-01-31", "23:59:00")) # Creates a timeDate.
DR <- seq(BD, ED, by = 60) # Creates a sequence by minutes in between the 2 dates.

data <- runif(length(DR), 0, 100) # Creating random sample data.

x <- timeSeries(data, DR) # Initializing a timeSeries object from data and DR.
colnames(x) <- "Data" # Renaming column.

by = timeSequence(BD, ED, by = "hour") # Setting the sequence to be aggregated on.
x.agg <- timeSeries::aggregate(x, by, sum) # Aggregating on that sequence.

After running the code my head looks like this:

> head(x.agg)
GMT
                          Data
2015-01-01 00:00:00   29.71688
2015-01-01 01:00:00 3129.84860
2015-01-01 02:00:00 2398.92438
2015-01-01 03:00:00 3134.78608
2015-01-01 04:00:00 2743.79543
2015-01-01 05:00:00 3159.38404

Notice that the first data, "2015-01-01 00:00:00" is significantly less than the other hourly sums, in fact it is exactly the same as the data point in the original data sample:

> head(x)
GMT
                        Data
2015-01-01 00:00:00 29.71688
2015-01-01 00:01:00 38.73175
2015-01-01 00:02:00  1.01945
2015-01-01 00:03:00 89.64938
2015-01-01 00:04:00 34.23608
2015-01-01 00:05:00 90.48571

Doing some investigating into where the sum is coming from, the aggregation for the "2015-01-01 01:00:00" hour is a summation of all the time in between (inclusive) "2015-01-01 00:01:00" and "2015-01-01 01:00:00" as shown code-wise here:

> sum(x[2:61,])
[1] 3129.849

> x.agg[2,]
GMT
                        Data
2015-01-01 01:00:00 3129.849

What I need is for the aggregation to sum across all the data points within the "00:00:00" hour, that is to say, the aggregation for "2015-01-01 00:00:00" should be equivalent with:

> sum(x[1:60,])
[1] 3065.829

including the first minute of that hour and not the first minute of the next hour like aggregation is doing. It seems to be that the aggregation function is considering the first minute of the hour to not be part of that hour, which I find very strange. Any help would be greatly appreciated.

1
The results you obtained is expected. You need to pay attention to the doc concerning ?aggregate. Something like aggregate.ts(x,nfrequency = 1/60) yield better results but it sill misses your objective.DJJ
I definitely see that the way aggregate is working is simply inherent to the function. I'm hoping to find a work around as I would like to be able to use the timeSeries package and having my data aggregated in not the way I need it throws a wrench in that endeavor.giraffehere

1 Answers

0
votes

It seems I found an answer to my own question and it involves modifying the source code for the timeSeries::aggregate() function. To achieve what I wanted in my above question, go to the source code of the timeSeries package, which is most easily found by downloading the tar.gz file off CRAN here:

https://cran.r-project.org/web/packages/timeSeries/index.html

Extract the file and make your way into the R folder inside the timeSeries folder. Find the "stats-aggregate.R" file and open it in R. In it, you'll see .aggregate.timeSeries function. Inside that function, what needs to be changed to get the result I wanted is that the +1's need to be removed from line 80 and 81. After doing so, the aggregate function will aggregate in the way I wanted it to.

Here is the modified function in text (I changed its name as well):

`modTSAgg <- 
 function(x, by, FUN, ...)
{
# A function implemented by Yohan Chalabi and Diethelm Wuertz

# Description:
#   Aggregates a 'timeSeries' object

# Details:
#   This function can be used to aggregate and coursen a
#   'timeSeries' object.

# Arguments:
#   x - a 'timeSeries' object to be aggregated
#   by - a calendarical block
#   FUN - function to be applied, by default 'colMeans'
#   ... - additional argument to be passed to the newly generated
#       'timeSeries' object

# Value:
#   Returns a S4 object of class 'timeSeries'.

# Examples:
# Quarterly Aggregation:
#   m = matrix(rep(1:12,2), ncol = 2)
#   ts = timeSeries(m, timeCalendar())
#   Y = getRmetricsOptions("currentYear"); Y
#   from = paste(Y, "04-01", sep = "-"); to = paste(Y+1, "01-01", sep = "-")
#   by = timeSequence(from, to, by = "quarter") - 24*3600; by
#   ts; aggregate(ts, by, sum)
# Weekly Aggregation:
#   dates = timeSequence(from = "2009-01-01", to = "2009-02-01", by = "day")
#   data = 10 * round(matrix(rnorm(2*length(dates)), ncol = 2), 1); data
#   ts = timeSeries(data = data, charvec = dates)
#   by = timeSequence(from = "2009-01-08",  to = "2009-02-01", by = "week")
#   by = by - 24*3600; aggregate(ts, by, sum)

# FUNCTION:

# Check Arguments:
if (!((inherits(by, "timeDate") && x@format != "counts") ||
      (is.numeric(by) && x@format == "counts")))
  stop("'by' should be of the same class as 'time(x)'", call.=FALSE)

# Extract Title and Documentation:
Title <- x@title
Documentation <- x@documentation

# Make sure that x is sorted:
if (is.unsorted(x))
  x <- sort(x)

# Sort and remove double entries in by:
by <- unique(sort(by))

INDEX <- findInterval(x@positions, as.numeric(by, "sec"))
INDEX <- INDEX
is.na(INDEX) <- !(INDEX <= length(by))

# YC : ncol important to avoid problems of dimension dropped by apply
data <- matrix(apply(getDataPart(x), 2, tapply, INDEX, FUN), ncol=ncol(x))
rownames(data) <- as.character(by[unique(na.omit(INDEX))])
colnames(data) <- colnames(x)
ans <- timeSeries(data, ...)

# Preserve Title and Documentation:
ans@title <- Title
ans@documentation <- Documentation

# Return Value:
ans

  }


setMethod("aggregate", "timeSeries", function(x, by, FUN, ...)
  modTSAgg(x, by, FUN, ...))


# until UseMethod dispatches S4 methods in 'base' functions
 aggregate.timeSeries <- function(x, ...) modTSAgg(x, ...)`