0
votes

I have huge dataset of time series which are represented as vectors (no time labels available), due to some errors in measuring process their lengths (as values from length() show) varies slightly (~10%) but each of them definitively describs time interval of exacly two minutes. I would like to rescale/resize them and then calculate some statistics between them (so I need time series of equal lengths).

I need vary fast approach and linear interpolation is perfectly good choice for me, because speed is more important.

Simple example, rescaling vector of length 5 to vector of length of 10 :

input <- 0:4 # should be rescaled/resized into :
output <- c(0, .444, .888, 1.333, 1.777, 2.222, 2.666, 3.111, 3.555, 4)

I think that the fastest approach is to create matrix w ('w' for weights) which dimensions are : length(output) x length(input), so w %*% input gives output(as matrix object), if it is the fastest way, how to create matrices w efficiently ?

1

1 Answers

2
votes

I think this could be enough:

resize <- function (input, len) approx(seq_along(input), input, n = len)$y

For example:

> resize(0:4, 10)
 [1] 0.0000000 0.4444444 0.8888889 1.3333333 1.7777778 2.2222222 2.6666667 3.1111111 3.5555556 4.0000000

> resize( c(0, 3, 2, 1), 10)
 [1] 0.000000 1.000000 2.000000 3.000000 2.666667 2.333333 2.000000 1.666667 1.333333 1.000000