0
votes

I have a data set of about 400 bivariate time series which each contain roughly 80,000 observations. After looking at them manually it is obvious that some are very similar and so I want to cluster them using DTW (Dynamic Time Warping).

Now, if I try creating the distance matrix for the whole set using the DTW method, R tells me it needs 50 GB of RAM (which I don't have). Is it possible to calculate the distance between two time series separately using a for loop (or similar)?

Which other distance methods would you recommend for clustering time series?

1

1 Answers

0
votes

If you do DTW naively, it is quadratic, and your matrix will have 6400000000 elements, hence 50 gig

However if you only need the distance and not the path, you can do DTW using only two columns at a time, just 160000 elements, less than one megabyte

However there is still some bad news, the space complxity is a non issue, but the time complexity will kill you.

However, there are some tricks, like downsampling [a] that could help.

If you want more help, email the last authour of [a] (me)

[a] http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf