8
votes

As indicated in the title, I am wondering if the DTW (Dynamic Time Warping) could be used to calculate the DTW distance between two time series with missing values.

Let's say the two time series are daily temperatures of two weather stations, and are of equal lengths (e.g. 365 days), and the missing values are on different days for the two time series.

If this is possible, is the dtw package in R able to handle the missing values? I didn't find a parameter that could be set in dtw() like na.rm = T.

Thanks a lot!

Thanks thelatemail for the suggestion. Below is a simplified example of the two time series, where each time series contain only 52 elements and the missing values are set to NA.

TS1 = c(-3.26433,  -5.09096,    NA, -8.4158,    -5.85485,   -3.49234,   -7.64666,   -4.90124,   NA, -4.68836,   -1.38114,   1.55527,    2.81872,    2.44261,    3.57963,    6.19983,    7.42515,    8.41524,    6.32686,    10.0144,    9.53251,    13.4781,    12.3585,    10.6706,    10.2647,    16.6848,    16.4855,    20.1482,  NA,   21.5734,    20.3946,    20.8824,    18.0325,    18.5813,    17.5453,    16.3315,    14.3068,    11.3164,   9.96398, 5.53102,    9.55094,    9.05897,    6.81199,    5.20343,    1.63158,    -0.661077,  -4.33853,   -6.53655,   NA,   -10.8646, 1.11843,    1.23786)

TS2 = c(-5.76852,  -10.2207,    -11.8465,   NA, -1.70019,   -3.60319,   -5.7718,    -3.81106,   -5.62284,   -3.57516,        0.314511,  0.64058,    0.476162,   NA, 4.23757,    5.15417,    7.29422,    NA, 1.57376,    9.28236,    8.05182,    13.7175,    9.5453, 10.2417,    9.32423,    18.214, 18.3726,    16.661, 20.6563,    22.2901,  22.1109,  19.129, 15.8615,    16.7817,    17.247, 15.9921,    14.5804,    11.3693,    10.9349,    10.1196,  3.7467,   9.09229,    6.91285,    NA, 4.20934,    -0.566403,  -2.94184,   -3.81432,   -10.0212,   -15.9876,    -2.56286,  -1.88976)
3
This sounds interesting - could you post a simplified example of the type of data you are using for the analyses so that those who might be able to answer your query have something concrete to work with?thelatemail
Following from Ali's answer, could you just impute the missing values first and then run the dtw program? I know there are a number of imputation methods, but even something as simple as TS2[is.na(TS2)] <- sapply(which(is.na(TS2)),function(x) mean(c(TS2[x-1],TS2[x+1]))) could work alright.thelatemail
Thanks! I actually thought about imputation. But the actual data gap is much worse than shown in the example. For some of the time series to be analyzed, there can be 1/3 of the data points missing...user1795375

3 Answers

8
votes

Probably not, I looked over the package manual and there is nothing about the missing or NA values. I also tried to feed your data to dtw() and it fails:

Error in dtw(TS1, TS2) : 
  No warping paths exists that is allowed by costraints

But when I changed all NA values to 0, it worked easily.

So if your only solution is this package, you can make a post on the DTW package forum, or probably you have to deal the missing data yourself. You may find some hints here or use the na() function of the fSeries package*.

*This package is no longer available. It is suggested to use the timeSeries package instead.

2
votes

I also run into this situation. The reason you are getting error message when using DTW with a time series containing NA values is that the warping distance will be undetermined when NA is present in the DTW path. I suggest you impute the NA values using some ARIMA model and then use DTW. Check out this or this for imputing missing time series values.

1
votes

The dtw function to work as follows.

#this shows how to register a distance function with proxy
install.packages("proxy")
require("proxy")

DWT.DIST<-function (x,y)
{

  a<-na.omit(x)
  b<-na.omit(y)

  return(dtw(a,b)$normalizedDistance)
}

## create a new entry in the registry with two aliases
pr_DB$set_entry(FUN = DWT.DIST, names = c("DWT.DIST"))

d<-dist(appliances_t, method = "DWT.DIST")
hc<-hclust(d,"ave")
plot(hc)

pr_DB$delete_entry("DWT.DIST")

Sources:

Link 01; Link 02