Irregular/missing data when clustering time series

Question

Whether it is dynamic time warping or some sort of Euclidean k-means clustering of a time series, it is (nearly?) always required to consider irregular spacing of data, unequal lengths of data and/or missing-ness of data.

While realizing that each of these issues have considerations unto themselves, is there a general reason why pre-processing each time series with a spline to interpolate (or very minimally extrapolate) the data to ameliorate these issues cannot be done?

BonStats BonStats · Accepted Answer · 2020-02-22T01:28:39

I don't see why not. I think the main thing to consider is what assumption(s) you are making. The assumptions that come to mind for such a procedure, to me at least, are

The splines can adequately describe (smooth) each time series, and capture differences between them.
The inputs into the clustering procedure describe true differences between the splines, and hence the time series.

The input into the clustering procedure could be the estimated spline function, or the coefficients of the spline. Certainly, the estimated coefficients would be easier to use, but you'd want to ensure that differences between them truly represent differences in the spline function. This might boil down to orthogonality of the basis function of the splines, but I'm not sure if there is theory existing to back that up or not.

Irregular/missing data when clustering time series

1 Answers