Clustering Time Series Data of Different Length

Question

I have time series data of different length of series. I want to cluster based upon DTW distance but could not find ant library regarding it. sklearn give straight error while tslearn kmeans gave wrong answer.

My problem is solving if I pad it with zeros but I am not sure if this is correct to pad time-series data while clustering.

The suggestion about other clustering technique about time series data are welcomed.

max_length = 0

for i in train_1:
    if(len(i)>max_length):
        max_length = len(i)
print(max_length)

train_1 = sequence.pad_sequences(train_1, maxlen=max_length)
km3 = TimeSeriesKMeans(n_clusters = 4, metric="dtw",verbose = False,random_state = 0).fit(train_1)

print(km3.labels_)

I am the one asked the question on analysis reach to the conclusion that padding is not the solution as it gives different answers from more than 2 class data — Yash Gupta

Anix Breaker Anix Breaker · Accepted Answer · 2019-06-20T06:41:15

You can try custom made k-means(clustering algorithm) or other. Source code is easily available at the sklearn library. Padding is really not a great option as it will change the question problem itself. You can also use tslearn and pyclustering(for optimal clusters) as an alternative, but remember to use DTW distance rather than Euclidean distance.

Clustering Time Series Data of Different Length

2 Answers