3
votes

I am trying to perform a Time Series Clustering With Dynamic Time Warping Distance (DTW) with the dtwclust package.

I use this function,

dtwclust(data = NULL, type = "partitional", k = 2L, method = "average",
distance = "dtw", centroid = "pam", preproc = NULL, dc = NULL,
control = NULL, seed = NULL, distmat = NULL, ...)

I save my data as a list, they have different length. like the example below, and it is a time series.

$a
[1]  0  0  0  0  2  3  6  7  8  9 11 13

$b
[1]  0  1  1  2  4  7  8 11 13 15 17 19 22 25 28 31 34 35

$c
[1]  1  2  4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  6  6  6  6  7  7  8  8  9 10 10 12 14 15 17 19

$d
[1] 0 0 0 0 0 1 2 4 4 4

$e
[1]  0  1  1  3  5  6  9 12 14 17 19 20 22 24 28 31 32 34

Now, my problem are

(1) I can only choose dtw, dtw2 or sbd for my distance and dba, shape or pam for my centroid (because of different length of list). But, I don't know which distance and centroid is correct.

(2) I have plot some graphs, but I don't know how to choose the right and reasonable one.

k = 6, distance = dtw, centroid = dba:

k = 4, distance = dtw, centroid = dba (the cluster center seems wired?)

I have do all the combination, k from 4 to 13... but I have no idea about how to choose the right one...

1

1 Answers

3
votes

You would not want to "chose" the parameters but rather evaluate the result. Therefore, you need to chose a criterion for evaluation of clustering. You basically vary the parameters such as distance and k and then evaluate the clustering using a loss function. Generally there are two possibilities for evaluation of clustering:

external evaluation:

You can use the labels (which were not used for clustering and are therefore considered external) to calculate the accuracy in form of false positive, true positive, etc. which will finally lead you to the AUC measure.

It seems that your data is not labelled, therefore you can not calculate any accuracy, which would be the easiest way to go.

internal evaluation:

Alternatively, you can try to maximize the intra-cluster similarity (average distance of a cluster member to all other members of a specific cluster) and minimize the inter-cluster similarity (average distance of a cluster member to all elements outside of his own cluster).

For further information can be found:

http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

http://www.ims.uni-stuttgart.de/institut/mitarbeiter/schulte/theses/phd/algorithm.pdf