1
votes

I have time-series data of 12 consumers. The data corresponding to 12 consumers (named as a ... l) is enter image description here

I want to cluster these consumers so that I may know which of the consumers have utmost similar consumption behavior. Accordingly, I found clustering method pamk, which automatically calculates the number of clusters in input data.

I assume that I have only two options to calculate the distance between any two time-series, i.e., Euclidean, and DTW. I tried both of them and I do get different clusters. Now the question is which one should I rely upon? and why?

When I use Eulidean distance I got following clusters: enter image description here

and using DTW distance I got enter image description here

Conclusion: How will you decide which clustering approach is the best in this case?

Note: I have asked the same question on Cross-Validated also.

1
I'm voting to close this question as off-topic because you cross-posted on SE, which is a better site for such a question.user3710546
How well does your data fit into each clustering result? Does one give more outliers than the other? Do the clusters have any physical meaning?Tim Biegeleisen
@Pascal, You are correct that SE is better. But the fact is that form last few days, I observed that I do not get any comment, answer to any of my questions. I find Stack Overflow much more active than Cross-validated.Haroon Rashid
But it is not a reason to post an off-topic question here.user3710546
@Pascal, I do not think it is off-topic to stack overflow. On stack overflow there are 2.1k question related to cluster-analysis while as on cross-validated it is only 1.6k.Haroon Rashid

1 Answers

0
votes
  1. none of the timeseries above look similar to me. Do you see any pattern? Maybe there is no pattern?

  2. the clustering visualizations indicate that there are no clusters, too. b and l appear to be the most unusual outliers; followed by d,e,h; but there are no clusters there.

  3. Also try hierarchical clustering. The dendrogram may be more understandable.

But in either way, there may be no clusters. You need to be prepared for this outcome, and consider it a valid hypothesis. Double-check any result. As you have seen, pam will always return a result, and you have absolutely no means to decide which result is more "correct" than the other (most likely, neither is correct, and you should rely on neither, to answer your question).