0
votes

I have a dataset contains 3 categories {c1,c2 and c3}. I’m using the single- linkage hierarchical cluster method (from the matlab) to cluster the dataset. I built my own distance measure. The following figure shows the results. Note that the hierarchical cluster method clusters the data correctly; where the points of c1 (yellow) are very close to each other. And similarly, c2(green) and c3(blue).

enter image description here

From the figure, we can note that the distances between the points in c1 are very small comparing to c2 and c3. So, for example, If I decide to cut the tree at 8, this will results with c1, c2 and c3 will be splited into 8 clusters; where each point will be in different cluster.

How can I overcome this problem; do I need to change the clustering method? Or cut the tree at 17 and cluster the resulted clusters again?

1

1 Answers

0
votes

There are different ways of extracting clusters from a dendrogram. You are not required to do a single cut (although matlab may only offer this choice). Selecting regions like you did is also reasonable, and so is cutting the dendrogram at multiple heights. But not every tools has all the capabilities.

Notice that c3 was split into two, half of which is not well separated from c2.