This is a homework problem and I'm facing some difficulties to understand it. The home work question is
Cluster the following bitsequences using hierarchical clustering. If d(:,:) defines the
distace between two bitsequences a and b, d(a,b) = Hamming-Distance(a,b) . If C1 and C2 are
two clusters, the distance between C1 and C2 is d(C1,C2) = 1/|C1||C2| Summation(a belongs C1, b belongs C2) d(a,b).
Show the cluster hierarchchy with all the intermediate steps.
1 10001011
2 11010111
3 00101010
4 00011110
5 10101110
6 11100001
I read in a book that initially I have to consider all of them as clusters and then start merging the closest ones. A new cluster will be formed. Now I have to find the closest cluster to this newly formed cluster by computing the distance between this new cluster and other clusters by averaging the distance between each element in both clusters as said in the question.
My solution: I will find hamming distance between all the pairs and choose the one with least one which is C3 and C5 (hamming distance is 2). Now this can can be merged into a new cluster.
My concern is what is exactly meant by merging here? How do I do it? Or simply I keep them as they are and name it a new cluster?
And how do I find the average distance between each element of the new cluster with other clusters?
Also to calculate average the formula given says to divide by |C1| and |C2|. So, does it mean I have to divide here by the number of elements (which is 8 per one group times the cluster it gets merged into?)
Any help is greatly appreciated. Thank you.