0
votes

I'm trying to use KMeans for clustering RGB colors and automatically count how many pixels of each group is present on an image. For that, I'm setting the initial position of centroids at positions I would like to categorize and running KMeans from sklearn.

The problem is, depending on the image, the algorithm output changes the order of the initial centroid vector, so when I count the number of elements, it goes to the wrong color.

This usually happens when I dont have one or more colors that are in initial centroids on the image. In this case, I would like it to count 0 instead.

Does anyone knows how to fix the order of initial centroids on the output of KMeans prediction?

Code bellow:

 centroid_start = np.array([[0,0,0],#Black
                           [38,64,87], #Col1
                           [43,68,98], #Col2
                           [23,42,45], #Col3
                           [160, 62, 0],#Col3
                           [153, 82, 33], #Col5
                           [198, 130, 109], #Col6
                           [100,105,79], #Col7
                           [220,138, 22]#Col8
                           ], np.float64)      
    image = cv.cvtColor(img, cv.COLOR_HSV2RGB)
    reshape=image.reshape((image.shape[0]*image.shape[1], 3))
    cluster = KMeans(n_clusters =np.shape(centroid_start[0], init =centroid_start).fit(reshape)
 pixels = Counter(cluster.labels_)
print(pixels)

The problem is:when I check 'pixels' variable, 0 not always correspond to black, 1 not always correspond to Col1, etc.

1

1 Answers

0
votes

If you don't want the colors to migrate, you probably should not use k-means. Instead, just use pairwise distances between your colors and the image pixels, then select the color with smallest distance.

If you really do want the initial colors to migrate, then you have to accept that some of your initial cluster centers (colors) may disappear or potentially migrate to something very different than your initial colors. One option is to re-order the rows of the cluster_centers_ attribute (and possibly labels_) of your fitted KMeans object. Another - probably safer - option is to compute a mapping of fitted cluster centers to your original colors (again using pairwise distances), then translate the results of your subsequent k-means classification. If you want to do it all in one step, you could subclass KMeans or wrap it by creating your own class derived from BaseEstimator.