3
votes

I want to cluster the images using K Means or other algorithm(suggestion required).

The problem is like this- I want to cluster images into 3 clusters (nature, sunset, water). I loaded all the images using os.listdir() and then converted all of the images into arrays (RGB) and then created a data frame which contains three columns - ID, Image_array, Label.

Now, when I use K Means clustering, providing n_clusters = 3, it shows this error:

from sklearn.cluster import KMeans kmeans = KMeans(n_clusters = 3).fit(img_array) ERROR = Found array with dim 4. Estimator expected <= 2.

Now, I need your help in this clustering problem. The data frame that I created looks like this

img_array = []

path = "C://Users/shivam/Desktop/freelancer/p22/data/green_nature/"
for f in os.listdir('.'):
    if f.endswith('.jpg'):
        img = Image.open(f)
        data = np.asarray(img, dtype='uint8')
        img_array.append(data)


df = pd.DataFrame({'image_arrays':img_array})
df['id'] = range(1, len(df) + 1)
2

2 Answers

0
votes

This happen because you pass 4-dim array while 2-dim expected. 'img_array.shape' should be like this (n_samples, n_features). You need to use feature extraction algorithm.

This can be done via scikit-image module. You need to convert images to greyscale format. Code:

import skimage.feature as feature
img_converted = []
for i in range(len(img_array)):
    img_converted.append(feature.hog(img_array[i]))
model.fit(np.array(img_converted))

Documentation: http://scikit-image.org/docs/dev/api/skimage.feature.html#hog

0
votes

Well as you said, k-means would like a vector per input, whereas you provide it with a 3d array per image. The easiest way to solve a problem like this (which does require some creativity) would be to devise a set of features that are descriminating for the classes you have.

Since in this case you wish to classify between nature (lot's o' green), water (lot's o' blue) and sunset (lot's o' read/yellow/pink maybe?) you could use the total or average green blue and red values. To check if the features you have selected are discriminative, you can plot a histogram.

to go from your 4D (image x width x height x colour) array to a 2D (image x average colours) array. You need to take the np.mean over the colour, height, and width diminsions. In the end you should have an (images x 3 (colours)) array.