0
votes

I have a set of "goob" and "bad" images, presented as gray-scale array. I would like to extract "good" and "bad" features from these images and populate a dictionary. Here my high-level algorithm to approach this task:

  1. Load images and present them NP matrix img_mtx [ img_mtx.shape = (10, 255, 255)]
  2. Use image.PatchExtractor over img_mtx to get the 1000 patches for each image, in total 10000 7x7 pixel patches [patches.shape = (10000, 49)]
  3. Next step, I assume my patches matrix is something like a bag of words and I want to create a sparse matrix of patches for each image and set "good" or "bad" class for each image.
  4. Now I should have pretty classic task for classification, like spam detection, just adding more images in training set I should have a good result.

But I have some problems here:

  • How to implement step 3? I've seen the example for text classification, but not for image classification
  • When I need to classify a new image, again I'm splitting it into patches, but now I need to map these patches from the new image into my "patch dictionary". How to do it the best way, keeping in mind I may never receive 100% match with dictionary? I looks like I need to calculate the closest distance to my dictionary's feature, but this sounds expensive.

... or I took a completely wrong approach to this task?

1

1 Answers

1
votes

You should first think about what are good features for your task. Also, you should think about whether your images are always the same shape and aligned. If you think it is a good idea to describe patches, you may want to look into standard image features like SIFT or SURF or BRIEF - maybe look into scikit-image, opencv or mahotas - though having just the raw patches is a possible first step. If you want to use patch descriptors and want to throw away the spacial arrangement (which would be the bag of word approach) you need to cluster the descriptors and then build histograms over the "words". Then you can train on the histogram and get a single prediction for the whole image. There is a vast amount of literature on this, not sure what would be a good point to start, though. Maybe look into the book by Szeliski on Computer Vision.