0
votes

I messing around with sparse coding from scikit-learn and I want to try to classify images. I have images of size 128 x 128. From this I extract random 7x7 patches to feed to kmeans which has 100 centroids. This means I have a dictionary of 100 atoms. So given an image to classify I first extract patches from this image with extract_patches_2d, which if I am not mistaken is also called convolutional sampling. This means that I have (128-7+1)^2 patches for an image. I can encode every patch by the use of my dictionary and orthogonal matching persuit, leaving my with (128-7+1)^2*(128-7+1)^2 * 100 (sparse) features.

What would be the next step in order to transform this (14884,100) matrix to a feature vector. From what I am reading this is done with average or max pooling, but I can't quite figure out how this works given this matrix.

1

1 Answers

1
votes

Are your images natural images or do they come from some very specific setup or scientific imaging? If you want to classify natural images, I recommend you look into either feature extraction using neural networks, or handcrafted descriptors like SIFT (for example try DAISY from scikit-image).

To answer your question: To do the max-pooling or average-pooling you need to decide whether you want to keep locality in the image or not. If not, you can just do the max or average over the rows you get for each image. If you want to keep locality, you can put a 3x3 or similar grid over the image, and take the average / max only of those patches that lie within a given grid cell. That would give you for example 3x3x100 features per image.