I messing around with sparse coding from scikit-learn and I want to try to classify images. I have images of size 128 x 128. From this I extract random 7x7 patches to feed to kmeans which has 100 centroids. This means I have a dictionary of 100 atoms.
So given an image to classify I first extract patches from this image with extract_patches_2d, which if I am not mistaken is also called convolutional sampling. This means that I have (128-7+1)^2 patches for an image. I can encode every patch by the use of my dictionary and orthogonal matching persuit, leaving my with (128-7+1)^2*(128-7+1)^2 * 100 (sparse) features.
What would be the next step in order to transform this (14884,100) matrix to a feature vector. From what I am reading this is done with average or max pooling, but I can't quite figure out how this works given this matrix.