I would take the two images and scale them down to much lower resolution... like a grid of 16 x 16, or so. Mark each point in the grid as on or off (drawn in or not drawn in).
Then overlay over each other, and see how many of the points are set in one and not in another. If that is over a threshold, flag it as not a match.
You could improve the algorithm by scaling the drawn image. Find the topmost and bottom-most drawn pixels in both, and scale the drawn image to match the first image. You could do the same with width. This way a player wouldn't be penalized for drawing a good, but smaller, version of the picture.
Another improvement would be to do multiple comparisons, shifting the drawn image left to right, up and down, taking the 'best' match. That way you won't get penalized for drawing something offset from the center.
It's all a bit hacky, but I think it is probably more helpful to go this route than to try to incorporate the logic to parse strokes and other OCR- or gesture-based algorithms.