I need a simple and fast way to compare two images for similarity. I.e. I want to get a high value if they contain exactly the same thing but may have some slightly different background and may be moved / resized by a few pixel.
(More concrete, if that matters: The one picture is an icon and the other picture is a subarea of a screenshot and I want to know if that subarea is exactly the icon or not.)
I have OpenCV at hand but I am still not that used to it.
One possibility I thought about so far: Divide both pictures into 10x10 cells and for each of those 100 cells, compare the color histogram. Then I can set some made up threshold value and if the value I get is above that threshold, I assume that they are similar.
I haven't tried it yet how well that works but I guess it would be good enough. The images are already pretty much similar (in my use case), so I can use a pretty high threshold value.
I guess there are dozens of other possible solutions for this which would work more or less (as the task itself is quite simple as I only want to detect similarity if they are really very similar). What would you suggest?
There are a few very related / similar questions about obtaining a signature/fingerprint/hash from an image:
- OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?
- Image fingerprint to compare similarity of many images
- Near-Duplicate Image Detection
- OpenCV: Fingerprint Image and Compare Against Database.
- more, more, more, more, more, more, more
Also, I stumbled upon these implementations which have such functions to obtain a fingerprint:
- pHash
- imgSeek (GitHub repo) (GPL) based on the paper Fast Multiresolution Image Querying
- image-match. Very similar to what I was searching for. Similar to pHash, based on An image signature for any kind of image, Goldberg et al. Uses Python and Elasticsearch.
- iqdb
- ImageHash. supports pHash.
- Image Deduplicator (imagededup). Supports CNN, PHash, DHash, WHash, AHash.
Some discussions about perceptual image hashes: here
A bit offtopic: There exists many methods to create audio fingerprints. MusicBrainz, a web-service which provides fingerprint-based lookup for songs, has a good overview in their wiki. They are using AcoustID now. This is for finding exact (or mostly exact) matches. For finding similar matches (or if you only have some snippets or high noise), take a look at Echoprint. A related SO question is here. So it seems like this is solved for audio. All these solutions work quite good.
A somewhat more generic question about fuzzy search in general is here. E.g. there is locality-sensitive hashing and nearest neighbor search.