2
votes

I am very new to image processing and image matching and don't understand it very clearly. What I need to do is a) Take a image b) Extract features from it (SIFT, SURF are better for matching) c) Create a Hash (like MD5 or SHA1) d) Store it in the database and search different images if any are similar.

Bascially (A Tineye)

I referred to OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?. I also checked the pHash and tried to run the SIFT SURF via opencv simple_matcher.cpp

Read a little about Geometric Hashing/ Local Sensitive Hashing but not sure if I am going in to right direction.

How could I create a hash from features exctracted from SIFT/SURF (OpenCV)? I would be grateful if someone could tell simple steps to be followed or some reference to move forward.

1
Is your end goal to match one image to a database of many?kamjagin
@kamjagin Yes. I am trying to build a small application where Suppose If we found 500 images on one laptop and 100 images on another laptop. I am trying to find if any images has been shared between them. Images can be modified. So I can't just MD5 Hash them.bitvijays

1 Answers

3
votes

Ok, there are a ton of nice ways of matching images with various level of complexity. I will provide a suggestion that I think is good enough for the problem that you described and really simple to implement (since you say that you are supernew to CV :) ).

  1. Compute sparse or dense SURF features on the images on computer1
  2. Create a vocabulary (for this task generating a random one is probably also good enough)
  3. Assign the features to the vocabulary (nn)
  4. Build a kd-tree (to use for nearest neighbour) or learn some classifier (like sum)
  5. Apply the classifier to the images on computer2 (after having computed surfs and assigned to the vocabulary)

The same images will most likely produce the highest classification scores.

The reason for why I suggest this approach to the faster and hashing approaches is that it is unlikely that you will have performance issues for as few images as ~500, and since there is a nice example in opencv (bagofwords_classification.cpp), that you can follow step-by-step to achieve what you want.