11
votes

I am working on a project where I have a to detect a known picture in a scene in "real time" in a mobile context (that means I'm capturing frames using a smartphone camera and resizing the frame to be 150x225). The picture itself can be rather complex. Right now, I'm processing each frame in 1.2s in average (using OpenCV). I'm looking for ways to improve this processing time and global accuracy. My current implementation work as follow :

  1. Capture the frame
  2. Convert it to grayscale
  3. Detect the keypoint and extract the descriptors using ORB
  4. Match the descriptor (2NN) (object -> scene) and filter them with ratio test
  5. Match the descriptor (2NN) (scene -> object) and filter them with ratio test
  6. Non-symmetrical matching removal with 4. and 5.
  7. Compute the matching confidence (% of matched keypoints against total keypoints)

My approach might not be the right one but the results are OK even though there's a lot of room for improvement. I already noticed that SURF extraction is too slow and I couldn't manage to use homography (it might be related to ORB). All suggestions are welcome!

2
When you profile this process, how long does each listed step take? What part of the 1.2s does each listed item account for?Brad Larson♦
In average, the grayscale conversion takes 15ms, the detection and extraction phase 300ms and the rest (~900ms) is spent in the matching phase.Cladouros
I've been attempting the same process myself, only doing it entirely on-GPU. I have up to the keypoint detection (using Harris corners, although I'm working on a FAST corner implementation), and am working on the rest. I was able to detect and extract key points for a 640x480 RGB frame in ~60 ms on an iPhone 4, although I think I caused the performance to regress a little recently with some failed optimizations. I've seen a few fast GPU-bound brute force matchers that I'm thinking of applying here. The code for what I have so far can be found here: github.com/BradLarson/GPUImageBrad Larson♦
Great work, I'm definitively going to take a close look at it.Cladouros

2 Answers

7
votes

Performance is always an issue on mobiles :)

There are a few things you can do. OpenCV: C++ and C performance comparison explains generic methods on processing time improvements.

And some specifics for your project:

  • If you capture color images and the convert them to grayscale, that is a biig waste of resources. The native camera format is YUV. It gets converted to RGB, which is costly, then to gray, which again is costly. All this while the first channel in YUV (Y) is the grayscale... So, capture YUV, and extract the first channel by copying the first part of the image data (YUV on Android is planar, that means that the first w*h pixels belong to the Y channel)
  • ORB was created to be fast. And it is. But just a few weeks ago FREAK was added to OpenCV. That is a new descriptor, whose authors claim is both more accurate and faster than ORB/SIFT/SURF/etc. Give it a try.YOu can find it in opencv >= 2.4.2 (This is the current now)

EDIT

Brad Larsen question is illuminating - if the matcher stays 900ms to process, then that's a problem! Check this post by Andrey Kamaev How Does OpenCV ORB Feature Detector Work? where he explains the possible combinations between descriptors and matchers. Try the FLANN-based uchar matcher.

And also, I suppose you get an awful lot of detections - hundreds or thousands - if it takes that much to match them. Try to limit the detections, or select only the first n best values.

3
votes

You should try FAST to detect the object in the scene, is faster than SURF and you can find articles that use a pyramidal version of FAST. To improve performance on mobiles you can optimize loops, use fixed-poit arithmetics, etc. Good luck.