Contour detection takes up the majority of my time in my computer vision, and it needs to be faster. I've so optimized everything else via NEON instructions and vectorizing that really, the contour detection dominates the profile. Unfortunately, it isn't obvious to me how to optimize this.
I'm doing the classic rectangle detection process to find fiducial markers, i.e. cvFindContours(), followed by approximating squares from the contours. In cases with many, many markers visible (or catastrophically, when a dense grid of rectangles that aren't markers is visible), the call to cvFindContours() alone can take >30ms on an iPhone.
I've already replaced the incredibly expensive C++ cv::FindContours() with cvFindContours(). Particularly if passed a vector >, the C++ version spent longer allocating and populating vectors than than its internal cvFindContours() took!
Now, I'm bound completely by the time in cvFindContours, or more specifically in cvFindNextContour(). The code inside cvFindNextContour is branch-heavy, and not obviously easy to vectorize. It also implements a complex algorithm that I don't trust myself not to get wrong in any attempt to optimize.
I have already looked at cvBlobLib (for disambiguition, I mean this one: http://code.google.com/p/cvblob/) to see if it provided alternative algorithms that could do the same thing faster. A base download of the source is incredibly slow because it records the contours into a std::list(), and spends almost all of its time in memory allocation. Replacing that list with a std::vector pre-sized to 256 elements to eliminate the initial copies on push_back() still leaves you with a function that takes 3x longer than cvFindContours(), 66% of that directly in cvb::cvLabel(). So it doesn't seem viable to go this way.
Does anyone have any ideas for how I can optimize detection of many rectangles. My vague handwaving includes:
Are there any fast implementations equivalent to cvFindContour(), ideally as source code as I'm multiplatform, out there?
The majority of contours are not required, only the "successful" rectangles are useful. In particular, their internal contours are then not useful. Theoretically, could I not call cvFindContours at all, and instead call cvStartFindContours/cvFindNextContour, testing each contour as found, and not recursing if I found a rectangle I'm looking for, as subrectangles are then guaranteed to be useless?
Is there a completely different rectangle detection algorithm I can use from the classic FindContours()/ApproxPoly() approach?
IS there a way to "prime" cvFindContours with useful regions of interest? E.g. a FAST corner detection almost always returns my fiducial marker corners even with a very agressive threshold. Is there a way to use that point set to limit detection? (Unfortunately, I'm not sure how much this helps, again in the case of many markers, or dense gridlines unrelated to the markers, which happens often in my app.)
In the same vein as above, since Blob detection can (if I understand correctly) be implemented as recursive flood-filling, are there any fast vectorized implementations of this that can then be used to somehow pull out interesting Blob rectangles, and seed contour detection from there?
Any ideas would be welcome!