What's the common way to merge detection windows in OpenCV / computer vision?

Question

So basically I've created my own pedestrian detection algorithm (I need it for some research purposes, thus decided not to use the supplied HoG detector) .

After detection, I'd have many overlapping rectangles around the detected object / human. Then I'd apply non-maxima suppression to retain the local maxima. However there are still overlapping rectangles in location out of search range of the non-maxima suppression algorithm.

How would you merge the rectangles ? I tried to use grouprectangles, but somehow i'm lost about how it came up with the result (e.g. grouprectangles( rects, 1.0, 0.2 ) )

I applied a rudimentary merging algorithm that merge if there are rectangles that overlapped for certain percentage of the area, the code is shown below.

/**
 * Merge a set of rectangles if there's an overlap between each rectangle for more than 
 * specified overlap area
 * @param   boxes a set of rectangles to be merged
 * @param   overlap the minimum area of overlap before 2 rectangles are merged
 * @param   group_threshold only the rectangles that have more than the remaining group_threshold rectangles will be retained
 * @return  a set of merged rectangles
 **/
vector<Rect> Util::mergeRectangles( const vector<Rect>& boxes, float overlap, int group_threshold ) {
    vector<Rect> output;
    vector<Rect> intersected;
    vector< vector<Rect> > partitions;
    vector<Rect> rects( boxes.begin(), boxes.end() );

    while( rects.size() > 0 ) {
        Rect a      = rects[rects.size() - 1];
        int a_area  = a.area();
        rects.pop_back();

        if( partitions.empty() ) {
            vector<Rect> vec;
            vec.push_back( a );
            partitions.push_back( vec );
        }
        else {
            bool merge = false;
            for( int i = 0; i < partitions.size(); i++ ){

                for( int j = 0; j < partitions[i].size(); j++ ) {
                    Rect b = partitions[i][j];
                    int b_area = b.area();

                    Rect intersect = a & b;
                    int intersect_area = intersect.area();

                    if (( a_area == b_area ) && ( intersect_area >= overlap * a_area  ))
                        merge = true;
                    else if (( a_area < b_area ) && ( intersect_area >= overlap * a_area  ) )
                        merge = true;
                    else if (( b_area < a_area ) && ( intersect_area >= overlap * b_area  ) )
                        merge = true;

                    if( merge )
                        break;
                }

                if( merge ) {
                    partitions[i].push_back( a );
                    break;
                }
            }

            if( !merge ) {
                vector<Rect> vec;
                vec.push_back( a );
                partitions.push_back( vec );
            }
        }
    }

    for( int i = 0; i < partitions.size(); i++ ) {
        if( partitions[i].size() <= group_threshold )
            continue;

        Rect merged = partitions[i][0];
        for( int j = 1; j < partitions[i].size(); j++ ) {
            merged |= partitions[i][j];
        }

        output.push_back( merged );

    }

    return output;
}

However what I'd like to now if this is actually an accepted way to merge rectangles in computer vision, especially when I want to check the precision recall of my algorithm. My approach seems to be too simplistic at times, and every merged rectangles get bigger and bigger mainly because of merged |= partitions[i][j]; which finds the minimum rectangle that enclose both rectangles.

If this is an acceptable way to merge detection windows, what's the common value for merging overlap (i.e. if overlap area >= what percentage) ?

Nallath Nallath · Accepted Answer · 2013-04-25T07:44:53

I dare say that there is no "accepted" way to merge certain areas of interest. Even the percentage on what point to merge completely depends on what you are trying to do.

You could try to use some sort of weight/voting mechanism, that gives a larger weight to certain observations, based on the size of the original detected square (other things could be used as well, as the amount of overlap with others or number of overlaps).

You could also merge the found squares into some sort of mask. This would create an image where all the squares are white pixels and everything else black. By using that mask on the original image, you should have a pretty exact set of "merged" areas of interest, that are exactly as big as the squares you found.

What's the common way to merge detection windows in OpenCV / computer vision?

1 Answers