8
votes

I have a set of axis aligned rectangles. When two rectangles overlap (partly or completely), they shall be merged into their common bounding box. This process works recursively.

Detecting all overlaps and using union-find to form groups, which you merge in the end will not work, because the merging of two rectangles covers a larger area and can create new overlaps. (In the figure below, after the two overlapping rectangles have been merged, a new overlap appears.)

enter image description here

As in my case the number of rectangles is moderate (say N<100), a brute force solution is usable (try all pairs and if an overlap is found, merge and restart from the beginning). Anyway I would like to reduce the complexity, which is probably O(N³) in the worst case.

Any suggestion how to improve this ?

3
From a few quick sketches, this problem is hard, because an additional rectangle can cause "avalanche merges" in all directions. - greybeard
Please change your example to show two non-overlapping rectangles to the left "connected" by an overlapping one to the right. (Easy way out: can be merged isn't shall be merged.) - greybeard
Shot from the hip: line-sweep to and fro - O(n²logn)?. - greybeard
@greybeard: interesting. Sweepline-to only won't work as you may have to backtrack. But to & fro is worth the try. - Yves Daoust
I guess to and fro isn't helpful, in the end: not backtracking, but starting over if there was a merge is (at least in complexity analysis…). - greybeard

3 Answers

1
votes

I think an R-Tree will do the job here. An R-Tree indexes rectangular regions and lets you insert, delete and query (e.g, intersect queries) in O(log n) for "normal" queries in low dimensions.

The idea is to process your rectangles successively, for each rectangle you do the following:

  1. perform an intersect query on the current R-Tree (empty in the beginning)

  2. If there are results then delete the results from the R-Tree, merge the current rectangle with all result rectangles and insert the newly merged rectangle (for the last step jump to step 1.).

  3. If there are no results just insert the rectangle in the R-Tree

In total you will perform

  • O(n) intersect queries in step 1. (O(n log n))
  • O(n) insert steps in step 3. (O(n log n))
  • at most n delete and n insert steps in step 2. This is because each time you perform step 2 you decrease the number of rectangles by at least 1 (O(n log n))

In theory you should get away with O(n log n), however the merging steps in the end (with large rectangles) might have low selectivity and need more than O(log n), but depending on the data distribution this should not ruin the overall runtime.

0
votes

Use a balanced normalized quad-tree.

Normalized: Gather all the x coordinates, sort them and replace them with the index in the sorted array. Same for the y coordinates.

Balanced: When building the quad-tree always split at the middle coordinate.

So when you get a rectangle you want to go and mark the correct nodes in the tree with some id of the rectangle. If you find any other rectangles underneath(that means they will be overlapping), gather them in a set. When done, if the vector is not empty (you found overlapping rectangles), then we create a new rectangle to represent the union of the subrectangles. If the computed rectangle is bigger then the one you just inserted, then apply the algorithm again using the new computed rectangle. Repeat this until it no longer grows, then move to the next input rectangle.

For performance every node in the quad-tree store all the rectangles overlapping that node, in addition to marking it as an end-node.

Complexity: Initial normalization is O(NlogN). Inserting and checking for overlaps will be O(log(N)^2). You need to do this for the original N rectangles and also for the overlaps. Every time you find an overlap you eliminate at least one of the original rectangles so you can find at most (N-1) overlaps. So overall you need 2N operations. So overall the complexity is going to be O(N(log(N)^2)).

This is better than other approaches because you don't need to check any-to-any rectangles for overlap.

0
votes

This can be solved using a combination of plane sweep and spatial data structure: we merge intersecting rectangles along the sweep line and put any rectangles behind the sweep line to spatial data structure. Every time we get a newly merged rectangle we check spatial data structure for any rectangles intersecting this new rectangle and merge it if found.

If any rectangle (R) behind the sweep line intersects some rectangle (S) under sweep line then either of two corners of R nearest to the sweep line is inside S. This means that spatial data structure should store points (two points for each rectangle) and answer queries about any point lying inside a given rectangle. One obvious way to implement such data structure is segment tree where each leaf contains the rectangle with top and bottom sides at corresponding y-coordinate and each other node points to one of its descendants containing the rightmost rectangle (where its right side is nearest to the sweep line).

To use such segment tree we should compress (normalize) y-coordinates of rectangles' corners.

  1. Compress y-coordinates
  2. Sort rectangles by x-coordinate of their left sides.
  3. Move sweep line to next rectangle, if it passes right sides of some rectangles, move them to the segment tree.
  4. Check if current rectangle intersects anything along the sweep line. If not go to step 3.
  5. Check if union of rectangles found on step 4 intersects anything in the segment tree and merge recursively, then go to step 4.
  6. When step 3 reaches the end of list get all rectangles under sweep line and all rectangles in segment tree and uncompress their coordinates.

Worst-case time complexity is determined by segment tree: O(n log n). Space complexity is O(n).