2
votes

I need to match a given point (lat, lon) against several polygons to decide if there is a match.

The easiest way would be to iterate over each polygon and apply the point-in-polygon check algorithm, but that is prohibitively expensive.

The next optimization that I did was to define a bounding rectangle for each polygon (upper bound, lower bound) and iteratively check the point against each bounding box (fewer comparisons as against checking all the points in the polygon).

Is there any other optimizations possible? Would a spatial index on the bound rectangle points or a geohash help?

1

1 Answers

2
votes

Further optimizations:

The bounding box idea is good. Checking if a point is in a bounding box is extremely fast.

If you still need more speed, you can do more pre-calculation like this:

  1. For each polgon create a bounding box.
  2. Define equally sized "tiles" that cover your map.
  3. For each tile, create a list of polygons that overlap. You can do that by first checking if the bounding box overlaps with the tile. If they do, you check if the polygon overlaps with the tile.

When searching, do this:

  1. Determine the tile that you're in. That's a fast operation.
  2. Now you have the list of potential polygons.
  3. For each polygon, check if the point is in the bounding box.
    if it is, check if the point is in the polygon using the more expensive algorithm that you've mentioned.

I've used this algorithm several times and it's very fast. By changing the tile size you can choose the right balance between memory footprint and performance:

Think of the extreme cases:

  • One huge tile that covers the entire map:
    You'll get one list of all elements in your map, you'll have to check all of the bounding boxes.

  • Very tiny tiles (1x1 m for a map that only has a polygon per country):
    You'll get a huge amount of tiles. All polygons will be split over many tiles, and each tile will only have one polygon. But, once you've figured out in which tile the point is (fast), it's almost 100% sure that there's just one polygon that needs to be checked.

You need to be somewhere in between. If you only need this once and a while, you might want to choose a low memory footprint over performance. The optimal tilesize can also depends on the homogeneity of the polygon sizes. So, there is no automatic way to calculate an optimal tile-size, and you'll just have to tweak a bit until you get it right.