0
votes

I'm dealing with a (a) set of points P in Euclidean space (let's assume the plane to keep things simple), (b) a bounding rectangle [a,b] x [c,d] containing all the points in P, along with (c) a vector (m,n) of positive integers encoding subdivisions along each dimension. The basic question is

How does one efficiently build an m x n grid of distances to P?

In particular, I'd like to generate a cubical approximation to the distance function from P in the obvious m x n grid decomposition of the rectangle [a,b] x [c,d]: chop the X-direction into m pieces and the Y direction into n pieces, and for each little rectangle R which results from the partition, compute a positive integer distance d(R) to P defined as follows.

Consider a weighted graph G whose vertex set V corresponds to little rectangles in our grid with an edge of weight $1$ between neighbors (even diagonal neighbors). Call a vertex "red" if the corresponding rectangle contains a point of P. What I want to compute is the function d on V taking positive integer values which associates to each vertex the shortest distance in G to a red vertex. So, if a little rectangle actually contains a point from P, then its associated vertex gets assigned "0". If it does not contain a point of P but is adjacent (even diagonally) to a rectangle which does, then it gets "1" and so on...

The naive approach of considering the distance of each point from each little rectangle and keeping track of the minimum incurs a cost of |P|mn which appears prohibitive for large grids. So here's the second approach that I considered: set each d(R) to some large MAX to each R in the grid which does NOT contain a point from P. Then, set d(R) = 0 for each R that does contain a point of P and throw all its neighbors with strictly larger d-values into a Queue. Then, iterate this until the queue is empty:

  1. pop a grid rectangle R from the queue,
  2. if d(R) > 1 + d(R') where R' is the neighbor whose processing enqueued R, then set d(R) = 1 + d(R') and enqueue all neighbors of R' with d-values exceeding 1+d(R).

I'd appreciate any ideas on how one solves this problem more efficiently than this "second approach".

2
It seems that by "distance to a set of points" you mean "shortest distance to any point in the set", is that correct? (Note that this does not necessarily obey a metric, which is the usual mathematical notion of "distance", because it can violate the triangle inequality.)j_random_hacker
Yes, I mean shortest distance to any point... and even then, I really mean distance counted in terms of number of cubes, not the intrinsic Euclidean distance. Thanks for pointing this out!Vidit Nanda
Not to be a pedant, but although you have updated your question, you still don't actually define the distance measure you use! There is more than one possible definition of "distance to a set of points" in graph theory.j_random_hacker

2 Answers

2
votes

You are talking about the topological distance in a binary image. The rectangles can be seen as pixels and those containing points of P define distance 0. Distance 1 is for the pixels immediately adjacent (by an edge or a corner, 8-connexity), distance 2 for the pixels immediately adjacent to the latter and so on.

Computing the distance map with a queue is a good idea. Initialize the queue by pushing all 8-neighbors of the level0 pixels. Then pop them, assigning level1, and push the unassigned 8-neighbors... (like what you describe).

This is linear process (every pixel is handled a constant number of times), so that it is optimal, as the level of every pixel needs to be computed. You can't do much better.

I guess that you can also fill the distance map by a two-passes scanline process.

1) process top-left to down-right, considering the 4 top and left 8-neighbors of every pixel in turn. (Take the smallest level among the neighbors and add 1).

2) then process bottom-right to top-left, considering the 4 bottom and right 8-neighbors.

1
votes

Your current algorithm is fine for low dimensions. In general for k dimensions there will be 3^k-1 neighbours for each grid cell according to your current scheme in which diagonals are allowed, since for any given grid position, to generate a neighbouring position you can choose any way to assign a displacement of -1, 0 or 1 to each co-ordinate independently, with the exception that the displacement vector consisting of k 0s is not allowed.

Let's call the total number of cells p (so e.g. p = mn for k=2). Every grid cell must be pulled out of the queue at least once, and even if we apply tricks to look for ways of avoiding the need to push all of its neighbours into the queue, it seems to me that all 3^k-1 of its neighbours must still be checked at this time, so I think there's no way to avoid a running time of O(3^k*p).

However this also gives an upper space bound for this algorithm of O(3^k*p) for holding the queue. This might not be tight, but I can still get a lower bound of O(1.5^k*p) space usage for k dimensions with the following construction, which I'll first describe for the 2D case: Consider a problem instance built from any number of 2x2 blocks in which the cell (0, 0) within each block contains no point and the surrounding 3 cells each contain one or more points. (The analogue for higher dimensions has a repeating block of width 2 in every dimension, with the cell at (0, 0, ..., 0) in each block containing no point, and the other 2^k-1 having one or more.) The cell at (0, 0, ..., 0) within each block has a full contingent of 3^k-1 neighbours having distance value 0, so it will get pushed on the queue 3^k-1 times before the first copy is pulled off the queue. The only mitigating factor we have is that only 1 in every 2^k cells in the input has this property, so in total (3^k-1)/2^k = O(1.5^k) cells will be pushed onto the queue for each cell on average, or O(1.5^k*p) overall. (And of course it's possible that other constructions will yield even higher upper bounds...)

Nevertheless, it is possible to get rid of this 1.5^k factor simply by maintaining an m*n array of flags indicating whether a particular cell has been pushed on the queue yet. This is valid because the total distance never decreases as cells are read from the queue (it's a breadth-first search, which is a best-first search when all edges in the search graph have unit weight), so the first time a cell is pulled out of the queue is when it will be assigned its smallest value. That means there's no need to ever push the remaining copies of that cell in the first place, and you can avoid doing so by checking the corresponding flag for that cell before doing so, and setting it afterwards. You will need O(p) extra space for the flags, but this isn't asymptotically more than what you already use to hold the input, and each flag can literally be a single bit, and this drops the (upper bound on the) overall space complexity down to O(p).