2
votes

We have sets of labeled points whose convex hulls do not overlap. There is some empty space between the convex hulls.

Given an unlabeled point that is not in our data we want to approximately determine which convex hull it lies within.

To make the computation faster, we want to reduce the number of sides on the convex hull (thus expanding the convex hull a bit, but not too much).

What algorithms could I use?

Update: Ideally I want to do the expansion under the constraint that it not intersect a given nearby polygon. (The motivation for this constraint is that I have several disjoint hulls and want to reduce the number of sides of all of them while still keeping them disjoint. But treat this as a parenthetical because I do not want to do a joint modification. I am happy to modify one hull while keeping the others constant. I am happy to hack this simple case to do a joint modification iteratively.)

4
You mean, you convex hulls would not be technically hulls, since they are bigger than necessary? Mathematically, there is only one possible convex hull for a set of points.Daniel Möller
@Daniel Yes, that's correct. I am looking to reduce number of sides (sometimes to a triangle or a square) while giving up the "tightness" of fit. The benefit is easier and faster tests for membership. Thanks for thinking about the question. Also updated the question with an additional constraint so that I can do this for several disjoint hulls.Real Geek N
Why is the problem stated in terms of processing an additional point ("Given an unlabeled point that is not in our data..."). Is this because that unlabeled point was not known in advance and arrived later? Or is it becuse you subconsciously settled for an incremental processing approach (point by point) and are trying to force that approach on us? This is an importnat distinction that can critcially affect the choice of algorithm in this case. Are all data points available in advance? Or do they arrive one by one?AnT
@AnT the data points for which labels are known for sure are available in advance. other unlabeled data points are relatively few and their labeling (however we do it) will not be authoritative. whether they arrive one by one or not is moot.Real Geek N

4 Answers

2
votes

Perhaps this is worth trying. Find the convex hull A' of A union x, and the convex hull B' of B union x. Select whichever increases the hull area the least. In the example below, A' is the winner.


Hulls2x
k
  • Mictchell et al.: "Minimum-Perimeter Enclosing k-gon" 2006 (CiteSeer link)
  • Aggarwal et al.: "Minimum Area Circumscribing Polygons" 1985 (CiteSeer link)
  • O'Rourke et al.: "An optimal algorithm for fnding minimal enclosing triangles" 1986, Algorithmica (ACM link)

These algorithms are, however, quite intricate and unlikely to help much.

2
votes

The "point in convex polygon" test is not so expensive, as it can be performed in Lg(N) comparisons by dichotomy (split the polygon in two with a straight line, recursively until you have a single triangle left). N is the number of sides. Actually, a polygon of 27 (resp. 130) sides will cost you the double (triple) of a triangle.

If you have many hulls, exhaustive comparisons of the point against every hull is a waste. There are better approaches such as using monotone subdivisions, which could lower the search time to O(Log(M)) query time for a total of M sides, after preprocessing.

1
votes

I wouldn't be surprised if the saving of processing one less edge in your rough-phase contains check is outweighed by the increased false-positive rate of the inflated hull. Indeed, you might even be making more work for yourself - every point that passes the rough-phase check will have to also be checked against the true hull anyway.

Instead of trying to reduce the n in your O(n) contains check, I'd be tempted to go straight to something amortised O(1) for the rough passes:

  1. 1st pass - check against the axis-aligned bounding box (AABB). This gives quick rejection for the vast majority of points outside the polygon.
  2. 2nd pass - divide the AABB into a grid, where each grid quad is in one of three states: fully outside the hull, intersecting the hull edge or fully inside the hull. If your point lies in an "in" or "out" quad, you can stop here.
  3. 3rd pass - any point that lies in a grid quad that intersects the polygon is checked against the hull as normal.

The state of the grid can be computed ahead of time:

  1. Initialise each grid quad to be outside of the hull
  2. Use the algorithm linked in this answer to trace the edges of the hull over the grid and set all intersecting quads.
  3. The grid now contains the outline of the hull, so use a simple floodfill or scanline approach to find and set all quads on the interior of the hull.

The resolution of the grid can be varied to give a tradeoff between memory cost (2 bits per quad) and false-positive rate (low-resolution grids will lead to more O(n) conventional hull checks).

1
votes

It looks like your ultimate goal is not really about convex hulls, it is about solving the point location problem (https://en.wikipedia.org/wiki/Point_location). And you seem to be determined to solve it by simply iteratively checking your point against a number of convex hulls. While I understand where convex hulls come from (they actually represent sets of points), it is still not a reason to directly use them in the algorithm. Point location problem can be solved by a number of more efficent algorithms (like search tree based on trapezoidal decomposition), which are much less sensitive to the number of edges in your hulls.