0
votes

I have a bunch of points in 2D space and have calculated a convex hull for them. I would now like to "tighten" the hull so that it no longer necessarily encompasses all points. In the typical nails-in-board-with-rubber-band analogy, what I'd like to achieve is to be able to tune the elasticity of the rubber band and allow nails to bend at pressure above some limit. That's just an analogy, there is no real physics here. This would kind-of be related to the reduction in hull area if a given point was removed, but not quite because there could be two points that are very close to each-other. This is not necessarily related to outlier detection, because you could imagine a pattern where a large fractions of the nails would bend if they are on a narrow line (imagine a hammer shape for example). All of this has to be reasonably fast for thousands of points. Any hints where I should look in terms of algorithms? An implementation in R would be perfect, but not needed.

enter image description here

EDIT AFTER COMMENT: The three points I've labelled are those with largest potential for reducing the hull area if they are excluded. In the plot there is no other set of three points that would result in a larger area reduction. A naive implementation of what I'm looking for would maybe be to randomly sample some fraction of the points, calculate the hull area, remove each point on the hull iteratively, recalculate the area, repeat many times and remove points that tend to lead to high area reduction. Maybe this could be implemented in some random forest variant? It's not quite right though, because I would like the points to be removed one by one so that you get the following result. If you looked at all points in one go it would possibly be best to trim from the edges of the "hammer head".

enter image description here

3
So you want to remove some points that currently sit on the convex hull then recalculate the convex hull? How do you decide which ones? In your sample image you have 9 points on the convex hull and you seem to have randomly labelled two of them saying "this should bend", but it's not clear why these points were chosen. Are they random? Your description is too vague to allow for an algorithm in its current form. - Allan Cameron
Yes, that is what I'd like to do. I've made an edit to hopefully make it clearer. - Rasmus
I still miss what you want to do. Do you want to remove points from the hull or move some towards the center to obtain a smaller polygon ? Are you trying to minimize or to maximize the changes of area ? Lastly, are you asking an XY question ? - Yves Daoust

3 Answers

2
votes

Suppose I have a set of points like this:

set.seed(69)

x <- runif(20)
y <- runif(20)

plot(x, y)

enter image description here

Then it is easy to find the subset points that sit on the convex hull by doing:

ss <- chull(x, y)

This means we can plot the convex hull by doing:

lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "red")

enter image description here

Now we can randomly remove one of the points that sits on the convex hull (i.e. "bend a nail") by doing:

bend <- ss[sample(ss, 1)]

x <- x[-bend]
y <- y[-bend]

And we can then repeat the process of finding the convex hull of this new set of points:

ss <- chull(x, y)
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "blue", lty = 2)

enter image description here

To get the point which will, on removal, cause the greatest reduction in area, one option would be the following function:

library(sp)

shrink <- function(coords)
{
  ss <- chull(coords[, 1], coords[, 2])
  outlier <- ss[which.min(sapply(seq_along(ss), 
                function(i) Polygon(coords[ss[-i], ], hole = FALSE)@area))]
  coords[-outlier, ]
}

So you could do something like:

coords <- cbind(x, y)

new_coords <- shrink(coords)

new_chull <- new_coords[chull(new_coords[, 1], new_coords[, 2]),]
new_chull <- rbind(new_chull, new_chull[1,])

plot(x, y)

lines(new_chull[,1], new_chull[, 2], col = "red")

enter image description here

Of course, you could do this in a loop so that new_coords is fed back into shrink multiple times.

1
votes

Calculate a robust center and variance using mcd.cov in MASS and the mahalanobis distance of each point from it (using mahalanobis in psych). We then show a quantile plot of the mahalanobis distances using PlotMD from modi and also show the associated outliers in red in the second plot. (There are other functions in modi that may be of interest as well.)

library(MASS)
library(modi)
library(psych)

set.seed(69)   
x <- runif(20)
y <- runif(20)
m <- cbind(x, y)

mcd <- cov.mcd(m)
md <- mahalanobis(m, mcd$center, mcd$cov)
stats <- PlotMD(md, 2, alpha = 0.90)

giving:

(continued after screenshot) screenshot

and we show the convex hull using lines and the outliers in red:

plot(m)    
ix <- chull(m)
lines(m[c(ix, ix[1]), ])

wx <- which(md > stats$halpha)
points(m[wx, ], col = "red", pch = 20)

screenshot

0
votes

Thank you both! I've tried various methods for outlier detection, but it's not quite what I was looking for. They have worked badly due to weird shapes of my clusters. I know I talked about convex hull area, but I think filtering on segment lengths yields better results and is closer to what I really wanted. Then it would look something like this:

shrink <- function(xy, max_length = 30){
      to_keep <- 1:(dim(xy)[1])
      centroid <- c(mean(xy[,1]), mean(xy[,2]))

      while (TRUE){
        ss <- chull(xy[,1], xy[,2])
        ss <- c(ss, ss[1])
        lengths <- sapply(1:(length(ss)-1), function(i) sum((xy[ss[i+1],] - xy[ss[i],])^2))

        # This gets the point with the longest convex hull segment. chull returns points
        # in clockwise order, so the point to remove is either this one or the one
        # after it. Remove the one furthest from the centroid.
        max_point <- which.max(lengths)
        if (lengths[max_point] < max_length) return(to_keep)

        if (sum((xy[ss[max_point],] - centroid)^2) > sum((xy[ss[max_point + 1],] - centroid)^2)){
          xy <- xy[-ss[max_point],]
          to_keep <- to_keep[-ss[max_point]]
        }else{
          xy <- xy[-ss[max_point + 1],]
          to_keep <- to_keep[-ss[max_point + 1]]
        }
      }
    }

It's not optimal because it factors in the distance to the centroid, which I would have liked to avoid, and there is a max_length parameter that should be calculated from the data instead of being hard-coded.

No filter: enter image description here

It looks like this because there are 500 000 cells in here, and there are many that end up "wrong" when projecting from ~20 000 dimensions to 2.

Filter: enter image description here

Note that it filters out points at tips of some clusters. This is less-than-optimal but ok. The overlap between some clusters is true and should be there.