Find clusters in 3D point data using a massively parallel algorithm

Question

I have a large number of points in 3D space (x,y,z) represented as an array of 3 float structs. I also have access to a strong graphics card with CUDA capability. I want the following:

Divide the points in the array into clusters so that every point within a cluster has a maximum euclidean distance of X to at least one other point within the cluster.

Examle in 2D:

The "brute force" way of doing this is of course to calculate the distance between every point and every other point, to see if any of the distances is below the threshold X, and if so mark those points as belonging to the same cluster. This is an O(n²) algorithm.

This can be done in parallel in CUDA ofcourse with n² threads, but is there a better way?

Sigi Sigi · Accepted Answer · 2015-10-22T13:41:47

The algorithm can be reduced to O(n) by using binning:

impose a 3D grid spaced as X, that is a 3D lattice (each cell of the lattice is a cubic bin);
assign each points in space to the corresponding bin (the bin that geometrically contains that points);
every time you need to evaluate the distances from one point, you just use only the points in the bin of the point itself and the ones in the 26 neighbouring bins (3x3x3 = 27)

The points in the other bins are further than X, so you don't need to evaluate the distances at all.

In this way, assuming a constant density in the points, you will have to compute the distance only for a constant number of pair points / total number of points.

Assigning the points to the bins is O(n) as well.

If the points are not uniformly distributed, the bins can be smaller (and you must consider more than 26 neighbours to evaluate the distances) and eventually sparse.

This is a typical trick used for molecular dynamics, ray tracing, meshing,... However I know of the term binning from molecular dynamics simulation: the name can change (link-cell, kd-trees too use the same principle, even if more articulated), the algorithm remains the same!

And, good news, the algorithm is well suited for parallel implementation.

refs:

https://en.wikipedia.org/wiki/Cell_lists

Find clusters in 3D point data using a massively parallel algorithm

1 Answers