Group geometry points according to spatial proximity

Question

I have the following points in 3D space:

I need to group the points, according to D_max and d_max:

D_max = max dimension of each group
d_max = max distance of points inside each group

Like this:

The shape of the group in the above image looks like a box, but the shape can be anything which would be the output of the grouping algorithm.

I'm using Python and visualize the results with Blender. I'm considering using the scipy.spatial.KDTree and calling its query API, however, I'm not sure if that's the right tool for the job at hand. I'm worried that there might be a better tool which I'm not aware of. I'm curious to know if there is any other tool/library/algorithm which can help me.

As @CoMartel pointed out, there is DBSCAN and also HDBSCAN clustering modules which look like a good fit for this type of problems. However, as pointed out by @Paul they lack the option for max size of the cluster which correlates to my D_max parameter. I'm not sure how to add a max cluster size feature to DBSCAN and HDBSCAN clustering.

Thanks to @Anony-Mousse I watched Agglomerative Clustering: how it works and Hierarchical Clustering 3: single-link vs. complete-link and I'm studying Comparing Python Clustering Algorithms, I feel like it's getting more clear how these algorithms work.

It looks like a DBScan could do a better job : scikit-learn.org/stable/modules/generated/… — CoMartel
There is also a derivative called HDBScan, with some differences that might be good for you — CoMartel
DBScan only considers d_max but not D_max (to stay in the parlance of this post). Also, I am not sure how well DBScan will work anyway, as most of the points seem to be on a grid, and hence most nearest neighbors will be equally far away. — Paul Brodersen
If you do apply DBSCAN first, what is the maximum distance of points within a cluster (let's call it D)? How does that compare to your target D_max? You may find that that currently, D >> D_max, in which case you probably would just want to pre-cluster your points with DBSCAN, and then partition those clusters using some other algorithm to enforce your D_max criterion (which to me really seems more of a packaging problem than a clustering problem). — Paul Brodersen
@CoMartel Can you please post your comments on DBSCAN and HDBSCAN as an answer so that I can accept it and close the question as answered. Your suggestion of those two algorithms actually solved my problem. — user3405291

CoMartel CoMartel · Accepted Answer · 2018-05-24T10:00:30

As requested, my comment as an answer :

You could use DBSCAN(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) or HDBSCAN.

Both these algorithm allow to group each point according to d_max (maximum distance between 2 points of the same dataset), but they don't take the maximum cluster size. The only way to limit the maximum size of a cluster is by reducing the epsparameter, which control the max distance between 2 points of the same cluster.

Group geometry points according to spatial proximity

3 Answers