Algorithm for optimal placement of cropped regions to capture features (blobs) in an image

Question

I have a large image which contains "blobs" of interest over a background. I have the position (centroid, bounding box, area) of all blobs. I want to crop a limited number of regions of a fixed size in the image that allow me to capture most of the blobs. Example below for 1, 2 or 3 cropped regions in the same image.

This example shows that cropping 1 region (in red) is relatively easy: just pick the region with as many blobs as possible. This can be solved by just trying everything or possibly by computing a density of blobs using a kernel density estimator or something like this.

But cropping 2 regions (in dashed blue) is not just cropping the next best crop after the first selected above. It is a new problem where I need to find the optimal combination of 2 crops. Trying all combinations of 2 crops (brute force) probably becomes too computationally expensive (I have many images to process and they are large).

Similarly, cropping 3 regions (in green) is a new problem, and one for which brute force is even less suited. In that particular example 2 of the 3 regions are identical to the blue case and a new one is added but this is not the general case (I just wanted to show a slightly complex scenario).

I have no idea regarding the algorithm to solve the n-crops case. I am wondering if there is a theoretical/well known solution to this problem.

In addition:

the geometry of the problem is approximately that of the example above (maximum two crops over the height of the image, many crops over the width); in case that simplifies things
crops should not intersect
blobs should be as centred as possible in the crop (i.e. https://dl.dropboxusercontent.com/u/1047321/SO_crop/cutout_one_blob.png )
crops should stay within the original image boundaries (cf. either example above)
the area of the blob should be taken into account (I am more interested in large blobs than small ones); but that can probably be introduced in any algorithm by associating a weight with each blob to compute the score of each cropping layout.
it is OK to leave some blobs out. Actually, I'll probably compute a cost-complexity parameter such as how many new blobs adding a crop would get and set a threshold under which I stop adding crops.

Thanks in advance for any pointer.

PS: The coding language does not really matter here since the core of the algorithm (finding the optimal position of the crops given the location/size of blobs) only needs small arrays (position/size of ~100 blobs per image) to be computed. I'll probably use Python or R.

I'm preparing a nice and hopefully more complete answer for you, in the meantime do check these : en.wikipedia.org/wiki/Binary_space_partitioning and to a lesser extent en.wikipedia.org/wiki/Quadtree More specifically check en.wikipedia.org/wiki/R-tree and stackoverflow.com/questions/2959564/… — Jiby
Thanks for the pointers, it looks very relevant indeed. R*-trees seem to be very appropriate (I know the location of all points of interest and want no overlap). What I can't figure out is how to force an aspect ratio (does not look possible using binary partitioning). Looking forward to your answer! — Jean-Olivier Irisson

midin midin · Accepted Answer · 2016-01-05T21:41:05

If the blobs are relatively small as shown in the image, you could run k-Means Clustering with the blob center x,y pairs. The python scikit-learn package is pretty mature and should work well: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (function fit_predict of the KMeans classifier)

k is an input and denotes the number of clusters you want. The algorithm will split the blobs (samples) into k clusters (sets). You could then compute the x,y frame (min-x,max-x,min-y,max-y) of each set and also include the blobs individual sizes or just take their max if they are fairly small.

You could then sort the clusters by their #blobs/frame-area ratio and sum them e.g. until either enough blobs are covered (done) - or your total area becomes too big (in that case re-run with larger k).

Algorithm for optimal placement of cropped regions to capture features (blobs) in an image

1 Answers