6
votes

I have a large image which contains "blobs" of interest over a background. I have the position (centroid, bounding box, area) of all blobs. I want to crop a limited number of regions of a fixed size in the image that allow me to capture most of the blobs. Example below for 1, 2 or 3 cropped regions in the same image.

This example shows that cropping 1 region (in red) is relatively easy: just pick the region with as many blobs as possible. This can be solved by just trying everything or possibly by computing a density of blobs using a kernel density estimator or something like this.

But cropping 2 regions (in dashed blue) is not just cropping the next best crop after the first selected above. It is a new problem where I need to find the optimal combination of 2 crops. Trying all combinations of 2 crops (brute force) probably becomes too computationally expensive (I have many images to process and they are large).

Similarly, cropping 3 regions (in green) is a new problem, and one for which brute force is even less suited. In that particular example 2 of the 3 regions are identical to the blue case and a new one is added but this is not the general case (I just wanted to show a slightly complex scenario).

I have no idea regarding the algorithm to solve the n-crops case. I am wondering if there is a theoretical/well known solution to this problem.

In addition:

  • the geometry of the problem is approximately that of the example above (maximum two crops over the height of the image, many crops over the width); in case that simplifies things
  • crops should not intersect
  • blobs should be as centred as possible in the crop (i.e. https://dl.dropboxusercontent.com/u/1047321/SO_crop/cutout_one_blob.png )
  • crops should stay within the original image boundaries (cf. either example above)
  • the area of the blob should be taken into account (I am more interested in large blobs than small ones); but that can probably be introduced in any algorithm by associating a weight with each blob to compute the score of each cropping layout.
  • it is OK to leave some blobs out. Actually, I'll probably compute a cost-complexity parameter such as how many new blobs adding a crop would get and set a threshold under which I stop adding crops.

Thanks in advance for any pointer.

PS: The coding language does not really matter here since the core of the algorithm (finding the optimal position of the crops given the location/size of blobs) only needs small arrays (position/size of ~100 blobs per image) to be computed. I'll probably use Python or R.

1
I'm preparing a nice and hopefully more complete answer for you, in the meantime do check these : en.wikipedia.org/wiki/Binary_space_partitioning and to a lesser extent en.wikipedia.org/wiki/Quadtree More specifically check en.wikipedia.org/wiki/R-tree and stackoverflow.com/questions/2959564/…Jiby
Thanks for the pointers, it looks very relevant indeed. R*-trees seem to be very appropriate (I know the location of all points of interest and want no overlap). What I can't figure out is how to force an aspect ratio (does not look possible using binary partitioning). Looking forward to your answer!Jean-Olivier Irisson

1 Answers

1
votes

If the blobs are relatively small as shown in the image, you could run k-Means Clustering with the blob center x,y pairs. The python scikit-learn package is pretty mature and should work well: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (function fit_predict of the KMeans classifier)

k is an input and denotes the number of clusters you want. The algorithm will split the blobs (samples) into k clusters (sets). You could then compute the x,y frame (min-x,max-x,min-y,max-y) of each set and also include the blobs individual sizes or just take their max if they are fairly small.

You could then sort the clusters by their #blobs/frame-area ratio and sum them e.g. until either enough blobs are covered (done) - or your total area becomes too big (in that case re-run with larger k).