CUDA implementation of the Circle Hough Transform

Question

I'm trying to implement a maximum performance Circle Hough Transform in CUDA, whereby edge pixel coordinates cast votes in the hough space. Pseudo code for the CHT is as follows, I'm using image sizes of 256 x 256 pixels:

int maxRadius = 100;
int minRadius = 20;
int imageWidth = 256;
int imageHeight = 256;

int houghSpace[imageWidth x imageHeight * maxRadius];

for(int radius = minRadius; radius < maxRadius; ++radius)
{
    for(float theta = 0.0; theta < 180.0; ++theta)
    {
        xCenter = edgeCoordinateX + (radius * cos(theta));
        yCenter = edgeCoordinateY + (radius * sin(theta));

        houghSpace[xCenter, yCenter, radius] += 1;
    }
}

My basic idea is to have each thread block calculate a (small) tile of the output Hough space (maybe one block for each row of the output hough space). Therefore, I need to get the required part of the input image into shared memory somehow in order to carry out the voting in a particular output sub-hough space.

My questions are as follows:

How do I calculate and store the coordinates for the required part of the input image in shared memory?
How do I retrieve the x,y coordinates of the edge pixels, previously stored in shared memory?
Do I cast votes in another shared memory array or write the votes directly to global memory?

Thanks everyone in advance for your time. I'm new to CUDA and any help with this would be gratefully received.

What you are asking us here is making your job for you. I suggest you these two readings and you'll be able to answer that yourself. 1: docs.nvidia.com/cuda/cuda-c-programming-guide/index.html 2: docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html — KiaMorot
KiaMorot - If I knew the answers myself, I wouldn't be posting on here. — Adam
@MrX Obviously, but you haven't read those docs either, have you? If you had you'd have known the answers... — KiaMorot

talonmies talonmies · Accepted Answer · 2013-07-20T10:10:39

I don't profess to know much about this sort of filtering, but the basic idea of propagating characteristics from a source doesn't sound too different to marching and sweeping methods for solving the stationary Eikonal equation. There is a very good paper on solving this class of problem (PDF might still be available here):

A Fast Iterative Method for Eikonal Equations. Won-Ki Jeong, Ross T. Whitaker. SIAM Journal on Scientific Computing, Vol 30, No 5, pp.2512-2534, 2008

The basic idea is to decompose the computational domain into tiles, and the sweep the characteristic from source across the domain. As tiles get touched by the advancing characteristic, they get added to a list of active tiles and calculated. Each time a tile is "solved" (converged to a numerical tolerance in the Eikonal case, probably a state in your problem) it is retired from the working set and its neighbours are activated. If a tile is touched again, it is re-added to the active list. The process continues until all tiles are calculated and the active list is empty. Each calculation iteration can be solved by a kernel launch, which explictly synchronizes the calculation. Run as many kernels in series as required to reach an empty work list.

I don't think it is worth trying to answer your questions until you have a more concrete algorithmic approach and are getting into implementation details.

CUDA implementation of the Circle Hough Transform

1 Answers