I'm trying to implement a maximum performance Circle Hough Transform in CUDA, whereby edge pixel coordinates cast votes in the hough space. Pseudo code for the CHT is as follows, I'm using image sizes of 256 x 256 pixels:
int maxRadius = 100;
int minRadius = 20;
int imageWidth = 256;
int imageHeight = 256;
int houghSpace[imageWidth x imageHeight * maxRadius];
for(int radius = minRadius; radius < maxRadius; ++radius)
{
for(float theta = 0.0; theta < 180.0; ++theta)
{
xCenter = edgeCoordinateX + (radius * cos(theta));
yCenter = edgeCoordinateY + (radius * sin(theta));
houghSpace[xCenter, yCenter, radius] += 1;
}
}
My basic idea is to have each thread block calculate a (small) tile of the output Hough space (maybe one block for each row of the output hough space). Therefore, I need to get the required part of the input image into shared memory somehow in order to carry out the voting in a particular output sub-hough space.
My questions are as follows:
How do I calculate and store the coordinates for the required part of the input image in shared memory?
How do I retrieve the x,y coordinates of the edge pixels, previously stored in shared memory?
Do I cast votes in another shared memory array or write the votes directly to global memory?
Thanks everyone in advance for your time. I'm new to CUDA and any help with this would be gratefully received.