1
votes

First of all, this is the first time I've actually felt the need to ask a question on StackOverflow, but I solved my problem while rubberducking and hacking my own OpenCL code. However, given how little useful and approachable debugging information for OpenCL I've found over the couple of months I've been learning it, I thought the effort of writing this down might help someone else in my position, since the solution to my problem wasn't obvious for a beginner.

Context: I'm writing a raytracer with constrains on my C, but permission to use OpenCL, for school. I've already built and debugged an OpenCL RNG library which I can call from simple kernels, have ported some algorithms into subfunctions, but am still learning memory management and decomposition of large algorithms into an organized sequence of kernels to be queued.

OS: Xubuntu 18.04 Platform: NVIDIA CUDA | Device: GeForce GTX 950M | Version: OpenCL 1.2 CUDA

I was getting an incoherence in my data: printf() told me that my data was there for my second kernel (the one where the problem was happening) and coherent; but it never met checks in corresponding 'if' statements. Worse, it seemed to clearly read if statements that were 'false' and given the weirdness of GPU control flow, I was at a loss.

The two pages on the Internet that talked about subjects most similar to what I was getting, but both were not my problem (it might be yours is why I'm adding them):

https://community.amd.com/thread/225707

https://computergraphics.stackexchange.com/questions/4115/gpu-branching-if-without-else

To debug, I used the following snippet in the sub function that returns a pixel's color to the main kernel (that calls it).

    if (isequal((float)scene->camera.c_to_w.sF, (float)0.))
    {
        return ((float3)(0., 255., 0.));
    }
    else if (isequal((float)scene->camera.c_to_w.sF, (float)0.5))
    {
        return ((float3)(255., 0., 255.));
    }
    else //if (some other condition)
        return ((float3)(255., 255., 0.));

The function without this snippet returned a black screen. Otherwise, it returned a screen of the color of one of the if statements, according to the following behavior. Commenting out the "else" statements respectively and together and playing with the values, I figured out that: so long as this snippet existed, one of these 'return (R,G,B)' would necessarily be read; if at least one of them was true, it would be read, otherwise behavior was consistently the first condition of this variable length if-else sequence.

1

1 Answers

0
votes

My error was the simple absence of the line "return (result_pixel_color);" at the end of my get_pixel_color() subfunction. Yes, I am dumb.

It seems the OpenCL compiler does not warn you of 'control flow reached end of non-void function before return' type errors, like most C compilers would. The undefined behavior of a missing return in my case took the approach of taking ANY return in the function as the general return for control flow. There are probably other classical errors the OpenCL compiler does not warn you of if this one can slide: be more critical of your own code !

This is a more general statement, but I feel it might be useful to someone coming upon some obscure bug while learning OpenCL. My problem was that I overestimated how helpful the OpenCL compiler was, especially given the size of my code. We're trying to have many subfunctions in different .cl files with a .cl.h header to make it legible and modular in its architecture and comments: it's a team project, but I already know OpenCL best... It seems that kernel coding for the most part is really about making functions hundred of lines long, which is a really a problem for maintainability and modularity IMO. More than 1 kernel per file and more than 1 file per program and you start to run into problems, especially with compilation. For a complex algorithm like (bidirectional/fast/etc) path tracing which requires to model many different types of large data, acceleration structures and sort rays to run intersections in a workgroup-coherent manner, you should be wary of the compiler, you never know how dumb/mundane your mistake actually is.