0
votes

I have a program utilizing OpenCL 2.0 because I want to take advantage of device-side enqueue. I have a test program that performs the following tasks on the host side:

  1. Allocates 16 kilobytes of floating point memory on the device and zeros it out.
  2. Builds the OpenCL program below, and creates a kernel of masterKernel()
  3. Sets the first argument of masterKernel() (heap) to the allocated memory in step 1
  4. Enqueues that masterKernel() via clEnqueueNDRangeKernel() with a work_dim of 1 and a global work size of 1. (So it only runs once, with get_global_id(0) always being zero)
  5. Reads the memory back into the host and displays it.

Here is the OpenCL code:

//This function was stripped down to nothing for testing purposes.
kernel void childKernel(global float* heap)
{
}

//Enqueues the child kernel.
kernel void masterKernel(global float* heap)
{
    ndrange_t ndRange = ndrange_1D(16);   //Arbitrary, could be any number.

    if(get_global_id(0) == 0)
    {
         enqueue_kernel(get_default_queue(), 0, ndRange, 
            ^{ childKernel(heap); });   
    }
}

The program builds successfully. However, when I try to run masterKernel(), The call to enqueue_kernel() here causes the host side call to clEnqueueNDRangeKernel() to fail with an error code of CL_OUT_OF_RESOURCES. OpenCL's documentation says enqueue_kernel() should return CL_SUCCESS or CL_ENQUEUE_FAILURE depending on if the block enqueues successfully or not. It does not say that clEnqueueNDRangeKernel() itself should fail. Here are some other things I've tried:

  • Commenting out the call to enqueue_kernel() causes the program to succeed.
  • Adding a line that sets heap[0] to any number causes the host-side program to reflect that change. So I know that it's not a problem with how I'm feeding the arguments in
  • Modifying the if statement so that it reads something impossible like if(get_global_id(0) == 6000) still causes the error. This tells me that the error is not caused by enqueue_kernel() executing (I verified get_global_size(0) == 1), but merely that it exists in the program at all.
  • Modifying the if statement to if(0) does make the error not happen.
  • Making it so childKernel() actually does something does not make the error go away.

I am not really sure what to try next. I know my device supports OpenCL 2.0. My device is an AMD Radeon R9 380 graphics card. I do not have access to any other OpenCL 2.0 capable hardware to test it on.

1

1 Answers

0
votes

I ended up figuring this one out. This issue happened because I did not create a device-side queue (one with the flags of CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT).