I'm working on implementing an FFT algorithm using OpenCL (specifically, the algorithm from OpenCL in Action. It runs fine on two different NVIDIA GPU's (the Tesla K20c and the GeForce GTX 650) but gives me a segmentation fault when I run it on my Intel CPU.
I've located the problem in the kernel code, but it doesn't make sense. The only two lines that give an error on their inclusion is the last two writes to local memory in the block below. No other write to memory causes a problem for the CPU, and they don't cause a problem on the GPU.
__kernel void fft_init(__global float2 *g_data, __local float2 *l_data,
uint points_per_group, uint size, int dir) {
uint4 br, index;
uint points_per_item, g_addr, l_addr, i, fft_index, stage, N2;
float2 x1, x2, x3, x4, sum12, diff12, sum34, diff34;
points_per_item = points_per_group/get_local_size(0);
l_addr = get_local_id(0) * points_per_item;
g_addr = get_group_id(0) * points_per_group + l_addr;
for(i=0; i<points_per_item; i+=4) {
...
l_data[l_addr] = sum12 + sum34;
l_data[l_addr+1] = diff12 + diff34;
l_data[l_addr+2] = sum12 - sum34;
l_data[l_addr+3] = diff12 - diff34;
l_addr+= 4;
}
I also know that the problem isn't the l_addr + {2,3} as further in the kernel the array is accessed by at least l_addr + 4.
Has anyone encountered a problem like this before, or have any ideas on how I can fix it? To run the kernel I use EnqueueNDRangeKernel and when setting the argument for the local memory array I'm using the entire local memory size available.
Thanks in advance!
l_dataarray to be (in bytes) and what value are you passing in clSetKernelArg size argument? Also, what does clGetDeviceInfo for CL_DEVICE_LOCAL_MEM_SIZE return? Does it fit? Finally, if your usage of local memory is fixed-size (rather than dynamic) a better form is:__local float localBuffer[1024];rather than using a kernel argument. - Dithermasterget_local_id(0). Since the number of work items per group is much smaller than LOCAL_MEM_SIZE, I cannot understand why accessingl_data[l_addr+3]is accessing past the end of the array. - Chris