1
votes

I'm working in a GPU Kernel and I have some problems copying data from global to local memory here is my kernel function:

__kernel void nQueens( __global int * data, __global int * result, int board_size)

so I want to copy from __global int * data to __local int aux_data[OBJ_SIZE] I tried to copy like a normal array:

for(int i = 0; i < OBJ_SIZE; ++i)
{
    aux_data[stack_size*OBJ_SIZE + i] = data[index*OBJ_SIZE + i];
}

and also with the functions to copy:

event_t e = async_work_group_copy ( aux_data,   (data + (index*OBJ_SIZE)),  OBJ_SIZE, 0);
wait_group_events (1, e);

And in both situations I get different values between the global and local memory. I don't know what I'm doing wrong...

1

1 Answers

5
votes

One of the problems with the way you are copying data in the first answer is that you are assigning data to parts of an array that don't exist. aux_data[stack_size*OBJ_SIZE + i] will overflow whenever stack_size > 1.

The problem with answer two might be that you need to pass an array of events, not just a single event.

One thing to make sure is to understand what index is referring to. I'm assuming for my solutions that it is referring to the group ID and not the thread ID. If it is indeed the thread ID, then we have other problems.

Possible Solution 1:

int gid = get_group_id(0);
int lid = get_local_id(0);
int l_s = get_local_id(0);
for(int i = lid; i < OBJ_SIZE; i += l_s)
{
    aux_data[i] = data[gid*OBJ_SIZE + i];
}
barrier(CLK_LOCAL_MEM_FENCE);

Possible Solution 2:

int gid = get_group_id(0);
event_t e = async_work_group_copy (aux_data, data + (gid*OBJ_SIZE), OBJ_SIZE, 0);
wait_group_events (1, &e);