0
votes

Suppose a struct X with some primitives and an array of Y structs:

typedef struct 
{ 
   int a;    
   Y** y;
} X;

An instance X1 of X is initialized at the host, and then copied to an instance X2 of X, on the device memory, through cudaMemcpy.

This works fine for all the primitives in X (such as int a), but cudaMemcpy seems to flatten any double pointer into a single pointer, thus causing out of bounds exceptions wherever there's an access to the struct arrays in X (such as y).

In this case am I supposed to use another memcpy function, such as cudaMemcpy2D or cudaMemcpyArrayToArray?

Suggestions are much appreciated. Thanks!

edit

The natural approach (as in "that's what I'd do if it were just C) towards copying an array of structures would be to cudaMalloc the array and then cudaMalloc and initialize each element separately, e.g.:

X** h_x;
X** d_x;
int num_x;

cudaMalloc((void**)&d_x, sizeof(X)*num_x);

int i=0;
for(;i<num_x;i++)
{
    cudaMalloc((void**)d_x[i], sizeof(X));
    cudaMemcpy(&d_x[i], &h_x[i], sizeof(X), cudaMemcpyHostToDevice);
}

However, the for's cudaMalloc generates a crash. I confess I'm not yet comfortable with the usage of pointers in Cuda functions, so perhaps I screwed up with the cudaMalloc and cudaMemcpy parameters?

1
CUDA compute capability 2.0 and above support double percision operations, Otherwise, the compiler would cast double to float, Please note that the compilation would go without no errorsTripleS
I told you that double pointers (**) makes this extra challenging. If you want to see how to copy ** arrays from host to device, look here. It's not for the faint of heart. Note that a.lasram is suggesting flattening on the host first. I also suggest you accept the answer given by a.lasram, and post new questions if you have them. It makes the question messy and confusing for others to read when you make wholesale edits and post mostly new questions in your old one that's already been answered.Robert Crovella

1 Answers

4
votes

cudaMemcpy, cudaMemcpy2D and cudaMemcpyArrayToArray all copy from a contiguous memory region in the host to a contiguous memory region on the device.

You have to copy all your data in an intermediary contiguous buffer you send to the device.