copying host memory to cuda device variable

Question

i've tried to find a solution to my problem using google but failed. there were a lot of snippets that didn't fit my case exactly, although i would think that it's a pretty standard situation.

I'll have to transfer several different data arrays to cuda. all of them being simple struct arrays with dynamic size. since i don't want to put everything into the cuda kernel call, i thought, that __device__ variables should be exactly what i need.

this is how i tried to copy my host data to the __device__ variable:

// MaterialDescription.h
struct MaterialDescription {
    unsigned char type;
    unsigned char diffuseR, diffuseG, diffuseB;
    __device__ __forceinline__ float4 diffuseColour() const {  return make_float4((float) diffuseR / 255.f, (float) diffuseG / 255.f, (float) diffuseB / 255.f, 0); }
};

// kernel.h
__device__ MaterialDescription* g_materials;
__global__ void deferredRenderKernel() {
     something = g_materials[indexDependingOnData].diffuseColour();
}

//Cuda.cu
const std::vector<MaterialDescription>& materials = getData();

// version 1
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

// version 2
MaterialDescription* ptr;
cudaMalloc((void**)&ptr, sizeof(MaterialDescription) * materialCount);
cudaMemcpy(ptr, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);
cudaMemcpyToSymbol(g_materials, ptr, sizeof(MaterialDescription) * materialCount);

// version 3
cudaMalloc((void**)&g_materials, sizeof(MaterialDescription) * materialCount);
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

deferredRenderKernel<<<numBlocks, threadsPerBlock>>();

however, the only version that worked included a kernel parameter

// kernel.h
__device__ MaterialDescription* g_materials;
__global__
void deferredRenderKernel(MaterialDescription* ptr) {
    g_materials = ptr;
    something = g_materials[indexDependingOnData].diffuseColour();
}

//Cuda.cu
// version 4, the only one working. but i pass again via kernel param
// in the worst case i'll stick to this, at least i wouldn't have to pass the
// parameters into device functions
MaterialDescription* ptr;
cudaMalloc((void**)&ptr, sizeof(MaterialDescription) * materialCount);
cudaMemcpy(ptr, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);

deferredRenderKernel<<<numBlocks, threadsPerBlock>>(ptr);

edit: this version (as proposed by Robert Crovella) also works, but the memory is not allocated dynamically.

 // kernel.h
 __device__ MaterialDescription g_materials[VIENNA_MAX_MATERIAL_COUNT];
__global__
void deferredRenderKernel() {
    something = g_materials[indexDependingOnData].diffuseColour();
}

// cuda.h
// version 1
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

other variables and structures are the same as above.

edit:

SOLUTION

It finally works just the way i want.

MaterialDescription.h

struct MaterialDescription {
    unsigned char type;
    unsigned char diffuseR, diffuseG, diffuseB;
    __device__ __forceinline__ float4 diffuseColour() const {  return make_float4((float) diffuseR / 255.f, (float) diffuseG / 255.f, (float) diffuseB / 255.f, 0); }
};

kernel.h

__device__ MaterialDescription* g_materials;
__global__ void deferredRenderKernel() {
    something = g_materials[indexDependingOnData].diffuseColour();
}

Cuda.cu

const std::vector<MaterialDescription>& materials = getData();
MaterialDescription* dynamicArea;

// allocate memory on the device for our data
cudaMalloc((void**)&dynamicArea, sizeof(MaterialDescription) * materialCount); 

// copy our data into the allocated memory
cudaMemcpy(dynamicArea, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);

// copy the pointer to our data into the global __device__ variable.
cudaMemcpyToSymbol(g_materials, &dynamicArea, sizeof(MaterialDescription*));

If your structure is only composed of POD types, then your version #2 is almost correct. Just change the size of the last memcpy to the correct size (its only a pointer you are copying), and it should work. — talonmies
not only the size, but also a reference is necessary :) i'll update with the working version in a second. — Adam

Robert Crovella Robert Crovella · Accepted Answer · 2014-10-12T13:12:11

It would be nice if you gave a complete example when asking questions like this. It would be useful to see your definition of MaterialDescription and materials. Take a look at what SO expects for questions of the type "why isn't my code working?"

This only holds storage for a pointer:

__device__ MaterialDescription* g_materials;

You can't copy a whole structure/object onto a pointer.

When you allocate a device variable like this, it is a static allocation, which means the size needs to be known at compile time. So if you know the size (or max size) at compile time, you could do something like this:

__device__ MaterialDescription g_materials[MAX_SIZE];

// this assumes materialCount <= MAX_SIZE
cudaMemcpyToSymbol(g_materials, &(materials.front()), sizeof(MaterialDescription) * materialCount);

copying host memory to cuda __device__ variable

SOLUTION

1 Answers

copying host memory to cuda device variable