CUDA texture memory to bind a sub-portion of global memory

Question

I am having problem binding to texture memory a sub-portion of global device memory.

I have a large global device array filled with memory as follows:

double * device_global;

cudaMalloc((void **)&device_global, sizeof(double)*N));

cudaMemcpy(device_global, host, sizeof(double)*N, cudaMemcpyHostToDevice) );

I am running numerous kernels in a for loop.

Each kernel required a small portion (int offset = 100) of device_global which I am binding to a texture through:

cudaBindTexture(0, texRef, device_global, channelDesc, sizeof(double)*10);

However the problem I am facing is that I am unable to use pointer arithmetic to only bind a looping section of device_global via an offset that loops.

I would like to do something like:

cudaBindTexture(0, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);

it should be noted that the above approach does work if the offset is set to 0, somehow the pointer arithmetic does not work.

Any help or other guidelines would be much appreciated.

IIRC pointer arithmetic is ok, even for device pointers. Did you unbind the texture at the end of the loop? What's the error? — Jonas Bötel

sgarizvi sgarizvi · Accepted Answer · 2013-01-18T19:14:04

It's a bad practice to pass 0 or NULL as the first argument of cudaBindTexture. CUDA texture binding requires that the pointer to be bound must be aligned. The alignment requirement can be determined by cudaDeviceProp::textureAlignment device property.

cudaBindTexture can bind any device pointer to the texture. If the pointer is not aligned, it returns an offset in bytes from the nearest preceding aligned address in the first argument of cudaBindTexture. If the first argument is NULL, the function call fails.

Binding should be done as:

size_t texture_offset = 0;
cudaBindTexture(&texture_offset, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);

CUDA texture memory to bind a sub-portion of global memory

3 Answers