I am having problem binding to texture memory a sub-portion of global device memory.
I have a large global device array filled with memory as follows:
double * device_global;
cudaMalloc((void **)&device_global, sizeof(double)*N));
cudaMemcpy(device_global, host, sizeof(double)*N, cudaMemcpyHostToDevice) );
I am running numerous kernels in a for loop.
Each kernel required a small portion (int offset = 100) of device_global
which I am binding to a texture through:
cudaBindTexture(0, texRef, device_global, channelDesc, sizeof(double)*10);
However the problem I am facing is that I am unable to use pointer arithmetic to only bind a looping section of device_global
via an offset that loops.
I would like to do something like:
cudaBindTexture(0, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);
it should be noted that the above approach does work if the offset is set to 0, somehow the pointer arithmetic does not work.
Any help or other guidelines would be much appreciated.