0
votes

I'm doing a thrust transform_reduce and need to access a thrust::device_vector from within the functor. I am not iterating on the device_vector. It allows me to declare the functor, passing in the device_vector reference, but won't let me dereference it, either with begin() or operator[].

1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include\thrust/detail/function.h(187): warning : calling a host function("thrust::detail::vector_base > ::operator []") from a host device function("thrust::detail::host_device_function ::operator () ") is not allowed

I assume I'll be able to pass in the base pointer and do the pointer math myself, but is there a reason this isn't supported?

1
You need to pass in the result of your_vector.data(). CUDA __device__ functions can't dereference host pointers in general, so it wouldn't make sense to try to give your functor a reference to your_vector. - Jared Hoberock

1 Answers

5
votes

Just expanding on what @JaredHoberock has already indicated. I think he will not mind.

A functor usable by thrust must (for the most part) conform to the requirements imposed on any CUDA device code.

Both thrust::host_vector and thrust::device_vector are host code containers used to manipulate host data and device data respectively. A reference to the host code container cannot be used successfully in device code. This means even if you passed a reference to the container successfully, you could not use it (i.e. could not do .push_back(), for example) in device code.

For direct manipulation in device code (such as functors, or kernels) you must extract raw device pointers from thrust and use those directly, with your own pointer arithmetic. And advanced functions (such as .push_back()) will not be available.

There are a variety of ways to extract the raw device pointer corresponding to thrust data, and the following example code demonstrates two possibilities:

$ cat t651.cu
#include <thrust/device_vector.h>
#include <thrust/sequence.h>

__global__ void printkernel(float *data){

  printf("data = %f\n", *data);
}

int main(){

  thrust::device_vector<float> mydata(5);
  thrust::sequence(mydata.begin(), mydata.end());
  printkernel<<<1,1>>>(mydata.data().get());
  printkernel<<<1,1>>>(thrust::raw_pointer_cast(&mydata[2]));
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -o t651 t651.cu
$ ./t651
data = 0.000000
data = 2.000000
$