I'm a bit of a newcomer to CUDA and thrust. I seem to be unable to get the thrust::for_each algorithm to work when supplied with a counting_iterator. Here is my simple functor:
struct print_Functor {
print_Functor(){}
__host__ __device__
void operator()(int i)
{
printf("index %d\n", i);
}
};
Now if I call this with a host-vector prefilled with a sequence, it works fine:
thrust::host_vector<int> h_vec(10);
thrust::sequence(h_vec.begin(),h_vec.end());
thrust::for_each(h_vec.begin(),h_vec.end(), print_Functor());
However, if I try to do this with thrust::counting_iterator it fails:
thrust::counting_iterator<int> first(0);
thrust::counting_iterator<int> last = first+10;
for(thrust::counting_iterator<int> it=first;it!=last;it++)
printf("Value %d\n", *it);
printf("Launching for_each\n");
thrust::for_each(first,last,print_Functor());
What I get is that the for loop executes correctly, but the for_each fails with the error message:
after cudaFuncGetAttributes: unspecified launch failure
I tried to do this by making the iterator type a template argument:
thrust::for_each<thrust::counting_iterator<int>>(first,last, print_Functor());
but the same error results.
For completeness, I'm calling this from a MATLAB mex file (64 bit).
I've been able to get other thrust algorithms to work with the counting iterator (e.g. thrust::reduce gives the right result).
As a newcomer I'm probably doing something really stupid and missing something obvious - can anyone help?
Thanks for the comments so far. I have taken on board the comments so far. The worked example (outside Matlab) worked correctly and produced output, but if this was made into a mex file it still did not work - the first time producing no output at all and the second time just producing the same error message as before (only fixed by a recompile, when it goes back to no output).
However there is a similar problem with it not executing the functor from thrust::for_each even under DOS. Here is a complete example:
#include <thrust/for_each.h>
#include <thrust/iterator/counting_iterator.h>
struct sum_Functor {
int *sum;
sum_Functor(int *s){sum = s;}
__host__ __device__
void operator()(int i)
{
*sum+=i;
printf("In functor: i %d sum %d\n",i,*sum);
}
};
int main(){
thrust::counting_iterator<int> first(0);
thrust::counting_iterator<int> last = first+10;
int sum = 0;
sum_Functor sf(&sum);
printf("After constructor: value is %d\n", *(sf.sum));
for(int i=0;i<5;i++){
sf(i);
}
printf("Initiating for_each call - current value %d\n", (*(sf.sum)));
thrust::for_each(first,last,sf);
cudaDeviceSynchronize();
printf("After for_each: value is %d\n",*(sf.sum));
}
This is compiled under a DOS prompt with:
nvcc -o pf pf.cu
The output produced is:
After constructor: value is 0
In functor: i 0 sum 0
In functor: i 1 sum 1
In functor: i 2 sum 3
In functor: i 3 sum 6
In functor: i 4 sum 10
Initiating for_each call - current value 10
After for_each: value is 10
In other words the functor's overloaded operator() is called correctly from the for loop but is never called by the thrust::for_each algorithm. The only way to get the for_each to execute the functor when using the counting iterator is to omit the member variable.
( I should add that after years of using pure Matlab, my C++ is very rusty, so I could be missing something obvious ...)