I'm running windows 7 64 bits, cuda 4.2, visual studio 2010.
First, I run some code on cuda, then download the data back to host. Then do some processing and move back to the device. Then I did the following copy from device to host, it runs very fast, like 1ms.
clock_t start, end;
count=1000000;
thrust::host_vector <int> h_a(count);
thrust::device_vector <int> d_b(count,0);
int *d_bPtr = thrust::raw_pointer_cast(&d_b[0]);
start=clock();
thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
end=clock();
cout<<"Time Spent:"<<end-start<<endl;
It takes ~1ms to finish.
Then I ran some other code on the cuda again, mainly atomic operations. Then I copy the data from device to host, it takes very long time, like ~9s.
__global__ void dosomething(int *d_bPtr)
{
....
atomicExch(d_bPtr,c)
....
}
start=clock();
thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
end=clock();
cout<<"Time Spent:"<<end-start<<endl;
~ 9s
I ran the code multiple times, for example
int i=0;
while (i<10)
{
clock_t start, end;
count=1000000;
thrust::host_vector <int> h_a(count);
thrust::device_vector <int> d_b(count,0);
int *d_bPtr = thrust::raw_pointer_cast(&d_b[0]);
start=clock();
thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
end=clock();
cout<<"Time Spent:"<<end-start<<endl;
__global__ void dosomething(int *d_bPtr)
{
....
atomicExch(d_bPtr,c)
....
}
start=clock();
thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
end=clock();
cout<<"Time Spent:"<<end-start<<endl;
i++
}
The results are pretty much the same.
What could be the problem?
Thank you!
thrust::raw_ptr_castwithdevice_vectorfirst index.I am trying to run a snippet from your code and I am gettingerror: argument list for class template "thrust::device_ptr" is missingerror... - Recker