I'm trying to learn how to use CUDA with thrust and I have seen some piece of code where the printf function seems to be used from the device.
Consider this code:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <cstdio>
struct functor
{
__host__ __device__
void operator()(int val)
{
printf("Call for value : %d\n", val);
}
};
int main()
{
thrust::host_vector<int> cpu_vec(100);
for(int i = 0 ; i < 100 ; ++i)
cpu_vec[i] = i;
thrust::device_vector<int> cuda_vec = cpu_vec; //transfer to GPU
thrust::for_each(cuda_vec.begin(),cuda_vec.end(),functor());
}
this seems to run fine and prints 100 times the message "Call for value : " followed by a number.
now if I include iostream and replace the printf line with a C++ stream-based equivalent
std::cout << "Call for value : " << val << std::endl;
I get compilation warnings from nvcc and the compiled program will not print anything.
warning: address of a host variable "std::cout" cannot be directly taken in a device function
warning: calling a __host__ function from a __host__ __device__ function is not allowed
warning: calling a __host__ function("std::basic_ostream<char, std::char_traits<char> >::operator <<") from a __host__ __device__ function("functor::operator ()") is not allowed
- Why does it work with printf?
- Why does it not work with cout?
- What is actually run on GPU? I'd guess, at least sending to stdout requires some CPU work.
printf
is "overloaded" as a__device__
function, whilecout
is not. You need explicit "overloading" of printing functions because you need to properly deal with the output buffer. Have a look at thesimplePrintf
example and you will have a feeling on why you need explicit overloading and how you can do it. Sincecout
is a__host__
function only,nvcc
cannot compile it. – Vitality