I'm trying to understand the difference between timing kernel execution using CUDA timers (events) and regular CPU timing methods (gettimeofday
on Linux, etc.).
From reading http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ section 8.1, it seems to me that the only real difference is that when using CPU timers one needs to remember to synchronize the GPU because calls are asynchronous. Presumably the CUDA event APIs do this for you.
So is this really a matter of:
- With GPU events you don't need to explicitly call
cudaDeviceSynchronize
- With GPU events you get an inherently platform-independent timing API, while with the CPU you need to use separate APIs per OS
?
Thanks in advance