0
votes

I'm trying to understand the difference between timing kernel execution using CUDA timers (events) and regular CPU timing methods (gettimeofday on Linux, etc.).

From reading http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ section 8.1, it seems to me that the only real difference is that when using CPU timers one needs to remember to synchronize the GPU because calls are asynchronous. Presumably the CUDA event APIs do this for you.

So is this really a matter of:

  1. With GPU events you don't need to explicitly call cudaDeviceSynchronize
  2. With GPU events you get an inherently platform-independent timing API, while with the CPU you need to use separate APIs per OS

?

Thanks in advance

1

1 Answers

2
votes

You've got it down. Because the GPU operates asynchronously from the CPU, when you launch a GPU kernel the CPU can continue on its merry way. When timing, this means you could reach the end of your timing code (i.e. record the duration) before the GPU returns from its kernel. This is why we synchronize.. to make sure the kernel has finished before we move forward with the CPU code. This is particularly important when we need the results from the GPU kernel for a following operation (i.e. steps in a algorithm).

If it helps, you can think of cudaEventSynchronize as a sync point from CPU-GPU as the CPU timer depends on both CPU and GPU code, whereas the cuda timer events only depend on the GPU code. And because those cuda timing events are compiled by nvcc specifically for CUDA platforms, they're CPU platform independent but GPU platform dependent.