CUDA: Difference between CPU timer and CUDA timer event?

Question

What is the difference between using a CPU timer and the CUDA timer event to measure the time taken for the execution of some CUDA code? Which of these should a CUDA programmer use and why?

CPU timer usage would involve calling cudaThreadSynchronize before any time is noted. For noting the time clock() could be used or high-resolution performance counter like QueryPerformanceCounter (on Windows) could be queried.

CUDA timer event would involve recording before and after by using cudaEventRecord. At a later time, the elapsed time would be obtained by calling cudaEventSynchronize on the events, followed by cudaEventElapsedTime to obtain the elapsed time.

Did you start writing one question and finishing writing another? I don't understand how the last paragraph fits in with the rest of the question. What is it that you really want to know? Are you attempting to reconcile the output from host and device timer measurements and can't, or something else? — talonmies
Talonmies: I have removed the last paragraph. So the question simply is ... as a programmer, I am confused which of these 2 timers to use and why? — Ashwin Nanjappa

talonmies talonmies · Accepted Answer · 2011-04-29T07:05:09

The answer to the first part of question is that cudaEvents timers are based off high resolution counters on board the GPU, and they have lower latency and better resolution than using a host timer because they come "off the metal". You should expect sub-microsecond resolution from the cudaEvents timers. You should prefer them for timing GPU operations for precisely that reason. The per-stream nature of cudaEvents can also be useful for instrumenting asynchronous operations like simultaneous kernel execution and overlapped copy and kernel execution. Doing that sort of time measurement is just about impossible using host timers.

EDIT: I won't answer the last paragraph because you deleted it.

CUDA: Difference between CPU timer and CUDA timer event?

2 Answers