What is the difference between using a CPU timer and the CUDA timer event to measure the time taken for the execution of some CUDA code? Which of these should a CUDA programmer use and why?
CPU timer usage would involve calling cudaThreadSynchronize
before any time is noted. For noting the time clock()
could be used or high-resolution performance counter like QueryPerformanceCounter
(on Windows) could be queried.
CUDA timer event would involve recording before and after by using cudaEventRecord
. At a later time, the elapsed time would be obtained by calling cudaEventSynchronize
on the events, followed by cudaEventElapsedTime
to obtain the elapsed time.