I am a bit confused about the usage of cudaEvent_t
. Currently, I am using the clock()
call like this to find the duration of a kernel call:
cudaThreadSynchronize();
clock_t begin = clock();
fooKernel<<< x, y >>>( z, w );
cudaThreadSynchronize();
clock_t end = clock();
// Print time difference: ( end - begin )
Looking for a timer of higher-resolution I am considering using cudaEvent_t
. Do I need to call cudaThreadSynchronize()
before I note down the time using cudaEventRecord()
or is it redundant?
The reason I am asking is because there is another call cudaEventSynchronize()
, which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?