0
votes

How to measure the execution time of cudaMalloc using cuda events? I am able to measure kernel time and cudaMemcpy time with events, but it doesn't work for cudaMalloc. With the code below I get execution time 3.104e-06 sec (which is wrong). With Nvidia Nsight Compute I get 0.109 sec.

cudaEventRecord(startCuda);
cudaMalloc(&devMatrix, allocSize);
cudaEventRecord(stopCuda);
cudaEventSynchronize(stopCuda);
cudaEventElapsedTime(&timeCudaMalloc, startCuda, stopCuda);
1

1 Answers

0
votes

Try using std::chrono::high_resolution_clock to measure the same things using the cpu clock: http://www.cplusplus.com/reference/chrono/high_resolution_clock/now/

Also, I'm not familiar with Nvidia Nsight Compute, but could it be that it adds overhead for debugging your code?

Finally, the cudaMalloc execution time is highly variable, so don't expect consistent results.