How to measure the execution time of cudaMalloc using cuda events? I am able to measure kernel time and cudaMemcpy time with events, but it doesn't work for cudaMalloc. With the code below I get execution time 3.104e-06 sec (which is wrong). With Nvidia Nsight Compute I get 0.109 sec.
cudaEventRecord(startCuda);
cudaMalloc(&devMatrix, allocSize);
cudaEventRecord(stopCuda);
cudaEventSynchronize(stopCuda);
cudaEventElapsedTime(&timeCudaMalloc, startCuda, stopCuda);