Running nvprof --metrics command under windows gives an error:
==6580== NVPROF is profiling process 6580, command: Project1.exe
==6580== Error: Internal profiling error 4292:1.
======== Error: CUDA profiling error.
If I only use the nvprof command, no error will be reported:
F:\vstest\Project1\x64\Release>nvprof Project1.exe
==384== NVPROF is profiling process 384, command: Project1.exe
sumMatrixOnGPU2D <<<(512,512), (32,32)>>> elapsed 22 ms
==384== Profiling application: Project1.exe
==384== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 61.28% 538.11ms 2 269.06ms 260.98ms 277.13ms [CUDA memcpy HtoD]
36.29% 318.68ms 1 318.68ms 318.68ms 318.68ms [CUDA memcpy DtoH]
2.43% 21.364ms 1 21.364ms 21.364ms 21.364ms sumMatrixOnGPU2D(float*, float*, float*, int, int)
API calls: 56.77% 1.29771s 3 432.57ms 47.895ms 1.19911s cudaMalloc
37.53% 857.94ms 3 285.98ms 261.20ms 319.19ms cudaMemcpy
2.56% 58.617ms 1 58.617ms 58.617ms 58.617ms cudaDeviceReset
2.13% 48.594ms 3 16.198ms 14.312ms 17.671ms cudaFree
0.95% 21.732ms 2 10.866ms 275.60us 21.456ms cudaDeviceSynchronize
0.02% 512.70us 1 512.70us 512.70us 512.70us cudaLaunchKernel
0.02% 359.30us 96 3.7420us 100ns 204.60us cuDeviceGetAttribute
0.02% 347.80us 1 347.80us 347.80us 347.80us cudaGetDeviceProperties
0.01% 180.60us 1 180.60us 180.60us 180.60us cuDeviceGetPCIBusId
0.00% 32.100us 1 32.100us 32.100us 32.100us cuDeviceTotalMem
0.00% 13.400us 1 13.400us 13.400us 13.400us cudaSetDevice
0.00% 4.0000us 3 1.3330us 200ns 3.5000us cuDeviceGetCount
0.00% 3.9000us 1 3.9000us 3.9000us 3.9000us cudaGetLastError
0.00% 1.1000us 2 550ns 200ns 900ns cuDeviceGet
0.00% 1.0000us 1 1.0000us 1.0000us 1.0000us cuDeviceGetName
0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetUuid
0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetLuid
I would like to ask what is the problem, how can I use the command nvprof --metrics