A piece of code that takes well over 1 minute on the command line was done in a matter of seconds in NVIDIA Visual Profiler (running the same .exe). So the natural question is why? Is there something wrong with command line, or does Visual Profiler do something different and not really execute everything as on the command line?
I'm using CUBLAS, Thrust and cuRAND.
Incidentally, there's been a noticeable slowdown in compiled code on my machine very recently, even old code that previously ran quickly, hence I'm getting suspicious.
Update:
- I have checked that the calculated output on command line and Visual Profiler is identical - i.e. all required code has been run in both cases.
- GPU-shark indicated that my performance state was unchanged at P0 when I switched from command line to Visual Profiler.
- However, GPU usage was reported at 0.0% when run with Visual Profiler, but went as high as 98% when run off command line.
- Moreover, far less memory is used with Visual Profiler. When run off command line, task manager indicates usage of 650-700MB of memory (spikes at the first
cudaFree(0)
call). In Visual Profiler that figure goes down to ~100MB.