Nvidia Visual Profiler showing offset GPU/CPU computations

Question

I have a CUDA program that I am profiling on three machines:

Windows 7 workstation with a GeForce 690 GTX gpu.
Windows 7 laptop with a NVS 5200M gpu.
Fedora 19 workstation with a GeForce 690 GTX gpu.

The first machine (windows 7 workstation) is using the GeForce 690 as its primary display card (in addition to doing CUDA processing). The last 2 machines (windows laptop and linux workstation) are using other graphics cards for display rendering (integrated graphics in the case of the laptop and a lower end ATI card for the linux workstation).

I have compiled the same program (with all the CUDA profiling compiler flags set) on all three platforms, and am using nvvp to profile. The timelines of the machines #2 and #3 are what I would expect:

Windows 7 Laptop

Linux Workstation enter image description here

However, the profiling timeline for the Windows Workstation is very different:

Windows 7 Workstation enter image description here

I don't know how or why it happened, but the CPU and GPU computations seemed to have gotten out of sync (at least as far as the profiler is concerned). Could this have something to do with the Windows 7 workstation not having an additional graphics card dedicated to graphics?

There is a known issue in CUDA 5.0/5.5 drivers that GPU time synchronization is off for devices in SLI groups. All devices in the SLI group will use the offset for the first device in the group skew between the GPUs will not be corrected. If you are using SLI please disable SLI mode in the NVIDIA Control Panel and see if this fixes the problem. This may already be fixed in the most recent NVIDIA display drivers. — Greg Smith

Greg Smith Greg Smith · Accepted Answer · 2013-08-28T21:48:29

The NVIDIA Visual Profiler, NVIDIA Nsight Visual Studio Edition, and nvprof use a common method in the driver to synchronize the GPU timers with the CPU timers. In NVIDIA Display Drivers for CUDA 5.0 and CUDA 5.5 there was a bug in the driver that affects timer synchronization with devices in SLI groups. Specifically, all devices in the SLI group used the timer from the first device which results in the other devices in the SLI group displaying event at a fixed positive or negative offset from the correct location. This issue should be fixed in GeForce R326.41 or newer driver.

Nvidia Visual Profiler showing offset GPU/CPU computations

1 Answers