I have a CUDA program that I am profiling on three machines:
- Windows 7 workstation with a GeForce 690 GTX gpu.
- Windows 7 laptop with a NVS 5200M gpu.
- Fedora 19 workstation with a GeForce 690 GTX gpu.
The first machine (windows 7 workstation) is using the GeForce 690 as its primary display card (in addition to doing CUDA processing). The last 2 machines (windows laptop and linux workstation) are using other graphics cards for display rendering (integrated graphics in the case of the laptop and a lower end ATI card for the linux workstation).
I have compiled the same program (with all the CUDA profiling compiler flags set) on all three platforms, and am using nvvp to profile. The timelines of the machines #2 and #3 are what I would expect:
Windows 7 Laptop
Linux Workstation
However, the profiling timeline for the Windows Workstation is very different:
Windows 7 Workstation
I don't know how or why it happened, but the CPU and GPU computations seemed to have gotten out of sync (at least as far as the profiler is concerned). Could this have something to do with the Windows 7 workstation not having an additional graphics card dedicated to graphics?