Sometime back NVIDIA introduced the concept of vGPUs utilizing its GRID GPUs, where a physical GPU is virtualized into multiple vGPUs, each of which is assigned to a guest VM. While the GRID documentation is pretty clear on memory seggregation, what's not clear is how the kernels originating from guestVM execute on the main hardware.
The Grid datasheet in one of its tables (table 1) mentions CUDA Cores(time-sliced shared) . Does this imply a CUDA kernel originating from one guest VM captures the entire GPU for a time-slice followed by kernels from other VMs?
Reference to GRID Datasheet: http://images.nvidia.com/content/pdf/grid/whitepaper/NVIDIA-GRID-WHITEPAPER-vGPU-Delivering-Scalable-Graphics-Rich-Virtual-Desktops.pdf