I am studying Intel VMX through Linux KVM.
And I could NOT clearly understand how does KVM (Linux) schedule multiple VMs running cocurrently in the same host.
For example, there is 1 physical CPU in the host, and there are 2 KVM VMs, each configured with 1 vCPU.
Once they are started, KVM/QEMU configure a VMCS for each vCPU, so there are 2 VMCSes in KVM. Since there is only 1 pCPU, so KVM/Linux has to schedule each vCPU 1 by 1.
My understanding is when vCPUa is running, KVM VMPTRLD VMCS of vCPUa, and run the VM's codes. Then, vCPUb is to be scheduled, KVM will VMPTRST VMCS of vCPUa to somewhere, and VMPTRLD VMCS of vCPUb from somewhere.
By reading KVM's code, I did NOT find where VMPTRLD/VMPTRST happened for vCPU scheduling, and what is 'somewhere'.
0
votes
I'd guess that KVM works like a process under the host kernel, getting scheduled by the regular scheduler. It must need code for switching between running a guest VM vs. running host code (including non-VM user-space processes on the host). Did you already look at context-switching functions to see if they're KVM-aware?
– Peter Cordes
That makes sense, and that is also my guess. I have to dig into scheduler code for details, will update my findings here.
– wangt13
2 Answers
1
votes
Firstly, KVM will register two callouts for vCPU schedule in and out, like below.kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
So, each time, when scheduling happens (have not deeply dived into this), they are called. Take kvm_sched_in()
as an example, it will calls,
static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
if (vcpu->preempted)
vcpu->preempted = false;
kvm_arch_sched_in(vcpu, cpu);
kvm_arch_vcpu_load(vcpu, cpu);
}
And kvm_arch_vcpu_load()
will play with VMCS, like below.
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
/* Address WBINVD may be executed by guest */
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops->has_wbinvd_exit())
cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
else if (vcpu->cpu != -1 && vcpu->cpu != cpu)
smp_call_function_single(vcpu->cpu,
wbinvd_ipi, NULL, 1);
}
kvm_x86_ops->vcpu_load(vcpu, cpu); <<======
Since, .vcpu_load = vmx_vcpu_load,
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
*/
static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
if (!vmm_exclusive)
kvm_cpu_vmxon(phys_addr);
else if (vmx->loaded_vmcs->cpu != cpu)
loaded_vmcs_clear(vmx->loaded_vmcs);
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
}
That is it.