3
votes

I'm writing an open source document about qemu internals so if you help me you're helping the growth of Qemu project

The closest answer I found was: In which conditions the ioctl KVM_RUN returns?

This is the thread loop for a single CPU running on KVM:

static void *qemu_kvm_cpu_thread_fn(void *arg)
{
    CPUState *cpu = arg;
    int r;

    rcu_register_thread();

    qemu_mutex_lock_iothread();
    qemu_thread_get_self(cpu->thread);
    cpu->thread_id = qemu_get_thread_id();
    cpu->can_do_io = 1;
    current_cpu = cpu;

    r = kvm_init_vcpu(cpu);
    if (r < 0) {
        error_report("kvm_init_vcpu failed: %s", strerror(-r));
        exit(1);
    }

    kvm_init_cpu_signals(cpu);

    /* signal CPU creation */
    cpu->created = true;
    qemu_cond_signal(&qemu_cpu_cond);
    qemu_guest_random_seed_thread_part2(cpu->random_seed);

    do {
        if (cpu_can_run(cpu)) {
            r = kvm_cpu_exec(cpu);
            if (r == EXCP_DEBUG) {
                cpu_handle_guest_debug(cpu);
            }
        }
        qemu_wait_io_event(cpu);
    } while (!cpu->unplug || cpu_can_run(cpu));

    qemu_kvm_destroy_vcpu(cpu);
    cpu->created = false;
    qemu_cond_signal(&qemu_cpu_cond);
    qemu_mutex_unlock_iothread();
    rcu_unregister_thread();
    return NULL;
}

You can see here:

do {
        if (cpu_can_run(cpu)) {
            r = kvm_cpu_exec(cpu);
            if (r == EXCP_DEBUG) {
                cpu_handle_guest_debug(cpu);
            }
        }
        qemu_wait_io_event(cpu);
    } while (!cpu->unplug || cpu_can_run(cpu));

that every time the KVM returns, it gives an opportunity for Qemu to emulate things. I suppose that when the kernel on the guest tries to access a PCIe device, KVM on the host returns. I don't know how KVM knows how to return. Maybe KVM maintains the addresses of the PCIe device and tells Intel's VT-D or AMD's IOV which addresses should generate an exception. Can someone clarify this?

Well, by the look of the qemu_kvm_cpu_thread_fn, the only place where a PCIe access could be emulated, is qemu_wait_io_event(cpu), which is defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus.c#L1266 and which calls qemu_wait_io_event_common defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus.c#L1241 which calls process_queued_cpu_work defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus-common.c#L309

Let's see the code which executes the queue functions:

 while (cpu->queued_work_first != NULL) {
        wi = cpu->queued_work_first;
        cpu->queued_work_first = wi->next;
        if (!cpu->queued_work_first) {
            cpu->queued_work_last = NULL;
        }
        qemu_mutex_unlock(&cpu->work_mutex);
        if (wi->exclusive) {
            /* Running work items outside the BQL avoids the following deadlock:
             * 1) start_exclusive() is called with the BQL taken while another
             * CPU is running; 2) cpu_exec in the other CPU tries to takes the
             * BQL, so it goes to sleep; start_exclusive() is sleeping too, so
             * neither CPU can proceed.
             */
            qemu_mutex_unlock_iothread();
            start_exclusive();
            wi->func(cpu, wi->data);

It looks like that the only power the VCPU thread qemu_kvm_cpu_thread_fn has when KVM returns, is to execute the queued functions:

wi->func(cpu, wi->data);

This means that a PCIe device would have to constantly queue itself as a function for qemu to execute. I don't see how it would work.

The functions that are able to queue work on this cpu have run_on_cpu on its name. By searching it on VSCode I found some functions that queue work but none related to PCIe or even emulation. The nicest function I found was this one that apparently patches instructions: https://github.com/qemu/qemu/blob/stable-4.2/hw/i386/kvmvapic.c#L446. Nice, I wanted to know that also.

1

1 Answers

2
votes

Device emulation (of all devices, not just PCI) under KVM gets handled by the "case KVM_EXIT_IO" (for x86-style IO ports) and "case KVM_EXIT_MMIO" (for memory mapped IO including PCI) in the "switch (run->exit_reason)" inside kvm_cpu_exec(). qemu_wait_io_event() is unrelated.

Want to know how execution gets to "emulate a register read on a PCI device" ? Run QEMU under gdb, set a breakpoint on, say, the register read/write function for the ethernet PCI card you're using, and then when you get dropped into the debugger look at the stack backtrace. (Compile QEMU --enable-debug to get better debug info for this kind of thing.)

PS: If you're examining QEMU internals for educational purposes, you'd be better to use the current code, not a year-old release of it.