qemu-kvm numa topology exposure problems

Question

I am trying to achieve the following scenario.

I have a linux box with 4 numa nodes, each with 6 cpus. In order to achieve better kvm guest performance, I pin each vcpu to a set of cpus, preferably in the same numa cell.

For example, if I want to start a 12 core guest, I pin the first 6 vcpus to the cpuset in NUMA node 1 and the second 6 to the cpuset in NUMA node 2.

So far so good, the problems start to occur when I try to expose that topology to the guest i.e. make the guest aware that it has two cpusets on 2 NUMA nodes.

I though that if I use the options -smp 12,sockets=2,cores=6,threads=1 to qemu-kvm it will most probably split them in half, grouping the first 6 in one socket and the second 6 in another and use -numa option to set 2 numa nodes on the appropriate vcpus. So my questions are as follows:

Will the -numa option do its thing? In the documentation it is said it is for numa simulation. If its simulation, doesn't that mean it will hurt performance? What I need is a way to say to the guest: "These cpus are on the same NUMA node" ( even if they are not ). Is this the way to achieve that ?
It seems there is a bug on qemu (1.2.0) and the topology is exposed very badly. When I set the CPU topology to (for example) -smp 9,sockets=3,cores=3,threads=1, for some weird reason, inside the guest I see them ( using lstopo ) arranged in three sockets , but 4 cores on the first, 4 cores on the second and 1 core on the third ( 4|4|1 ). I figured, it splits them to powers of 2, rather than equally. I also observed the same behavior with sockets=2,cores=10; sockets=2,cores=18 , you name it, always splits them not by half, but by powers of 2 ( i.e. 8|2 and 16|2 ). sockets=2,cores=8 works fine though ( which is kind of expected). Has anyone experienced something like that?

I am actually facing similar issue as yours. I want to pin first of half the vcpus to first half of cores on host and second half of vcpus to second half of cores. And on Guest, I need to know which of the cores on Guest are pinned to second half of host cores. Since my problem and your problem are to do with 'exposure' between host and guest, my analysis and conclusion is that it is totally opaque between guest and host. If at all there is such 'exposure', I think then there is violation of virtualization. A guest is supposed to be as independent, sovereign, standalone to user as the host. — madhu

Ilja Maslov Ilja Maslov · Accepted Answer · 2013-03-29T19:05:08

Since 0.9.8 one can use numa element to specify the guest NUMA topology. Combined with the vcpupin under cputune elements, you should be able to achieve desired CPU/memory mapping between the host and guest, if desired.

http://libvirt.org/formatdomain.html#elementsCPU

qemu-kvm numa topology exposure problems

1 Answers