I try to maximize throughput on a single RX-Queue of my network card. My current setup is utilizing the Shared Umem
feature by setting up multiple sockets on the same RX-Queue, each with a reference to the same Umem.
My Kernel XDP-Program then assigns streams of packets to the correct socket via a BPF_MAP_TYPE_XSKMAP
. This all works fine but at around 600.000 pps, ksoftirqd/18
reaches 100% CPU load (I moved my userspace application to another core via taskset -c 1
to reduce load on Core 18). My userspace app doesn't have more than 14% CPU load so unfortunately, the reason why I am not able to process any more packets is because of the huge amount of interrupts.
I then read about the xdp bind-flag XDP_USE_NEED_WAKEUP
which sends the Umem Fill-Ring to sleep thus reducing interrupt overhead (as far as I understand it correctly, there is not a lot of information out there on this topic). Because Umem Fill-Ring might be sleeping, one has to regularly check:
if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
const int ret = poll(fds, len, 10);
}
fds
are struct pollfd
containing the filedescriptor of every socket. I am not quite sure where to add the XDP_USE_NEED_WAKEUP
flag but here is how I use it:
static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem, struct config *cfg,
const bool rx, const bool tx) {
struct xsk_socket_config xsk_socket_cfg;
struct xsk_socket_info *xsk;
struct xsk_ring_cons *rxr;
struct xsk_ring_prod *txr;
int ret;
xsk = calloc(1, sizeof(*xsk));
if (!xsk) {
fprintf(stderr, "xsk `calloc` failed: %s\n", strerror(errno));
exit(1);
}
xsk->umem = umem;
xsk_socket_cfg.rx_size = XSK_CONS_AMOUNT;
xsk_socket_cfg.tx_size = XSK_PROD_AMOUNT;
if (cfg->ip_addrs_len > 1) {
xsk_socket_cfg.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
} else {
xsk_socket_cfg.libbpf_flags = 0;
}
xsk_socket_cfg.xdp_flags = cfg->xdp_flags;
xsk_socket_cfg.bind_flags = cfg->xsk_bind_flags | XDP_USE_NEED_WAKEUP;
rxr = rx ? &xsk->rx : NULL;
txr = tx ? &xsk->tx : NULL;
ret = xsk_socket__create(&xsk->xsk, cfg->ifname_buf, cfg->xsk_if_queue, umem->umem, rxr, txr, &xsk_socket_cfg);
if (ret) {
fprintf(stderr, "`xsk_socket__create` returned error: %s\n", strerror(errno));
exit(-ret);
}
return xsk;
}
I observed that it had a small impact on the load of ksoftirqd/18
and I was able to process 50.000 pps more than before (but this could also be because of changes to the general load of the system - I am not sure :/). But I also noticed, that XDP_USE_NEED_WAKEUP
doesn't work for Shared Umem
because libbpf has this code in xsk.c
:
sxdp.sxdp_family = PF_XDP;
sxdp.sxdp_ifindex = xsk->ifindex;
sxdp.sxdp_queue_id = xsk->queue_id;
if (umem->refcount > 1) {
sxdp.sxdp_flags = XDP_SHARED_UMEM;
sxdp.sxdp_shared_umem_fd = umem->fd;
} else {
sxdp.sxdp_flags = xsk->config.bind_flags;
As you can see, bind_flags
are only used if Umem
has a refcount
of 1 (it can't be less than that because it is incremented somewhere above in xsk_socket__create
). But because for every created socket, refcount
is increased - these bind_flags
are only used for the first socket (where refcount
is still <= 1
).
I don't quite understand why XDP_USE_NEED_WAKEUP
can only be used for one socket? In fact, I don't understand why this flag is related to the socket at all if it actually affects the Umem?
Nevertheless, I am searching for a way to reduce interrupt overhead - any ideas how this could be achieved? I need to have at least 1.000.000 pps.