Does Linux automatically binds IRQs to the NUMA-nodes to which the PCIe-devices are connected?

Question

As we known, we can map IRQs of some devices to some CPU-Cores by using IRQ Affinity on Linux

cat <8-bit-core-mask> /proc/irq/[irq-num]/smp_affinity:

Also

We known, that we can map IRQ (hardware-interrupts) on the some CPU-Nodes (Processors on motherboard) on NUMA-systems, by using: https://events.linuxfoundation.org/sites/events/files/eeus13_shelton.pdf

cat <8-bit-node-mask> /proc/irq/[irq-num]/node

But if one PCIe-device (Ethernet, GPU, ...) connected to the NUMA-0, and other PCIe-device connected to the NUMA-1, then it would be optimal to use interrupts on those NUMA-nodes (CPU) to which these devices are connected, to avoid high latency communication between nodes: Is CPU access asymmetric to Network card

Does Linux automatically binds IRQs to the nodes to which the PCIe-devices are connected , or does it have to be done manually?

And if we have to do this with our hands, then what is the best way to do this?

Particularly interested in Linux x86_64: Debian 8 (Kernel 3.16) and Red Hat Enterprise Linux 7 (Kernel 3.10), and others...

Motherboard chipsets: Intel C612 / Intel C610, and others...

Ethernet cards: Solarflare Flareon Ultra SFN7142Q Dual-Port 40GbE QSFP+ PCIe 3.0 Server I/O Adapter - Part ID: SFN7142Q

I think, the answer may be specific to the hardware platform used and card driver... And there are kernel which does some mapping on boot, and also init scripts (specific to your linux distribution) of tools like irqbalance which may change initial mapping. What are your kernel version, platform (motherboard or chipset name) and cards, and distribution? — osgx
I'm surprised there aren't more general answers available for this than "you need to know the hardware architecture". NUMA servers are very common these days (e.g. the HPE DL360/DL380 family have existed as dual socket servers for many years), and wanting IRQs routed appropriately to each PCIe slot "automatically" seems a reasonable expectation for a Linux user. — Michael Firth

Moonray Moonray · Accepted Answer · 2017-06-14T12:54:42

By architecture all low IRQs mapped to Node 0. Some of them CAN'T be remapped like IRQ 0 timer. Anyway need to review your system (blueprints).

In case you have high network load and doing routing it makes sense to pin NIC queues. Most effectively to pin tx and rx queues to "nearest" cores in term of caches. But before suggest that would be great to know your architecture.

Need to know: 1. Your system (dmidecode, lspci output), cat /proc/interrupt 2. Your requirements (what the purpose of the server). IOW would be great to understand what's your server for. So just explain the flows and architecture.

Does Linux automatically binds IRQs to the NUMA-nodes to which the PCIe-devices are connected?

1 Answers