2
votes

What is the difference between memory and io bandwidth, and how do you measure each one?

I have so many assumptions, forgive the verbosity of this two part question.

The inspiration for these questions came from: What is the meaning of IB read, IB write, OB read and OB write. They came as output of Intel® PCM while monitoring PCIe bandwidth where Hadi explains:

DATA_REQ_OF_CPU is NOT used to measure memory bandwidth but i/o bandwidth.

I’m wondering if the difference between mem/io bandwidth is similar to the difference between DMA(direct memory addressing) & MMIO(memory mapped io) or if the bandwidth of both IS io bandwidth?

I’m trying to use this picture to help visualize:

enter image description here

(Hopefully I have this right) In x86 there are two address spaces: Memory and IO. Would IO bandwidth be the measure between cpu (or dma controller) to the io device, and then memory bandwidth would be between cpu and main memory? All data in these two scenarios running through the memory bus? Just for clarity, we all agree the definition of the memory bus is the combination of address and data bus? If so that part of the image might be a little misleading...

If we can measure IO bandwidth with Intel® Performance Counter Monitor (PCM) by utilizing the pcm-iio program, how would we measure memory bandwidth? Now I’m wondering why they would differ if running through the same wires? Unless I just have this all wrong. The github page for a lot of this test code is a bit overwhelming: https://github.com/opcm/pcm

Thank you

1
Yes, memory bandwidth is normally either theoretical max for the DRAM itself, or for the CPU<=>memory connection. I/O bandwidth usually refers to a specific I/O device, but sure you could talk about possible aggregate I/O bandwidth over all PCIe links that connect the CPU to the outside world e.g. from multiple video cards, 100G NICs, and/or SSDs. On modern x86, the memory controllers are built-in to the CPU so there's no side-channel from I/O to DRAM that bypasses the CPU. DMA bypasses any specific CPU core, though.Peter Cordes
The picture is misleading if not wrong. The links are: L3 -> Ring bus/Mesh -> (Home Agent ->) iMC for CPU => DRAM, PCI device -> PCIe bus -> System Agent -> Ring bus/Mesh -> (Home Agent ->) DRAM for DMA and L3 (Assuming the Cache Agent is the unified path outside to the uncore, IO is not cached when traversing this path of course) -> Ring bus / Mesh -> System Agent -> PCIe bus -> PCI device for IO (memory and port mapped). In a NUMA architecture the segment "Ring bus / Mesh" must be extended to include an eventual QPI/UPI link between sockets.Margaret Bloom
"In x86 there are two address spaces: Memory and IO". Yes, but not in the way it is usually described. There is a legacy "IO Address Space" consisting of 64Ki individually addressable 8-bit "IO ports", and accessed exclusively via special IO instructions [IN,INS,OUT,OUTS]. The other address space is "physical address space", which is subdivided to allow access to "regular" memory and to "memory-mapped IO" in different address ranges. (To make it more confusing, in some engineering disciplines every signal leaving the chip is considered "IO", including DRAM access.)John D McCalpin

1 Answers

2
votes

The DATA_REQ_OF_CPU event cannot be used to measure memory bandwdith for the following reasons:

  • Not all inbound memory requests from an IIO controller are serviced by a memory controller because a request could also be serviced by the LLC (or an LLC in case of multiple sockets). Note, however, on Intel processors that don't support DDIO, IO memory read requests may cause speculative read requests to be sent to memory in parallel with the LLC lookup.
  • The DATA_REQ_OF_CPU event has many subevents. The inbound memory metrics measured by the pcm-iio tool don't include all types of memory requests. Specifically, they don't include atomic memory reads and writes and IOMMU memory requests, which may consume memory bandwdith.
  • Some subevents count non-memory requests. For example, there are peer-to-peer requests (from one IIO to another).
  • An IO device may want to access memory on a NUMA node that is different from the node to which it's connected. In this case, it will consume memory bandwidth on a different NUMA node.

Now I realize the statement you quoted is a little ambiguous; I don't remember whether I was talking specifically about the metrics measured by pcm-iio or the event in general or whether "memory bandwdith" refers to total memory bandwidth or only the portion consumed by IO devices attached to an IIO. Although the statement interpreted in any of these ways is correct for the reasons mentioned above.

The pcm-iio tool only measures IO bandwdith. Use instead the pcm-memory tool for measuring memory bandwdith, which utilizes the performance events of the IMCs. It appears to me that the none of the PCM tools can measure memory bandwdith consumed by IO devices, which requires using the CBox events.

The main source of information on uncore performance events is the Intel uncore manuals. You'll find nice figures in the Introduction chapters of these manuals that show how the different units of a processor are connected to each other.