I found my MMIO read/write latency is unreasonably high. I hope someone could give me some suggestions.
In the kernel space, I wrote a simple program to read a 4 byte value in a PCIe device's BAR0 address. The device is a PCIe Intel 10G NIC and plugged-in at the PCIe x16 bus on my Xeon E5 server. I use rdtsc to measure the time between the beginning of the MMIO read and the end, a code snippet looks like this:
vaddr = ioremap_nocache(0xf8000000, 128); // addr is the BAR0 of the device
rdtscl(init);
ret = readl(vaddr);
rmb();
rdtscl(end);
I'm expecting the elapsed time between (end, init) to be less than 1us, after all, the data traversing the PCIe data link should be only a few nanoseconds. However, my test results show at lease 5.5use to do a MMIO PCIe device read. I'm wondering whether this is reasonable. I change my code to remote the memory barrier (rmb) , but still get around 5 us latency.
This paper mentions about the PCIe latency measurement. Usually it's less than 1us. www.cl.cam.ac.uk/~awm22/.../miller2009motivating.pdf Do I need to do any special configuration such as kernel or device to get lower MMIO access latency? or Does anyone has experiences doing this before?