I want all read & write requests to a PCIe device to be cached by CPU caches. However, it does not work as I expected.
These are my assumptions on write-back MMIO regions.
- Writes to the PCIe device happen only on cache write-back.
- The size of TLP payloads is cache block size (64B).
However, captured TLPs do not follow my assumptions.
- Writes to the PCIe device happen on every write to the MMIO region.
- The size of TLP payloads is 1B.
I write 8-byte of 0xff
to the MMIO region with the following user space program & device driver.
Part of User Program
struct pcie_ioctl ioctl_control;
ioctl_control.bar_select = BAR_ID;
ioctl_control.num_bytes_to_write = atoi(argv[1]);
if (ioctl(fd, IOCTL_WRITE_0xFF, &ioctl_control) < 0) {
printf("ioctl failed\n");
}
Part of Device Driver
case IOCTL_WRITE_0xFF:
{
int i;
char *buff;
struct pci_cdev_struct *pci_cdev = pci_get_drvdata(fpga_pcie_dev.pci_device);
copy_from_user(&ioctl_control, (void __user *)arg, sizeof(ioctl_control));
buff = kmalloc(sizeof(char) * ioctl_control.num_bytes_to_write, GFP_KERNEL);
for (i = 0; i < ioctl_control.num_bytes_to_write; i++) {
buff[i] = 0xff;
}
memcpy(pci_cdev->bar[ioctl_control.bar_select], buff, ioctl_control.num_bytes_to_write);
kfree(buff);
break;
}
I modified MTRRs to make the corresponding MMIO region write-back. The MMIO region starts from 0x0c7300000, and the length is 0x100000 (1MB). Followings are cat /proc/mtrr
results for different policies. Please note that I made each region exclusive.
Uncacheable
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: uncachable
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
Write-combining
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-combining
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
Write-back
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-back
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
Followings are waveform captures for 8B write with different policies. I have used integrated logic analyzer (ILA) to capture these waveform. Please watch pcie_endpoint_litepcietlpdepacketizer_tlp_req_payload_dat
when pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
is set. You can count the number of packets by counting pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
in these waveform example.
- Uncacheable: link -> correct, 1B x 8 packets
- Write-combining: link -> correct, 8B x 1 packet
- Write-back: link -> unexpected, 1B x 8 packets
System configuration is like below.
- CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
- OS: Linux kernel 4.15.0-38
- PCIe Device: Xilinx FPGA KC705 programmed with litepcie
Related Links
- Generating a 64-byte read PCIe TLP from an x86 CPU
- How to Implement a 64B PCIe* Burst Transfer on Intel® Architecture
- Write Combining Buffer Out of Order Writes and PCIe
- Do Ryzen support write-back caching for Memory Mapped IO (through PCIe interface)?
- MTRR (Memory Type Range Register) control
- PATting Linux
- Down to the TLP: How PCI express devices talk (Part I)
mempcy
being implemented asrep movsb
inside the kernel (because your CPU is new enough for ERMSB which makesrep movsb
fairly good.) – Peter Cordesuncached-minus
in PAT. I will try to solve it. – Taekyung Heo