I want to do a 64-byte transaction on PCIe. I am using Intel i7 9th gen CPU.
I was able to do 64-byte write transaction to PCIe device memory by making it WC region and wrote data like this:
_mm256_store_si256(pcie_memory_address, ymm0);
_mm256_store_si256(pcie_memory_address+32, ymm1);
_mm_mfence();
I tried a 64-byte read using the instruction:
_mm256_loadu_si256();
Used it as like write, but here read occurs as 2* 32-byte reads.
Can anyone help me with this? I want to do a 64-byte read as a single burst.
I referred Intel documentation from this link: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/pcie-burst-transfer-paper.pdf
WC
implies, this feature is about write combining memory. You can find some information about how this works here. Effectively, the processor has a couple of 64-byte registers that it can buffer writes (non-temporal or towc/uc
memory) in, so multiple separate writes (ideally) combine into a single bus transaction. The buffers don't do loads, and you don't want to load fromwc
memory if at all avoidable. Maybe AVX512 enables a single 64-byte load to cause a single bus transaction, but I'm not certain about that. – EOF_mm256_loadu_si256()
with_mm_stream_load_si128()
while keeping the memorywc
. This should fetch a 64-byte cacheline in a single transaction into a fill buffer. A second aligned 32-byte load from the same cacheline should not cause a second bus transaction if the fill buffer was not evicted in-between (but you might not always be able to prevent this, depending on things like read-for-ownership of unrelated cachelines by other processors). – EOF