0
votes

I want to do a 64-byte transaction on PCIe. I am using Intel i7 9th gen CPU.

I was able to do 64-byte write transaction to PCIe device memory by making it WC region and wrote data like this:

_mm256_store_si256(pcie_memory_address, ymm0); 
_mm256_store_si256(pcie_memory_address+32, ymm1);
_mm_mfence();

I tried a 64-byte read using the instruction:

_mm256_loadu_si256();

Used it as like write, but here read occurs as 2* 32-byte reads.

Can anyone help me with this? I want to do a 64-byte read as a single burst.

I referred Intel documentation from this link: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/pcie-burst-transfer-paper.pdf

1
As the name WC implies, this feature is about write combining memory. You can find some information about how this works here. Effectively, the processor has a couple of 64-byte registers that it can buffer writes (non-temporal or to wc/uc memory) in, so multiple separate writes (ideally) combine into a single bus transaction. The buffers don't do loads, and you don't want to load from wc memory if at all avoidable. Maybe AVX512 enables a single 64-byte load to cause a single bus transaction, but I'm not certain about that.EOF
AFAICT, you should be able to replace _mm256_loadu_si256() with _mm_stream_load_si128() while keeping the memory wc. This should fetch a 64-byte cacheline in a single transaction into a fill buffer. A second aligned 32-byte load from the same cacheline should not cause a second bus transaction if the fill buffer was not evicted in-between (but you might not always be able to prevent this, depending on things like read-for-ownership of unrelated cachelines by other processors).EOF

1 Answers

0
votes

As you guys said i have used the _mm256_stream_load_si256() with wc memory, i am now able to do 64 byte read also. This is how i used it

__m256i a =   _mm256_stream_load_si256 ((__m256i*)mem_base + 0);
__m256i b =   _mm256_stream_load_si256 ((__m256i*)mem_base + 1);
_mm_mfence();

Thank you guys for your help