Are writes on the PCIe bus atomic?

Question

I am a newbie to PCIe, so this might be a dumb question. This seems like fairly basic information to ask about PCIe interfaces, but I am having trouble finding the answer so I am guessing that I am missing some information which makes the answer obvious.

I have a system in which I have an ARM processor (host) communicating to a Xilinx SoC via PCIe (device). The endpoint within the SoC is an ARM processor as well.

The external ARM processor (host) is going to be writing to the register space of the SoC's ARM processor (device) via PCIe. This will command the SoC to do various things. That register space will be read-only with respect to the SoC (device). The external ARM processor (host) will make a write to this register space, and then signal an interrupt to indicate to the SoC that new parameters have been written and it should process them.

My question is: are the writes made by the external ARM (host) guaranteed to be atomic with respect to the reads by the SoC (device)? In conventional shared memory situations, a write to a single byte is guaranteed to be an atomic operation (i.e. you can never be in a situation where the reader had read the first 2 bits of the byte, but before it reads the last 6 bits the writer replace them with a new value, leading to garbage data). Is this the case in PCIe as well? And if so, what is the "unit" of atomic-ness? Are all bytes in a single transaction atomic with respect to the entire transaction, or is each byte atomic only in relation to itself?

Does this question make sense?

Basically I want to know to what extent memory protection is necessary in my situation. If at all possible, I would like to avoid locking memory regions as both processors are running RTOSes and avoiding memory locks would make design simpler.

depends in part as to how the write vs interrupt are implemented it may be possible for the interrupt to pass the write and get there first. but that wouldnt be a pcie thing but an soc thing and the IP used by the chip vendor, bus implementation, address decoding, etc. — old_timer
You mention that you will be using an interrupt to signal when the operation is done. What kind? Will you be using legacy PCI interrupts, MSI, MSI-X, or some personal thing? I'm guessing the "host" is the RootPort but I want to confirm this is a direct connection or are you going across PCIe switches? — arduic
@arduic , I will be using MSI interrupts here. And yes, the host is the RootPort with no PCIe switches. Would the answer to this question change if PCIe switches were involved? — dykeag
@dykeag It should not I just wanted a better understanding of the setup. Switches to my knowledge just use the BAR address specified to route packets to the correct device in this case they are always going to the correct device. I believe I have your answer I'll be posting below soon. — arduic

arduic arduic · Accepted Answer · 2020-03-12T12:10:22

So on the question of atomicity the PCIe 3.0 specification (only one I have) is mentioned a few times.

First you have SECTION 6.5 Locked Transactions this is likely not what you need but I want to document it anyway. Basically it's the worst case scenario of what you were describing earlier.

Locked Transaction support is required to prevent deadlock in systems that use legacy software which causes the accesses to I/O devices

But you need to properly check using this anyway as it notes.

If any read associated with a locked sequence is completed unsuccessfully, the Requester must assume that the atomicity of the lock is no longer assured, and that the path between the Requester and Completer is no longer locked

With that said Section 6.15 Atomic Operations (AtomicOps) is much more like what you are interested in. There are 3 types of operations you can perform with the AtomicOps instruction.

FetchAdd (Fetch and Add): Request contains a single operand, the “add” value

Swap (Unconditional Swap): Request contains a single operand, the “swap” value

CAS (Compare and Swap): Request contains two operands, a “compare” value and a “swap” value

Reading Section 6.15.1 we see mention that these instructions are largely implemented for cases where multiple producers/consumers exist on a singular bus.

AtomicOps enable advanced synchronization mechanisms that are particularly useful when there are multiple producers and/or multiple consumers that need to be synchronized in a non-blocking fashion. For example, multiple producers can safely enqueue to a common queue without any explicit locking.

Searching the rest of the specification I find little mention of atomicity outside of the sections pertaining to these AtomicOps. That would imply to me that the spec only insures such behavior when these operations are used however the context around why this was implemented suggests that they only expect such questions when a multi producer/consumer environment exists which yours clearly does not.

The last place I would suggest looking to answer your question is Section 2.4 Transaction Ordering To note I am fairly sure the idea of transactions "passing" others only makes sense with switches in the middle as these switches can make such decisions, once your put bits on the bus in your case there is no going back. So this likely only applies if you place a switch in there.

Your concern is can a write bypass a read. Write being posted, read being non-posted.

A3, A4 A Posted Request must be able to pass Non-Posted Requests to avoid deadlocks.

So in general the write is allowed to bypass the read to avoid deadlocks.

With that concern raised I do not believe it is possible for the write to bypass the read on your system since there is no device on the bus to do this transaction reordering. Since you have RTOSes I highly doubt they are enquing the PCIe transactions and reordering them before sending although I have not looked into that personally.

Are writes on the PCIe bus atomic?

1 Answers