I am writing a Linux device driver which supports multiple devices. I have a x8 PCIe card with 4 of these devices on it. Each runs through a PCIe switch and gets 2 PCIe lanes. Is there a way to have the driver write to multiple lanes at the same time? If so, how would I do this? I would think it should be possible since it is all on one PCIe slot, but I have no idea how this would be done from the driver.
2 Answers
PCIe doesn't work quite the way you think it does. The switch is not partitioning up the upstream x8 link into multiple x2 links -- it simply forwards traffic from one link to another. So what you will see is a x8 link to the switch, and then 4 x2 links from the switch to the downstream devices. However with a different switch and different downstream devices, it would be equally possible to (for example) have x8 links everywhere, ie a x8 link from the root port to the switch and x8 links from the switch to the downstream devices.
However, in your case you have a matched amount of bandwidth on both sides of the switch, so there should be no issue with devices competing for a limited amount of bandwidth. Your driver can talk to all the devices simultaneously as efficiently as if there were independent links.
It sounds like you're looking for PCIe multicast. This has no connection to the number of lanes, but would simply be a function of delivering a single write to multiple destinations as efficiently as possible. There is a standard for this, mostly intended for backplane uses, see: http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=12f5c260ccf5e054366d4c96ee655fa6827db5b3
It looks like this is supported with a new PCI BAR type, where multiple devices would have the same mapped physical address range, and the switch would also be configured to know about this multicast range. But this all needs OS support, and I haven't found anything on the web to suggest that Linux has the pieces necessary to configure the devices to do all this.
Since your parent link has enough bandwidth to saturate all four child links, you don't have a throughput problem. The only thing you'd save with multicast is bandwidth from the memory subsystem. If you have a modern architecture, the amount you'd save would be in the noise.
In other words, don't worry about it. Treat your devices as independent (this will make for a cleaner driver, anyway) and get on with your project.