26
votes

I have an FPGA (Like most of the people asking this question) that gets configured after my Linux kernel does the initial PCIe bus scan and enumeration. As you can guess, the FPGA implements a PCIe endpoint.

I would Like to have the PCIe core re-enumerate the ENTIRE PCIe bus so that my FPGA will then show up and I can load my driver module. I would also like the ability to SWAP the FPGA load out for a different configuration. By this I mean I would like to be able to:

  1. Boot Linux
  2. Configure FPGA
  3. Enumerate PCIe endpoint and load module
  4. Remove PCIe endpoint
  5. Re-configure FPGA
  6. Re-enumerate PCIe endpoint

All without rebooting Linux

Here are solutions that have been proposed elsewhere but do not solve the problem.

echo 1 > /sys/bus/pci/rescan This seems to work (only sometimes) and it does not work if I want to hotswap the FPGA load after it was first enumerated.

Can the Hotplug/power managment facilities of PCIe be used to make this work? If so is there any good resources for how to use the Hotplug system with PCIe? (LDD does not quite cover it thoroughly enough)

3
Look at PCIe hotplug mechanism. It's supported in newer kernels. Actually how by your opinion Thunderbolt will work? It's the same here.0andriy
Are you executing rescan on the host machine or inside a Xen VM? Xen had problem to rescan the PCIe tree and crashed in the past. I don't know if it is solved.Paebbels
I'm wondering what base hardware are you using. In my experience with commercial grade motherboards the rescan method rarely worked. I went the partial reconfiguration route to solve the problem (by not reenumerating). @Paebbels @whh4000 can you share your setup ?Claudio
To my knowledge, its independent of the hardware. Enumeration is already done by BIOS/UEFI and a second time by the kernel. If a system supports rescaning is a matter of kernel software and support for the particular platform (root complex driver, ...). Its also a question if the kernel and the drivers support disassembling the PCI tree for a short time before it's assembled again. The main copyright of the source code is not mine, but I'll forward your request.Paebbels
@Claudio The general setup is a 'cloud system', where user can allocate FPGA resources. There is a login node, where you can allocate an FPGA, program the FPGA, enable ChipScope forwarding, register your PCI driver and free the FPGA. In a future state its planned to allocate a VM per user and to integrated GPUs, too.Paebbels

3 Answers

17
votes

Re-enumerating the PCIe bus/tree via echo 1 > /sys/bus/pci/rescan is the correct solution. We are using it the same way as you described it.

We are using echo 1 > $pcidevice/remove to disconnect the driver from the device and to detach the device from the tree. The driver (xillybus) is not unloaded, just disconnected.

A better solution is to rescan only the node where your FPGA is attached to. This reduces the over all impact for the system.

This technique is used in the RC3E FPGA cloud system.

1
votes

This is really dependent on exactly what is changed on the FPGA. The problem is in how PCIe enumeration and address assignment is done, particularly how the PCIe switches are configured. The allocation MUST be done in one shot as a depth-first search. After this is complete, it is not possible to go insert additional bus numbers or address space without changing all of the subsequent allocations, which would require reloading all of the corresponding device drivers. Basically, once the bus is enumerated and addresses are assigned, you can't change the overall allocations without re-enumerating the entire bus, which requires a reboot. Preallocating resources on a specific PCIe port can alleviate this problem, and is required for PCIe hot plugging.

If the PCIe BAR configuration has not changed, then usually doing a remove/hot reset/rescan is sufficient and no reboots are required.

If the BAR configuration has changed, then it's a different story. If the new BARs are smaller, then there should be no problem. But if the new BARs are larger or there are more BARs, if there isn't enough address space allocated to the switch port that the device is attached to, then those BARs cannot be allocated address space and the device will fail to enumerate. In this case, a reboot is required to so that resources can be reassigned. Don't forget that there are also 32 bit BARs and 64 bit BARs and these BARs are assigned form two different pools of address space, so changing BAR types can also require a reboot to re-enumerate.

If you're going from no device to a device (i.e. blank FPGA to configured FPGA), then bus numbers may need to be reassigned, which requires a reboot.

0
votes

From The Doctor

Here is how to reset the Vegas before same as a reset in windows. This is based on the Vendor ID.

lspci -n | grep 1002: | egrep -v ".1"| awk '{print "find /sys | grep ""$1"/rescan" -| tac -;"}' | sh - | sed s/^/echo\ 1\ >\ "&/g | sed s/$/"/g

The output of that put in your /etc/rc.local to reset your Vegas after bootup similar to the devcon restart script.

echo 1 > "/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1d.0/0000:06:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1d.1/0000:07:00.0/rescan"