0
votes

I am trying to debug somewhat strange problem in the device driver for the PCIe FPGA device. Both the device driver and the FPGA image are developed in the house.

The target system is x86, and the OS is fedora 9. It has a PCIe card with the FPGA plugged in it's only PCIe slot. The FPGA image is loaded after the boot from the EEPROM.

The driver is written in such a way that it uses the /sys/bus/pci/devices/0000:02:00.0/ resource files (where 0000:02:00.0 is the PCI slot of the card containing the FPGA) to configure the FPGA.

When the system boots (or when it returns from the hibernation), the FPGA link seams to be lost, and the resource files are missing. When the FPGA boots properly, everything works fine (the resource files are there). When the system enters the hibernation, the FPGA is powered off. When it returns from the hibernation, the FPGA is powered on, before starting the driver initialization.

I am suspecting at next things :

  • a bug in firmware - something related to PCI plug in?
  • a bug in kernel - least likely, because other PCI cards are recognized fine. Only
    this PCI card makes problems

And the questions are :

  • Has anyone had similar problems?
  • What else could be wrong?
  • Any suggestions on how to debug this issue?

EDIT

I just found this bug, which is very similar to the problem I am seeing.

2
Losing is spelled L-O-S-I-N-G.Rob
@Paul My target system is x86, and I am not sure if it is relevant. Do you think it is not?BЈовић
maybe - it's not really clear from your question what sort of answer you're looking for - it seems more of a "cry for help" than a specific question. You might want to mention details of the target system is in your question though, e.g. Linux, x86, 32 bit PCI, etc.Paul R
More information is needed. For example, how to do you program the FPGA? Is it loaded from a configuration memory on the PCI card itself?Dr. Watson

2 Answers

0
votes

A PCIe card has to reply to a "Is anybody there" message within a certain time. Is is possible that your card is not responding quickly enough after hibernation / reset?

Without more details of your design, it is hard to do anything but guess.

Can you list the differences between the system working and not working, i.e. what do you do differently to get the card to work?

1
votes

I finally managed to debug my problem. Just before entering the hibernation, all processes which are still using the resource files are being killed. For some unknown reason, one process didn't release resources, and was killed. We have a watchdog, which respawns all processes which are not running.

When coming back from the hibernation, this process respawned, and since it couldn't open the resource files, it died again, and then a critical error was declared. After some very small time, the resources files were added by the OS, and this process could continue normally.