6
votes

I'm trying to get DMA transfer working between an FPGA and an x86_64 Linux machine.

On the PC side I'm doing this initialization:

//driver probe
... 
pci_set_master(dev); //set endpoint as master
result = pci_set_dma_mask(dev, 0xffffffffffffffff); //set as 64bit capable
...

//read
pagePointer = __get_free_page(__GFP_HIGHMEM); //get 1 page
temp_addr = dma_map_page(&myPCIDev->dev,pagePointer,0,PAGE_SIZE,DMA_TO_DEVICE);
printk(KERN_WARNING "[%s]Page address: 0x%lx Bus address: 0x%lx\n",DEVICE_NAME,pagePointer,temp_addr);
writeq(cpu_to_be64(temp_addr),bar0Addr); //send address to FPGA
wmb();
writeq(cpu_to_be64(1),bar1Addr); //start trasnfer
wmb();

The bus address is a 64bits address. On the FPGA side the TLP I'm sending out for the read of 1 DW:

Fmt: "001"
Type: "00000"
R|TC|R|Attr|R|TH : "00000000"
TD|EP|Attr|AT : "000000"
Length : "0000000001"
Requester ID
Tag : "00000000"
Byte Enable : "00001111";
Address : (address from dma map page)

The completion that I get back from the PC is :

Fmt: "000"
Type: "01010"
R|TC|R|Attr|R|TH : "00000000"
TD|EP|Attr|AT : "000000"
Length : "0000000000"
Completer ID
Compl Status|BCM : "0010"
Length : "0000000000";
Requester ID
Tag : "00000000"
R|Lower address : "00000000"

so basically a completion without data and with the status Unsupported Request. I don't think there is something wrong on the construction of the TLP but I cannot see any problem on the driver side either. The kernel I'm using has the PCIe error reporting enabled but I see nothing in the dmesg output. What's wrong? Or, is there a way to find why I get that Unsupported Request Completion?

Marco

1
You could compare your code to other open PCIe drivers like Riffa 2.x or XilliBus on how to use kernel function for DMA.Paebbels

1 Answers

2
votes

This is an extract from one of my designs (that works!). It's VHDL and slightly different but hopefully it will help you:

-- First dword of TLP Header
tlp_header_0(31 downto 30)  <= "01";            -- Format = MemWr
tlp_header_0(29)                        <= '0' when pcie_addr(63 downto 32) = 0 else '1'; -- 3DW header or 4DW header
tlp_header_0(28 downto 24)  <= "00000";         -- Type
tlp_header_0(23)                        <= '0'; -- Reserved
tlp_header_0(22 downto 20)  <= "000";           -- Default traffic class
tlp_header_0(19)                        <= '0'; -- Reserved
tlp_header_0(18)                        <= '0'; -- No ID-based ordering
tlp_header_0(17)                        <= '0'; -- Reserved
tlp_header_0(16)                        <= '0'; -- No TLP processing hint
tlp_header_0(15)                        <= '0'; -- No TLP Digest
tlp_header_0(14)                        <= '0'; -- Not poisoned
tlp_header_0(13 downto 12)  <= "00";            -- No PCI-X relaxed ordering, no snooping
tlp_header_0(11 downto 10)  <= "00";            -- No address translation
tlp_header_0( 9 downto  0)  <= "00" & X"20";    -- Length = 32 dwords

-- Second dword of TLP Header
-- Bits 31 downto 16 are Requester ID, set by hardware PCIe core
tlp_header_1(15 downto 8)       <= X"00";   -- Tag, it may have to increment
tlp_header_1( 7 downto 4)       <= "1111";  -- Last dword byte enable
tlp_header_1( 3 downto 0)       <= "1111";  -- First dword byte enable

-- Third and fourth dwords of TLP Header, fourth is *not* sent when pcie_addr is 32 bits
tlp_header_2    <= std_logic_vector(pcie_addr(31 downto  0)) when pcie_addr(63 downto 32) = 0 else std_logic_vector(pcie_addr(31 downto 0));
tlp_header_3    <= std_logic_vector(pcie_addr(31 downto  0));

Let's ignore the obvious difference that I was performing MemWr of 32 dwords instead of reading a dword. The other difference, which caused me trouble the first time I did this, is that you have to use 3DW header if the address is below 4GB.

That means you have to check the address you get from the host and determine if you need to use the 3DW header (with only LSBs of address) or the full 4DW header mode.

Unless you need to transfer ungodly amount of data, you can set the dma address mask to 32 bits to be always in the 3DW case, Linux should reserve plenty of memory location below 4GB by default.