2
votes

Pretty sure I already know the answer to this question since there are related questions on SO already (here, here, and here,, and this was useful),,, but I wanted to be absolutely sure before I dive into kernel-space driver land (never been there before).

I have a PCIe device that I need to communicate with (and vice versa) from an app in linux user space. By opening /dev/mem, then mmap'ing,, I have been able to write a user-space driver built on top of pciutils that has allowed me to mmap the BARs and successfully write data to the device. Now, we need comm to go the other direction, from the PCIe device to the linux user app. In order for this to work, we believe we are going to need a large chunk (~100MB) of physically contiguous memory that never gets paged/swapped. Once allocated, that address will need to be passed to the PCIe device so it knows where to write its data (thus I don't see how this could be virtual, swappable memory). Is there any way to do this without a kernel space driver? One idea here was floated,, perhaps we can open /dev/mem and then feed it an ioctl command to allocate what we need? If this is possible, I haven't been able to find any examples online yet and will need to research it more heavily.

Assuming we need a kernel space driver, it will be best to allocate our large chuck during bootup, then use ioremap to get a kernel virtual address, then mmap from there to user-space, correct? From what I've read on kmalloc, we won't get anywhere close to 100MB using that call, and vmalloc is no good since that's virtual memory. In order to allocate at bootup, the driver should be statically-linked into the kernel, correct? This is basically an embedded application, so portability is not a huge concern to me. A module rather than a statically-linked driver could probably work, but my worry there is memory fragmentation could prevent a physically contiguous region from being found, so I'd like to allocate it asap from power-on. Any feedback?

EDIT1: My CPU is an ARM7 architecture.

2

2 Answers

2
votes

Hugepages-1G

Current x86_64-processors not only support 4k and 2M, but also 1G-pages (flag pdpe1gb in /proc/cpuinfo indicates support).

These 1G-pages must already be reserved at kernel boot, so the boot-parameters hugepagesz=1GB hugepages=1 must be specified.

Then, the hugetlbfs must be mounted:

mkdir /hugetlb-1G
mount -t hugetlbfs -o pagesize=1G none /hugetlb-1G

Then open some file and mmap it:

fd = open("/hugetlb-1G/page-1", O_CREAT | O_RDWR, 0755);
addr = mmap(NULL, SIZE_1G, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

You can now access 1G of physically contiguous memory at addr. To be sure it doesn't get swapped out you can use mlock (but this is probably not even necessary at all for hugepages).

Even if your process crashes, the huge page will be reserved for mapping it like above, so the pci-e device will not write rogue into system or process memory.

You can find out the physical address by reading /proc/pid/pagemap.

0
votes

Actually Ctx's comment about memmap is what got me down the right path. To reserve memory, I gave a bootloader argument as memmap=[size]$[location] which I found here. Different symbols mean different things, and they aren't exactly intuitive. Just another slight correction, the flag is CONFIG_STRICT_DEVMEM, which my kernel was not compiled with.

There are still some mysteries. For instance, the [location] in the the memmap argument seemed to be meaningless. No matter what I set for the location, linux took all that was not reserved with [size] in one contiguous chunk, and the space that I reserved was at the end. The only indication of this was looking at /proc/iomem. The amount of space I reserved matched the gap between the end of linux memory space and the end of system memory space. I could find no indication anywhere that linux said "I see your reserved chunk and I won't touch it" other than it wasn't taken by linux in /proc/iomem. But the FPGA has been writing to this space for days now with no visible ill-effects for linux, so I guess we're all good! I can just mmap to that location and read the data (surprised this works since linux doesn't indicate this exists, but glad it does). Thanks for the help! Ian I'll come back to your comment if I go to kernel driver space.