9
votes

I am working with a high speed serial card for high rate data transfers from an external source to a Linux box with a PCIe card. The PCIe card came with some 3rd party drivers that use dma_alloc_coherent to allocate the dma buffers to receive the data. Due to Linux limitations however, this approach limits data transfers to 4MB. I have been reading and trying multiple methods for allocating a large DMA buffer and haven't been able to get one to work.

This system has 32GB of memory and is running Red Hat with a kernel version of 3.10 and I would like to make 4GB of that available for a contiguous DMA. I know the preferred method is scatter/gather, but this is not possible in my situation as there is a hardware chip that translated the serial protocol into a DMA beyond my control, where the only thing that I can control is adding an offset to the incoming addresses (ie, address zero as seen from the external system can be mapped to address 0x700000000 on the local bus).

Since this is a one-off lab machine I think the fastest/easiest approach would be to use mem=28GB boot configuration parameter. I have this working fine, but the next step to access that memory from virtual space is where I am having problems. Here is my code condensed to the relevant components:

In the kernel module:

size_t len = 0x100000000ULL; // 4GB
size_t phys = 0x700000000ULL; // 28GB
size_t virt = ioremap_nocache( phys, len ); // address not usable via direct reference
size_t bus = (size_t)virt_to_bus( (void*)virt ); // this should be the same as phys for x86-64, shouldn't it?

// OLD WAY
/*size_t len = 0x400000; // 4MB
size_t bus;
size_t virt = dma_alloc_coherent( devHandle, len, &bus, GFP_ATOMIC );
size_t phys = (size_t)virt_to_phys( (void*)virt );*/

In the application:

// Attempt to make a usable virtual pointer
u32 pSize = sysconf(_SC_PAGESIZE);
void* mapAddr = mmap(0, len+(phys%pSize), PROT_READ|PROT_WRITE, MAP_SHARED, devHandle, phys-(phys%pSize));
virt = (size_t)mapAddr + (phys%pSize);

// do DMA to 0x700000000 bus address

printf("Value %x\n", *((u32*)virt)); // this is returning zero

Another interesting thing is that before doing all of this, the physical address returned from dma_alloc_coherent is greater than the amount of RAM on the system(0x83d000000). I thought that in x86 the RAM will always be the lowest addresses and therefore I would expect an address less than 32GB.

Any help would be appreciated.

1
Err... 0x770000000ULL is 29.75 GB, not 28... Try 0x700000000 instead.Iwillnotexist Idonotexist
Dope, stupid math error. Still shouldn't matter as that area should still be valid RAM. I hadn't gone up to a 4GB test case yet and was still only using 4MB. Will update the question.LINEMAN78
I have a 32GB memory system handy. Could you post up an absolute-barebones but complete kernel module source file, as well as an absolute-minimum usermode program to test? Also, why did you tag with [c++] when the Linux kernel is exclusively [c], and the usermode snippet you show uses exclusively C APIs?Iwillnotexist Idonotexist
The application is compiled with C++ as it is part of a C++ application, however the code associated with this problem is exclusively C. I'm not sure I can post the Kernel Module as it is proprietary software of the hardware manufacturer, but I can try to strip out the relevant parts.LINEMAN78
An explanation for the large physical address from dma_alloc_coherent: The hardware that needs memory mapped I/O (like your video card) uses memory in the 3GB-4GB physical range, so the largest physical RAM address would be extended by 1GB past the size of the memory (to 0x8 4000 0000 in this case). So the 0x8 3D00 0000 address you got would be near the top of your physical memory but not past it.1201ProgramAlarm

1 Answers

1
votes

Instead of limiting the amount of system memory via mem, try using CMA: https://lwn.net/Articles/486301/

Using the CMA kernel command line argument allows you to reserve a certain amount of memory for DMA operations that is guaranteed to be contiguous. The kernel will allow non-DMA processes to access that memory, but as soon as a DMA operation needs that memory, non-DMA processes will be evicted. So, I would advise not changing your mem parameter, but adding cma=4G to your cmdline. dma_alloc_coherent should automatically pull from that reserved space, but you can enable CMA debugging in your kernel config to make sure.