1
votes

I'm building a data acquisition system based on the UltraScale+ FPGA equipped with arm64 CPU. The data are transmitted to RAM via DMA. The DMA buffers in the driver are reserved as below:

virt_buf[i] = dma_zalloc_coherent(&pdev->dev, BUF_SIZE, &phys_buf[i],GFP_KERNEL);

In the driver's mmap function, the mapping to the user space is done in the following way:

#ifdef ARCH_HAS_DMA_MMAP_COHERENT
   printk(KERN_INFO "Mapping with dma_map_coherent DMA buffer at phys: %p virt %p\n",phys_buf[off],virt_buf[off]);
   res = dma_mmap_coherent(&my_pdev->dev, vma, virt_buf[off], phys_buf[off],  vsize);
#else
   physical = phys_buf[off];
   res=remap_pfn_range(vma,vma->vm_start, physical >> PAGE_SHIFT , vsize, pgprot_noncached(vma->vm_page_prot));
   printk(KERN_INFO "Mapping with remap_pfn_range DMA buffer at phys: %p virt %p\n",physical,virt_buf[off]);
#endif

On my UltraScale+ CPU remap_pfn_range is used. In the user space application the data are read from the buffer, and currently immediately send in UDP packets with length limited to MAX_DGRAM (originally equal to 572).

 int i = 0;
 int bleft = nbytes;
 while(i<nbytes) {
    int bts = bleft < MAX_DGRAM ? bleft : MAX_DGRAM;
    if (sendto(fd,&buf[nbuf][i],bts,0, res2->ai_addr,res2->ai_addrlen)==-1) {
       printf("%s",strerror(errno));
       exit(1);
    }
    bleft -= bts;
   i+= bts;
 }

Everything worked perfectly on the 32-bit Zynq FPGA. However, after I moved it to the 64-bit UltraScale+ FPGA, I started to receive random errors, after a few hundreds of transfers.

[  852.703491] Unhandled fault: alignment fault (0x96000021) at 0x0000007f82635584
[  852.710739] Internal error: : 96000021 [#4] SMP
[  852.715235] Modules linked in: axi4s2dmov(O) ksgpio(O)
[  852.720358] CPU: 0 PID: 1870 Comm: a4s2dmov_send Tainted: G      D    O    4.4.0 #3
[  852.728001] Hardware name: ZynqMP ZCU102 RevB (DT)
[  852.732769] task: ffffffc0718ac180 ti: ffffffc0718b8000 task.ti: ffffffc0718b8000
[  852.740248] PC is at __copy_from_user+0x8c/0x180
[  852.744836] LR is at copy_from_iter+0x70/0x24c
[  852.749261] pc : [<ffffffc00039210c>] lr : [<ffffffc0003a36a8>] pstate: 80000145
[  852.756644] sp : ffffffc0718bba40
[  852.759935] x29: ffffffc0718bba40 x28: ffffffc06a4bae00 
[  852.765228] x27: ffffffc0718ac820 x26: 000000000000000c 
[  852.770523] x25: 0000000000000014 x24: 0000000000000000 
[  852.775818] x23: ffffffc0718bbe08 x22: ffffffc0710eba38 
[  852.781112] x21: ffffffc0718bbde8 x20: 000000000000000c 
[  852.786407] x19: 000000000000000c x18: ffffffc000823020 
[  852.791702] x17: 0000000000000000 x16: 0000000000000000 
[  852.796997] x15: 0000000000000000 x14: 00000000c0a85f32 
[  852.802292] x13: 0000000000000000 x12: 0000000000000032 
[  852.807586] x11: 0000000000000014 x10: 0000000000000014 
[  852.812881] x9 : ffffffc0718bbcf8 x8 : 000000000000000c 
[  852.818176] x7 : ffffffc0718bbdf8 x6 : ffffffc0710eba2c 
[  852.823471] x5 : ffffffc0710eba38 x4 : 0000000000000000 
[  852.828766] x3 : 000000000000000c x2 : 000000000000000c 
[  852.834061] x1 : 0000007f82635584 x0 : ffffffc0710eba2c 
[  852.839355] 
[  852.840833] Process a4s2dmov_send (pid: 1870, stack limit = 0xffffffc0718b8020)
[  852.848134] Stack: (0xffffffc0718bba40 to 0xffffffc0718bc000)
[  852.853858] ba40: ffffffc0718bba90 ffffffc0006a1b2c 000000000000000c ffffffc06a9bdb00
[  852.861676] ba60: 00000000000005dc ffffffc071a0d200 0000000000000000 ffffffc0718bbdf8
[  852.869488] ba80: 0000000000000014 ffffffc06a959000 ffffffc0718bbad0 ffffffc0006a2358
[...]
[  853.213212] Call trace:
[  853.215639] [<ffffffc00039210c>] __copy_from_user+0x8c/0x180
[  853.221284] [<ffffffc0006a1b2c>] ip_generic_getfrag+0xa4/0xc4
[  853.227011] [<ffffffc0006a2358>] __ip_append_data.isra.43+0x80c/0xa70
[  853.233434] [<ffffffc0006a3d50>] ip_make_skb+0xc4/0x148
[  853.238642] [<ffffffc0006c9d04>] udp_sendmsg+0x280/0x740
[  853.243937] [<ffffffc0006d38e4>] inet_sendmsg+0x7c/0xbc
[  853.249145] [<ffffffc000651f5c>] sock_sendmsg+0x18/0x2c
[  853.254352] [<ffffffc000654b14>] SyS_sendto+0xb0/0xf0
[  853.259388] [<ffffffc000084470>] el0_svc_naked+0x24/0x28
[  853.264682] Code: a88120c7 a8c12027 a88120c7 36180062 (f8408423) 
[  853.270791] ---[ end trace 30e1cd8e2ccd56c5 ]---
Segmentation fault
root@Xilinx-ZCU102-2016_2:~#

The strange thing is, that when I simply read words from the buffer, it does not cause any alignment errors.

It seems, that the send function improperly uses the __copy_from_user function, causing unaligned memory access. The question is: is it the kernel bug, or have I done something incorrectly?

However, usually, sending the data block not starting at 8-byte boundary does not trigger the alignment error. The problem occurs with relatively low probability. I was not able to isolate the conditions that lead to the error.

I have worked around the problem by adjusting the MAX_DGRAM so that it is a multiple of 8. However I'm afraid, that the problem may reappear if the data in the mmapped buffer are submitted to more complex processing. Some people reported similar problems in arm64 architecture related to memcpy function (e.g. [https://bugs.launchpad.net/linux-linaro/+bug/1271649]).

What is the correct method for mapping of coherent DMA buffers to user space to avoid memory alignment errors?

1

1 Answers

2
votes

That driver needs updating. ARCH_HAS_DMA_MMAP_COHERENT hasn't been defined by anything other than PowerPC for a long time, and even that looks like a forgotten leftover.

There has been a generic dma_mmap_coherent() implementation since 3.6, so that can, and should, be used unconditionally. The result of the current code is that, thanks to the #ifdef, you always take the other path, then thanks to pgprot_noncached() you end up making the userspace mapping of the buffer Strongly-ordered (Device nGnRnE in AArch64 terms). That's generally a bad idea, as userspace code will assume it's always operating on Normal memory (unless explicitly crafted not to), and can safely do things like unaligned or exclusive accesses, both of which are liable to go badly wrong on Device-type memory. I'm not even going to ask what kind of craziness ends up with the kernel copying data back out of a userspace mapping of a kernel buffer*, but suffice to say the kernel - via copy_{to,from,in}_user() - also assumes userspace addresses are mapped as Normal memory and thus safe for unaligned accesses. Frankly I'm a little surprised this doesn't blow up similarly on 32-bit ARM, so I guess your data happens to always be at least 4-byte aligned - that would also explain why reading words (with 32-bit accesses) is fine, if only 64-bit doubleword accesses can potentially be misaligned.

In short, just use dma_mmap_coherent(), and get rid of the open-coded poor equivalent. That will give userspace a Normal non-cacheable mapping (or a cacheable one for a hardware-coherent device) which will work as expected. It's also not broken in terms of assuming a dma_addr_t is a physical address, as your driver code seems to do - that's another thing that's liable to come around and bite you in the bum sooner or later (ZynqMP has a System MMU, so you can presumably update to a 4.9 kernel, wire up some Stream IDs, add them to the DT, and watch that assumption go bang in new and exciting ways).

* Although it does occur to me that there was some circumstance under which copying from the very end of a page may sometimes over-read into the next page, which could trigger this unwittingly if the following page happened to be a Device/Strongly-ordered mapping, which led to this patch in 4.5. Linus' response to such memory layouts was "...and nobody sane actually does that..."