2
votes

I am getting an alignment error when accessing part of my DDR memory on a Xilinx ZynqMP embedded system with Linux Kernel 4.9.0.

We hide some MB from the top of the memory from the Linux system and map it with ioremap into a dedicated driver. The memory is exposed to the userspace on the embedded system by a mmap interface. We use this chunk of memory to communication between different processors (RPU, APU, Host).

static int genmem_mmap(struct file *pFile, struct vm_area_struct *pVma)
{
   struct aim_mem_device *pInst;
   int ret;

   /* get private data */
   pInst = (struct aim_mem_device*)(pFile->private_data);

 ....[Some checks omitted here ]...

   /* mapping */
   pVma->vm_page_prot = pgprot_noncached(pVma->vm_page_prot);
   ret = remap_pfn_range(
         pVma,                                             /* user vma to map to */
         pVma->vm_start,                                   /* target user address to start at */
         pVma->vm_pgoff + ((pInst->res.start)>>PAGE_SHIFT),/* physical address of kernel memory */
         pVma->vm_end - pVma->vm_start,                    /* size of map area */
         pVma->vm_page_prot                                /* page protection flags for this mapping */
         );

 ....[Some checks omitted here ]...


   return ret;
}

Unfortunately this raises an unhandled alignment fault on memcpy in userspace with sizes not aligned to 8 bytes. I found some information about pgprot_noncached and MT_DEVICE_nGnRnE and that this leads to a strict alignment setting. However I do not really understand what alternatives I have. Since I am communicating with other processors, I need the non cached setting.

# [  509.376525] esmartd[1505]: unhandled alignment fault (11) at 0x7f80b24032, esr 0x92000021
[  509.384674] pgd = ffffffc0341bf000
[  509.388038] [7f80b24032] *pgd=0000000034bba003[  509.392283] , *pud=0000000034bba003
, *pmd=0000000033c24003[  509.397740] , *pte=01e800003f000f43
[  509.401224]
[  509.402717]
[  509.404174] CPU: 0 PID: 1505 Comm: esmartd Not tainted 4.9.0-aim2-00054-g1ccf631 #183
[  509.411995] Hardware name: ZynqMP AIM APXX (DT)
[  509.416850] task: ffffffc034136e80 task.stack: ffffffc033e14000
[  509.422757] PC is at 0x7f80bafe94
[  509.426042] LR is at 0x401200
[  509.429001] pc : [<0000007f80bafe94>] lr : [<0000000000401200>] pstate: 00000000
[  509.436384] sp : 0000007ffffe0040
[  509.439676] x29: 0000007ffffe0040 x28: 0000000000000000
[  509.444963] x27: 0000000000000000 x26: 0000000000000000
[  509.450262] x25: 0000000000000000 x24: 0000000000000000
[  509.455553] x23: 0000000000000000 x22: 0000000000000000
[  509.460848] x21: 0000000000000000 x20: 0000000000000000
[  509.466147] x19: 00000000004038d8 x18: 0000000000000001
[  509.471438] x17: 0000007f80bafe40 x16: 0000000000414350
[  509.476733] x15: 0000007f80cab030 x14: 00007a672e726174
[  509.482027] x13: 2e6574616470752d x12: 3031393378787061
[  509.487323] x11: 0000001600000000 x10: 0074005300200033
[  509.492617] x9 : 00320020006c0065 x8 : 0064006f00000042
[  509.497911] x7 : 0000000145534d54 x6 : 000000000ccb80c0
[  509.503207] x5 : 0000000000000003 x4 : 0000000000000000
[  509.508502] x3 : 0000000000000000 x2 : 0000000000000002
[  509.513796] x1 : 0000007f80b24042 x0 : 000000000ccb8080
[  509.519090]

BTW: Same code is working on a Xilinx Zynq-7000 (ARM32) series.

A similar issue but with DMA buffers can be found here: Linux on arm64: sendto causes “Unhandled fault: alignment fault (0x96000021)” when sending data from mmapped coherent DMA buffer

Is there an alternative to pgprot_noncached which provides an non cached access but does not imply any alignment requirements ?

1
is there source code of your project? I mean: vivado .tcl to build the hw, the .config of the kernel and the application you are using in the RPU and APU?Leos313

1 Answers

0
votes

I may be coming a bit late to the party but for future reference, here is my answer:

On ARM64/armv8, device memory (set by pgprot_noncached) can only be accessed with 8-byte (64-bit) alignment. (armv8 trm).

#define pgprot_noncached(prot) __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRnE) | PTE_PXN | PTE_UXN

To comunicate with other processors you may use Normal memory, as long as it is set as non cached. The main difference to device memory, as set by the above function, is that it may have speculative accesses and the GRE (armv8 trm) that was not set on device memory. Special caution with R (re-ordering) as processor may re-order any memory accesses done in the program, flags may be set prior to actual memory buffers copies are actually made.

TL;DR: Use pgprot_writecombine to set your memory as Normal and non-cached.

#define pgprot_writecombine(prot) __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)

Or, you can simply force a 64-bit alignment on the struct/memory you are actually copying (don't know if memcpy forces 8 bit copies or if supports 64 bit).