1
votes

I have an arm cortex-a9 quad core device, and I'm programming a multi-process application. These processes share the same source of input - a DMA buffer which they all access using a mmap() call.

I noticed that the time it takes for the processes to access the DMA memory, is significantly longer than it takes if I change the source of input to be a normal allocated buffer (i.e. allocated using malloc).

I understand why a DMA buffer must be non-cacheable, however, since I have the ability to determine when the buffer is stable (unchanged by the hardware, which is the case most of the time) or dirty (data has changed) I thought I might get a significant speed improvement if I'll make the memory region temporarily cacheable.

Is there a way to do that?

I'm currently using this line to map the memory:

void *buf = mmap(0, size, PROT_READ | PROT_WRITE,MAP_SHARED, fd, phy_addr);

Thanks!

1
DMA buffer is too broad of a definition. How do you obtain/create one?user405725
I create a DMA buffer with DMA_ALLOC_COHERENT in the device's kernel module.oferlivny
you could make memory permanently cacheable, just take care to flush caches before DMA starts and invalidate caches after DMA finishes for each CPU. Drawback is caches operations also take their time. After all memory copying might be faster... If cache-coherence is supported/implemented by cpu/os then manual cache handling could be omitted...user3124812

1 Answers

1
votes

Most modern CPUs use snooping to determine if/when cache lines must be flushed to memory or marked invalid. On such CPUs a "DMA buffer" is identical to a kmalloc() buffer. This, of course, assumes the snoop feature works correctly and that the OS takes advantage of the snoop feature. If you are seeing differences in accesses to DMA and non-DMA memory regions then I can only assume your CPU either does not have cache snooping capabilities (check CPU docs) or the capability is not used because it doesn't work (check CPU errata).

Problems with your proposed approach:

  1. Do you know when it is time to change the memory region back to non-cacheable?
  2. Changing MMU settings for a memory region is not always trivial (is CPU dependent) and I'm not sure an API even exists within your OS for changing such settings.
  3. Changing MMU settings for a memory region is risky even when it is possible and such changes must be carefully synchronized with your DMA operation or data corruption is virtually guaranteed.

Given all of these significant problems, I suggest a better approach is to copy the data from the DMA buffer to the kmalloc() buffer when you detect the DMA buffer has been updated.