2
votes

I'm implementing cache maintenance functions for ARMv8 (Cortex-A53) running in 32 bit mode. There is a problems when I try to flush memory region by using virtual addresses (VA). DCacheFlushByRange looks like this

// some init.
// kDCacheL1 = 0; kDCacheL2 = 2;
while (alignedVirtAddr < endAddr)
{
    // Flushing L1
    asm volatile("mcr   p15, 2, %0,  c0,  c0,  0" : : "r"(kDCacheL1) :);        // select cache
    isb();
    asm volatile("mcr   p15, 0, %0,  c7, c14,  1" : : "r"(alignedVirtAddr) :);  // clean & invalidate
    dsb();

    // Flushing L2
    asm volatile("mcr   p15, 2, %0,  c0,  c0,  0" : : "r"(kDCacheL2) :);        // select cache
    isb();
    asm volatile("mcr   p15, 0, %0,  c7, c14,  1" : : "r"(alignedVirtAddr) :);  // clean & invalidate
    dsb();

    alignedVirtAddr += lineSize;
}

DMA is used to validate the functions. DMA copies one buffer into another. Source buffer is flushed before DMA, destination buffer is invalidated after DMA completion. Buffers are 64 bytes aligned. Test

for (uint32_t i = 0; i < kBufSize; i++)
    buf1[i] = 0;
for (uint32_t i = 0; i < kBufSize; i++)
    buf0[i] = kRefValue;

DCacheFlushByRange(buf0, sizeof(buf0));

// run DMA
while (1) // wait DMA completion;

DCacheInvalidateByRange(buf1, sizeof(buf1));
compare(buf0, buf1);

In dump I could see that buf1 still contains only zeroes. When caches are turned off, result is correct so DMA itself works correctly.

Other point is when whole D-cache is flushed/invalidated by set/way result is correct.

// loops th/ way & set for L1 & L2
asm volatile("mcr   p15, 0, %0,  c7, c14,  2" : : "r"(setway) :)

So shortly flush/invalidate by set/way work correctly. The same by flashing/invalidating using VA doesn't. What could be a problem?

PS: kBufSize=4096;, total buffer size is 4096 * sizeof(uint32_t) == 16KB

1
You should probably just flush the entire L1 cache if the range/buffer is large and then pause to make sure it completes before flushing the L2 cache. Also, there is a write buffer (or the like) which is not part of the cache. You don't give sizes nor if 'buf1' is completely zero or partially. Sets are usually consecutive addresses.artless noise
Buffer size is 16KB. I also tried 64B buffer, result is the same. Flush whole L1 & Flush L2 region by VA doesn't work. Whole destination buffer is zeroes in all cases.user3124812
Sorry, I am not familiar with the A53, however on Cortex-A7, there is a memory mapped register interface to the L2. The CP15 registers will not flush the L2 cache (even though a manual may seem to indicate this). Do you have an ARM manual besides the Cortex-A53 TRM? Usually the SCU and on-chip timers, etc have separate register files at least with earlier ARM chips. Linux uses different mechanisms in cache.S.artless noise
@artlessnoise L2 Cache Controller is separate chip in A7 (similar to other periphery, even they are all packed together). A53 has integrated L2 cache.user3124812

1 Answers

2
votes

There is no a problems w/ the function itself rather than Cortex-A53 cache implementation features.

From Cortex-A53 TRM

DCIMVAC operations in AArch32 and DC IVAC instructions in AArch64 perform an invalidate of the target address. If the data is dirty within the cluster then a clean is performed before the invalidate.


So there is no actual invalidate, there's clean and invalidate

Normal (at least for me) sequence is

flush(src);
dma(); // copy src -> dst
invalidate(dst);

But due to invalidate() does flush, old data from cache (dst region) is written on top of data in memory after DMA transfer.


Solution/workaround is

flush(src);
invalidate(dst);
dma(); // copy src -> dst
invalidate(dst); // again, that's right*.


* Data from 'dst' memory region could be fetched into a cache in advance. If that happens before DMA put data in memory, an old data from cache would be used. Second invalidate is fine, since data is not marked as 'dirty', it would be performed as 'pure invalidate'. No clean/flush in this case.