I'm looking at an efficient method to copy 42 32-bit consecutive memory locations.
Note: A snapshot array is copied to a log array.
I'm using the LDMIA and STMIA pair (10 registers per instruction):
LDMIA R0!, {R2-R12} ; Read 10 array slots\n
STMIA R1!, {R2-R12} ; Write 10 array slots\n
My questions:
- How do these instructions affect the data cache?
- Is the data bus locked during the entire load/store or is it only locked per 32-bit load / store?
In other words, for theLDM
instruction, does the ARM lock the data bus and load all the data into registers, or is the data bus only locked for each 32-bit transfer?
The code is running on an ARM Cortex A8 (Texas Instruments am3358).
I didn't see any hardware details in this page ARM Architecture Reference Manual