I am trying to create an ldm (resp. stm) instruction with inline assembly but have problems to express the operands (especially: their order).
A trivial
void *ptr;
unsigned int a;
unsigned int b;
__asm__("ldm %0!,{%1,%2}" : "+&r"(ptr), "=r"(a), "=r"(b));
does not work because it might put a into r1 and b into r0:
ldm ip!, {r1, r0}
ldm expects registers in ascending order (as they are encoded in a bitfield) so I need a way to say that the register used for a is lower than this of b.
A trivial way is the fixed assignment of registers:
register unsigned int a asm("r0");
register unsigned int b asm("r1");
__asm__("ldm %0!,{%1,%2}" : "+&r"(ptr), "=r"(a), "=r"(b));
But this removes a lot of flexibility and might make the generated code not optimal.
Does gcc (4.8) support special constraints for ldm/stm? Or, are there better ways to solve this (e.g. some __builtin function)?
EDIT:
Because there are recommendations to use "higher level" constructs... The problem I want to solve is packing of 20 bits of a 32 bit word (e.g. input is 8 words, output is 5 words). Pseudo code is
asm("ldm %[in]!,{ %[a],%[b],%[c],%[d] }" ...)
asm("ldm %[in]!,{ %[e],%[f],%[g],%[h] }" ...) /* splitting of ldm generates better code;
gcc gets out of registers else */
/* do some arithmetic on a - h */
asm volatile("stm %[out]!,{ %[a],%[b],%[c],%[d],%[e] }" ...)
Speed matters here and ldm is 50% faster than ldr. The arithmetic is tricky and because gcc generates much better code than me ;) I would like to solve it in inline assembly with giving some hints about optimized memory access.