In gcc inline assembly, you can use labels and have the assembler sort out the jump target for you. Something like (contrived example):
int max(int a, int b)
{
int result;
__asm__ __volatile__(
"movl %1, %0\n"
"cmpl %2, %0\n"
"jeq a_is_larger\n"
"movl %2, %0\n"
"a_is_larger:\n" : "=r"(result), "r"(a), "r"(b));
return (result);
}
That's one thing. The other thing you could do to avoid multiplication is to make the assembler align the blocks for you, say, at a multiple of 32 bytes (I don't think the instruction sequence fits into 16 Bytes), like:
".align 32\n" \
".Lmul"
"movq -"
"mulq "
"addq %%rax,%0\n" \
"adcq %%rdx,%1\n" \
"adcq $0,%2\n"
This will simply pad the instruction stream with nop
. If yo do choose not to align these blocks, you can still, in your main expression, use the generated local labels to find the size of the assembly blocks:
#ifdef UNALIGNED
__asm__ ("imul $(.Lmul0-.Lmul1), %[label]\n"
#else
__asm__ ("shlq $5, %[label]\n"
#endif
"leaq .Lmulblkstart, %[dummy]\n"
"jmp (%[dummy], %[label])\n"
".align 32\n"
".Lmulblkstart:\n"
__mul(127)
...
__mul(0)
: ... [dummy]"=r"(dummy) : [label]"r"((128-count)))
And for the case where count
is a compile-time constant, you can even do:
__asm__("jmp .Lmul" #count "\n" ...);
Little note on the end:
Aligning the blocks is a good idea if the autogenerated _mul()
thing can create sequences of different lengths. For constants 0..127
as you use, that won't be the case as they all fit into a byte, but if you'll scale them larger it would go to 16- or 32-bit values and the instruction block would grow alongside. By padding the instruction stream, the jumptable technique can still be used.