Gcc inline asm early clobber constraints are described here in the gcc docs here:
http://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
We have an amd64 implementation of 128 bit add:
#define ADD128(rh, rl, ah, al, bh, bl) \
__asm__("addq %2, %0; adcq %3, %1" \
/* outputs */ : "=r"(rl), /* %0 */ \
"=r"(rh) /* %1 */ \
/* inputs */ : "emr"(bl), /* %2 */ \
"emr"(bh), /* %3 */ \
"0"(al), /* %4 == %0 */ \
"1"(ah) /* %5 == %1 */ \
/* clobbers */: "cc" /* condition registers (CF, ...) */ \
)
I was wondering if this must use early clobber (&) for %0:
#define ADD128(rh, rl, ah, al, bh, bl) \
__asm__("addq %2, %0; adcq %3, %1" \
/* outputs */ : "=&r"(rl), /* %0 */ \
"=r"(rh) /* %1 */ \
/* inputs */ : "emr"(bl), /* %2 */ \
"emr"(bh), /* %3 */ \
"0"(al), /* %4 == %0 */ \
"1"(ah) /* %5 == %1 */ \
/* clobbers */: "cc" /* condition registers (CF, ...) */ \
)
However, I was not so sure since we have inputs = outputs explicitly in the amd64 version (%0
==%4
, %1
==%5
)?
The first non-earlyclobber version appears to currently work on all optimization levels we are using, at least with the intel compiler (we wouldn't need this if using gcc, since gcc now supports native int128 operations on this target).
For strict conformance to the gcc specs for early clobber in inline asm would we need the &
for the %0
constraint, even with the inputs=outputs statements?