Does this gcc style asm with inputs=outputs require an early clobber?

Question

Gcc inline asm early clobber constraints are described here in the gcc docs here:

http://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers

We have an amd64 implementation of 128 bit add:

#define ADD128(rh, rl, ah, al, bh, bl)                                     \
    __asm__("addq %2, %0; adcq %3, %1"                                     \
            /* outputs */ : "=r"(rl),  /* %0 */                            \
                            "=r"(rh)   /* %1 */                            \
            /* inputs */  : "emr"(bl), /* %2 */                            \
                            "emr"(bh), /* %3 */                            \
                            "0"(al),   /* %4 == %0 */                      \
                            "1"(ah)    /* %5 == %1 */                      \
            /* clobbers */: "cc"       /* condition registers (CF, ...) */ \
           )

I was wondering if this must use early clobber (&) for %0:

#define ADD128(rh, rl, ah, al, bh, bl)                                 \
    __asm__("addq %2, %0; adcq %3, %1"                                     \
            /* outputs */ : "=&r"(rl),  /* %0 */                            \
                            "=r"(rh)   /* %1 */                            \
            /* inputs */  : "emr"(bl), /* %2 */                            \
                            "emr"(bh), /* %3 */                            \
                            "0"(al),   /* %4 == %0 */                      \
                            "1"(ah)    /* %5 == %1 */                      \
            /* clobbers */: "cc"       /* condition registers (CF, ...) */ \
           )

However, I was not so sure since we have inputs = outputs explicitly in the amd64 version (%0==%4, %1==%5)?

The first non-earlyclobber version appears to currently work on all optimization levels we are using, at least with the intel compiler (we wouldn't need this if using gcc, since gcc now supports native int128 operations on this target).

For strict conformance to the gcc specs for early clobber in inline asm would we need the & for the %0 constraint, even with the inputs=outputs statements?

Chris Dodd Chris Dodd · Accepted Answer · 2012-06-19T18:20:49

You need the early clobber IF you call this macro with the exact same expression for bh and al. In that case, without the clobber, the compiler might choose to use the same register for %3 and %4 (which is the same as %0), so the first instruction might then clobber that value before the second expression reads it.

Its pretty unlikely that you actually call the macro in a way that might trigger this problem, so its not surprising that you don't see any problems without the clobber. Adding the clobber also introduces an extra (unneeded) register copy when you call the macro with al identical to bl (eg, adding a 128bit value to itself in place), so is slightly undesirable.

Does this gcc style asm with inputs=outputs require an early clobber?

1 Answers