I use various double machine word types, like e.g. (u)int128_t on x86_64 and (u)int64_t on i386, ARM etc. in GCC. I am looking for a correct/portable/clean way of accessing and manipulating the individual actual machine words (mostly in assembler). E.g. on 32bit machines I want to directly access the high/low 32bit part of an int64_t which gcc uses internally, without using stupid error-prone code like below. Similarly for the "native" 128bit types I want to access the 64b parts gcc is using (not for the below example as "add" is simple enough, but generally).
Consider the 32bit ASM path in the following code to add two int128_t together (which may be "native" to gcc, "native" to the machine or "half native" to the machine); it's horrendous and hard to maintain (and slower).
#define BITS 64
#if defined(USENATIVE)
// USE "NATIVE" 128bit GCC TYPE
typedef __int128_t int128_t;
typedef __uint128_t uint128_t;
typedef int128_t I128;
#define HIGH(x) x
#define HIGHVALUE(x) ((uint64_t)(x >> BITS))
#define LOW(x) x
#define LOWVALUE(x) (x & UMYINTMAX)
#else
typedef struct I128 {
int64_t high;
uint64_t low;
} I128;
#define HIGH(x) x.high
#define HIGHVALUE(x) x.high
#define LOW(x) x.low
#define LOWVALUE(x) x.low
#endif
#define HIGHHIGH(x) (HIGHVALUE(x) >> (BITS / 2))
#define HIGHLOW(x) (HIGHVALUE(x) & 0xFFFFFFFF)
#define LOWHIGH(x) (LOWVALUE(x) >> (BITS / 2))
#define LOWLOW(x) (LOWVALUE(x) & 0xFFFFFFFF)
inline I128 I128add(I128 a, const I128 b) {
#if defined(USENATIVE)
return a + b;
#elif defined(USEASM) && defined(X86_64)
__asm(
"ADD %[blo], %[alo]\n"
"ADC %[bhi], %[ahi]"
: [alo] "+g" (a.low), [ahi] "+g" (a.high)
: [blo] "g" (b.low), [bhi] "g" (b.high)
: "cc"
);
return a;
#elif defined(USEASM) && defined(X86_32)
// SLOWER DUE TO ALL THE CRAP
int32_t ahihi = HIGHHIGH(a), bhihi = HIGHHIGH(b);
uint32_t ahilo = HIGHLOW(a), bhilo = HIGHLOW(b);
uint32_t alohi = LOWHIGH(a), blohi = LOWHIGH(b);
uint32_t alolo = LOWLOW(a), blolo = LOWLOW(b);
__asm(
"ADD %[blolo], %[alolo]\n"
"ADC %[blohi], %[alohi]\n"
"ADC %[bhilo], %[ahilo]\n"
"ADC %[bhihi], %[ahihi]\n"
: [alolo] "+r" (alolo), [alohi] "+r" (alohi), [ahilo] "+r" (ahilo), [ahihi] "+r" (ahihi)
: [blolo] "g" (blolo), [blohi] "g" (blohi), [bhilo] "g" (bhilo), [bhihi] "g" (bhihi)
: "cc"
);
a.high = ((int64_t)ahihi << (BITS / 2)) + ahilo;
a.low = ((uint64_t)alohi << (BITS / 2)) + alolo;
return a;
#else
// this seems faster than adding to a directly
I128 r = {a.high + b.high, a.low + b.low};
// check for overflow of low 64 bits, add carry to high
// avoid conditionals
r.high += r.low < a.low || r.low < b.low;
return r;
#endif
}
Please note that I don't use C/ASM much, in fact this is my first attempt at inline ASM. Being used to Java/C#/JS/PHP etc. means that something very obvious to a routine C dev may not be apparent to me (besides the obvious insecure quirkiness in code style ;)). Also all this may be called something else entirely, because I had a very hard time finding anything online regarding the subject (non-native speaker as well).
Thanks a lot!
Edit 1
After much digging I have found the following theoretical solution, which works, but is unnecessary slow (slower than the much longer gcc output!) because it forces everything to memory and I am looking for a generic solution (reg/mem/possibly imm). I have also found that if you use an "r" constraint on e.g. 64bit int on 32bit machine, gcc will actually put both values in 2 registers (e.g. eax and ebx). The problem is not being able to reliably access the second part. I am sure there some hidden operator modifier that's just hard to find to tell gcc I want to access that second part.
uint32_t t1, t2;
__asm(
"MOV %[blo], %[t1]\n"
"MOV 4+%[blo], %[t2]\n"
"ADD %[t1], %[alo]\n"
"ADC %[t2], 4+%[alo]\n"
"MOV %[bhi], %[t1]\n"
"MOV 4+%[bhi], %[t2]\n"
"ADC %[t1], %[ahi]\n"
"ADC %[t2], 4+%[ahi]\n"
: [alo] "+o" (a.low), [ahi] "+o" (a.high), [t1] "=&r" (t1), [t2] "=&r" (t2)
: [blo] "o" (b.low), [bhi] "o" (b.high)
: "cc"
);
return a;