0
votes

Underlying architecture changed recently on my cloud server and it looks like i'm getting in trouble with code compiled with gcc -march=native (Ubuntu 14.04, gcc 4.8)

It used to always run on 16-core Intel Xeon E5-2650 v2, and now from time to time (depending on availability i guess) it gets a much faster Xeon E5-2650 v3 instead.

Binary works, but it looks like there's strange undefined behavior in lockless thread synchronization code now whereas it used to work 100% before. Only thing i can think of is that code got somehow compiled for v3 and run on v2 (or the other way round) and there's some incompatibility between the two.

I'd rather avoid this in the future. Is there a good way to detect binaries that were compiled for the wrong architecture ?


Edit: I checked, gcc docs are pretty clear about -march=native:

-march=cpu-type
Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.

‘native’
This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines).

The difference between v2 and v3 looks fairly substantial judging by the flags gcc -march=native uses under the hood (strings prg | grep march if prg was compiled with debugging symbols). v3 adds all of these and changes l2-cache-size:

-mabm -mavx2 -mbmi -mbmi2 -mfma -mlzcnt -mmovbe

If any of these new instruction sets gets used it'd be like compiling something for MMX and expecting it to run on non MMX arch ...

1
Does the code work 100% again if the binary is rebuilt? It sounds more like a software bug that was exposed by a CPU change.that other guy
Trying to reproduce but hard to test (can't just choose which cpu i'm getting...). Could be a bug but unlikely, software has been stable on different architectures.lemonsqueeze
@lemonsqueeze The x86 memory model hasn't changed recently. You could have a look at the errata sheet for the newer CPU, see if there's anything suspicious, but I strongly suspect it's a software bug.EOF
If the new instruction sets get used, it will SIGILL on an architecture that doesn't support them.caf
and it's not getting SIGILL'ed, so it's more likely a bug...lemonsqueeze

1 Answers

0
votes

GCC will define a predefined macro for the architecture it's using - for example -march=core2 will define a macro __core2. You could include a command-line option to your program that uses these macros to show the architecture it was compiled for - in this case, you want to check for __ivybridge and __haswell.

Alternatively (or as well), you could have the program show the compiler flags that were used to compile it (by having your build system provide those flags to the program). That won't much help if you're using -march=native, though.

It does sound more likely that your issue is a latent bug in the program though, especially since it involves thread synchronisation - the newer machine is likely more aggressively reordering memory accesses, and this exposes a missing barrier in your lockless algorithm.