VC++ compiler is less smart than you think it is. Here’s how these settings work.
When you’re building 32-bit code and enable SSE1 or SSE2, it enables automatic vectorization into respective instruction sets.
When you’re building 64-bit code, both SSE1 and SSE2 are part of the instruction set, all AMD64 processors in the world are required to support both of these. That’s why you’re getting the warning with /arch:SSE2.
When you set up AVX the compiler does 2 things, enables automatic vectorization into AVX1, also switches instruction encoding (for all of them, both SSE, AVX, manually vectorized and auto-vectorized) from legacy to VEX. VEX is good stuff, enables to fuse unaligned RAM reads into other instructions. It also solves dependency issues which may affect performance, VEX encoded vaddps xmm0, xmm0, xmm1
zeroes out higher 16 bytes of ymm0
, while legacy encoded addps xmm0, xmm0, xmm1
keeps the data there.
When you set up AVX2 it does a few minor optimizations, most notably stuff like _mm_set1_epi32
may compile into vpbroadcastd
. Also switches encoding to VEX like for AVX1.
Note I marked automatic in bold. Microsoft compiler doesn’t do runtime dispatch or cpuid checks, and the automatic vectorizer doesn’t use SSE3 or 4.1. If you’re writing manually vectorized code the compiler won’t do fallbacks, will emit whatever instructions you asked for. When present, AVX/AVX2 setting only affects their encoding.
If you want to write manually vectorized code that uses SSE3, SSSE3, SSE 4.1, FMA3, AES, SHA, etc., you don’t need to enable anything. You just need to include relevant headers, and ideally ensure in runtime the CPU has them. For the last part, I usually calling __cpuid
early on startup and checking these bits, this is to show a comprehensible error message about unsupported CPU, instead of a hard crush later.
-O3 -msse4.1
or-O3 -march=penryn
– Peter Cordes