144
votes

Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.

5

5 Answers

189
votes

These days you should normally just include <immintrin.h>. It includes everything.

GCC and clang will stop you from using intrinsics for instructions you haven't enabled at compile time (e.g. with -march=native or -mavx2 -mbmi2 -mpopcnt -mfma -mcx16 -mtune=znver1 or whatever.)

MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics.


Historically (before immintrin.h pulled in everything) you had to manually include a header for the highest level of intrinsics you wanted.

This may still be useful with MSVC and ICC to stop yourself from using instruction-sets you don't want to require.

<mmintrin.h>  MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<ammintrin.h> SSE4A
<wmmintrin.h> AES
<immintrin.h> AVX, AVX2, FMA

Including one of these pulls in all previous ones (except AMD-only SSE4A: immintrin.h doesn't pull that in)

Some compilers also have <zmmintrin.h> for AVX512.

83
votes

On GCC/clang, if you use just

#include <x86intrin.h>

it will include all SSE/AVX headers which are enabled according to compiler switches like -march=haswell or just -march=native. Additionally some x86 specific instructions like bswap or ror become available as intrinsics.


The MSVC equivalent of this header <intrin.h>


If you just want portable SIMD, use #include <immintrin.h>

MSVC, ICC, and gcc/clang (and other compilers like Sun I think) all support this header for the SIMD intrinsics documented by Intel's only intrinsics finder / search tool: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

57
votes

The header name depends on your compiler and target architecture.

  • For Microsoft C++ (targeting x86, x86-64 or ARM) and Intel C/C++ Compiler for Windows use intrin.h
  • For gcc/clang/icc targeting x86/x86-64 use x86intrin.h
  • For gcc/clang/armcc targeting ARM with NEON use arm_neon.h
  • For gcc/clang/armcc targeting ARM with WMMX use mmintrin.h
  • For gcc/clang/xlcc targeting PowerPC with VMX (aka Altivec) and/or VSX use altivec.h
  • For gcc/clang targeting PowerPC with SPE use spe.h

You can handle all these cases with conditional preprocessing directives:

#if defined(_MSC_VER)
     /* Microsoft C/C++-compatible compiler */
     #include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
     /* GCC-compatible compiler, targeting x86/x86-64 */
     #include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
     /* GCC-compatible compiler, targeting ARM with NEON */
     #include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
     /* GCC-compatible compiler, targeting ARM with WMMX */
     #include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
     /* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
     #include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
     /* GCC-compatible compiler, targeting PowerPC with SPE */
     #include <spe.h>
#endif
47
votes

From this page

+----------------+------------------------------------------------------------------------------------------+
|     Header     |                                         Purpose                                          |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h    | Everything, including non-vector x86 instructions like _rdtsc().                         |
| mmintrin.h     | MMX (Pentium MMX!)                                                                       |
| mm3dnow.h      | 3dnow! (K6-2) (deprecated)                                                               |
| xmmintrin.h    | SSE + MMX (Pentium 3, Athlon XP)                                                         |
| emmintrin.h    | SSE2 + SSE + MMX (Pentium 4, Athlon 64)                                                  |
| pmmintrin.h    | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego)                        |
| tmmintrin.h    | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer)                                      |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom)                                                       |
| ammintrin.h    | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom)                         |
| smmintrin.h    | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer)                             |
| nmmintrin.h    | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer)     |
| wmmintrin.h    | AES (Core i7 Westmere, Bulldozer)                                                        |
| immintrin.h    | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA             |
+----------------+------------------------------------------------------------------------------------------+

So in general you can just include immintrin.h to get all Intel extensions, or x86intrin.h if you want everything, including _bit_scan_forward and _rdtsc, as well as all vector intrinsics include AMD-only ones. If you are against including more that you actually need then you can pick the right include by looking at the table.

x86intrin.h is the recommended way to get intrinsics for AMD XOP (Bulldozer-only, not even future AMD CPUs), rather than having its own header.

Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps without enabling fma, even if you include immintrin.h and enable AVX2).

14
votes

20200914: latest best practice: <immintrin.h> (also supported by MSVC)

I'll leave the rest of the answer for historic purposes; it might be useful for older compiler / platform combinations...


As many of the answers and comments have stated, <x86intrin.h> is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc, clang, and icc have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...

  • gcc : support for x86intrin.h first appears in gcc-4.5.0. The gcc-4 release series is no longer being maintained, while gcc-6.x is the current stable release series. gcc-5 also introduced the __has_include extension present in all clang-3.x releases. gcc-7 is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released as gcc-7.1.0.

  • clang : x86intrin.h appears to have been supported for all clang-3.x releases. The latest stable release is clang (LLVM) 3.9.1. The development branch is clang (LLVM) 5.0.0. It's not clear what's happened to the 4.x series.

  • Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the LLVM projects. That said, the current release: clang-800.0.42.1, is based on LLVM 3.9.0. The first LLVM 3.0 based version appears to be Apple clang 2.1 back in Xcode 4.1. LLVM 3.1 first appears with Apple clang 3.1 (a numeric coincidence) in Xcode 4.3.3.

    Apple also defines __apple_build_version__ e.g., 8000042. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.

Any recent version of clang, including Apple versions, should therefore have no issue with x86intrin.h. Of course, along with gcc-5, you can always use the following:

#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif

One trick you can't really rely on is using the __GNUC__ versions in clang. The versioning is, for historical reasons, stuck at 4.2.1. A version that precedes the x86intrin.h header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.

  • icc : as far as I can tell, the x86intrin.h header is supported since at least Intel C++ 16.0. The version test can by performed with: #if (__INTEL_COMPILER >= 1600). This version (and possibly earlier versions) also provides support for the __has_include extension.

  • MSVC : It appears that MSVC++ 12.0 (Visual Studio 2013) is the first version to provide the intrin.h header - not x86intrin.h... this suggests: #if (_MSC_VER >= 1800) as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.