Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.
5 Answers
These days you should normally just include <immintrin.h>
. It includes everything.
GCC and clang will stop you from using intrinsics for instructions you haven't enabled at compile time (e.g. with -march=native
or -mavx2 -mbmi2 -mpopcnt -mfma -mcx16 -mtune=znver1
or whatever.)
MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics.
Historically (before immintrin.h
pulled in everything) you had to manually include a header for the highest level of intrinsics you wanted.
This may still be useful with MSVC and ICC to stop yourself from using instruction-sets you don't want to require.
<mmintrin.h> MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<ammintrin.h> SSE4A
<wmmintrin.h> AES
<immintrin.h> AVX, AVX2, FMA
Including one of these pulls in all previous ones (except AMD-only SSE4A: immintrin.h
doesn't pull that in)
Some compilers also have <zmmintrin.h>
for AVX512.
On GCC/clang, if you use just
#include <x86intrin.h>
it will include all SSE/AVX headers which are enabled according to compiler switches like -march=haswell
or just -march=native
. Additionally some x86 specific instructions like bswap
or ror
become available as intrinsics.
The MSVC equivalent of this header <intrin.h>
If you just want portable SIMD, use #include <immintrin.h>
MSVC, ICC, and gcc/clang (and other compilers like Sun I think) all support this header for the SIMD intrinsics documented by Intel's only intrinsics finder / search tool: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
The header name depends on your compiler and target architecture.
- For Microsoft C++ (targeting x86, x86-64 or ARM) and Intel C/C++ Compiler for Windows use
intrin.h
- For gcc/clang/icc targeting x86/x86-64 use
x86intrin.h
- For gcc/clang/armcc targeting ARM with NEON use
arm_neon.h
- For gcc/clang/armcc targeting ARM with WMMX use
mmintrin.h
- For gcc/clang/xlcc targeting PowerPC with VMX (aka Altivec) and/or VSX use
altivec.h
- For gcc/clang targeting PowerPC with SPE use
spe.h
You can handle all these cases with conditional preprocessing directives:
#if defined(_MSC_VER)
/* Microsoft C/C++-compatible compiler */
#include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
/* GCC-compatible compiler, targeting x86/x86-64 */
#include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
/* GCC-compatible compiler, targeting ARM with NEON */
#include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
/* GCC-compatible compiler, targeting ARM with WMMX */
#include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
/* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
#include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
/* GCC-compatible compiler, targeting PowerPC with SPE */
#include <spe.h>
#endif
From this page
+----------------+------------------------------------------------------------------------------------------+
| Header | Purpose |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h | Everything, including non-vector x86 instructions like _rdtsc(). |
| mmintrin.h | MMX (Pentium MMX!) |
| mm3dnow.h | 3dnow! (K6-2) (deprecated) |
| xmmintrin.h | SSE + MMX (Pentium 3, Athlon XP) |
| emmintrin.h | SSE2 + SSE + MMX (Pentium 4, Athlon 64) |
| pmmintrin.h | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego) |
| tmmintrin.h | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer) |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom) |
| ammintrin.h | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom) |
| smmintrin.h | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer) |
| nmmintrin.h | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer) |
| wmmintrin.h | AES (Core i7 Westmere, Bulldozer) |
| immintrin.h | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA |
+----------------+------------------------------------------------------------------------------------------+
So in general you can just include immintrin.h
to get all Intel extensions, or x86intrin.h
if you want everything, including _bit_scan_forward
and _rdtsc
, as well as all vector intrinsics include AMD-only ones. If you are against including more that you actually need then you can pick the right include by looking at the table.
x86intrin.h
is the recommended way to get intrinsics for AMD XOP (Bulldozer-only, not even future AMD CPUs), rather than having its own header.
Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps
without enabling fma, even if you include immintrin.h
and enable AVX2).
20200914: latest best practice: <immintrin.h>
(also supported by MSVC)
I'll leave the rest of the answer for historic purposes; it might be useful for older compiler / platform combinations...
As many of the answers and comments have stated, <x86intrin.h>
is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc
, clang
, and icc
have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...
gcc : support for
x86intrin.h
first appears ingcc-4.5.0
. Thegcc-4
release series is no longer being maintained, whilegcc-6.x
is the current stable release series.gcc-5
also introduced the__has_include
extension present in allclang-3.x
releases.gcc-7
is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released asgcc-7.1.0
.clang :
x86intrin.h
appears to have been supported for allclang-3.x
releases. The latest stable release isclang (LLVM) 3.9.1
. The development branch isclang (LLVM) 5.0.0
. It's not clear what's happened to the4.x
series.Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the
LLVM
projects. That said, the current release:clang-800.0.42.1
, is based onLLVM 3.9.0
. The firstLLVM 3.0
based version appears to beApple clang 2.1
back inXcode 4.1
.LLVM 3.1
first appears withApple clang 3.1
(a numeric coincidence) inXcode 4.3.3
.
Apple also defines__apple_build_version__
e.g.,8000042
. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.
Any recent version of clang
, including Apple versions, should therefore have no issue with x86intrin.h
. Of course, along with gcc-5
, you can always use the following:
#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif
One trick you can't really rely on is using the __GNUC__
versions in clang
. The versioning is, for historical reasons, stuck at 4.2.1
. A version that precedes the x86intrin.h
header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.
icc : as far as I can tell, the
x86intrin.h
header is supported since at least Intel C++ 16.0. The version test can by performed with:#if (__INTEL_COMPILER >= 1600)
. This version (and possibly earlier versions) also provides support for the__has_include
extension.MSVC : It appears that
MSVC++ 12.0 (Visual Studio 2013)
is the first version to provide theintrin.h
header - notx86intrin.h
... this suggests:#if (_MSC_VER >= 1800)
as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.