1
votes

I'm reading this document about how to compile C/C++ code using the Intel C++ compiler and AVX512 support on a Intel Knights Landing.

However, I'm a little bit confused about this part:

-xMIC-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512ER and AVX-512FP.

-xCORE-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512BW, AVX-512DQ and AVX-512VL.

For example, to generate Intel AVX-512 instructions for the Intel Xeon Phi processor x200, you should use the option –xMIC-AVX512. For example, on a Linux system

$ icc –xMIC-AVX512 application.c This compiler option is useful when you want to build a huge binary for the Intel Xeon Phi processor x200. Instead of building it on the coprocessor where it will take more time, build it on an Intel Xeon processor-based machine

My Xeon Phi KNL doesn't have a coprocessor (No need to ssh micX or to compile with the -mmic flag). However, I don't understand if it's better to use the -xMIC or -xCORE?

In second place about -ax instead of -x:

This compiler option is useful when you try to build a binary that can run on multiple platforms.

So -ax is used for cross-platform support, but is there any performance difference comapred to -x?

2
-xCORE will not work on KNL, because it doesn't support AVX-512BW, AVX-512DQ and AVX-512VL.Ilya Verbin

2 Answers

2
votes

For the first question, please use –xMIC-AVX512 if you want to compile for the Intel Xeon Phi processor x200 (aka KNL processor). Note that the phrase in the paper that you mentioned was mistyped, it should read "This compiler option is useful when you want to build a huge binary for the Intel Xeon Phi processor x200. Instead of building it on the Intel Xeon Phi processor x200 where it will take more time, build it on an Intel Xeon processor-based machine."

For the second question, there should not be a performance difference if you run the binaries on an Intel Xeon Phi processor x200. However, the size of the binary complied with -ax should be bigger than the one compiled with -x option.

2
votes

Another option from the link you provide is to build with -xCOMMON-AVX512. This is a tempting option because in my case it has all the instructions that I need and I can use the same option for both a KNL and a Sklake-AVX512 system. Since I don't build on a KNL system I cannot use -xHost (or -march=native with GCC).

However, -xCOMMON-AVX512 should NOT be used with KNL. The reason is that it generates the vzeroupper instruction (https://godbolt.org/z/PgFX55) which is not only not necessary it actually is very slow on a KNL system.

From Agner Fog's micro-architecture manual he writes in the KNL section.

The VZEROALL or VZEROUPPER instructions are not only superfluous here, they are actually harmful for the performance. A VZEROALL or VZEROUPPER instruction takes 36 clock cycles in 64 bit mode...

Therefore for a KNL system you should use -xMIC-AVX512for other systems with AVX512 you should use -xCORE-AVX512 (or -xSKYLAKE-AVX512). I use -qopt-zmm-usage=high as well.

I am not aware of a switch for ICC to disable vzeroupper once it is enabled (with GCC you can use -mno-vzeroupper).

Incidentally, by the same logic you should use -march=knl with GCC and not -mavx512f (-mavx512f -mno-vzeroupper may work if you are sure you don't need AVX512ER or AVX512PF).