6
votes

I'm trying to understand the VEX prefix encoding for the SSE/AVX instructions. So please bear with me if I ask something simple. I have the following related questions.

Let's take the MOVUP(D/S) instruction (0F 10). If I follow the 2-byte VEX prefix encoding correctly:

enter image description here

The following two instruction encodings produce the same result:

db 0fh, 10h, 00000000b              ; movups      xmm0,xmmword ptr [rax]
db 0c5h, 11111000b, 10h, 00000000b  ; vmovups     xmm0,xmmword ptr [rax]

As these two:

db 066h, 0fh, 10h, 00000000b        ; movupd      xmm0,xmmword ptr [rax]
db 0c5h, 11111001b, 10h, 00000000b  ; vmovupd     xmm0,xmmword ptr [rax]

Thus my questions:

  1. What does the first v stand for in those instructions? Is it just to denote the use of the VEX prefix?

  2. Does it make any difference (with the exception of the length of the instructions) if I use or don't use the VEX prefix in the examples above?

  3. I'm trying to understand Intel's syntax in their documentation. Say, this screenshot:

enter image description here

In VEX.128.0F.WIG I can see that .128 is the bit 2 (L) of the 2nd VEX byte. Then .0F is for a 3-byte VEX prefix, m-mmmm form to be 00001, right? But what does the WIG part stand for?

  1. Is VEX prefix recognized by the Intel CPUs only? How about AMD?

  2. Lastly, what is the difference between movups and movupd? It seems like both of them simply move 16 bytes from the source memory:

enter image description here

into the xmm register:

enter image description here

and the "double" or "single" precision packing really doesn't make any difference.

Thanks for your patience with me.

1
AMD CPUs since Bulldozer have supported AVX (and thus VEX encodings). See agner.org/optimize for more details about x86 microarchitectures.Peter Cordes
movups vs. movupd makes no difference on any CPU made so far. Some CPUs have domain-crossing latency for integer vs. FP (especially for reg-reg moves), but no CPUs have separate double/single domains. Use movups because it's shorter.Peter Cordes
W ignored, as opposed to the W bit being meaningful or being mandated as 0 or 1harold
@HadiBrais: The manual is wrong. Assuming mov-elimination works at all, throughput = 0.25 for both if even 1 of the 4 uops is eliminated on average. (mov-elimination success rate is normally much higher). movups/d same,same would defeat mov-elimination and give you 0.33 throughput.Peter Cordes
Yup, not exactly rare. But fortunately between Agner Fog's experimental results + instlatx64, we can usually check Intel's numbers. Or just ignore Intel's because they're wrong more often than Agner cross-checked by instlat, and only Agner Fog tells you which ports instructions run on, which is essential because real code rarely just repeats only the same instruction back-to-back (or in a simple loop). Unfortunately Agner's tables have errors too, but they're often surprising enough to to make you double-check. (e.g. 5 instead of 0.5). (IACA is handy for uop->port stuff, e.g. for SKX.)Peter Cordes

1 Answers

1
votes
  1. What does the first v stand for in those instructions? Is it just to denote the use of the VEX prefix?

v stands for the AVX version of the instruction.

  1. Does it make any difference (with the exception of the length of the instructions) if I use or don't use the VEX prefix in the examples above?

Yes, it does. If you use the VEX prefix, upper bits of the register are cleared. (e.g. if you use vmovups xmm0, the upper half ymm0 is cleared.

  1. I'm trying to understand Intel's syntax in their documentation. [snip]. But what does the WIG part stand for?

"W" = width flag. "IG" = ignored.

From section "3.1.1.2 Opcode Column in the Instruction Summary Table (Instructions with VEX prefix)" in the manual,

"— WIG: can use C5H form (if not requiring VEX.mmmmm) or VEX.W value is ignored in the C4H form of VEX prefix."

"— If WIG is present, the instruction may be encoded using either the two-byte form or the three-byte form of VEX. When encoding the instruction using the three-byte form of VEX, the value of VEX.W is ignored."

  1. Is VEX prefix recognized by the Intel CPUs only? How about AMD?

It is recognized by any CPU that supports AVX. Both Intel and AMD have supported it since ~2011 (Intel's Sandy Bridge and later, and AMD's Bulldozer and later)

  1. Lastly, what is the difference between movups and movupd? It seems like both of them simply move 16 bytes from the source memory:

I believe that some processors may maintain flags on the contents of floating point SIMD registers; and using the wrong width/type may cause a stall in some situations.