I'm trying to understand the VEX prefix encoding for the SSE/AVX instructions. So please bear with me if I ask something simple. I have the following related questions.
Let's take the MOVUP(D/S) instruction (0F 10). If I follow the 2-byte VEX prefix encoding correctly:
The following two instruction encodings produce the same result:
db 0fh, 10h, 00000000b ; movups xmm0,xmmword ptr [rax]
db 0c5h, 11111000b, 10h, 00000000b ; vmovups xmm0,xmmword ptr [rax]
As these two:
db 066h, 0fh, 10h, 00000000b ; movupd xmm0,xmmword ptr [rax]
db 0c5h, 11111001b, 10h, 00000000b ; vmovupd xmm0,xmmword ptr [rax]
Thus my questions:
What does the first
vstand for in those instructions? Is it just to denote the use of theVEXprefix?Does it make any difference (with the exception of the length of the instructions) if I use or don't use the
VEXprefix in the examples above?I'm trying to understand Intel's syntax in their documentation. Say, this screenshot:
In VEX.128.0F.WIG I can see that .128 is the bit 2 (L) of the 2nd VEX byte. Then .0F is for a 3-byte VEX prefix, m-mmmm form to be 00001, right? But what does the WIG part stand for?
Is
VEXprefix recognized by the Intel CPUs only? How about AMD?Lastly, what is the difference between
movupsandmovupd? It seems like both of them simply move 16 bytes from the source memory:
into the xmm register:
and the "double" or "single" precision packing really doesn't make any difference.
Thanks for your patience with me.




movupsvs.movupdmakes no difference on any CPU made so far. Some CPUs have domain-crossing latency for integer vs. FP (especially for reg-reg moves), but no CPUs have separate double/single domains. Usemovupsbecause it's shorter. - Peter Cordesmovups/d same,samewould defeat mov-elimination and give you 0.33 throughput. - Peter Cordes